Andrea Ercolino, Software Engineer
I wanted to parse the host of a URL with a regular expression to get its third level domain {[.pattern | 1.hilite(=php=)]}
Let’s test the general case with http://www.dnr.state.oh.us/ {[.example-1 | 1.hilite(=php=)]}
array(2) { [0]=> string(19) "www.dnr.state.oh.us" [1]=> string(5) "state" }
Pretty good. And now let’s test the edge case with http://google.com/ {[.example-2 | 1.hilite(=php=)]}
array(1) { [0]=> string(10) "google.com" }
WTF, where is my empty submatch? Since when an optional submatch is not a submatch if it’s empty?
I googled it and found that there is already a filed bug. The chosen resolution has been won’t fix!! They say for backward compatibility, but I cannot imagine how fixing it would break anything older.
In the edge case I get now
array(2) { [0]=> string(10) "google.com" [1]=> string(0) "" }
Unfortunately the wrapper is more complex than I like, but PHP allows regular expressions with named groups and they require a lot of additional code. Anyway I’ve been able to do it all in a single function that can be easily dropped in any project.
Here is a test with a pattern with named groups, just in case you were wondering what it looks like {[.example-3 | 1.hilite(=php=)]}
array(3) { [0]=> string(10) "google.com" ["subdomain"]=> string(0) "" [1]=> string(0) "" }
Actually, this last example allows me to show that my wrapper is really returning the expected result. In fact, just by adding a last non-empty group to the previous pattern, the original and buggy preg_match will work just fine {[.example-4 | 1.hilite(=php=)]}
array(4) { [0]=> string(10) "google.com" ["subdomain"]=> string(0) "" [1]=> string(0) "" [2]=> string(10) "google.com" }
Of course you’ll get the same result using the wrapper.