How to fix a preg_match bug

I wanted to parse the host of a URL with a regular expression to get its third level domain {[.pattern | 1.hilite(=php=)]}

Let’s test the general case with http://www.dnr.state.oh.us/ {[.example-1 | 1.hilite(=php=)]}

array(2) {
  [0]=>
  string(19) "www.dnr.state.oh.us"
  [1]=>
  string(5) "state"
}

Pretty good. And now let’s test the edge case with http://google.com/ {[.example-2 | 1.hilite(=php=)]}

array(1) {
  [0]=>
  string(10) "google.com"
}

WTF, where is my empty submatch? Since when an optional submatch is not a submatch if it’s empty?

I googled it and found that there is already a filed bug. The chosen resolution has been won’t fix!! They say for backward compatibility, but I cannot imagine how fixing it would break anything older.

  • If I expect 3 submatches from my pattern, but I get 2, then I know (for the bug) that the missing submatch is the last one and it’s an empty string. So I add it myself to the submatches array. Would a programmer do anything different to fix this bug?
  • If the bug is globally fixed, it means that my old code will always get 3 submatches from that pattern. So my individual fix won’t get triggered, and having the last submatch the same value (empty string) as the one my fix would have added, I won’t have any issue, except a bit of (stale) unused code.
To cleanly fix it myself once and for all, I’ve written a wrapper ando_preg_match that has the same signature and the expected results.
EDIT: There were some bugs in my own fix to the preg_match bug. For the code, please see the new post.

In the edge case I get now

array(2) {
  [0]=>
  string(10) "google.com"
  [1]=>
  string(0) ""
}

Unfortunately the wrapper is more complex than I like, but PHP allows regular expressions with named groups and they require a lot of additional code. Anyway I’ve been able to do it all in a single function that can be easily dropped in any project.

Here is a test with a pattern with named groups, just in case you were wondering what it looks like {[.example-3 | 1.hilite(=php=)]}

array(3) {
  [0]=>
  string(10) "google.com"
  ["subdomain"]=>
  string(0) ""
  [1]=>
  string(0) ""
}

Actually, this last example allows me to show that my wrapper is really returning the expected result. In fact, just by adding a last non-empty group to the previous pattern, the original and buggy preg_match will work just fine {[.example-4 | 1.hilite(=php=)]}

array(4) {
  [0]=>
  string(10) "google.com"
  ["subdomain"]=>
  string(0) ""
  [1]=>
  string(0) ""
  [2]=>
  string(10) "google.com"
}

Of course you’ll get the same result using the wrapper.

How to fix Fatal error: Exception thrown without a stack frame

From time to time I get the infamous

Fatal error: Exception thrown without a stack frame in Unknown on line 0

but googling about it yesterday I found this article that describes a method for seeing through it and discover the real error. It’s very simple and yet powerful (and it works).

The key trick is

  1. find the line N where your script is misbehaving. This is tedious but easily done with a binary search on the execution line. At the beginning, the execution line can be that of the whole script.
    • find an approximate middle point in the current execution line
    • put an exit instruction there and execute the script again
    • do you get the same error?
      • YES: make the executed half the next execution line
      • NO: make the non executed half the next execution line
    • remove the exit
    • repeat all until the exit on line N is clean and on line N+1 is dirty
  2. look at what you have on line N and try to figure out how to force your script to willfully do right there the same operations that are automatically done on shut down
    • in my case yesterday, like the author of the article, I had a session_start on line N, so that I added a session_write_close() on line N+1 and an exit on line N+2. Magically the real error message got displayed!!

 

Translating a string from PHP to JSON

Based on my understanding of this subject, I’ve come up with the following function for translating a string from PHP to JSON, strictly conforming to the RFC4627.

{[ .json_string | 1.hilite(=php,ln-1=) ]}

A simple test like this
{[ .test | 1.hilite(=php,ln-1=) ]}

yields (in comparison to the _encodeString method of the Zend_Json_Encoder class of Zend Framework)

Zend_Json_Encoder::_encodeString: Array
(
    [0] => "a null: ; a new line: n; a carriage return: r;"
    [1] => "a js regex: /(["'])\w+\1/"
    [2] => "a script element: <script type="test/javascript" src="http://example.com/all.js"></script>"
    [3] => "a japanese word: u307fu305a"
)
json_string: Array
(
    [0] => "a null: u0000; a new line: n; a carriage return: r;"
    [1] => "a js regex: /(["'])\w+\1/"
    [2] => "a script element: <script type="test/javascript" src="http://example.com/all.js"></script>"
    [3] => "a japanese word: みず"
)