Chili 1.3 Released Today

UPDATE: Chili 1.4 has been released

Changes

  • Fixed a bug in the formula for computing the number of submatches of a regular expression: parentheses not explicitly escaped inside a character class are escaped by default, but Chili didn’t take care of them.
  • Replaced the explicit MIT Licence, with a link to it.
  • Added “?:” to many parenthesized expressions (inside recipes) that were being used just for grouping.
  • Removed the trace helper. Try the Firebug lite console instead.

Files

  • download all in a zip
  • read the manual
  • test Chili highlighting Chili
  • test Chili highlighting jQuery
  • test Chili highlighting PHP

AdSense Enzymes

AdSense Enzymes are very simple.

At the simplest level of abstraction, I can directly trasclude the custom field which I’ve stored the ad code into. And I can do it either by means of a statement into the content of a post or a page, or by means of a call to the metabolize function (available in Enzymes 1.1) into the php code of a WordPress template file.

The former method is useful when I want to place an ad unit in a particular/variable position inside the content of a post or a page; the latter method is useful when I want to place an ad unit in a general/constant position inside the blog.

For example, if I put the statement {[1.ad001]} here, it would reproduce by itself the ad unit right here, because in the first post I’ve stored the ad code in a custom field called ad001. But to make the ad unit appear before any post, I need to find the line in the index.php file of my default theme that reads

<?php if (have_posts()) : ?>

and replace it with this line

<?php metabolize( "{[1.ad001]}" ); if (have_posts()) : ?>

Taking the abstraction one step further, I’d like to store the string 1.ad001 in a new home field, so that I can provide a separation layer that makes it possible for me to replace the ad code simply by changing the content of a field, rather than having to change the php file again.

Indirect transclusion is not directly available in Enzymes, but it can be easily achieved by means of a simple enzyme like this

preg_match( '/'.$this->e['substrate'].'/', $this->substrate, $matches );
return $this->item( $matches['sub_id'], $matches['sub_key'] );

I’ve called this enzyme get, and I’ve put it into the first post. So the Enzymes statement becomes {[1.get(1.home)]} and the edited line for the index.php file becomes

<?php metabolize( "{[1.get(1.home)]}" ); if (have_posts()) : ?>

I’m currently using the latter for my blog home, so I don’t have to worry about placing ads every time I post a new log, and the former for my pages, so that I can place the ads insdide the content, in a position that I hope will fit better.

Why does Google limit to three the number of ad units per page? Is it a technical reason?

Number of submatches of a regular expression

Chili development started because I found a bug at the core of Code Highlighter, and wanted to fix it. The bug was inside the snippet used to count the number of submatches of a given regular expression.

That number is central to the working of a clever parsing engine, based upon the possibility of matching once a big expression against the target, rather than matching many times smaller expressions.

The big expression is built by alternating the smaller ones, so that each of them becomes a submatch of the big expression, like in this example: (A)|(B)|(C).

But that is just the tip of the iceberg, because each of the smaller expressions can in turn have many submatches, which add up to the total number of submatches returned in an array as the result of the big match.

If a match is found, then submatches[0], which holds the global match is certainly not empty, as not empty must also be submatches[x], being x the index of the first smaller expression that matched.

In the above example, if the number of submatches of A, B, and C be always 0, then submatches[1] would be not empty if A matched, else submatches[2] would be not empty if B matched, else submatches[3] would be not empty if C matched.

But if the number of submatches of A was nA, of B nB, and of C nC, then the x for (A) would be 1, for (B) it would be (1+nA)+1, and for C it would be (1+nA)+(1+nB)+1.

Now we have all the info required for detecting the smaller expression that matched the target by looking at submatches[1], else at submatches[2+nA], or else at submatches[3+nA+nB]. The first of them which is non empty is the one that matched.

The number of open parentheses is related to the number of submatches. However it is not exactly that number, due to the following exceptions

  • parentheses are also used for temporary grouping
  • parentheses can be escaped, and considered part of the target
  • the escaping device can be escaped

Instead of trying to count the open parentheses by means of only one expression that accounts for the above exceptions, I’ve found it’s cleaner to use three separate steps.

  1. re = X.replace( /\./g, “%” )
    this removes any escaped character
  2. re = re.replace( /[.*?]/g, “%” )
    this removes any character class
  3. nX = ( re.match( /((?!?)/g ) || [] ).length
    this matches all the open parentheses not followed by a “?”

In particular:

  1. This step disables any escaped backslash or open parenthesis (as well as any other escaped character, but I don’t care). This way I’m done with the issue of escaping based on the use of the backslash sign. The X represents the regular expression under examination
  2. This step disables any open parenthesis inside any character class (as well as any other character inside any character class, but I don’t care). In fact those open parenteses could have been written without escaping them, because they are escaped by default
  3. This step is just the classical short definition of what describes a submatch in a regular expression. nX represents the number of submatches of X