Full UTF-8 support in WordPress

A few days ago I discovered that WordPress didn’t support full UTF-8 strings, whose characters are 1 to 4 bytes long. Instead it does support all unicodes belonging to the BMP, whose UTF-8 characters are 1 to 3 bytes long.

This WordPress defect is “caused by” MySQL 5, which only supports UTF-8 characters in the BMP. Apparently, MySQL 6 will be full UTF-8 compliant.

This morning, with the help of the UTF-8 class I recently developed, I made up a new WordPress plugin that adds full UTF-8 support to WordPress.

And this is the same sentence by Douglas Crockford, from the RFC4627 I cited in the previous post:

a string containing only the G clef character [𝄞] may be represented as “uD834uDD1E”

Windows users see a rectangle: it’s a Windows feature, but they should see the following thing

Imagen 1

You should note that the G clef above (not the one in the picture 😉 appears in the HTML not as an entity but as a common UTF-8 character, entered as is in the WordPress editor. You can see it for yourself by comparing the source code of this post (1) with that of the previous one (2).

  1. <blockquote><p>a string containing only the G clef character [<a href="http://www.fileformat.info/info/unicode/char/1d11e/index.htm" target="_blank"><span style="font-size: 2em;">𝄞</span></a>] may be represented as “uD834uDD1E”</p></blockquote>
  2. <blockquote><p>a string containing only the G clef character [<a href="http://www.fileformat.info/info/unicode/char/1d11e/index.htm" target="_blank"><span style="font-size: 2em;">&#119070;</span></a>] may be represented as &#8220;uD834uDD1E&#8221;</p></blockquote>

Note that my plugin works for post and page content, title, excerpt, and also for searches, but it doesn’t cover custom fields (since version 2.0.0) any character written to and read from the database. For this reason Anyway, I’ve just opened a ticket about this issue in the WordPress Trac: please drop by and comment 🙂

What follows is the code of my Zend_Utf8 class, which I included in the plugin, after de-Zend-ifying all of it, for safe distribution in the wild.

{[ .Ando_Utf8 | 1.hilite(=php=) ]}

2 Replies to “Full UTF-8 support in WordPress”

  1. I was going to publish a new plugin for WordPress based on the patch I submitted there. And I’ve already written it and it works like a charm, but I’m experiencing a long Internet outage and I can’t still publish it. Sorry for that.

  2. I have just published version 2.0.0 of the WordPress plugin Full UTF-8. It now covers any write and read of data to and from the database.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.