Tag based architecture

Web projects, and software projects in general, contain lots of attributes whose values are elements of a specific set. A “status” attribute is one of them, and “deleted”, “published”, “draft” could be its possible values.

I used to represent those attributes by means of normailzed tables, where each set of possible values of an attribute is a table with two columns: an id and a value.

For a “status” attribute, I made a table like this:

status( id, value ) = {
  (1, 'deleted'),
  (2, 'published'),
  (3, 'draft')
}

Then I represented the attribute itself with a foreign key column (“status_id”) in another table, like post( id, status_id, title, … ). With this architecture, all published posts will be selected by an SQL select like this:

[1]
SELECT *
  FROM post
  WHERE status_id = 2

The fact that status_id should be 2 does not tell anything about the selected posts. So the previous select could be rewritten like this:

[2]
SELECT post.*
  FROM post
    JOIN status ON post.status_id = status.id
  WHERE status.value = 'published'

An alternative architecture, which at the same time allows for an SQL select which is as clean as [1] and as clear as [2], represents an attribute as an ENUM column whose elements are the possible values:

post.status = [ 'deleted', 'published', 'draft' ]

With this architecture, all published posts will be selected by an SQL select like this:

[3]
SELECT *
  FROM post
  WHERE status = 'published'

Sadly, the ENUM based architecture comes with a burden. A change of the possible values implies a change of the database structure, whereas in the case of the normalized architecture, that change belongs to the database data, which I prefer. Additionally, if the possible values are more than a few units or if values are dynamic instead of static, then the ENUM solution is unviable.

The main benefit of these architectures is that you can change a possible value of an attribute in one place and have the application work as smoothly as before the change. IMO, this is a pretty little benefit compared to the added complexities. I do make this type of changes when developing, but pervasive translation, which I always use, greatly reduces the need.

Due to the fact that this kind of attributes is very common, I represent all of them with one table: tag( name, notes ), being name the primary key.

For a status attibute, I could have a tag table fragment like:

tag( name ) = {
  ...
  ( '<$ status deleted $>' ),
  ( '<$ status post published $>' ),
  ( '<$ status post draft $>' ),
  ...
}

Then, in a post table I could have a status column whose value is a tag below <$ status $>.

For neighourhoods of Barcelona (Spain), I could have a tag table fragment like:

tag( name ) = {
  ...
  ( '<$ zone spain cataluna barcelona barcelona ciutat_vella $>' ),
  ( '<$ zone spain cataluna barcelona barcelona eixample $>' ),
  ( '<$ zone spain cataluna barcelona barcelona gracia $>' ),
  ( '<$ zone spain cataluna barcelona barcelona horta-guinardo $>' ),
  ( '<$ zone spain cataluna barcelona barcelona les_corts $>' ),
  ( '<$ zone spain cataluna barcelona barcelona nou_barris $>' ),
  ( '<$ zone spain cataluna barcelona barcelona sant_andreu $>' )
  ( '<$ zone spain cataluna barcelona barcelona sant_marti $>' ),
  ( '<$ zone spain cataluna barcelona barcelona sants-montjuic $>' ),
  ( '<$ zone spain cataluna barcelona barcelona sarria-sant_gervasi $>' ),
  ...
}

Then in an address table I could have a zone column whose value is a tag below <$ zone $>.

Implementing some tree operations in a tag model class, I can perform recurrent tasks very easily.

Example: get options for a status select box:

$statuses = Tag::childrenAsOptions('<$ status $>');

Example: get options for a spanish province select box:

$provincias = Tag::childrenAsOptions('<$ zone spain * $>');

The asterisk means any name at that level, which is the ‘comunidad autónoma’ level in Spain.

It’s worth to note that returned options have keys equal to the tag name and values equal to their translations. This way, users always see something meaningful to them, and developers as well see something meaningful to them, because those keys are much richer than flat numbers!

Example: get all the published posts.

$posts = Post::byStatus('<$ status post published $>');

Example: get postal codes of the city of Barcelona (Spain). Supposing postal codes are in a table zip( zone, code ):

$barcelona_codes = Zip::byZone('<$ zone spain cataluna barcelona barcelona $>');

Example: get postal codes of Spain.

$spain_codes = Zip::byZone('<$ zone spain % $>');

The percent is the SQL wildcard.

The status and zone columns should be foreign keys to the tag table. While this is not perfect, it still helps. In fact, even if you could by mistake assign a status tag to a zone column, the database will complain if you try to change or delete a referenced tag.

This architecture has two main benefits:

  1. it allows you to inspect code (SQL and PHP), and immediately understand it because there is no indirection
  2. it allows you to find all occurrences of a given tag in code (SQL and PHP) and data

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.