How to customize MorganJS

MorganJS is easy to install and works nicely out of the box.

{[.old-setup | 1.hilite(=javascript=) ]}

Here is what it looks like. Highlighted HTTP status codes are quite useful.

Screen Shot 2015-11-08 at 14.44.07

Thankfully, it’s possible to customize MorganJS by adding tokens, which are template symbols, like this:

{[ .current-user | 1.hilite(=javascript=) ]}

which can be later used like this:

{[ .middle-setup | 1.hilite(=javascript=) ]}

to produce something like this:

Screen Shot 2015-11-08 at 14.52.16

Uh-oh!! Where are my colors?

I delve into MorganJS code…

{[ .morgan | 1.hilite(=javascript=) ]}

As you may have noticed, the above code is hard to understand and quite hard-coded too.

  • Hard-coded because, even if the ´dev´ template is documented as ´:method :url :status :response-time ms – :res[content-length]´, it’s really embedded into the code and mixed up with extraneous bits rather than being declared into some option and used like any other MorganJS template is.
  • Hard to understand because the function object is being used as a cache for its own executions which entail a compilation step whose raison d’être I still have to grasp. I could be wrong, but this one could be a clear example of over-engineering.

However my biggest disappointment was that there is no way of reusing the colored ´:status´ token nor the coloring functionality, neither directly, by calling a method, nor indirectly, by copy-pasting some code. A total fail. 🙁

Googling “terminal colors” I eventually got to this Unix StackExchange answer, which I used to write this:

{[ .color-factory | 1.hilite(=javascript=) ]}

A nice collateral about my ´ColorFactory´ function is that I can use it also in the console like this:

{[ .console-log | 1.hilite(=javascript=) ]}

to get something like this:

Screen Shot 2015-11-08 at 16.12.17

Finally, I was able to customize Morgan with this:

{[ .morgan-factory | 1.hilite(=javascript=) ]}

and use it like this:

{[ .new-setup | 1.hilite(=javascript=) ]}

to get something like this:

Screen Shot 2015-11-08 at 16.51.54

 

How to improve filters with promises

I had been programming a filters setup for the node API of a MEAN stack app.

Having this ´User´ model:

// user.model.js (complete)

var mongoose = require('mongoose');

var schema   = new mongoose.Schema({
    name: String,
    admin: Boolean
});

module.exports = mongoose.model('User', schema);

It allowed a ´User´ controller like this:

// user.controller.js (complete)

var fields = [
    'name', 
    function (admin) { 
        return !!admin.length; 
    }
];

var Item = require('./user.model');
var Controller = require(global.absPath + '/app/shared/CRUD.controller');
module.exports = Controller(Item, fields);

The meaning should be straightforward: copy the ´name´ field as is and make the ´admin´ field a proper boolean. That was made possible by this:

// CRUD.controller.js (excerpt)

module.exports = CRUD_Controller;

function CRUD_Controller(Item, fields) {
    //...
    function Create(req, res) {

        var item = new Item();

        CopyFields(fields, req.body, item);

        item.save(function(err) {

            if (err) {
                return res.send(err);
            }

            res.json({
                message: 'Item created!'
            });

        });

    }


    function CopyFields(fields, data, item) {

        (fields || []).forEach(function(field) {

            switch (typeof field) {

                case 'string':
                    item[field] = data[field];
                    break;

                case 'function':
                    var matches = String(field).match(/^functions*(s*(w+)s*)/);
                    if (!(matches && matches[1])) {
                        console.log('Expected a function with only one argument.');
                        return;
                    }
                    var name = matches[1];
                    item[name] = field(data[name]);
                    break;

            }

        });

    }
    //...
}

Then I wanted to add a ´password´ field to the ´User´ model. For storing it I decided to go with Strong Password Hashing with Node.js Standard Library. Properly translated to JavaScript and slightly tweaked I got this:

// hash.js (complete)

var crypto = require('crypto');

module.exports = Hash;

return;

function Hash(options, callback) {

    // Default options.plaintext to a random 8-character string
    if (!options.plaintext) {
        return crypto.randomBytes(8, function(err, buf) {
            if (err) {
                return callback(err);
            }
            options.plaintext = buf.toString('base64');
            Hash(options, callback);
        });
    }

    // Default options.salt to a random 64-character string (512 bits)
    if (!options.salt) {
        return crypto.randomBytes(64, function(err, buf) {
            if (err) {
                return callback(err);
            }
            options.salt = buf.toString('base64');
            Hash(options, callback);
        });
    } 

    // Default options.iterations to 10k
    if (!options.iterations) {
        options.iterations = 10000;
    }

    // Default options.digest to sha1
    if (!options.digest) {
        options.digest = 'sha1';
    }

    crypto.pbkdf2(options.plaintext, options.salt, options.iterations, 64, options.digest, function(err, key) {
        if (err) {
            return callback(err);
        }
        options.algorithm = 'PBDFK2';
        options.key = key.toString('base64');
        callback(null, options);
    });

}

So my ´User´ model became this:

// user.model.js (complete)

var mongoose = require('mongoose');

var schema   = new mongoose.Schema({
    name: String,
    password: {
        algorithm:  String,
        digest:     String,
        iterations: Number,
        salt:       String,
        key:        String
    },
    admin: Boolean
});

module.exports = mongoose.model('User', schema);

Have you noticed that the ´Hash´ function relies on the asynchronous´crypto.pbkdf2´ function? That’s just standard, so I wasn’t going to use the synchronous version on a second thought.

Then my problem was:

How do I make these filters work with deferred values?

Ta-da! Promises:

// user.controller.js (complete)

var Promise = require('es6-promise').Promise;
var fields = [
    'name', 
    function (password) { 
        return new Promise(function (resolve, reject) {
            var Hash = require(global.absPath + '/app/components/auth/hash');
            Hash({plaintext: password}, function (error, result) {
                if (error) {
                    reject(Error(error));
                } else {
                    delete result.plaintext;
                    resolve(result);
                }
            });
        });
    }, 
    function (admin) { 
        return !!admin.length; 
    }
];

var Item = require('./user.model');
var Controller = require(global.absPath + '/app/shared/CRUD.controller');
module.exports = Controller(Item, fields);

To make that work I had to change a bit the ´CRUD´ controller.

The first change was to separate the filtering from the assignment, so that I could later use the ´Promise.all´ method which allows to synchronize promises and values as well. That implied to pass from a ´CopyFields´ function which filters and assigns each value in turn to a ´FilterFields´ function which filters all values at once, thus making the assignments directly in the ´Create´ function.

// CRUD.controller.js (broken excerpt) 
 
module.exports = CRUD_Controller; 
 
function CRUD_Controller(Item, fields) { 
    //... 
    function Create(req, res) {

        FilterFields(fields, req.body, function (fFields) {
            var item = new Item();

            fFields.forEach(function (fField) {
                item[fField.name] = fField.value;
            });

            item.save(function(err) {

                if (err) {
                    return res.send(err);
                }

                res.json({
                    message: 'Item created!'
                });

            });
        });

    }


    function FilterFields(fields, data, callback) {

        Promise
            .all((fields || []).map(Filter))
            .then(callback)
            .catch(function (error) {
                console.log(error);
            });


        function Filter(field) {
            var result;

            switch (typeof field) {

                case 'string':
                    result = {
                        name: field,
                        value: data[field]
                    };
                    break;

                case 'function':
                    var matches = String(field).match(/^functions*(s*(w+)s*)/);
                    if (!(matches && matches[1])) {
                        console.log('Expected a function with only one argument.');
                        return;
                    }
                    result = {
                        name: matches[1],
                        value: field(data[matches[1]])
                    };
                    break;

            }

            return result;
        }

    }
    //... 
}

The second change was to add a needed special treatment for my promises. You may have noticed that, in the ´case ‘function’:´ above, ´result.value´ can be a promise BUT that won’t make ´result´ a promise itself!! So the code above wouldn’t work yet, because it would complete ´Promise.all´ before getting the hashed password. Finally, I got this:

// CRUD.controller.js (working excerpt)

module.exports = CRUD_Controller; 
 
function CRUD_Controller(Item, fields) { 
    //... 
    function Create(req, res) {

        FilterFields(fields, req.body, function (fFields) {
            var item = new Item();

            fFields.forEach(function (fField) {
                item[fField.name] = fField.value;
            });

            item.save(function(err) {

                if (err) {
                    return res.send(err);
                }

                res.json({
                    message: 'Item created!'
                });

            });
        });

    }


    function FilterFields(fields, data, callback) {

        Promise
            .all((fields || []).map(Filter))
            .then(callback)
            .catch(function (error) {
                console.log(error);
            });


        function Filter(field) {
            var result;

            switch (typeof field) {

                case 'string':
                    result = {
                        name: field,
                        value: data[field]
                    };
                    break;

                case 'function':
                    var matches = String(field).match(/^functions*(s*(w+)s*)/);
                    if (!(matches && matches[1])) {
                        console.log('Expected a function with only one argument.');
                        return;
                    }
                    result = {
                        name: matches[1],
                        value: field(data[matches[1]])
                    };
                    if (stuff.isPromise(result.value)) {
                        var promise = new Promise(function (resolve, reject) {
                            var name = result.name;
                            result.value.then(function (value) {
                                resolve({
                                    name: name,
                                    value: value
                                });
                            }).catch(function (error) {
                                reject(Error(error));
                            });
                        });
                        result = promise;
                    }
                    break;

            }

            return result;
        }

    }
    //...
}

The added lines make ´result´ a promise if ´result.value´ is one: ´result´ will eventually resolve to the expected result. BTW, the ´stuff.isPromise´ method is the classical ´object.then && typeof object.then == ‘function’´.

How to scrape AJAX trees

DISCO, European Dictionary of Skills and Competences, offers the user a tree to be searched or browsed. Inspecting the tree nodes, we see that concepts are contained in LI elements with an liItem class. Executing $(‘.liItem’).length in the console we get 676. They claim instead to collect more than 104000 concepts. A bold claim?

A better look at the tree structure reveals that some concepts have a data-loaded attribute set to true and some set to false. In particular, true denotes readily available nodes (downloaded with the initial page load) and false denotes nodes that require an AJAX call before being displayed. Leaf nodes are always of the former kind, but internal nodes can be of both kinds. Would we get those 104000 concepts if we unfolded all the false nodes?

We’ll try. Along the way we’ll also store all nodes into a different structure, something more portable than bare HTML. JSON seems a good option. Ironically, DISCO uses getJSON to download HTML snippets. To summarize, we are now going to store all the HTML tree of DISCO into a JSON structure.

As you probably understood by reading some of my last articles, I had decided to scrape the DISCO tree by means of the support provided by jsFiddle. That was before I discovered the existence of Custom JavaScript Snippets in Google Chrome Developer Tools. Apparently they’ve been there for quite some time, now almost two years !

Screen Shot 2014-06-21 at 14.44.43

Too bad for me. At least it was fun to make DISCO’s page behave outside of DISCO’s server.

Here is the snippet I came up with:

{[ .snippet | 1.hilite(=javascript=) ]}

There are only a few things to note:

  • I’ve put a limit as a guard to safely try things out
    • with recursive structures –like trees– it’s very useful to limit actions to a small amount of nodes before going full monty
    • this limit is just how many nodes to visit, you can start with a low number like 20 or 50 and see how it works
    • you should get quite a long list of messages output to the console, and if all was fine, the last message will be the result
  • The result is a hash of node ids as keys and node objects as values
    • for example, window.disco.nodes[16901] is
      {[ .node | 1.hilite(=javascript=) ]}
      which corresponds to this node in DISCO’s page
      Screen Shot 2014-06-21 at 19.09.54
  • The functions download(node) and download_children(node, children) are mutually recursive
    • their arguments are coherent, i.e. node is an LI element and children is an array of LI elements
    • the latter is not integrated into the former because we need to provide the same treatment to both children readily available and those that will be in the future
    • they start visiting from the two roots –horizontal_skills and vertical_skills– and drill down into the tree structure
  • The UI is never updated by the snippet, instead all the state is automatically kept in memory by the recursive descent
    • if you unfolded aesthetic sensitivity (node 16091) in the tree yourself between two executions with a small number of nodes (say 20), you’d get two different results
    • the first result would (probably) not show aesthetic sensitivity children while the second result would (probably) not show the last two nodes of the first result, thus keeping the number of nodes stuck to the given limit
    • if you want to go back to the initial mint state, a simple reloading won’t be enough without deleting first the session cookie
  • Finally, you can run JSON.stringify(window.disco) and get a nice JSON string which you can copy and paste somewhere and save to a file
    • the hash to string conversion is gonna need some minutes… so many in fact that I left the browser working “indefinitely” (half an hour?)
    • the resulting string is humungous too: 3.785.133 bytes (3,8 MB on disk).

Conclusions

The execution of the above snippet with a limit of 105000 nodes takes around 3 minutes on my MacBook Air with 4GB RAM. At the end, you’ll discover that the last node was number 7380 !!

Wow, that’s a huge difference from the claimed “more than 104000 concepts”. How can it be?

Even considering that they provide a multilingual thesauri with 11 languages and they could have inflated 7380 * 11 times = 81180, there is still around 28% of missing concepts. Could they have added the number of phrases? No, because they separately claim “approximately 36000 example phrases”. They could have instead added the number of synonyms.

{[ .synonyms | 1.hilite(=javascript=) ]}

Running the above code we get 3443 synonyms, which added to 7380 concepts make for 10823 terms, which inflated 11 times make for 119053 terms in all languages.

  • 7380 * 11 + 3443 + X * 10 = 104000
  • X = (104000 – 84623) / 10
  • X = 1937.7 = 56% * 3443 = 26% * 7380

Hm, I don’t know. It seems to me that they “confused” concepts with terms and at the same time, while English synonyms count is around 47% of concepts, in all other languages synonyms count is around 26% of concepts, which is ostensibly much less.

All in all, 7380 concepts is a good number but it’s only the 7% of what they claim.