How to workaround the Same Origin policy

How do you programmatically import stuff from a web page on another domain? I have this problem from time to time, mostly when I want to try something in JavaScript without worrying about deploying a proper setup. And recently I had this problem once again. And once again I hit the Same Origin policy thick wall.

Recent browsers support CORS, but for fiddling with a third party page you can’t reasonably ask their owners to ask their hosting providers to allow requests from your domain of choice, like jsfiddle.net, so your only chance is JSONP. And when you look around for a solution involving JSONP and Same Origin policy, you inevitably find YQL, used as a proxy.

select * from html where url="--URL--"

YQL, which can act as a free proxy by using a simple query, is very attractive. If you are lucky, you can find some example of how to use it with jQuery. So, after some programming work you get to a nice jQuery plugin, like this:

$.proxyGet = function ( url, callback, options ) {

    // reject anything that doesn't resemble a "plain" URL or a null (see below)
    if (! (url === null || /^(https?:|//)/.test(url))) {
        throw new SyntaxError('Expected a URL.');
    }
    
    // allow detection of current SSL mode by starting the url with '//'
    if (url && url.indexOf('//') === 0) {
        url = window.location.protocol + url;
    }
    
    // you are strongly advised to choose a different proxy. YQL "from html" is a toy !! 
    var yql_proxy = {
        
        // a url with or without a '--URL--' placeholder
        // -- the placeholder will be replaced by the url param
        // -- use a null url param if the proxy url is requestable as is
        url: 'http://query.yahooapis.com/v1/public/yql' + '?q=' + encodeURIComponent('select * from html where url="--URL--" and compat="html5" and xpath="*"') + '&format=xml',
        
        // null (no action) or a function that takes response data and returns clean data
        cleanup: function (data) {
            data = data.results && data.results[0];
            if (! data) return null;
            data = data.replace(/
/ig, "r").replace(/
/ig, "n");
            return data;
        },
        
        // null (no action) or a function that takes clean data and returns filtered data
        filter: null
    };
    
    // use YQL proxy by default, but allow for customizations
    options = $.extend(yql_proxy, options || {});
    
    // make the jsonp request
    var jsonp_url = options.url.replace('--URL--', encodeURIComponent(url)) + '&callback=?';
    $.getJSON( jsonp_url, function (data) {
        if ($.isFunction(callback)) {
            if ($.isFunction(options.cleanup)) {
                data = options.cleanup(data);
                if ($.isFunction(options.filter)) {
                    data = options.filter(data);
                }
            }
            callback(data);
        }
    } );
    
};

A little diversion. Disco-tools is a European effort for putting order into the messy terminology used for skills and competences, only made worse by so many different languages used for European CVs. So they decided to publish a nice thesaurus of all possible skills in the world, properly translated to many languages. Now at version 2, you can browse the thesaurus like a tree. If you look at the network traffic while clicking around, you will notice that some ajax calls are issued to get a piece of subtree like this (source view):

{
    "parent": "16091",
    "html": "<li class="noBulletsLi"><ul class="innerUl"><li class="liItem cross " data-termid="node_15215_0" data-loaded="true" data-isterm="1" data-tnr="15215" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;"><span class="itemToBeAdded" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;">feeling for form and space</span></li><li class="liItem end " data-termid="node_15213_0" data-loaded="true" data-isterm="1" data-tnr="15213" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;"><span class="itemToBeAdded" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;">sense of colour</span></li></ul></li>"
}

Then you could want to use my nice jQuery plugin for getting cross domain stuff using YQL. Notice that disco-tools.eu is very messy. They return a JSON object but send a wrong (text/html) Content-Type. We then need further filtering here because YQL seems unable to refrain from honoring the received Content-Type and always “corrects” the response by enclosing it into the body of a fictitious HTML page. So you’d get to something like this:

var ajax_url = "http://disco-tools.eu/disco2_portal/ajax/ajaxCalls.php?ajaxFunction=loadNode&prefix=node_&node=16091&lang_id=0&documents=false";

$.proxyGet(ajax_url, function (data) {console.log("--- using YQL:n" + data);}, {
    filter: function (data) {
        if (! data) return null;
        data = data.replace(/^<html><head/><body>(.*?)</body></html>$/, "$1");
        return data;
    }
});

For some vanilla JSON objects, the provided filter is enough to extract the good JSON part. But sometimes, like in this case, the wrong Content-Type clashes against the unescaped HTML special characters in the complex JSON object, and YQL’s “correction” is even much broader, because it absolutely cannot stand author’s errors. So you’d get to something like this:

--- using YQL:
{
    "parent": "16091",
    "html": "<li class="&quot;noBulletsLi&quot;"><ul class="&quot;innerUl&quot;"><li class="&quot;liItem" cross="cross" data-isterm="&quot;1&quot;" data-loaded="&quot;true&quot;" data-termid="&quot;node_15215_0&quot;" data-tnr="&quot;15215&quot;" ie="ieCheck();if(ie" onclick="&quot;var"> -1 &amp;&amp; ie &lt;=8)window.event.cancelBubble = true;"&gt;<span class="&quot;itemToBeAdded&quot;" ie="ieCheck();if(ie" onclick="&quot;var"> -1 &amp;&amp; ie &lt;=8)window.event.cancelBubble = true;"&gt;feeling for form and space&lt;/span&gt;&lt;/li&gt;<li class="&quot;liItem" data-isterm="&quot;1&quot;" data-loaded="&quot;true&quot;" data-termid="&quot;node_15213_0&quot;" data-tnr="&quot;15213&quot;" end="end" ie="ieCheck();if(ie" onclick="&quot;var"> -1 &amp;&amp; ie &lt;=8)window.event.cancelBubble = true;"&gt;<span class="&quot;itemToBeAdded&quot;" ie="ieCheck();if(ie" onclick="&quot;var"> -1 &amp;&amp; ie &lt;=8)window.event.cancelBubble = true;"&gt;sense of colour&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;"
}</span></li></span></li></ul></li>

In the above snippet I had to color it like HTML, which is how YQL thinks it is. For example, you can see that all open HTML tags get properly closed.

THAT IS EXTREMELY HARD TO FIX IN GENERAL !!!

I had no other choice but to deploy my own proxy and give up on YQL.

<?php

$url = $_GET['url'];
$url = preg_replace('/^(?!https?b)/', 'http://', $url);
if (preg_match('/^(https?://)?([w-]+.)*bandowebsit.esb/', $url)) {
  stop_now();
}
$cached = dirname(__FILE__) . '/cached/' . md5($url);

$pass = '73e69002e737fb0d1c7a19a3159d0634';
$code = $_GET['code'];
if ($pass == md5($code)) {
  $result = file_get_contents($url);
  $response = array(
    'url'     => $url,
    'header'  => $http_response_header,  // see also http://php.net/manual/en/reserved.variables.httpresponseheader.php#113361
    'result'  => $result,
  );
  $json_response = json_encode($response);
  file_put_contents($cached, $json_response);
}
elseif (file_exists($cached)) {
  $json_response = file_get_contents($cached);
}
else {
  stop_now();
}

respond($json_response);

//----------------------------------------------------------------------------------------------------------------------

// Sends the response and exits.
function respond($json_response) {
  $callback = preg_replace('/W+/', '', $_GET['callback']);
  if ($callback) {
    header('Content-Type: application/javascript');
    echo $callback . '(' . $json_response . ');';
  }
  else {
    header('Content-Type: application/json');
    echo $json_response;
  }
  exit;
}

// Stops execution by returning null in the expected result format.
function stop_now() {
  respond(array('result' => null));
}

And use it like this:

var my_proxy = {
    url: '.../?url=--URL--&code=...',
    cleanup: function (data) {
        data = data.result;
        return (data ? data : null);
    }
};

$.proxyGet(ajax_url, function (data) {console.log("--- using my proxy:n" + data);}, my_proxy);

to get the same result the browser would get:

--- using my proxy:
{
    "parent": "16091",
    "html": "<li class="noBulletsLi"><ul class="innerUl"><li class="liItem cross " data-termid="node_15215_0" data-loaded="true" data-isterm="1" data-tnr="15215" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;"><span class="itemToBeAdded" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;">feeling for form and space</span></li><li class="liItem end " data-termid="node_15213_0" data-loaded="true" data-isterm="1" data-tnr="15213" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;"><span class="itemToBeAdded" onclick="var ie = ieCheck();if(ie > -1 && ie <=8)window.event.cancelBubble = true;">sense of colour</span></li></ul></li>"
}