This has been quite a challenge but in the end we made the http://disco-tools.eu/disco2_portal/terms.php page knee before us. It was a way paved with roadblocks. Many were kind of expected and easy to smash, but other ones (end of 2nd, 5th and 6th attempts) were not only unexpected but also quite difficult to de-atomize and required special care.
First we downloaded all the page from my AS-IS proxy instead of YQL, using my simple $.proxyGet() plugin. Then we appended the page into the body of a jsFiddle. As expected, we got a very rough result: no images, no stylesheets, lots of errors in the console.
Many of those errors depended upon the fact that the page was using relative URLs instead of absolute, so we fixed that before appending. As for jQuery, the page downloaded and used version 1.7.2 but we needed jQuery before, so we removed it from the page before appending and downgraded our own jQuery to 1.7.2 to guarantee compatibility.
That was all pretty standard stuff.
The first big roadblock appeared at the end of the second attempt. Scripts didn’t execute in the order they were appended. We declared yepnope.js an External Resource in our jsFiddle, and sandwiched the $().append() into a couple of additional lines. First we removed all the scripts from the page and then we added them back by means of yepnope. Thus we got rid of all the errors related to appending the page into the body.
The second big roadblock appeared at the end of the fifth attempt. Scripts issued relative AJAX calls. We injected a beforeSend callback into our AJAX setup. Then, for selected URLs, we made requests pass through my proxy instead of pointing to their original server. Thus we got rid of all the errors related to toggling already downloaded nodes.
That’s all Folks!
Last but not least
My proxy only passes the request through to the intended server if it finds a secret code in the URL. If it does not, instead of just giving up, it tries to fulfill the current request by sending back a previously cached response. But if no cache exists, then it will return an empty response. What does that mean for you? It means that for some clicks in the thesaurus you will get expected responses (cache hits), so that you can fiddle around for real. If instead you unexpectedly saw the last error (i.e. the one at the end of the sixth attempt), then it should be because of a cache miss.