So I was writing an article on screen scraping and one of the things that came up is "How do you mitigate against screen scraping?" I think this is actually in interesting question, which brought up the idea of a side project that maybe someone else has time for.
The idea is that to prevent screen scraping, the page being scraped must be mutated as to break a scraper. To do that, you could do things like alter selectors of css resources and html (for instance, changing all ids of "signupButton" to "sarah-goldfarb") or change the structure of the page. Maybe it also mutates the structure of the DOM.
A small node proxy that does this around streams would be particularly cool.
Things you're likely to learn:
- More about CSS selector precedence. Which selectors can you easily mutate? Which are harder? Are there examples of selectors which you can't mutate?
- What can XPath do? XPath is a mechanism for querying tree structures (notably XML). If you were g