At my company, we are building infrastructure that enables us to perform computations involving large bodies of text data.
To get familiar with the tech involved, I started with a simple experiment: using [Common Crawl metadata corpus][1], count crawled URLs grouped by top level domain (TLD).