In this (optional) MP you will prepare a corpus of free English newswire text for use in later assignments. This is intended to provide practical Python programming experience, but will not be graded.
For this task we will use a subset of the News Crawl corpus consisting of data from the year 2009. This (very large: 3.7 GB) file is available here as a gzipped TAR file.
- Download the file: