Download one by hand. See if you get the Stanford NLTK running to extract places and dates. And see if it works!
Download each html file--write a python script to do one at a time.
use the urllib2 library which can connect to the web:
read the files with urllib2.
open command to open files: open("output.html","w")
and write out to them from inside Python.
Some examples are here.
Automatically extract place names.
- Named Entity extraction: Stanford natural language toolkit. Install nltk: run python on the downloaded html files to search for places and dates (probably possible?). http://nltk.org/
Run through the files with nltk and write out lists of matched names and places.
Check and see if it works!