Skip to content

Instantly share code, notes, and snippets.

@amake
Created August 24, 2014 12:09
Show Gist options
  • Save amake/5aa1c1774a9df4b49d91 to your computer and use it in GitHub Desktop.
Save amake/5aa1c1774a9df4b49d91 to your computer and use it in GitHub Desktop.
Journey to the West (西游记) character count
'''
Count the number of characters in Journey to the West
'''
import urllib2
URL = 'http://www.sdmz.net/xy/%03d.htm'
CHAPTERS = 100
def do_count():
chars = set()
for n in xrange(1, CHAPTERS + 1):
print 'Chapter', n
chars = chars.union(urllib2.urlopen(URL % n).read().decode('gb2312', 'ignore'))
print len(chars)
if __name__ == '__main__':
do_count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment