Skip to content

Instantly share code, notes, and snippets.

@alts
Created April 15, 2010 06:00
Show Gist options
  • Save alts/366745 to your computer and use it in GitHub Desktop.
Save alts/366745 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# encoding: utf-8
"""
Reads the image URIs for the various graphs in Google's webmaster tools crawl
stats page, and pulls some data estimates from it.
$ python gwc_decode.py 'https://www.google.com/chart?...'
"""
import urllib
import urlparse
import sys
alphanum = 'ABCDEFGHIJKLMONPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
def readURL(url):
url_parts = urlparse.urlparse(urllib.unquote(url))
query = urlparse.parse_qs(url_parts[4])
if 'chd' not in query or 'chxl' not in query:
print 'URL is not of expected format. Should have keys "chd", "chxl"'
return
max = int(query['chxl'][0].split('|')[3].replace(',',''))
data = query['chd'][0][2:]
result = []
for datum in data:
index = alphanum.find(datum)
if index is not -1:
result.append(int((index + 1) / 62.0 * max))
print result
for arg in sys.argv[1:]:
readURL(arg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment