Skip to content

Instantly share code, notes, and snippets.

@alecxe
Created September 4, 2014 19:34
Show Gist options
  • Save alecxe/8e14ab756d0b48d8ba30 to your computer and use it in GitHub Desktop.
Save alecxe/8e14ab756d0b48d8ba30 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
import mechanize
import cookielib
b = mechanize.Browser()
b.set_handle_refresh(True)
b.set_debug_redirects(True)
b.set_handle_redirect(True)
b.set_debug_http(True)
cj = cookielib.CookieJar()
b.set_cookiejar(cj)
b.addheaders = [
('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'),
('Host', 'www.fangraphs.com'),
('Referer', 'http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2014&month=0&season1=2014&ind=0')
]
b.open("http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2014&month=0&season1=2014&ind=0")
def is_form1_form(form):
return "id" in form.attrs and form.attrs['id'] == "form1"
b.select_form(predicate=is_form1_form)
b.form.find_control(name='__EVENTTARGET').readonly = False
b.form.find_control(name='__EVENTARGUMENT').readonly = False
b.form['__EVENTTARGET'] = 'LeaderBoard1$cmdCSV'
b.form['__EVENTARGUMENT'] = ''
print b.submit().read()
@stohlern
Copy link

I am trying to do what I think you're trying to do here. I am trying to download the .csv files from fangraphs.com. I have a script that works using selenium, but it needs a browser, and I'm looking for a headless solution like this one. When you run this, does it actually download the .csv file somewhere? When I run it, the print command just dumps the HTML content, not the .csv file. Thanks, and nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment