Skip to content

Instantly share code, notes, and snippets.

View jerielizabeth's full-sized avatar

Jeri Elizabeth jerielizabeth

View GitHub Profile
@jerielizabeth
jerielizabeth / gist:4008975
Created November 3, 2012 21:48
Install beautiful Soup
pip install beautifulsoup4
@jerielizabeth
jerielizabeth / gist:4009376
Created November 3, 2012 23:55
original html file for 43rd congress search
<!-- saved from url=(0053)http://bioguide.congress.gov/biosearch/biosearch1.asp -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Congressional Biographical Directory</title></head>
<body background="./43rd-congress_files/paper1.gif" text="#000000">
<table border="1" cellpadding="0" cellspacing="0" width="100%">
<tbody><tr>
<td width="100%" valign="TOP" bgcolor="#990000"><center><img src="./43rd-congress_files/topbanner.jpg" border="0"></center></td>
</tr></tbody></table>
@jerielizabeth
jerielizabeth / gist:4009383
Created November 3, 2012 23:58
names and emails, 43rd congress
"ADAMS, George Madison",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000035
"ALBERT, William Julian",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000074
"ALBRIGHT, Charles",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000077
"ALCORN, James Lusk",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000079
"ALLISON, William Boyd",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000160
"AMES, Adelbert",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000172
"ANTHONY, Henry Bowen",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000262
"ARCHER, Stevenson",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000274
"ARMSTRONG, Moses Kimball",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000283
"ARTHUR, William Evans",http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000304
@jerielizabeth
jerielizabeth / script.py
Created November 4, 2012 00:40
BeautifulSoup Script
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
clean_list = []
links = soup.find_all('a')
for link in links:
@jerielizabeth
jerielizabeth / soupex1.py
Created November 4, 2012 02:45
prettify example
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
print(soup.prettify())
@jerielizabeth
jerielizabeth / soupex2.py
Created November 4, 2012 03:08
display links
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
links = soup.find_all('a')
for link in links:
@jerielizabeth
jerielizabeth / soupex3.py
Created November 4, 2012 03:11
print results to file
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
people = soup.find_all('a')
@jerielizabeth
jerielizabeth / soupex4.py
Created November 4, 2012 03:12
get text example
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
print(soup.get_text())
@jerielizabeth
jerielizabeth / soupex2a.py
Created November 4, 2012 03:31
isolating links first
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
links = soup.find_all('a')
for link in links:
print link
@jerielizabeth
jerielizabeth / soupscript.py
Created November 27, 2012 02:32
Script, part 1
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
links = soup.find_all('a')
for link in links: