Skip to content

Instantly share code, notes, and snippets.

View jerielizabeth's full-sized avatar

Jeri Elizabeth jerielizabeth

View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import nltk
with open('sample.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)
.block {
width: 30%;
float:left;
background-color: gray;
margin: 10px;
}
.navigation {
background-color:green;
width: 33%;
<section class="block">
<h2>some header </h2>
<p>Some text</p>
<a href="#">Read More</a>
</section>
<section class="block">
<h2>another header </h2>
<p>Some text</p>
<a href="#">Read More</a>
@jerielizabeth
jerielizabeth / script2.py
Last active October 13, 2015 07:08
Script, Part 2
from bs4 import BeautifulSoup
import csv
#open the html file and create a soup object
soup = BeautifulSoup(open("43rd-congress.html"))
#get rid of the final link that is outside the table
final_link = soup.p.a
final_link.decompose()
@jerielizabeth
jerielizabeth / finalresult.csv
Created November 27, 2012 23:44
End Result
ADAMS, George Madison 1837-1920 Representative Democrat KY 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000035
ALBERT, William Julian 1816-1879 Representative Republican MD 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000074
ALBRIGHT, Charles 1830-1880 Representative Republican PA 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000077
ALCORN, James Lusk 1816-1894 Senator Republican MS 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000079
ALLISON, William Boyd 1829-1908 Senator Republican IA 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000160
AMES, Adelbert 1835-1933 Senator Republican MS 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000172
ANTHONY, Henry Bowen 1815-1884 Senator Republican RI 43(1873-1874) http://bioguide.congress.gov/scripts/biodisplay.pl?index=A000262
ARCHER, Stevenson 1827-1898 Representative Democrat MD 43(1873-1874) htt
@jerielizabeth
jerielizabeth / soupscript.py
Created November 27, 2012 02:32
Script, part 1
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
links = soup.find_all('a')
for link in links:
@jerielizabeth
jerielizabeth / soupex2a.py
Created November 4, 2012 03:31
isolating links first
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
links = soup.find_all('a')
for link in links:
print link
@jerielizabeth
jerielizabeth / soupex4.py
Created November 4, 2012 03:12
get text example
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
print(soup.get_text())
@jerielizabeth
jerielizabeth / soupex3.py
Created November 4, 2012 03:11
print results to file
from bs4 import BeautifulSoup
soup = BeautifulSoup (open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
people = soup.find_all('a')