Skip to content

Instantly share code, notes, and snippets.

@laszlolm

laszlolm/main.py Secret

Last active November 1, 2015 00:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save laszlolm/2a48550d05520eb15069 to your computer and use it in GitHub Desktop.
Save laszlolm/2a48550d05520eb15069 to your computer and use it in GitHub Desktop.
import bs4
import re
txt = open("index.html")
def read1k():
return txt.read(65443553)
for data in iter(read1k, ''):
soup = bs4.BeautifulSoup(data, "html5lib")
links = [a.attrs.get('href') for a in soup.select('div.list-itemDescription > a')];
names = [a for a in soup.select('div.list-itemDescription > a')];
descriptions = [p for p in soup.select('p.list-itemDescription')];
i = 0
length = len(descriptions)
while(i < len(links)):
ez = links[i] + "\t" + re.sub(' +',' ',names[i].text).replace('\n', '\t').replace('\r', '').encode('ascii', 'ignore').decode('ascii') + "\n"
with open("output.csv", "a") as myfile:
myfile.write(ez)
i+= 1
print("The function finished with %d found followers. You can download 'output.csv' now")%(len(links))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment