Skip to content

Instantly share code, notes, and snippets.

@girishramnani
Last active September 20, 2015 16:06
Show Gist options
  • Save girishramnani/61c5f36b604783062cd2 to your computer and use it in GitHub Desktop.
Save girishramnani/61c5f36b604783062cd2 to your computer and use it in GitHub Desktop.
import sys
import bs4
import requests
from itertools import count
def get_unlimited(link):
for i in count(1):
response = requests.get("{}&page={}".format(link,i))
yield response.content.decode()
link = sys.argv[1]
print(link)
past_attr="Nothing"
for content in get_unlimited(link):
soup = bs4.BeautifulSoup(content)
attr = soup.find("span","mb-timestamp")
if not attr:
break
past_attr = attr
print(past_attr)
@girishramnani
Copy link
Author

usage

python file.py "the link"

better use the " in the link or else in some link the whole link wouldnt be taken in.

NOTE - if you have lxml installed then just change the line

soup = bs4.BeautifulSoup(content)

to

soup = bs4.BeautifulSoup(content,"lxml")

as you know lxml is faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment