Skip to content

Instantly share code, notes, and snippets.

@GeekyShiva
Created January 31, 2017 18:42
Show Gist options
  • Save GeekyShiva/305589b1c756dcf52613235f0fad6f47 to your computer and use it in GitHub Desktop.
Save GeekyShiva/305589b1c756dcf52613235f0fad6f47 to your computer and use it in GitHub Desktop.
This is a very basic scraper to understand how a scraper interacts with DOM components and how to implement one.
import urllib
import urllib.request
from bs4 import BeautifulSoup
theurl = "Profile Url which you want to scrap //Put your profile"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
print(soup.title.text)
"""
for link in soup.findAll('a'):
print(link.get('href'))
print(link.text)
"""
print(soup.find('div',{"class":"ProfileHeaderCard"}).find('p').text)
i = 1
for tweets in soup.findAll('div',{"class":"content"}):
print(i)
print(tweets.find('p').text)
i=i+1
@GeekyShiva
Copy link
Author

This requires python 3.5 and above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment