Created
January 31, 2017 18:42
-
-
Save GeekyShiva/305589b1c756dcf52613235f0fad6f47 to your computer and use it in GitHub Desktop.
This is a very basic scraper to understand how a scraper interacts with DOM components and how to implement one.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import urllib | |
import urllib.request | |
from bs4 import BeautifulSoup | |
theurl = "Profile Url which you want to scrap //Put your profile" | |
thepage = urllib.request.urlopen(theurl) | |
soup = BeautifulSoup(thepage,"html.parser") | |
print(soup.title.text) | |
""" | |
for link in soup.findAll('a'): | |
print(link.get('href')) | |
print(link.text) | |
""" | |
print(soup.find('div',{"class":"ProfileHeaderCard"}).find('p').text) | |
i = 1 | |
for tweets in soup.findAll('div',{"class":"content"}): | |
print(i) | |
print(tweets.find('p').text) | |
i=i+1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This requires python 3.5 and above