Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active February 2, 2020 05:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misho-kr/91e3d6bb52263d5cbdda0042c53b01af to your computer and use it in GitHub Desktop.
Save misho-kr/91e3d6bb52263d5cbdda0042c53b01af to your computer and use it in GitHub Desktop.
Summary of "Intermediate Importing Data in Python" course on Datacamp

In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.

Lead by Hugo Bowne-Anderson, Data Scientist at DataCamp

Importing data from the Internet

The web is a rich source of data from which you can extract various types of insights and findings. In this chapter, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.

  • The urllib package provides interface for fetching data across the web, urlopen()
  • urlretrieve() performs a GET
  • URL, HTTP and HTTPS, HTML
# urllib module
from urllib.request import urlopen, Request url = "https://www.wikipedia.org/"
request = Request(url)
response = urlopen(request)
html = response.read() response.close()

# Requests library
import requests
url = "https://www.wikipedia.org/" r = requests.get(url)
text = r.text
  • Scrapping web data with BeautifulSoup
from bs4 import BeautifulSoup
import requests
url = 'https://www.crummy.com/software/BeautifulSoup/' r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)

print(soup.title)
print(soup.get_text())

for link in soup.find_all('a'):
  print(link.get('href'))

Interacting with APIs to import data from the web

In this chapter, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.

  • Loading and exploring JSON documents
import json
with open('snakes.json', 'r') as json_file:
  json_data = json.load(json_file)

for key, value in json_data.items():
  print(key + ':', value)
  • Connecting to an APIs
  • Parts of URL
import requests
url = 'http://www.omdbapi.com/?t=hackers' r = requests.get(url)
json_data = r.json()
for key, value in json_data.items():
  print(key + ':', value)

Diving deep into the Twitter API

In this chapter, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.

  • Twitter has a number of APIs
  • Streaming APIs
  • Tweets are returned as JSONs
  • Using Tweepy
import tweepy

# Store OAuth authentication credentials in relevant variables

# Pass OAuth details to tweepy's OAuth handler
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Define stream listener class
class MyStreamListener(tweepy.StreamListener): 
  def __init__(self, api=None):
    super(MyStreamListener, self).__init__()
    # ...
  def on_status(self, status):
    # ...

# Create Streaming object and authenticate
l = MyStreamListener()
stream = tweepy.Stream(auth, l)

# This line filters Twitter Streams to capture data by keywords:
stream.filter(track=['apples', 'oranges'])
  • Twitter data to DataFrame
  • Plotting your Twitter data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment