misho-kr/Intermediate Importing Data in Python.md

## Intermediate Importing Data in Python.md

      
    Raw
  

              Intermediate Importing Data in Python.md
            
          
    Intermediate Importing Data in Python

In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.
Lead by Hugo Bowne-Anderson, Data Scientist at DataCamp
Importing data from the Internet

The web is a rich source of data from which you can extract various types of insights and findings. In this chapter, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.

The urllib package provides interface for fetching data across the web, urlopen()
urlretrieve() performs a GET
URL, HTTP and HTTPS, HTML

# urllib module
from urllib.request import urlopen, Request url = "https://www.wikipedia.org/"
request = Request(url)
response = urlopen(request)
html = response.read() response.close()

# Requests library
import requests
url = "https://www.wikipedia.org/" r = requests.get(url)
text = r.text

Scrapping web data with BeautifulSoup

from bs4 import BeautifulSoup
import requests
url = 'https://www.crummy.com/software/BeautifulSoup/' r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)

print(soup.title)
print(soup.get_text())

for link in soup.find_all('a'):
  print(link.get('href'))
Interacting with APIs to import data from the web

In this chapter, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.

Loading and exploring JSON documents

import json
with open('snakes.json', 'r') as json_file:
  json_data = json.load(json_file)

for key, value in json_data.items():
  print(key + ':', value)

Connecting to an APIs
Parts of URL

import requests
url = 'http://www.omdbapi.com/?t=hackers' r = requests.get(url)
json_data = r.json()
for key, value in json_data.items():
  print(key + ':', value)
Diving deep into the Twitter API

In this chapter, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.

Twitter has a number of APIs
Streaming APIs
Tweets are returned as JSONs
Using Tweepy

import tweepy

# Store OAuth authentication credentials in relevant variables

# Pass OAuth details to tweepy's OAuth handler
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Define stream listener class
class MyStreamListener(tweepy.StreamListener): 
  def __init__(self, api=None):
    super(MyStreamListener, self).__init__()
    # ...
  def on_status(self, status):
    # ...

# Create Streaming object and authenticate
l = MyStreamListener()
stream = tweepy.Stream(auth, l)

# This line filters Twitter Streams to capture data by keywords:
stream.filter(track=['apples', 'oranges'])

Twitter data to DataFrame
Plotting your Twitter data