Created
March 7, 2019 01:38
-
-
Save davegotz/9e017ef479b68bb3c403e77210b062d4 to your computer and use it in GitHub Desktop.
Parsing CSV data from the web.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import urllib.request | |
import ssl | |
# Required because the Census website | |
ssl._create_default_https_context = ssl._create_unverified_context | |
# If you follow the instructions on this Stackoverflow page, you should be able to omit the line above. | |
# https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394 | |
# | |
# Go to the folder where Python is installed, e.g., in my case it is installed in the Applications folder with the | |
# folder name 'Python 3.6'. Now double click on 'Install Certificates.command'. After that error was gone. | |
# Read data from the URL | |
url = 'https://www2.census.gov/programs-surveys/popest/datasets/2010-2014/state/asrh/scprc-est2014-18+pop-res.csv' | |
content = urllib.request.urlopen(url) | |
# Go line by line through the data, which is represented as "byte strings" until | |
# we tell the computer how to decode it. We'll decode it as 'utf-8' data (8-bit | |
# UNICODE). See https://en.wikipedia.org/wiki/UTF-8 | |
for byte_line in content: | |
# Get one line of the CSV file, convert it to a string, and remove the newline. | |
text_line = byte_line.decode('utf-8').rstrip() | |
# Split into tokens | |
tokens = text_line.split(',') | |
# Print tokens with tabs as separators, using ljust to pad strings with spaces. | |
for i in range(len(tokens)): | |
print(tokens[i].ljust(10)+'\t', end='') | |
# Add a newline at the end of the line. | |
print() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment