Skip to content

Instantly share code, notes, and snippets.

@davegotz
Created March 7, 2019 01:38
Show Gist options
  • Save davegotz/9e017ef479b68bb3c403e77210b062d4 to your computer and use it in GitHub Desktop.
Save davegotz/9e017ef479b68bb3c403e77210b062d4 to your computer and use it in GitHub Desktop.
Parsing CSV data from the web.
import urllib.request
import ssl
# Required because the Census website
ssl._create_default_https_context = ssl._create_unverified_context
# If you follow the instructions on this Stackoverflow page, you should be able to omit the line above.
# https://stackoverflow.com/questions/35569042/ssl-certificate-verify-failed-with-python3/43855394#43855394
#
# Go to the folder where Python is installed, e.g., in my case it is installed in the Applications folder with the
# folder name 'Python 3.6'. Now double click on 'Install Certificates.command'. After that error was gone.
# Read data from the URL
url = 'https://www2.census.gov/programs-surveys/popest/datasets/2010-2014/state/asrh/scprc-est2014-18+pop-res.csv'
content = urllib.request.urlopen(url)
# Go line by line through the data, which is represented as "byte strings" until
# we tell the computer how to decode it. We'll decode it as 'utf-8' data (8-bit
# UNICODE). See https://en.wikipedia.org/wiki/UTF-8
for byte_line in content:
# Get one line of the CSV file, convert it to a string, and remove the newline.
text_line = byte_line.decode('utf-8').rstrip()
# Split into tokens
tokens = text_line.split(',')
# Print tokens with tabs as separators, using ljust to pad strings with spaces.
for i in range(len(tokens)):
print(tokens[i].ljust(10)+'\t', end='')
# Add a newline at the end of the line.
print()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment