Skip to content

Instantly share code, notes, and snippets.

@jacobeturpin
Created November 2, 2020 21:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jacobeturpin/9fe1f5a0beb8f904a140eb465c03ab00 to your computer and use it in GitHub Desktop.
Save jacobeturpin/9fe1f5a0beb8f904a140eb465c03ab00 to your computer and use it in GitHub Desktop.
Sample and Read CSV using Pandas
"""Sample and read sample into pandas"""
import subprocess
import urllib
import pandas as pd
# City of Raleigh, NC Open Data -- Building Permits (2020-11-02)
URI = "https://opendata.arcgis.com/datasets/bdfad82b15344d37beb28d7f90b6c4be_0.csv"
FULL_FN = "full-dataset.csv"
SAMPLE_FN = "sample.csv"
urllib.urlretrieve(URI, FN)
bashCommand = f"shuf -n 100000 {FN} > {SAMPLE_FN}"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
df = pd.read_csv(SAMPLE_FN)
print(df.head())
# Do processing here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment