Skip to content

Instantly share code, notes, and snippets.

@nikolay-shenkov
Created May 22, 2015 16:57
Show Gist options
  • Save nikolay-shenkov/82026d421a5ba9516c66 to your computer and use it in GitHub Desktop.
Save nikolay-shenkov/82026d421a5ba9516c66 to your computer and use it in GitHub Desktop.
Office hour - read csv data in Python
# -*- coding: utf-8 -*-
"""
Tue May 19, 2015
Reading tabular data in Python
DA ND Office hour
We will use the improved turnstile dataset and
compare several different methods for reading in the data.
There are 42650 rows in the dataset.
"""
import pandas as pd
FILENAME = "turnstile_data_master_with_weather_v2.csv"
# very fast and memory efficient
df = pd.read_csv(FILENAME)
print "dataframe shape: ", df.shape
print "column names: ", df.columns
# more details and a lot of input options are available here:
# http://pandas.pydata.org/pandas-docs/stable/io.html
# Some of the more commonly used input parameters to read_csv:
# sep: separator to split fields on
# names: list of column names to use
# quotechar: the character used to denote quoted items
# in IPython we can time the execution of a function
%timeit -n 5 pd.read_csv(FILENAME)
import csv
entries = []
# use with statement so that the file is automatically closed
# after you exit the with statement
with open(FILENAME, 'r') as f:
reader = csv.reader(f)
# you may need to manually skip the header:
# reader.next()
for line in reader:
data = line[3]
entries.append(data)
# example without using 'with':
f = open(FILENAME, "r")
# do something with f
f.close()
print entries[:6]
print len(entries)
# read the entire file and store it in a single string
# loads the contents of the file in memory
with open(FILENAME, 'r') as f:
contents = f.read()
print "type: ", type(contents)
print "a very long string of length ", len(contents)
print "the first 600 characters: ", contents[:600]
all_lines_file = []
with open(FILENAME, 'r') as f:
for line in f: all_lines_file.append(line):
# each line is a string
# use split() to convert it into a list
print all_lines_file[:5]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment