Skip to content

Instantly share code, notes, and snippets.

@acviana
Last active August 29, 2015 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save acviana/188a4672996984127d28 to your computer and use it in GitHub Desktop.
Save acviana/188a4672996984127d28 to your computer and use it in GitHub Desktop.
Python for data analysis demo for UFRGS Software Carpentry Workshop on 2014-08-29

Numpy

Load the Numpy module:

import numpy as np

Use the Numpy genfromtxt function to load the data, manually defining the column names.

array = numpy.genfromtxt('logfile.txt', 
                         names=['DATA', 'TEMP', 'UMIDADE', 'PRESSAO','LUMINOSIDADE'])

Now check out the size of the array.

array.shape

And now the type and names of each column.

array.dtype

We can access each column by name.

array['TEMP']

And we can easily perform some statistics

print 'Max temp is {}'.format(array['TEMP'].max())
print 'Min temp is {}'.format(array['TEMP'].min())
print 'Mean temp is {}'.format(array['TEMP'].mean())
print 'STD of temp mean is {}'.format(array['TEMP'].std())

Matplotlib

Set up the IPython Notebook inline plotting

%pylab inline

First we import the matplotlib plotting module.

import matplotlib.pyplot as plt

And we can make a simple plot.

figure()
plot(array['DATA'], array['TEMP'], 'r.')
xlabel('DATA')
ylabel('TEMP')
title('TEMP vs DATA')
show()

Pandas

Pandas is an awesome data analysis tool. Check out the webpage here: http://pandas.pydata.org/

Installing Pandas

First we have to install Pandas on out virtual machines. In order to do this we first need to install the Python package installing utility pip. In the folder where you're keeping your Python work from the course run the following:

$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py

Now, because our VM's have very little memory, close everything else you have open (other shells, web browsers, etc.) and then run the following to install pandas:

$ sudo pip install pandas

If the VM seems frozen be patient, they very rarely crash, just give it a minute.

Using Pandas

Import Pandas and set up the pretty plotting.

import pandas as pd
pd.options.display.mpl_style = 'default'

Read in the data file as a tab seperated table, skipping the first 5 rows, defining an index, manually providing the names, and parsing the index as date information.

df = pd.read_table('../shell/data.txt', 
    skiprows=5, 
    index_col='DATA', 
    names=['DATA', 'TEMP', 'UMIDADE', 'PRESSAO','LUMINOSIDADE'],
    parse_dates=True)

Check out the first few rows

df.head()

Check out some quick stats

df.describe()

Make a nice plot, dropping the 'PRESSURE' column because it streches the Y-axis out too much.

df.drop('PRESSAO', 1).plot()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment