acviana/2014-08-29-swc-ufrgs-python-demo.md

## 2014-08-29-swc-ufrgs-python-demo.md

      
    Raw
  

              2014-08-29-swc-ufrgs-python-demo.md
            
          
    Numpy

Load the Numpy module:
import numpy as np
Use the Numpy genfromtxt function to load the data, manually defining the column names.
array = numpy.genfromtxt('logfile.txt', 
                         names=['DATA', 'TEMP', 'UMIDADE', 'PRESSAO','LUMINOSIDADE'])
Now check out the size of the array.
array.shape
And now the type and names of each column.
array.dtype
We can access each column by name.
array['TEMP']
And we can easily perform some statistics
print 'Max temp is {}'.format(array['TEMP'].max())
print 'Min temp is {}'.format(array['TEMP'].min())
print 'Mean temp is {}'.format(array['TEMP'].mean())
print 'STD of temp mean is {}'.format(array['TEMP'].std())
Matplotlib

Set up the IPython Notebook inline plotting
%pylab inline

First we import the matplotlib plotting module.
import matplotlib.pyplot as plt
And we can make a simple plot.
figure()
plot(array['DATA'], array['TEMP'], 'r.')
xlabel('DATA')
ylabel('TEMP')
title('TEMP vs DATA')
show()
Pandas

Pandas is an awesome data analysis tool. Check out the webpage here: http://pandas.pydata.org/
Installing Pandas

First we have to install Pandas on out virtual machines. In order to do this we first need to install the Python package installing utility pip. In the folder where you're keeping your Python work from the course run the following:
$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py
Now, because our VM's have very little memory, close everything else you have open (other shells, web browsers, etc.) and then run the following to install pandas:
$ sudo pip install pandas
If the VM seems frozen be patient, they very rarely crash, just give it a minute.
Using Pandas

Import Pandas and set up the pretty plotting.
import pandas as pd
pd.options.display.mpl_style = 'default'
Read in the data file as a tab seperated table, skipping the first 5 rows, defining an index, manually providing the names, and parsing the index as date information.
df = pd.read_table('../shell/data.txt', 
    skiprows=5, 
    index_col='DATA', 
    names=['DATA', 'TEMP', 'UMIDADE', 'PRESSAO','LUMINOSIDADE'],
    parse_dates=True)
Check out the first few rows
df.head()
Check out some quick stats
df.describe()
Make a nice plot, dropping the 'PRESSURE' column because it streches the Y-axis out too much.
df.drop('PRESSAO', 1).plot()