Created
May 16, 2014 22:20
-
-
Save amanahuja/0c2b6f086eed9d6c2fe0 to your computer and use it in GitHub Desktop.
Plotting a Categorical Variable in matplotlib with pandas
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Plotting a categorical variable | |
---------------------------------- | |
`df` is a pandas dataframe with a timeseries index. | |
`df` has a column `categorical` of dtype object, strings and nans, which is a categorical variable representing events | |
---------------------------------- | |
>>> print df[:5] | |
categorical | |
date | |
2014-03-15 14:56:50 users | |
2014-03-15 14:56:50 users | |
2014-03-15 14:57:15 users | |
2014-03-15 14:56:56 photos | |
2014-03-15 14:57:10 photos | |
>>> print "Type:", df.categorical.dtype | |
Type: object | |
>>> print "Unique vals: ", df.categorical.unique() | |
Unique vals: [users photos streams nan] | |
""" | |
# 1. We want to plot `df.categorical` to see the distribution of events over time | |
figure() | |
# create a mapping from categorical values to unique integers: | |
integer_map = dict([(val, i) for i, val in enumerate(set(df.categorical))]) | |
# plot using this integer map, applied over `categorical` | |
ax = df.categorical.apply(lambda x: integer_map[x]).plot( | |
marker='.',linestyle='', alpha=0.2, | |
) | |
# Set y tick locations | |
from matplotlib.ticker import FixedLocator | |
fl = FixedLocator(range(len(integer_map))) | |
ax.yaxis.set_major_locator(fl) | |
# Set y tick labels | |
ax.yaxis.set_ticklabels([str(k) for k in integer_map.viewkeys()]) | |
# 2. We want to see a histogram of value counts | |
figure() | |
df.categorical.value_counts().plot(kind='bar') | |
title('Histogram of events by event type') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment