Skip to content

Instantly share code, notes, and snippets.

@amanahuja
Created May 16, 2014 22:20
Show Gist options
  • Save amanahuja/0c2b6f086eed9d6c2fe0 to your computer and use it in GitHub Desktop.
Save amanahuja/0c2b6f086eed9d6c2fe0 to your computer and use it in GitHub Desktop.
Plotting a Categorical Variable in matplotlib with pandas
"""
Plotting a categorical variable
----------------------------------
`df` is a pandas dataframe with a timeseries index.
`df` has a column `categorical` of dtype object, strings and nans, which is a categorical variable representing events
----------------------------------
>>> print df[:5]
categorical
date
2014-03-15 14:56:50 users
2014-03-15 14:56:50 users
2014-03-15 14:57:15 users
2014-03-15 14:56:56 photos
2014-03-15 14:57:10 photos
>>> print "Type:", df.categorical.dtype
Type: object
>>> print "Unique vals: ", df.categorical.unique()
Unique vals: [users photos streams nan]
"""
# 1. We want to plot `df.categorical` to see the distribution of events over time
figure()
# create a mapping from categorical values to unique integers:
integer_map = dict([(val, i) for i, val in enumerate(set(df.categorical))])
# plot using this integer map, applied over `categorical`
ax = df.categorical.apply(lambda x: integer_map[x]).plot(
marker='.',linestyle='', alpha=0.2,
)
# Set y tick locations
from matplotlib.ticker import FixedLocator
fl = FixedLocator(range(len(integer_map)))
ax.yaxis.set_major_locator(fl)
# Set y tick labels
ax.yaxis.set_ticklabels([str(k) for k in integer_map.viewkeys()])
# 2. We want to see a histogram of value counts
figure()
df.categorical.value_counts().plot(kind='bar')
title('Histogram of events by event type')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment