Skip to content

Instantly share code, notes, and snippets.

@martinapugliese
Last active August 17, 2016 20:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save martinapugliese/56c69f18f2c8c4c3f20dad3c36a2bbbc to your computer and use it in GitHub Desktop.
Save martinapugliese/56c69f18f2c8c4c3f20dad3c36a2bbbc to your computer and use it in GitHub Desktop.
Plotting the frequencies in a FreqDist in NLTK instead of the counts.
# Copyright (C) 2016 Martina Pugliese
def plot_freqdist_freq(fd,
max_num=None,
cumulative=False,
title='Frequency plot',
linewidth=2):
"""
As of NLTK version 3.2.1, FreqDist.plot() plots the counts and has no kwarg for normalising to frequency. Work this around here.
INPUT:
- the FreqDist object
- max_num: if specified, only plot up to this number of items (they are already sorted descending by the FreqDist)
- cumulative: bool (defaults to False)
- title: the title to give the plot
- linewidth: the width of line to use (defaults to 2)
OUTPUT: plot the freq and return None.
"""
tmp = fd.copy()
norm = fd.N()
for key in tmp.keys():
tmp[key] = float(fd[key]) / norm
if max_num:
tmp.plot(max_num, cumulative=cumulative,
title=title, linewidth=linewidth)
else:
tmp.plot(cumulative=cumulative, title=title, linewidth=linewidth)
return
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment