Last active
August 17, 2016 20:53
-
-
Save martinapugliese/56c69f18f2c8c4c3f20dad3c36a2bbbc to your computer and use it in GitHub Desktop.
Plotting the frequencies in a FreqDist in NLTK instead of the counts.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Copyright (C) 2016 Martina Pugliese | |
def plot_freqdist_freq(fd, | |
max_num=None, | |
cumulative=False, | |
title='Frequency plot', | |
linewidth=2): | |
""" | |
As of NLTK version 3.2.1, FreqDist.plot() plots the counts and has no kwarg for normalising to frequency. Work this around here. | |
INPUT: | |
- the FreqDist object | |
- max_num: if specified, only plot up to this number of items (they are already sorted descending by the FreqDist) | |
- cumulative: bool (defaults to False) | |
- title: the title to give the plot | |
- linewidth: the width of line to use (defaults to 2) | |
OUTPUT: plot the freq and return None. | |
""" | |
tmp = fd.copy() | |
norm = fd.N() | |
for key in tmp.keys(): | |
tmp[key] = float(fd[key]) / norm | |
if max_num: | |
tmp.plot(max_num, cumulative=cumulative, | |
title=title, linewidth=linewidth) | |
else: | |
tmp.plot(cumulative=cumulative, title=title, linewidth=linewidth) | |
return |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment