Skip to content

Instantly share code, notes, and snippets.

@phobson
Forked from dmcdougall/email.txt
Created September 19, 2012 05:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phobson/3747864 to your computer and use it in GitHub Desktop.
Save phobson/3747864 to your computer and use it in GitHub Desktop.
Nabble boxplot email + status of implementation

Proposed features:

What I think we should (in the order I think we should do it)

  • allow users to pass in a custom function that will compute the median and conf. intervals (my idea).
    • ready for a PR/code review
    • pretty much done
    • DONE: fix docstring about conflicting options (usermedians+conf_intervals vs ci_fxn)
  • option for monochrome boxplot
    • ready for a PR/code review
    • started -- interactive tinkering indicates it's in good shape
    • DONE: write a proper test, add baseline images, update whats_new.rst
  • option for fillable boxplots
    • ready for a PR/code review
    • started, but takes an MPL color instead of True/False
    • basic interactive testing seems good except when patch_artist=True (fixed).
    • DONE: figured out patch artists
    • DONE: write a proper test, add baseline images, update whats_new.rst
  • option for also showing the means on the plot
    • ready for a PR/code review
    • started as he implemented it (no testing yet)
    • I propose that we plot this as point that's clearly different from the fliers (outliers). This is because in heavily skewed distributions (e.g. lognormal), the mean could possibly be beyond Q1 and Q3. A line floating off in space would look weird and best and be confusing at worst. Therefore the values for this kwarg should be a marker symbol definition instead of True or False.
    • DONE: implement the marker-based version I described above
    • DONE: basic interactive testing
    • DONE: write a proper test, add baseline images, update whats_new.rst
  • no_box option
    • ready for a PR/code review
    • email proposed to turn off boxes when the CIs go beyond q1 and q3 or when a boxplot is very small compared to the the axis limits
    • the diff we got only turned off boxes when the median's confidence interval went outside the IQR and was not user-callable. Work on this is started and in good shape according to interactive testing.
    • DONE: add logic to set a threshold that will turn a box off if it's 'small'. This depends on a datascale kwargs that let's the user specify whether data should becompared on a linear or log scale.
    • DONE: set the axis scale based on a datascale. Currently using self.[y,x]axis.set_scale('log') makes the scale squished down to the bottom -- doesn't seem to perform the transform. (Works now. FIX = using self.set[y,x]scale!)
    • DONE: write a proper test, add baseline images, update whats_new.rst

What I think we should skip:

  • option for fixed notch-size
    • Dubious statistical validity
    • users can do this with the new usermedians and conf_intervals kwargs and a bit of elbow grease
  • option to manually set axes limits
    • MPL makes this sort of thing easy enough to do after the call to boxplot
    • Boxplot is getting cluttered enough w/o this

Testing Procedure

  1. Navigate to home directory
  2. Execute python -c "import matplotlib as mpl; mpl.test()"
  3. Copy new boxplot images (~/result_images/test_axes/boxplot.*) to ~/sources/matplotlib/lib/matplotlib /tests/baseline_images/test_axes/boxplot.*
  4. Repeat #2 and make sure the relevant tests pass

Original email:

Dear matplotlib developers,

I am attaching an updated boxplot method for axes.py that I would like to suggest as a replacement for the present one in 0.90. My code does not change any of the existing functionality, and simply adds a few options to the existing code:

  • I wanted to be able to draw boxplots entirely in black.

  • When plotting several boxes on the same axes, where some boxes are very small in comparison to others, the box outline and notch can totally mask the median line information. So I added an option to draw no box if the box is sufficiently small so that the median line is clear. The inner whisker limits provide the only information about q1 and q3 in this case.

  • I personally don't find the adaptive notch size very attractive for publication-quality plots. I added a feature to keep the notch size fixed (notch==2). This means that very small boxes will foul up, so if the fixed notch size is smaller than the box height then the "no box" option is automatically forced on. Fixed notch size is set using the 'notchsize' parameter.

  • I wanted to include an additional (and optional) line in the box for the mean average of the data (but not associated with an additional notch).

  • I wanted to make the boxes white-fillable in case they are placed over background lines.

  • I wanted to set my own axes limits externally, and prevent ticks on the plot so that I can label the graphs myself and add additional info.

The attached PNG graphic exemplifies these suggestions with the new code for a plot that is appearing in an IEEE journal article (please ignore the black triangle that has nothing to do with the boxplotting). All the new features involve optional parameters to the method signature, defaulting in their absence to the existing functionality.

The code for the method includes an updated docstring that reflects how these features are used. Note that I've also made a few minor edits to the existing parts of the docstring text for clarity: in particular, regarding the return value, as the existing statement apparently was not updated when the method was changed to return a dictionary.

Please feel to discuss these ideas with me. Regards, Rob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment