Skip to content

Instantly share code, notes, and snippets.

@mwaskom
Last active May 11, 2020 19:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mwaskom/fa3638f52e92fa1a0778785bd5fa21b9 to your computer and use it in GitHub Desktop.
Save mwaskom/fa3638f52e92fa1a0778785bd5fa21b9 to your computer and use it in GitHub Desktop.
Notebook for figuring out consistent transformations from a variety of "wide" data to a long-form plot dataframe
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@phobson
Copy link

phobson commented May 11, 2020

This approach looks good and makes a lot of sense to me. I think it'd make a great addition to seaborn.

This does expand the scope of seaborn away from pretty-much-only tidy data. Is the hope to reduce the support burden from users who actually just need help tidying their data?

@mwaskom
Copy link
Author

mwaskom commented May 11, 2020

This does expand the scope of seaborn away from pretty-much-only tidy data.

Many functions handle do wide-form data (see the "Current Behavior" section), but that's not widely (heh) appreciated. Partially that is because the handling is a little bit idiosyncratic across the library. And any function that integrates with FacetGrid currently requires long-from data; fixing that is part of this refactor.

The reason that long-form data is preferred is that the mapping from variables to semantics is very explicit and predictable. The main goal here is to make the implicit mappings that you get with wide-form data formal, so they can be more predictable.

Also, seaborn is moving to keyword-only arguments but I am increasingly leaning towards the generic function signature being func(data, *, ...) so that func(data) does something useful for almost any data structure one might have at hand.

@mwaskom
Copy link
Author

mwaskom commented May 11, 2020

A big open question is whether to allow mixing of wide-form data inputs and explicit semantics, e.g. sns.boxplot(data=iris, hue="species"). On the one hand, that seems fairly handy. On the other hand, it starts to blur the distinction between wide/long data in a way that could breed confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment