lynaghk/gist:971172f891a8de633203 Secret

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    The system should use sensible defaults to fill out overly concise (i.e., incomplete) plot specifications.
For instance, to compare the price distributions of diamonds by color (a categorical variable), the user might write:
{:data diamonds
 :mapping {:x :color :y :price}
 :layers [{:geom :boxplot}]}
By convention taking :x to be the independent variable and grouping along it, the full specification can be inferred as
{:data diamonds
 :mapping {:x :group
           :min :min, :max :max, :lower :lower, :middle :middle, :upper :upper}
 :group :color
 :stat (stat/boxplot :dimension :price)
 :layers [{:geom :boxplot}]}
where the boxplot's aesthetics are all mapped to computed values: :group (from the grouper fn), and :min, :lower, ... from the boxplot stat.
However, in some situations the system should arguably infer different mappings from what the user has provided.
E.g., if instead of grouping by a categorical variable the user wants to look at the price distribution by carat, she should be able to write:
{:data diamonds
 :mapping {:x :carat :y :price}
 :layers [{:geom :boxplot}]}
and the system should expand to
{:data diamonds
 :mapping {:x :group/midpoint, :width :group/width
           :min :min, :max :max, :lower :lower, :middle :middle, :upper :upper}
 :group (group/bin :dimension :carat)
 :stat (stat/boxplot :dimension :price)
 :layers [{:geom :boxplot}]}
So as to center each boxplot above each x bin and give it the same width as the underlying data's x-bin (or, say 90% of the width, for visual nicety).
If the user wants to look explicitly in half-carat intervals, she might write:
{:data diamonds
 :mapping {:x :carat :y :price}
 :group (group/bin :bin-width 0.5)
 :layers [{:geom :boxplot}]}
And the system should still be able to infer that the grouper's :dimension should be set to (:x mapping) and that (:x mapping) should then be set to the midpoint of the bin.
I have two main questions with this approach:

Is it possible to implement cleanly using, e.g., core.logic?
Is it even a good idea?

Starting with #2; the advantages are that specifications are smaller and that users who don't care to specify the exact details of every rendering can still get reasonable things out.
Advanced users can specify more details as desired, leaving the system less room to fill in the blanks.
The tricky part here is how to ensure that "wrong" mappings can be overridden by the system (e.g. the user says :x :carat when they really mean "group by carat and set x to be the midpoint of each group) without giving advanced users grief a la Clippy.
For the implementation details, what's the smallest syntax we need to handle all possible rewrites?
Can we write a structured sieve that both matches and fills in?
[mapping, geom, grouper, stat                      ;;<- this line matches
 new-mapping, new-geom, new-grouper, new-stat]     ;;<- this line replaces (if not wildcard)

;;boxplots with numeric x
[{:x (numerico ?group) :y ?y} #geom/boxplot{} #group/bin{:dimension ?group} #stat/boxplot{:dimension ?y}
 {:x :group/midpoint, :width #(* 0.9 (:group/width %)), :y ::remove} _ _ _]

;;boxplots with categorical x
[{:x (categoricalo ?group) :y ?y} #geom/boxplot{} #group/categorical{:dimension ?group} #stat/boxplot{:dimension ?y}
 {:x :group/group, :y ::remove} _ _ _]