dkochmanski/dataframe protocol.org Secret

## dataframe protocol.org

      
    Raw
  

              dataframe protocol.org
            
          
    Polyclot 0.0.1 documentation

Overview

Polyclot is a tool to draw interactive charts in CLIM. Purpose of this
  document is to provide information on how to plot data and how to
  create new types of charts. Tool may be used by other CLIM
  applications as a library or as a standalone utility to render charts.
Developer guide

Coding conventions


  class names are enclosed in brackets, i.e <data-frame>
  all classes should be defined with an utility define-class
  types which are not classes are enclosed in percentage characters,
    i.e %index%

Utilities

(define-class <name> superclass slots &rest options)

Defines a class <name>, its constructor function <name> (trampoline to
  make-instance) and a variable <name> which contains the class
  object. Accepts additional option :stealth-mixin which makes this
  class a superclass of the victim.
(define-class <record-positions> (<data-frame>)
  ((ink :initform clim:+red+))
  (:stealth-mixin clim:output-record-history)
  (:documentation "OUTPUT-RECORD position scatterplot."))
Possibility to mix into the existing class allows interpreting objects
  defined in unrelated libraries as i.e a dataframe.
Data frames

Data frame represents a set of data. Data is immutable, but it is
  possible to filter rows and columns with a function sel and to add
  rows and columns with functions add-rows! and add-cols!.
Column names and row names are immutable strings which must be unique
  across the data frame. Data should be accessed with mapping functions
  map-data-frame, map-data-frame-rows and map-data-frame-cols and
  with a function ref which allows selecting a single element.
Components

<aesthetic> - mapping of a dataframe

<stat> - statistical transformation

<geom> - geometric object (design)

<mods> - collision modifiers (positional adjustment)

<scale> - mapping from data to aesthetic attributes

<coord> - mapping of the object’s position to the plot’s area

Standalone utility

Embedding in a CLIM application

As a frame

As a pane

As an output record

Extending Polyclot

Reference manual

Data Frames

Types

<data-frame>

A protocol class. All class implementing this protocol must have it as
  its superclass.
<raw-data-frame>

<sel-data-frame>

%index%

Either an integer or a string. If it is an integer it must be a valid
  index of the row or the column starting from 0. If it is a string it
  must be an existing row or column name.
<invalid-slice> (error)

<invalid-index> (error)

<row-does-not-exist> (<invalid-index>)

<col-does-not-exist> (<invalid-index>)

<insert-error> (error)

<col-name-not-unique> (<insert-error>)

<row-name-not-unique> (<insert-error>)

<row-length-mismatch> (<insert-error>)

Accessors

dims <data-frame>

Returns a data frame dimensions as two values: a number of rows and a
  number of columns in a data frame.
cols <data-frame>

Returns a data frame column names. Result type is (vector string).
rows <data-frame>

Returns a data frame row names. Result type is (vector string).
ref <data-frame> row col

Selects a single element indexed by row and col. Row may be an opaque
  object taken from map-data-frame-rows - in that case we seek a
  column in it. If either row or column are not part of a data frame
  consequences are undefined.
When row is an index function returns five values: value, column name,
  column index, row name and row index. Otherwise it returns three
  values: value, column name and column index.
(ref data-frame "Audi" "Max Speed")
(ref data-frame 42     "Max Speed")
(ref data-frame "Fiat" 42)
Function signals an error <invalid-index> for invalid indexes.
sel <data-frame> rows cols

Returns a <data-frame> which contains a slice of the original
  <data-frame> defined by rows and cols. Slice specifier:

  T
select all rows/cols
  (cons index index)
select elements between indexes
  (cons (eql t) index)
select all elements up to the index
  (cons index (eql t))
select all elements starting from the index
  (list s1 e1 s2 e2 …)
select union of slices (s1 . e1) (s2 . e2) …
  (vector index)
select elements with specified indexes

Function returns a data frame which is a “window” to the original data
  frame. To have a flattened data frame use copy-data-frame on it.
Function signals an error <invalid-slice> for invalid slice
  specifiers.
(let ((data-frame-1 (sel df (cons 10 20) #("Price" "Max speed")))
      (data-frame-2 (sel df (cons "Fiat" t) t))
      (data-frame-3 (sel df (list 10 20 "Fiat" t) #("Price")))
      (data-frame-4 (sel df t #(1 4 8))))
  #|do something|#)
Modifying the original data frame by means of add-rows! and
  add-cols! will change content of the selected data frame if the
  corresponding row or column splice specifier is open-ended toward the
  end, that is T or (cons index (eql t)).
If add-rows! or add-cols! function is invoked on a data frame
  being a “window” then function is invoked on the original data frame.
Mapping

map-data-frame <data-frame> function

Maps function over the data frame. Function should accept six
  arguments: row name, row index, data row, col name, col index and
  value.
(map-data-frame df (lambda (rname rind row cname cind val)
                     (declare (ignore rname cname))
                     (format t "[~s,~s] ~a~%" rind cind val))
                   a-data-frame)
map-data-frame-rows <data-frame> function

Maps function over a data frame rows. Function should accept three
  arguments: row name, row index and data row (opaque object).
map-data-frame-cols data-row function

Maps function over the row columns. Function should accept three
  arguments: column name, column index and value.
Destructive operators

add-rows! <data-frame> &rest name-row-pairs

Adds a new data row. Function modifies the data frame and returns the
  modified object. To avoid modification of the original data frame
  invoke the function on its copy.
(setq df (add-rows! (copy-data-frame df)
                    "Honda" '(42 15 22 :xxx "low")
                    "Audi"  '(10 12 44 :yyy "high")))
add-cols! <data-frame> &rest name-fun-pairs

Data frames are based on rows. Adding a column is an operation
  achieved by specifying a function which accepts the row name, row
  index and row data. FUN should return the column value for a
  row. Function modifies the data frame and returns the modified object.
(setq df (add-cols! df
                    "AVG" (lambda (row-name row-index row)
                            (+ (ref df row-index "Max")
                               (ref df row-index "Min"))
                            2)
                    "TYP" (lambda (row-name row-index row)
                            (if (> (ref df row-index "Seats") 3)
                                :comfort
                                :ergonomy))))
Constructors

make-data-frame cols &rest rows

Creates a data frame. Cols is a sequence of column names and rows are
  conses where car is the row name and cdr is a sequence of column
  values. Length of values must be the same as length of column names
  sequence.
(make-data-frame '(       "col1" "col2" "col3")
                 '("row1" value1 value2 value3)
                 '("row2" value1 value2 value3))
It is a thin wrapper to create a <data-frame> (exact class is not
  specified but it implements all necessary protocols).
copy-data-frame <data-frame>

Creates a new data frame with copied data (allocates new rows to store
  names and data).
(let ((new-df (copy-data-frame df)))
  (setq new-df (add-rows! new-df "Foo" '(1 2)))
  ;; add-rows called on new-df doesn't modify df.
  (ref df "Bar" 0))
join-data-frame <data-frame> <data-frame> &rest args

This function is included for completeness but is left unspecified.