dmkolobov/Ermine Row Types.md

## Ermine Row Types.md

      
    Raw
  

              Ermine Row Types.md
            
          
    Row Types

When talking about Ermine, we often hear about "row types". What are row types
exactly and how can we use them to reason about Ermine programs? This
post is a brief attempt to make that connection.
First, we'll need some imports:
module Tutorial where 

import Prelude 

import Syntax.IO -- provides syntax for working with the IO monad.

-- some stuff that let's us work with databases
import Runners as R 
import Scanners as S 
import Internal.SMEnv

import Constraint
import Field using getF
import Record 

import Vector using type Vector
Records and header types

Like Haskell, Ermine provides records as a means of abstraction. A record
is some data structure with named fields that you can get and set. Unlike records
in Haskell, Ermine records satisfy the following conditions:


All fields of a record must be some primitive type such as Int, String,
Bool, etc.


Field names must be declared prior to their use.


Let's define some fields that we might want to use to store information about a user:
field firstName : String 
field lastName  : String 
field userID    : Int    
We can use these definitions to define a record type representing a user:
type UserRec = Record (| firstName, lastName, userID |)
And then we can construct records of this type:
dmitry : UserRec
dmitry = { firstName = "Dmitry"  
         , lastName  = "Kolobov"
         , userID    = 0
         }
In addition to the usual kinds of types suchs as *( the kind of concrete types ) or
*->* ( the kind of higher-order types accepting one argument. ), Ermine has a rho kind,
which can be thought of as the kind of types which serve as a header for another type with fields.
These header types can be constructed via (|field1,field2,...|) notation.
type User = (| firstName, lastName , userID|) 
And we can use this type to create more consise definitions of our record:
type UserRecAlt = Record User
type UserRecShort = {..User} 
If you type :k Record into the REPL you'll see that Record is a higher-order
type which accepts a type of kind rho as its single type argument,
and returns a concrete type of kind * :

>> :k Record
Record : rho -> * 

This means that we can apply the higher-kinded Record type, to a type of kind rho, like User,
and obtain a concrete type of kind *, like UserRecAlt. Our original definition of UserRec just inlines
the header type definition in the definition of the record.
We can get value out of these records via the getF function:
dmitryID = getF userID dmitry
dmitryFN = getF firstName dmitry 
dmitryLN = getF lastName dmitry
Let's define a function for getting's a user's ID
getID = getF userID
Turns out, our function works on more than our UserRec and UserRecAlt types.
It works for any record which has a userID field!
field foo : String

idtwelve = getID { foo = "hello, world" , userID = 12 }
ideleven = getID { userID = 11 }
Why does that work?
Reasoning about header types

Given some  header type s, we can define a constraint on all header
types r like:

f : forall (r : rho). (exists t. r <- (s,t)) => _

The constraint is read as:

"the header type r can be partitioned into the disjoint header types s and t"

where disjoint means that the intersection of the set of fields represented by the types s
and t is empty, and their union gives us the set of fields represented by the type r.
We can (almost always) omit the existentials and kinds in the signature for brevity:

f : r <- (s,t) => _

Let's use this syntax to we give a type signature to the following synonym for getID:
getID1 : (exists t. r <- ((|userID|), t)) -- the header type `r` can be partitioned into (|userID|) and 
                                          -- some other header type `t`.
      => Record r -> Int 
getID1 = getID

idAlt = getID1 dmitry
The function getID works on values of type UserRec because the record's header type r can be
partitioned into (|userID|) and (|firstName, lastName|)
Similarly, in the definitions of:

idtwelve: The record's header type r can be partitioned into (|userID|) and (|foo|).
ideleven: The record's header type r can be partitioned into (|userID|) and (||) (the empty rho type).


Note that for any header type r, the constraint exists t. r <- (t,(||)) is trivially satisfied
by taking t = r. We'll use this fact later.

Now that we understand that, let's poke around our field definitions a bit:

>> :t firstName 
Field (|firstName|) String 
>> :t lastName 
Field (|lastName|) String
>> :t userID 
Field (|userID|) Int 
>> :k Field 
rho -> * -> * 

So whenever we define a field, the Field type-constructor is applied with a header type
containing just that field as the first argument, and the field's primitive type as the second argument.
Now you should be able to see what the type of getF means:

getF : r <- (h,t) => Field h a -> {..r} -> a

We can get a primitive type a out of a record with the header type r, if and only if
r can be partitioned into the field's header type h( which contains just that field ), and some other
header type t.
This makes it pretty easy to determine what a function operating on things with fields will do.
For instance, lets take a look at signature of the appendR function:

-- Record.e
appendR : c <- (a, b) -> {..a} -> {..b} -> {..c}

Well, we know that the header type c contains all the fields of the headers a and b. We also know
that a and b have no fields in common. We can deduce from this that the function takes two records which
have no fields in common, and return a record containing all the fields of the first record, as well as all the fiels
of the second.
Suppose that our service is currently based only in Antartica. We plan
to expand at the end of the month.
We decide to add a location header type for things that might occur in different locations.
We have a single record  with location = "Antartica", because to date, everything on our platform
happened in Antartica.
field location : String 

type Location = (|location|)

initialLocation : Record Location 
initialLocation = { location = "Antartica" }

augDmitry : Record (|firstName, lastName, userID, location|)
augDmitry = appendR dmitry initialLocation
Header types can be reified via the Row constructor:
type UserRow = Row User 

userRow : UserRow
userRow = { firstName, lastName, userID } -- literal syntax for construction a `Row` type.

type LocationRow = Row Location 

locationRow : LocationRow 
locationRow = { location }
Which can be used to do other interesting things with records, such as projecting
a header onto the record, or removing the fields of a header from a record:
justDmitry : UserRec
justDmitry = projectT userRow augDmitry

justDmitry' : UserRec
justDmitry' = exceptT locationRow augDmitry
Again, it is instructive to look at the constraints placed on the header types involved:

-- Row.e
projectT : r <- (h,t) => Row h -> {..r} -> {..h} 
exceptT : r <- (h,t) => Row h -> {..r} -> {..t}

If you keep in mind that the header types on the right side of the <- are disjoint
sets of fields, whose union is the header type on the left side of the <-, then you can use

set operations to deduce the fields present on any type in the rest of the signature.
Relations

Now that we have a basic understanding of row types, we can start to understand the relational calculus
included in Ermine. I mentioned that the rho kind is the kind of types which serve as a header
for another type with fields.
Records are one such type with fields. Another is Relation, which represents a result from a SQL query.
We can create a relation with a header r by passing the relation function a list of records with the header r:
users : Relation User
users = relation 
          [{ firstName = "dmitry", lastName = "kolobov", userID = 0 }
          ,{ firstName = "jane",   lastName = "smith",   userID = 1 }
          ,{ firstName = "monty",  lastName = "python",  userID = 2 }
          ,{ firstName = "joe",    lastName = "bicycle", userID = 3 }]
Values of type Relation r for some header r are compiled into a SQL query which
returns a result set with columns corresponding to the fields in r. The various
operations on relational types form an algebra for constructing nested SQL queries.
Ermine attempts to abstract over the particular database engine used, though SQL Server
is the most battle-tested.
Because Ermine was designed as a reporting language over massive data-sets, the timing of
and method by which these SQL queries are evaluated and iterated is heavily restricted. For the most part,
relations are meant to  be run during the course of rendering some layout constructed via the
charting and reporting API defined in Layout.Chart and Layout.Report. The result-sets of such
queries are iterated over in a lazy, streaming fashion and serialized for consumption by a writer such
as the HTML writer, which uses the Highcharts API and a library called data-tables to render the data
in an HTML report.
We can cheat the overall design and run these things in the IO monad from the REPL. But first,
I need to introduce( briefly, incompletely, possibly wrong-ly ) two concepts:

scanners : DB-specific objects which handle execution of a SQL query and iteration over the
resulting records.
runners : Provides an algebra for running a suspendable computation in some monad.

We create a runner by choosing a DB engine and then applying the related runner function(defined in
the module Runners) with a JDBC connection string to your database instance.
A scanner is created by binding an SMEnv value in the IO monad and passing it
to the scanner function(defined in the module Scanners) for the chosen DB engine.
url = "jdbc:sqlserver://localhost:1433;databaseName=vagrantdb;user=vagrant;password=v@grantDB"
runner = sqlServer_R url 

-- print a vector of records.
runRel : forall (rel : rho -> *) (r   : rho )
       . Relational rel => rel r -> IO (Vector {..r})
runRel r = smEnv >>= frun
  where 
    frun e = let 
               scanner = sqlServer_S e 
             in 
               run_R runner (runRelation scanner r)

-- print the underlying SQL query.
dumpRel : forall (rel : rho -> *) (r   : rho )
        . Relational rel => rel r -> IO String
dumpRel r = smEnv >>= fdump
  where 
    fdump e = let 
                scanner = sqlServer_S e 
              in 
                dumpQuery_S scanner r
Since we already know what the vector representation of users looks like, lets take a look at
the compiled sql query.
select [firstName],[lastName],[userID] 
from (values ('dmitry','kolobov',0), 
             ('jane','smith',1), 
             ('monty','python',2), 
             ('joe','bicycle',3)) as lit([firstName]
                                        ,[lastName]
                                        ,[userID])
So the relation function just outputs the passed in list as a SQL values clause, and then selects
all the columns comprising the header of the relation.
So let's do something else. Let's create another relation for the contact information:
field phone : String 
field email : String

type ContactInfo = (|email,phone,userID|)

contacts : Relation ContactInfo  -- could also be written as [..ContactInfo] or [email,phone,userID]
contacts = relation 
             [{phone = "000-000-0000", email = "dmitry@example.com", userID = 0 }
             ,{phone = "111-111-1111", email = "jane@example.com"  , userID = 1 }
             ,{phone = "222-222-2222", email = "monty@example.com" , userID = 2 }
             ,{phone = "333-333-3333", email = "joe@example.com"   , userID = 3 }
             ]
We want to join the users and contacts relation, and we can do so with the function join:
usersWithContacts : [userID, firstName, lastName, phone, email]
usersWithContacts = join users contacts
Take a look at the generated SQL query. A SQL JOIN clause is emitted:
usersWithContactsQuery = dumpRel usersWithContacts
Again, look at the type of the join function, which represents a "natural join":

-- Relation.e
join : (a <- (d,e)
       ,b <- (e, f)
       ,c <- (d,e,f)
       ) => [..a] -> [..b] -> [..c]

What this says is that we can join a Relation a with a Relation b, if and only if the header
types a and b contain the columns of the "key" header type e, along with some "extra" columns. The header
type d contains the "extra" columns of a, and the header type f contains the "extra" columns of b.
We can see that the resulting relation contains all of the columns of a and b, including
the join key.
In our definition of usersWithConstactsQuery, we are joining the users and contacts relations.
The header of users can be split into the join key (|userID|) and extra columns (|firstName, lastName|),
wheras the header of contacts can be partitioned into the join key (|userID|) and extra columns
(|firstName, lastName|). Here is a pseudo-code function signature representing this consisely:

join : ( User        <- ((|firstName, lastName|), (|userID|))
       , ContactInfo <- ((|userID|),(|phone, email|))
       , r           <- ((|firstName, lastName|),(|userID|),(|phone,email|))
       ) -> [..User] 
         -> [..ContactInfo] 
         -> [..r]

Many of the functions defined in the module Relation can be picked apart in this way.
Let's do something else. Remember that for any header r, the constraint r <- (t,(||))
is satisfied whenever r = t.
Let's substitute the empty header type for the type e in the signature of join:

joiny : (a <- (a',(||))
        ,b <- (b',(||))
        ,c <- (a',b',(||))
        ) => [..a] -> [..b] -> [..c]

if we simplify:

joiny : c <- (a,b) => [..a] -> [..b] -> [..c]

We see that this looks a lot like the signature of appendR, but for relations.
In fact, this is actually the definition of a cartesian product on two relations.
It takes two relations, and returns a third relation where every row from the
second relation is appended to every row of the first relation.
cart : c <- (a,b) => [..a] -> [..b] -> [..c]
cart = join
We used appendR to add a default location = "Antartica" field to our Record User value.
Can we use this cartesian product function to do something similar for our Relation User?
Yes!
Let's use this to define a function which takes some relation, and adds columns corresponding to
the fields of some record( whose header is disjoint from the relation's header ), with
values initialized to the record's field values.
migrate : (c <- (a,b)) => [..a] -> {..b} -> [..c]
migrate r = cart r . relation . singleton 

usersWithContactsAndLocation = migrate usersWithContacts initialLocation