When talking about Ermine, we often hear about "row types". What are row types exactly and how can we use them to reason about Ermine programs? This post is a brief attempt to make that connection.
First, we'll need some imports:
module Tutorial where
import Prelude
import Syntax.IO -- provides syntax for working with the IO monad.
-- some stuff that let's us work with databases
import Runners as R
import Scanners as S
import Internal.SMEnv
import Constraint
import Field using getF
import Record
import Vector using type Vector
Like Haskell, Ermine provides records as a means of abstraction. A record is some data structure with named fields that you can get and set. Unlike records in Haskell, Ermine records satisfy the following conditions:
-
All fields of a record must be some primitive type such as
Int
,String
,Bool
, etc. -
Field names must be declared prior to their use.
Let's define some fields that we might want to use to store information about a user:
field firstName : String
field lastName : String
field userID : Int
We can use these definitions to define a record type representing a user:
type UserRec = Record (| firstName, lastName, userID |)
And then we can construct records of this type:
dmitry : UserRec
dmitry = { firstName = "Dmitry"
, lastName = "Kolobov"
, userID = 0
}
In addition to the usual kinds of types suchs as *
( the kind of concrete types ) or
*->*
( the kind of higher-order types accepting one argument. ), Ermine has a rho
kind,
which can be thought of as the kind of types which serve as a header for another type with fields.
These header types can be constructed via (|field1,field2,...|)
notation.
type User = (| firstName, lastName , userID|)
And we can use this type to create more consise definitions of our record:
type UserRecAlt = Record User
type UserRecShort = {..User}
If you type :k Record
into the REPL you'll see that Record
is a higher-order
type which accepts a type of kind rho
as its single type argument,
and returns a concrete type of kind *
:
>> :k Record Record : rho -> *
This means that we can apply the higher-kinded Record
type, to a type of kind rho
, like User
,
and obtain a concrete type of kind *
, like UserRecAlt
. Our original definition of UserRec
just inlines
the header type definition in the definition of the record.
We can get value out of these records via the getF
function:
dmitryID = getF userID dmitry
dmitryFN = getF firstName dmitry
dmitryLN = getF lastName dmitry
Let's define a function for getting's a user's ID
getID = getF userID
Turns out, our function works on more than our UserRec
and UserRecAlt
types.
It works for any record which has a userID
field!
field foo : String
idtwelve = getID { foo = "hello, world" , userID = 12 }
ideleven = getID { userID = 11 }
Why does that work?
Given some header type s
, we can define a constraint on all header
types r
like:
f : forall (r : rho). (exists t. r <- (s,t)) => _
The constraint is read as:
"the header type
r
can be partitioned into the disjoint header typess
andt
"
where disjoint means that the intersection of the set of fields represented by the types s
and t
is empty, and their union gives us the set of fields represented by the type r
.
We can (almost always) omit the existentials and kinds in the signature for brevity:
f : r <- (s,t) => _
Let's use this syntax to we give a type signature to the following synonym for getID
:
getID1 : (exists t. r <- ((|userID|), t)) -- the header type `r` can be partitioned into (|userID|) and
-- some other header type `t`.
=> Record r -> Int
getID1 = getID
idAlt = getID1 dmitry
The function getID
works on values of type UserRec
because the record's header type r
can be
partitioned into (|userID|)
and (|firstName, lastName|)
Similarly, in the definitions of:
idtwelve
: The record's header typer
can be partitioned into(|userID|)
and(|foo|)
.ideleven
: The record's header typer
can be partitioned into(|userID|)
and(||)
(the empty rho type).
Note that for any header type
r
, the constraintexists t. r <- (t,(||))
is trivially satisfied by takingt = r
. We'll use this fact later.
Now that we understand that, let's poke around our field definitions a bit:
>> :t firstName Field (|firstName|) String >> :t lastName Field (|lastName|) String >> :t userID Field (|userID|) Int >> :k Field rho -> * -> *
So whenever we define a field, the Field
type-constructor is applied with a header type
containing just that field as the first argument, and the field's primitive type as the second argument.
Now you should be able to see what the type of getF
means:
getF : r <- (h,t) => Field h a -> {..r} -> a
We can get a primitive type a
out of a record with the header type r
, if and only if
r
can be partitioned into the field's header type h
( which contains just that field ), and some other
header type t
.
This makes it pretty easy to determine what a function operating on things with fields will do.
For instance, lets take a look at signature of the appendR
function:
-- Record.e appendR : c <- (a, b) -> {..a} -> {..b} -> {..c}
Well, we know that the header type c
contains all the fields of the headers a
and b
. We also know
that a
and b
have no fields in common. We can deduce from this that the function takes two records which
have no fields in common, and return a record containing all the fields of the first record, as well as all the fiels
of the second.
Suppose that our service is currently based only in Antartica. We plan to expand at the end of the month.
We decide to add a location
header type for things that might occur in different locations.
We have a single record with location = "Antartica"
, because to date, everything on our platform
happened in Antartica.
field location : String
type Location = (|location|)
initialLocation : Record Location
initialLocation = { location = "Antartica" }
augDmitry : Record (|firstName, lastName, userID, location|)
augDmitry = appendR dmitry initialLocation
Header types can be reified via the Row
constructor:
type UserRow = Row User
userRow : UserRow
userRow = { firstName, lastName, userID } -- literal syntax for construction a `Row` type.
type LocationRow = Row Location
locationRow : LocationRow
locationRow = { location }
Which can be used to do other interesting things with records, such as projecting a header onto the record, or removing the fields of a header from a record:
justDmitry : UserRec
justDmitry = projectT userRow augDmitry
justDmitry' : UserRec
justDmitry' = exceptT locationRow augDmitry
Again, it is instructive to look at the constraints placed on the header types involved:
-- Row.e projectT : r <- (h,t) => Row h -> {..r} -> {..h} exceptT : r <- (h,t) => Row h -> {..r} -> {..t}
If you keep in mind that the header types on the right side of the <-
are disjoint
sets of fields, whose union is the header type on the left side of the <-
, then you can use
set operations to deduce the fields present on any type in the rest of the signature.
Now that we have a basic understanding of row types, we can start to understand the relational calculus
included in Ermine. I mentioned that the rho
kind is the kind of types which serve as a header
for another type with fields.
Records are one such type with fields. Another is Relation
, which represents a result from a SQL query.
We can create a relation with a header r
by passing the relation
function a list of records with the header r
:
users : Relation User
users = relation
[{ firstName = "dmitry", lastName = "kolobov", userID = 0 }
,{ firstName = "jane", lastName = "smith", userID = 1 }
,{ firstName = "monty", lastName = "python", userID = 2 }
,{ firstName = "joe", lastName = "bicycle", userID = 3 }]
Values of type Relation r
for some header r
are compiled into a SQL query which
returns a result set with columns corresponding to the fields in r
. The various
operations on relational types form an algebra for constructing nested SQL queries.
Ermine attempts to abstract over the particular database engine used, though SQL Server
is the most battle-tested.
Because Ermine was designed as a reporting language over massive data-sets, the timing of
and method by which these SQL queries are evaluated and iterated is heavily restricted. For the most part,
relations are meant to be run during the course of rendering some layout constructed via the
charting and reporting API defined in Layout.Chart
and Layout.Report
. The result-sets of such
queries are iterated over in a lazy, streaming fashion and serialized for consumption by a writer such
as the HTML writer, which uses the Highcharts API and a library called data-tables
to render the data
in an HTML report.
We can cheat the overall design and run these things in the IO
monad from the REPL. But first,
I need to introduce( briefly, incompletely, possibly wrong-ly ) two concepts:
- scanners : DB-specific objects which handle execution of a SQL query and iteration over the resulting records.
- runners : Provides an algebra for running a suspendable computation in some monad.
We create a runner by choosing a DB engine and then applying the related runner function(defined in
the module Runners
) with a JDBC connection string to your database instance.
A scanner is created by binding an SMEnv
value in the IO
monad and passing it
to the scanner function(defined in the module Scanners
) for the chosen DB engine.
url = "jdbc:sqlserver://localhost:1433;databaseName=vagrantdb;user=vagrant;password=v@grantDB"
runner = sqlServer_R url
-- print a vector of records.
runRel : forall (rel : rho -> *) (r : rho )
. Relational rel => rel r -> IO (Vector {..r})
runRel r = smEnv >>= frun
where
frun e = let
scanner = sqlServer_S e
in
run_R runner (runRelation scanner r)
-- print the underlying SQL query.
dumpRel : forall (rel : rho -> *) (r : rho )
. Relational rel => rel r -> IO String
dumpRel r = smEnv >>= fdump
where
fdump e = let
scanner = sqlServer_S e
in
dumpQuery_S scanner r
Since we already know what the vector representation of users
looks like, lets take a look at
the compiled sql query.
select [firstName],[lastName],[userID]
from (values ('dmitry','kolobov',0),
('jane','smith',1),
('monty','python',2),
('joe','bicycle',3)) as lit([firstName]
,[lastName]
,[userID])
So the relation
function just outputs the passed in list as a SQL values
clause, and then selects
all the columns comprising the header of the relation.
So let's do something else. Let's create another relation for the contact information:
field phone : String
field email : String
type ContactInfo = (|email,phone,userID|)
contacts : Relation ContactInfo -- could also be written as [..ContactInfo] or [email,phone,userID]
contacts = relation
[{phone = "000-000-0000", email = "dmitry@example.com", userID = 0 }
,{phone = "111-111-1111", email = "jane@example.com" , userID = 1 }
,{phone = "222-222-2222", email = "monty@example.com" , userID = 2 }
,{phone = "333-333-3333", email = "joe@example.com" , userID = 3 }
]
We want to join the users
and contacts
relation, and we can do so with the function join
:
usersWithContacts : [userID, firstName, lastName, phone, email]
usersWithContacts = join users contacts
Take a look at the generated SQL query. A SQL JOIN
clause is emitted:
usersWithContactsQuery = dumpRel usersWithContacts
Again, look at the type of the join
function, which represents a "natural join":
-- Relation.e join : (a <- (d,e) ,b <- (e, f) ,c <- (d,e,f) ) => [..a] -> [..b] -> [..c]
What this says is that we can join a Relation a
with a Relation b
, if and only if the header
types a
and b
contain the columns of the "key" header type e
, along with some "extra" columns. The header
type d
contains the "extra" columns of a
, and the header type f
contains the "extra" columns of b
.
We can see that the resulting relation contains all of the columns of a
and b
, including
the join key.
In our definition of usersWithConstactsQuery
, we are joining the users
and contacts
relations.
The header of users
can be split into the join key (|userID|)
and extra columns (|firstName, lastName|)
,
wheras the header of contacts
can be partitioned into the join key (|userID|)
and extra columns
(|firstName, lastName|)
. Here is a pseudo-code function signature representing this consisely:
join : ( User <- ((|firstName, lastName|), (|userID|)) , ContactInfo <- ((|userID|),(|phone, email|)) , r <- ((|firstName, lastName|),(|userID|),(|phone,email|)) ) -> [..User] -> [..ContactInfo] -> [..r]
Many of the functions defined in the module Relation
can be picked apart in this way.
Let's do something else. Remember that for any header r
, the constraint r <- (t,(||))
is satisfied whenever r = t
.
Let's substitute the empty header type for the type e
in the signature of join:
joiny : (a <- (a',(||)) ,b <- (b',(||)) ,c <- (a',b',(||)) ) => [..a] -> [..b] -> [..c]
if we simplify:
joiny : c <- (a,b) => [..a] -> [..b] -> [..c]
We see that this looks a lot like the signature of appendR
, but for relations.
In fact, this is actually the definition of a cartesian product on two relations.
It takes two relations, and returns a third relation where every row from the
second relation is appended to every row of the first relation.
cart : c <- (a,b) => [..a] -> [..b] -> [..c]
cart = join
We used appendR
to add a default location = "Antartica"
field to our Record User
value.
Can we use this cartesian product function to do something similar for our Relation User
?
Yes!
Let's use this to define a function which takes some relation, and adds columns corresponding to the fields of some record( whose header is disjoint from the relation's header ), with values initialized to the record's field values.
migrate : (c <- (a,b)) => [..a] -> {..b} -> [..c]
migrate r = cart r . relation . singleton
usersWithContactsAndLocation = migrate usersWithContacts initialLocation