cdepillabout/haskell-db-comparison.md

## haskell-db-comparison.md

      
    Raw
  

              haskell-db-comparison.md
            
          
    Haskell DB Library Comparison

We will be comparing the following DB libraries:

HRR
Opaleye
Persistent + Esqueleto
Groundhog
postgresql-simple (and postgresql-transactional)

HRR

HRR will connect to our database and define our datatypes for us.  Let's say we have a table like this:
CREATE TABLE account
 (account_id integer primary key autoincrement not null,
  product_cd varchar(10) not null,
  cust_id integer not null,
  open_date date not null,
  close_date date,
  last_activity_date date,
  status text not null,
  open_branch_id integer,
  open_emp_id integer,
  avail_balance float(10,2),
  pending_balance float(10,2),
  check(status = 'ACTIVE' or status = 'CLOSED' or status = 'FROZEN')
  constraint fk_product_cd foreign key (product_cd)
    references product (product_cd),
  constraint fk_a_cust_id foreign key (cust_id)
    references customer (cust_id),
  constraint fk_a_branch_id foreign key (open_branch_id)
    references branch (branch_id),
  constraint fk_a_emp_id foreign key (open_emp_id)
        references employee (emp_id)
 );
Using Template Haskell, we can get it to create types like this:
data Account
  = Account {accountId :: !GHC.Int.Int64,
             productCd :: !String,
             custId :: !GHC.Int.Int64,
             openDate :: !Day,
             closeDate :: !(Maybe Day),
             lastActivityDate :: !(Maybe Day),
             status :: !String,
             openBranchId :: !(Maybe GHC.Int.Int64),
             openEmpId :: !(Maybe GHC.Int.Int64),
             availBalance :: !(Maybe Double),
             pendingBalance :: !(Maybe Double)}
  deriving (Show)

-- Relation type corresponding to Table
account :: Relation () Account
account =  ...

-- Column selectors for This DSL
accountId' :: Pi Account GHC.Int.Int64
accountId'
  = definePi 0
productCd' :: Pi Account String
productCd'
  = definePi 1
custId' :: Pi Account GHC.Int.Int64
custId'
  = definePi 2
....

Here is a simple query for accounts, using the types we defined above.
account1 :: Relation () Account
account1 = relation $ do
  a <- query account
  wheres $ a ! Account.productCd' `in'` values ["CHK", "SAV", "CD", "MM"]
  return a
Here is a projection (only selecting certain rows):
account_3_7 :: Relation () (Maybe Int64, String)
account_3_7 = relation $ do
  a <- query account
  let proj = (,) |$| a ! Account.openEmpId'
                 |*| a ! Account.productCd'
  asc proj
  return proj
Generated SQL:
SELECT ALL T0.open_emp_id AS f0,
           T0.product_cd AS f1
FROM MAIN.account T0
ORDER BY T0.open_emp_id ASC, T0.product_cd ASC
Subquery:
account_9_1 :: Relation () Account1
account_9_1 = relation $ do
  a  <- query account
  ma <- queryScalar $ aggregatedUnique account Account.accountId' max'
  wheres $ just (a ! Account.accountId') .=. flattenMaybe ma
  return $ Account1 |$| a ! Account.accountId'
                    |*| a ! Account.productCd'
                    |*| a ! Account.custId'
                    |*| a ! Account.availBalance'
Generated SQL:
SELECT ALL T0.account_id AS f0,
           T0.product_cd AS f1,
           T0.cust_id AS f2,
           T0.avail_balance AS f3
FROM MAIN.account T0
WHERE (T0.account_id = (SELECT ALL MAX (T1.account_id) AS f0
                        FROM MAIN.account T1))
Joins:
join_5_1_2aT :: Relation () ((String, String), String)
join_5_1_2aT = relation $ do
  e <- query employee
  d <- query department
  on $ e ! Employee.deptId' .=. just (d ! Department.deptId')
  return $ e ! Employee.fname' >< e ! Employee.lname' >< d ! Department.name'
Generated SQL:
SELECT ALL T0.fname AS f0,
           T0.lname AS f1,
           T1.name AS f2
FROM MAIN.employee T0 INNER JOIN MAIN.department T1
ON (T0.dept_id = T1.dept_id)
Pros:

Works with DB2, PostgreSQL, SQLite, MySQL, Microsoft SQL Server and
OracleSQL.
Able to connect to database at compile time and generate datatypes for tables
programatically.
Seems to cover almost all of SQL.

Cons:

No feature to create tables or do migrations.
Seems just a little harder to read than Opaleye.

Opaleye

Completely typesafe SQL library.  Similar to HRR.
-- define table
personTable :: Table (Column PGText, Column PGInt4, Column PGText)
                     (Column PGText, Column PGInt4, Column PGText)
personTable = Table "personTable" (p3 ( required "name"
                                      , required "age"
                                      , required "address" ))

-- Here is a standard query of all people in the personTable:
personQuery :: Query (Column PGText, Column PGInt4, Column PGText)
personQuery = queryTable personTable

-- Here is the SQL generated for `personQuery`.
--
-- ghci> printSql personQuery
-- SELECT name0_1 as result1,
--        age1_1 as result2,
--        address2_1 as result3
--   FROM (SELECT *
--      FROM (SELECT name as name0_1,
--                   age as age1_1,
--                   address as address2_1
--            FROM personTable as T1) as T1) as T1
--

-- Data types and convenient type synonyms.
data Birthday' a b = Birthday { bdName :: a, bdDay :: b }
type Birthday = Birthday' String Day
type BirthdayColumn = Birthday' (Column PGText) (Column PGDate)

-- To get user defined types to work with the typeclass magic they must
-- have instances defined for them.  The instances are derivable with
-- Template Haskell.
$(makeAdaptorAndInstance "pBirthday" ''Birthday')

-- Create birthdayTable
birthdayTable :: Table BirthdayColumn BirthdayColumn
birthdayTable = Table "birthdayTable"
                       (pBirthday Birthday { bdName = required "name"
                                           , bdDay  = required "birthday" })

-- Query all birthdays:
birthdayQuery :: Query BirthdayColumn
birthdayQuery = queryTable birthdayTable

-- Produces the following SQL:
--
-- ghci> printSql birthdayQuery
-- SELECT name0_1 as result1,
--        birthday1_1 as result2
-- FROM (SELECT *
--       FROM (SELECT name as name0_1,
--                    birthday as birthday1_1
--             FROM birthdayTable as T1) as T1) as T1

-- Throw away some columns of query:
nameAge :: Query (Column PGText, Column PGInt4)
nameAge = proc () -> do
  (name, age, _) <- personQuery -< ()
  returnA -< (name, age)

-- Where clause (restriction):
youngPeople :: Query (Column PGText, Column PGInt4, Column PGText)
youngPeople = proc () -> do
  row@(_, age, _) <- personQuery -< ()
  restrict -< age .<= 18

  returnA -< row

-- Inner join:
personAndBirthday ::
  Query (Column PGText, Column PGInt4, Column PGText, Column PGDate)
personAndBirthday = proc () -> do
  (name, age, address) <- personQuery -< ()
  birthday             <- birthdayQuery -< ()

  restrict -< name .== bdName birthday

  returnA -< (name, age, address, bdDay birthday)

-- Composability of queries and joins.  We can't generate "the SQL of" these
-- `QueryArr` combinators.  They are not `Query`s so they don't have any SQL.
restrictIsTwenties :: QueryArr (Column PGInt4) ()
restrictIsTwenties = proc age -> do
  restrict -< (20 .<= age) .&& (age .< 30)

restrictAddressIs1MyStreet :: QueryArr (Column PGText) ()
restrictAddressIs1MyStreet = proc address -> do
  restrict -< address .== pgString "1 My Street, My Town"

twentiesAtAddress' :: Query (Column PGText, Column PGInt4, Column PGText)
twentiesAtAddress' = proc () -> do
  row@(_, age, address) <- personQuery -< ()

  restrictIsTwenties -< age
  restrictAddressIs1MyStreet -< address

  returnA -< row
Pros:

Seems to be more popular than HRR in the western Haskell world:
[https://www.reddit.com/r/haskell/comments/3fuq4s/opaleye_or_relationalrecord/](reddit thread)
Used in production at multiple companies.
Typesafe (even in aggregation functions).
Complete composability in queries.

Cons:

Arrow syntax.  (Although this might be appropriate for ARoW?? ;-p)
Only supports PostgreSQL.
No support for creating tables (or migrations).  Assumes tables already exist.

Esqueleto

Esqueleto builds on persistent and gives us a way to write joins and other complex queries.
Example SQL:
SELECT BlogPost.*, Person.*
FROM BlogPost, Person
WHERE BlogPost.authorId = Person.id
ORDER BY BlogPost.title ASC
In esqueleto, we may write the same query above as:
select $
    from $ \(b, p) -> do
        where_ (b ^. BlogPostAuthorId ==. p ^. PersonId)
        orderBy [asc (b ^. BlogPostTitle)]
        return (b, p)
Here's an example using Yesod to just return a few fields of a record:
getHomeR :: Handler Html
getHomeR = do
    blogs <- runDB
    $ E.select
        $ E.from $ \(blog `E.InnerJoin` author) -> do
            E.on $ blog ^. BlogAuthor E.==. author ^. AuthorId
            return ( blog   ^. BlogId
                   , blog   ^. BlogTitle
                   , author ^. AuthorName
                   )
    defaultLayout $ do
        setTitle "Blog posts"
            [whamlet|
                <ul>
                    $forall (E.Value blogid, E.Value title, E.Value name) <- blogs
                        <li>
                            <a href=@{BlogR blogid}>#{title} by #{name}
            |]
Pros:

Used with persistent.
Joins.
Pretty simple if you already know persistent.

Cons:

Less typesafe than Opaleye / HRR (with regards to aggregation).

Groundhog

Similar to Persistent.  It aims to be an ORM for Haskell.  It seems to have a
little more power than Persistent (it's projections), but still doesn't have the
ability to do joins.
-- define data types to represent our tables
data Customer = Customer {
  customerName :: String,
  phone :: String
} deriving Show
data Product = Product {
  productName :: String,
  quantity :: Int,
  customer :: DefaultKey Customer
}
deriving instance Show Product

-- This creates all instances and definitions for datatypes so they can be
-- mapped to and from sql tables.
mkPersist defaultCodegenConfig [groundhog|
- entity: Customer               # Name of the datatype
  constructors:
    - name: Customer
      fields:
        - name: customerName
          # Set column name to "name" instead of "customerName"
          dbName: name
      uniques:
        - name: NameConstraint
          fields: [customerName] # Inline format of list
- entity: Product
|]

-- Show how to do queries and insertions.
main :: IO ()
main = withSqliteConn ":memory:" $ runDbConn $ do
  runMigration defaultMigrationLogger $ do
    migrate (undefined :: Customer)
    migrate (undefined :: Product)
  johnKey <- insert $ Customer "John Doe" "0123456789"
  get johnKey >>= liftIO . print
  insert $ Product "Oranges" 3 johnKey
  insert $ Product "Apples" 5 johnKey
  janeKey <- insert $ Customer "Jane Doe" "9876543210"
  insert $ Product "Oranges" 4 janeKey
  johnOrders <- select $ (CustomerField ==. johnKey)
    `orderBy` [Asc ProductNameField]
  liftIO $ putStrLn $ "Products for John: " ++ show johnOrders
Groundhog has interesting projections, so you can do something like this:
project ("username: " `append` upper UserNameField)
  $ lower UserNameField `like` "%smith%"

project ((toArith NumberField + 1) * 2)
  $   upper FirstNameField ==. upper UserNameField
  ||. (toArith NumberField * 5) >. (25 :: Int)

update (EmailField =. upper EmailField) (AutoKeyField ==. k)
This would produce the following sql statements:
SELECT 'username: ' || upper(username) FROM mytable WHERE lower(username) LIKE '%smith%'

SELECT (number + 1) * 2 FROM mytable WHERE upper(firstname) = upper(username) OR number * 5 > 25

UPDATE mytable SET email = upper(email) WHERE id = k
Pros:

supports migrations
supports composite keys
Seems to provide more sql functionality than persistent.

Cons:

Doesn't do joins.
Not as commonly used as Persistent?

postgresql-simple

Very simple.  Single class for converting from SQL row to Haskell data types:
-- Some data types
data Present = Present { presentName :: Text }

data Location = Location { locLat :: Double
                         , locLong :: Double
                         }

data Child = Child { childName :: Text
                   , childLocation :: Location
                   }

-- Classes for converting from SQL rows to Haskell datatypes
instance FromRow Present where
  fromRow :: RowParser Present
  fromRow = Present <$> field

instance FromRow Child where
  fromRow :: RowParser Child
  fromRow = do
    childName' <- field
    locLat' <- field
    locLong' <- field
    pure $ Child childName' locLat' locLong'

-- Some functions for querying
allChildren :: Connection -> IO [Child]
allChildren c = query_ c "SELECT name, loc_lat, loc_long FROM child"

allPresents :: Connection -> IO [Present]
allPresents c = query_ c "SELECT name FROM present"
Pros:

Relatively simple.
Allows arbitrary sql statements.
Allows easy use of transactions.

Cons:

Not type safe.  The SQL statements are not checked in any way.
Using postgresql-simple would make it more difficult to switch to a different
database.
No template Haskell for generating relations/tables for you.
No migration support.

Migrations

When using a library like HRR, Opaleye, or Groundhog, it may be necessary to
perform migrations with a separate tool.  This section lists different tools
available.  The three most popular tools are Flyway, Liquibase, and MyBatis
Migrations.
liquibase vs flyway #1
liquibase vs flyway #2
flyway vs liquibase vs mybatis migrations
Flyway

Relatively simple solution.  Seems to be widely used.
Pros:

Relatively simple.

Cons:
Can't migrate backwards.
Liquibase

Seems like the more "enterprise" solution.
Liquibase Homepage
Pros:

Generate database diffs.
Allows easy rollbacks.
Seems pretty powerful.

Cons:

Can't write in normal SQL.  Must use either XML or annotated
sql.

MyBatis Migrations

MyBatis Migrations Homepage
Cons:

Seems like it is relatively tightly linked to Java.
Documentation not too good?

Sqitch

Sqitch Homepage
This records the current state of the database in git commits.
Pros:

Pretty powerful.

Cons:

Pretty complicated command line tool.