philandstuff/codemesh2014.org

## codemesh2014.org

      
    Raw
  

              codemesh2014.org
            
          
    Codemesh 2014, London

keynote: complexity is outside the code


  @jessitron and @tastapod
  slides

when I started programming:


  architecture was always:
    
      box - box - cylinder
    
  
  then layers:
    
      ideally:
        
          box - box - box - box - box - box - cylinder
        
      
      but in practice, code gets snarled up between the layers
      every time you make a change, you need to change every layer
    
  
no silver bullet


  in the assembler days:
    
      cryptic code and brittle code were both slain by compilers
    
  
  but now we have multiple orthogonal problems
    
      latency
      deployment
      search
      security
      caching
      monitoring
    
  
  which don’t get solved in a single sweeping way, like compilers
    solved the problems with assembly language

soa & polyglot persistence


  architecture is now multiple boxes, multiple cylinders, directed
    acyclic graph
  most of the code is glue
    
      because when you break things apart, you spend more effort
        gluing them back together
    
  
moore’s law


  Herb Sutter (2004): the free lunch is over

the happy old days vs now:


  lang stdlib
    
      now: also python and ruby and all of github
    
  
  unix utils
    
      brew install it all!
    
  
  database
    
      MOAR DATABASES!
    
  
  algorithms
    
      read every thesis!
    
  
mistake: our goal is not to write code


  “IT” makes more sense than “Software Engineering”
  goal is in fact: minimize lead time to business impact
    
      lead time: starts when problem exists
      business impact: not “done done done”
    
  
  @jessitron’s geeky akka simulation of a dev team, comparing:
    
      a team where they’re trying to write as quickly as possible,
        not much slack time for tests/monitoring
      a team where they don’t build as fast as possible, and have
        slack to do
      she graphed two metrics for each team:
        
          number of implemented features
          number of working features
        
      
estimation


  “we’re the only industry on the planet that doesn’t give a
    range when giving an estimate”
  compare: “22 days” with “half a month to 2 months”
    
      deliberately vague units
    
  
  starts a conversation
    
      delivery manager: “you’re crazy! how can I plan with that
        estimate”?
      I say because:
        
          domain
          tech
          security
          perf
        
      
      I want the DM to be just as nervous as me about this stuff
      so that they ask questions about how we can reduce
        uncertainty
    
  
research


  research is a first-class task
  “leverage” is the right word for using technology
    
      because “leverage” increases risk
    
  
  research is an ongoing activity
    
      research is not the same thing as a spike
      if all you ever do is a spike, you optimize your tech choice
        for time-to-hello-world
    
  
  your architecture diagram should reflect your uncertainty and
    decisions you haven’t yet made
  box - ? - cylinder

kaizen


  inner loop: continuous learning/improvement
  outer loop: questioning the process itself
  we’re spending less time coding, and more time learning about
    other people’s code
    
      this is what you need to do to exploit reuse
    
  
  IBM invented this in the 1950s
    
      every management grade had an equivalent tech grade
      VP <-> research fellow
      fixed cost, unlimited opportunity
      Benoit Mandelbrot
      Bill Joy
    
  
things other than programming that we do


  evaluation
  monitoring
  deploy
  tests
  program
  databases
  filesystem
  network
  so… you’ve got to be good at all of them
    
      we tried specialising, but: if your job is to deliver to QA,
        all you’ll do is pass QA tests
      you need the whole feedback loop to do your job
      we can’t care about everything, that just leads to hopelessness
        and despair
      we can’t care about nothing
      we kind of just pick a couple and hope for the best
        
          and hope that enough people randomly choose a particular
            issue and hope that something gets done
        
      
Don Syme, the F♯ approach to relaxation


  this is a new talk!

starting position


  the great disputes of compsci should be struggled with
  they are chances to make a better, simpler, more relaxed world as
    much as create opposing camps
  although want to avoid:
    
      “mustard watches: an integrated approach to time and food”
        
          spoof paper lampooning arbitrary smushing of concepts
            together
        
      
then (2003)


  functional languages were
    
      isolated
      noninteroperable
      used their own vms
    
  
  interop standards were a mess:
    
      C-calls
      COM
      CORBA
      XML
    
  
things changed:


  F♯ approach (also scala, swift, clojure):

embrace industry-standard runtime layers, influence them.
  compromise where needed.
relaxation achieved 😎


  todays FP langs are immensely interoperable while staying true to
    FP principles
  F♯ type providers raise interop to a completely new level
  (some tensions remain: some techniquees not easily implemented on
    industry standard VMs)

tl;dr: where we are now


  F♯ is open, cross-platform, independent
  accepting contribs
  there is now an F♯ software foundation
    
      becoming a legal entity as a US nonprofit right now
      http://fsharp.org
    
  
  xamarin provide F♯ tooling for android and iOS
  usual packages for linux
  visual F♯ tools for windows
  F♯ compiler service powers many other tools
  F♯ community is very self-empowered

the puzzle


  pieces:
    
      lang
      openness
      cross-platform
      community
      ide tools
        
          xamarinstudio, emacs, vim, …
        
      
      ecosystem
        
          embrace an open ecosystem of libs, tools and type providers
        
      
F♯ compiler


  flow of contributions:
  github.com/fsharp/fsharp
    
      visualfsharp.codeplex.com (public git repo)
        
          this is what gets packaged into the microsoft visual F♯ tools
        
      
      nuget packages
        
          into emacs/xamarin tools
        
      
language design process


  next release: 4.0
  fsharp uservoice

F♯ Compiler Service


  projects which use it:
    
      emacs plugin
      vim plugin
      Visual f♯ power tools
      IFSharp
      FsReveal – reveal.js-based literate programming
        
          the compiler service is run over the fragments of code to get
            IntelliSense and suchlike
        
      
relaxation achieved:


  enterprise quality + openness + community + tooling = :D

then (2003)


  in 1990s FP langs were anti-OO
  even today any mention of objects in Haskell circles gets an
    “urggh” reaction
  Compiler Design with Standard ML - Chapter 9, Object Oriented
    Programming:
    
      object: to express a dislike or distatse for something
    
  
now: object + functional


  embrace objects, make them fit
    
      particularly for interop
    
  
  but don’t embrade full object orientation
  features to allow merging paradigms:
    
      object expressions to allow expression-oriented programming
    
  
  some functional features clash with object features:
    
      particularly currying vs method overloading
    
  
anecdata comparison between C♯ and F♯


  interesting comparison:
    
      in C♯, nothing stops you creating circular dependencies, and
        you have to make special efforts to avoid it
      in F♯, it’s really hard to do this.
    
  
  in theory and in practice, unmoderated intra-component cycles are
    a disaster
    
      unrestricted circularity is like crack: once people get used to
        it, it’s hard to wean them off
    
  
  language mechanisms that enforce acyclicity are good
  pattern matching is a lovely feature
    
      but without a way to name, abstract, and parameterize
        patterns, it’s a real problem in practice
    
  
enter F♯ Active Patterns


  similar to Scala extractors?

let (|``Even Number``|``Odd Number``|) n =
  if n % 2 = 0 then ``Even Number`` else ``Odd Number``

let (|Const|_|) (e:FSharpExpr) = ...

  relaxation achieved! pattern matching <=> abstraction

code <=> data: Type Providers


  operate in a world between static and dynamic languages

sync <=> async

numbers <=> numbers-with-units

GPU <=> CPU


  fsharp.org/use/gpu
  F♯ quotations feature helps here

what’s ahead?


  most major areas of tension have been resolved
  still some areas that bug me:

repl <=> distribution and scale


  https://github.com/nessos/Vagrant
  https://github.com/nessos/mbrace

conclusion


  false opposites are worth struggling over
  simplicity and unity are key

NoSQL is dead


  Eric Redmond

naming things is hard


  and NoSQL is a rubbish name
  it’s possible to name things that don’t exist like fairies

my history


  working in dbs since 1999 (as a researcher)
  seven databases in seven weeks
  a little riak book
  now at basho

distributed systems


  you should no more make your own distributed system than you
    should make your own crypto library

the thesis:


  the whole talk right here:

Models which aren’t useful should die.  NoSQL is not a useful model,
  so it should die

  NoSQL is ill-defined
  NoSQL is a bad classifier
  What this means for you

We get people who say “I’m stuck on Mongo or Riak” – this is like
  saying “I’m stuck between hat or shoes”
NoSQL is ill-defined


  wiki: NoSQL (often interpreted as Not Only SQL)
    
      but if you’re running a single postgres node, are you running
        NoSQL?
      Not At All SQL? NAASQL (pron. Nâsgul)
    
  
  people desire NoSQL because of some (perceived or actual)
    shortcoming in SQL

perceived deficiencies


  no horizontal scaling
    
      “newsql” – VoltDB
      OceanBase
    
  
  no semi-structured data
    
      postgres: JSONB
      SQL Server 2008 supports semi-structured datatypes
      mongolike
        
          mongo frontend to postgres
        
      
  slower dev
    
      often considered a corollary to no semi-structured data
      ORMs
      just look at RoR!
    
  
  can’t use modern tools (like MR/Hadoop)
    
      MySQL & Postgres both have hadoop connectors
    
  
actual deficiencies


  terrible in representing complex graphs
  high availability
  require indexes for speed
    
      even if all you have is an id
      a key-value store doesn’t have an index for keys, that’s just
        what you look up by
    
  
  distributed SQL is often a subset, or has caveats
    
      “we’re SQL98 compatible but you can’t do nested JOINs”
    
  
  SQL isn’t always required or helpful
    
      sometimes all you need is something like S3
    
  
NoSQL is a bad classifier


  not just ill-defined, actually dangerous

DB classifications


  by models (Graph, doc, col, kv)
  by network topology (mesh, partial-mesh, tree)
  by natural distribution (rdb/graph, doc/col/kv)

models

Graph


  http://en.wikipedia.org/wiki/Graph_database
  easiest thing to model
    
      boxes and lines
    
  
  easy to query (cypher)

START x=node(0)
MATCH x
RETURN x.name

  stores
    
      neo4j (high perf, ACID)
      hypergraphDB (directed hypergraph)
      titan (distributed)
      ArangoDB (flexible modelling)
        
          can also act as KV or doc
          (wait, where does this live in our model again?)
        
      
      SparkleDB (RDF, SPARQL)
      InfinityDB (distributable, embeddable)
        
          parts of your graph in a mobile phone, parts in a cloud
        
      
Key/Value stores


  examples
    
      riak (critical data, simple operations)
        
          high availability & operations
        
      
      aerospike (specialized for SSD+DRAM)
        
          distributed
        
      
      redis (speed, fancy types, messaging)
        
          horribly broken clustering, so only use single-node
            instances
        
      
      leveldb (embeddable)
    
  
  message: despite sharing a data model, the use cases here are
    massively diverse

column store


  row keys, column families
    
      rows are an ephemeral concept
      KKV? (row key, column family key)
    
  
  eg
    
      BigTable
      HBase
        
          ordered, sparse (BigTable based)
        
      
      Cassandra
        
          CQL
            
              thought experiment: if CQL expanded to full SQL, is it no
                longer NoSQL?
            
          
      Accumuluo
      Hypertable
    
  
document datastore


  often the first thing people think of when they hear “NoSQL”
  queries via field of document
  normally JSON, but could be XML or whatever
  eg
    
      CouchDB
        
          embeddable, replication
        
      
      Couchbase
        
          distributed, failover
        
      
      MongoDB
        
          easy to program
        
      
      RethinkDB
        
          created by former mongo users
          distributed joins
          atomic updates
        
      
this is too limited


  riak + search is a document datastore
  cassandra can do K/V
  ArangoDB (document)
  Postgres (column -> hadoop)
  ElasticSearch (inverted index?)
  (graphs are hard to replicate, because walking paths in a graph
    is difficult)
    
      riak used to have link-walking, they took it out because it
        was too much overhead
    
  
model types: revised


  K/V (with or without indexing)
  Graph
  Other
  is this enough?
    
      relational DBs can still do K/V and “other” reasonably well
    
  
Classify by topology?


  what’s entailed by distribution?
    
      what are you trying to achieve?
      sharding vs replication
    
  
  what does it mean to have a network partition?
    
      the CAP theorem
        
          quick explanation:
        
      
say you’re in a bar, and three people order a beer.  one goes to the
  toilet, the other two switch to whisky.  If you ask the separated one
  what the last drink the group had, they can either answer: “I don’t
  know” or “beer”.

  http://aphyr.com/posts/313-strong-consistency-models
    
      there are lots of types of consistency
        
          strong serializable, linearizable, serializable,
            read-your-writes…
        
      
      not all are banned by the CAP theorem in AP systems
      you can have odd things like monotonic consistency
    
  
  harvest/yield tradeoff
    
      you can’t guarantee 100% harvest and 100% yield
      harvest is proportion of request that completes
        
          50% of result set responds
        
      
      yield is probability that request responds at all
    
  
  FLP Impossibility Proof
    
      safety or liveness but not both
        
          safety sounds like consistency - is it?
        
      
      paxos is safe but not necessarily live
      two-phase locking makes neither guarantee
    
  
  there’s loads of these issues
  you can’t make a decision here based on “I need a NoSQL database”
  synchronicity is not the only problem (think lightcones)
    
      intention is the problem
      what does a conflict mean?  what does conflict resolution mean?
        
          first write wins?
          last write wins?
          CRDTs?
        
      
      this depends on the domain
    
  
Single Node


  neo4j
  leveldb
  hbase
  couchdb

Mesh


  every node speaks to every other node
  no SPF
  easy to operate
  can limit cluster size
    
      chattiness scales quadratically with cluster size
    
  
  eg
    
      riak
      HBase (if configured so)
      Couchbase
      BigCouch
      all different data models!
    
  
Partial Mesh


  don’t let every node talk to every other node
  can introduce complexity to operations
  eg
    
      riak + MDC (multi-dc replication)
      HBase (if configured so)
      Cassandra
        
          DC-aware
        
      
Tree/Star


  consistency is easy to guarantee
  but: single point of failure
  leader election
  these systems often assume you can only get one clean partition
    at a time
    
      three-way partitions can prevent leader election
        
          lose availability, fail to gain consistency
        
      
  eg
    
      mongo
      HBase (if configured so)
      Postgres/Mysql (more star than tree)
    
  
classify by natural distribution


  hard to distribute
    
      graphs
      relational joins
    
  
  easy to distribute
    
      key/values
    
  
hard, but not impossible


  titan (distributed graph)
  infinitydb (distributed graph)
  voltDB (distributed SQL)
  postgres cluster (distributed SQL)

easy, but not always there


  redis (KVs are easy to distribute, but Redis Cluster sucks)
    
      don’t roll your own distributed system
    
  
  mongo (document can distributed, but mongo tends to be tough to
    scale/admin)
    
      every time you want to double your capacity, you add three nodes
      3 -> 9 -> 12
    
  
what else?


  time series, FTS, ranges
  defined schema/schemaless/opaque binary
  large object storage
  HA/SC, Harvest/Yield
    
      Riak can choose HA or Strong Consistency
        
          both in the same DB! (although not on same dataset)
        
      
  Message patterns (req/rep, pub/sub)
  stream processing, datastream mining
  dev friendliness

what does this mean for you?

future


  polyglot DBs with data oriented middleware
    
      (this isn’t a thing, but it should be)
    
  
Categories for the Working Programmer


  Jeremy Gibbons, University of Oxford
  goal: show how category theory inspires better code
  but you don’t really need the category theory: it all makes sense
    as functional programs (Haskell) too.

functions that consume lists

Two equations, indirectly defining sum:
sum :: [ Integer ] -> Integer
sum []     = 0
sum (x:xs) = x + sum xs
Programs are equations, and are built out of equations.
This pattern works for other fns too. For any given f and e, these
  equations uniquely define h:
h []     = e
h (x:xs) = f x (h xs)
The unique solution is called foldr f e in Haskell:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f e []     = e
foldr f e (x:xs) = f x (foldr f e xs)
This shows that patterns can be expressed as pieces of code.
applications of foldr

sum     = foldr (+) 0
and     = foldr (and) True
decimal = foldr (\d x -> (fromInteger d + x) / 10) 0
id      = foldr (:) []
-- etc etc
What’s special about lists?

…only the special syntax.  We could have our own list:
data List a = Nil | Cons a (List a)
Then we could have
foldList :: (a -> b -> b) -> b -> List a -> b
foldList f e Nil         = e
foldList f e (Cons x xs) = f x (foldList f e xs)
This works for other data types
data Tree a = Tip a | Bin (Tree a) (Tree a)
Then we could have
foldTree :: (a -> b) -> (b-> b->b) -> Tree a -> b
foldTree f g (Tip x)     = f x
foldTree f g (Bin xs ys) = g (foldTree f g xs) (foldTree f g ys)
It’s not always so obvious

Rose trees (eg for games or XML)
data Rose a = Node a [Rose a]
(aside: name Rose comes from rhododendron (greek for rose tree))
What’s the right notion of fold here?
foldRose1 :: (a -> c -> b) -> (b -> c -> c) -> c -> Rose a -> b
foldRose1 f g e (Node x ts) = f x (foldr g e (map (foldRose1 f g e) ts))

-- but suppose we might want to process list nodes in a different order?

foldRose2 :: (a -> b -> b) -> ([b] -> b) -> Rose a -> b
foldRose2 f g (Node x ts) = f x (g (map (foldRose2 f g) ts))

-- even more general:

foldRose3 :: (a -> [b] -> b) -> Rose a -> b
foldRose3 f (Node x ts) = f x (map (foldRose3 f) ts)
Which one should we use?  Category theory can guide this decision.
Haskell gets folds for non-empty lists “wrong”:
foldr1, foldl1 :: (a -> a -> a) -> [a] -> a
this won’t let you define the identity function on nonempty lists
Preparing for genericity

Separate out list-specific shape from type recursion:
-- two-level type definition of list
-- ie ListS is nonrecursive, and Fix doesn't know about lists
data ListS a b = NilS | ConsS a b
data Fix s a = In (s a (Fix s a))
type List a = Fix ListS a
eg, [1,2,3] is represented thus
In (ConsS 1 (In (ConsS 2 (In (ConsS 3 (In NilS))))))
convenience fn:
out :: Fix s a -> s a (Fix s a)
out (In x) = x
This is a lot of boilerplate that you wouldn’t want to actaully write
but it’s a general way of constructing data types to help understand
  them
Shape is mostly opaque, just need to ‘locate’ the as and bs
bimap :: (a -> a') -> (b -> b') -> ListS a b -> ListS a' b'
bimap f g NilS        = NilS
bimap f g (ConsS a b) = ConsS (f a) (g b)

-- closely related to 'fmap' from Functor
Now we can define a more cleanly separated version of foldr on List:
foldList :: (ListS a b -> b) -> List a -> b
foldList f = f . bimap id (foldList f) . out

-- eg foldList add :: List Integer -> Integer, where

add :: ListS Integer Integer -> Integer
add NilS        = 0
add (ConsS m n) = m + n
The categorical view, in a nutshell

Think of a bifunctor S.  It is also a functor in each argument
  separately.
An algebra for functor S A is a pair (B,f) where f :: S A B ->
  B
A homomorphism between (B,f) and (C,g) is a function h :: B ->
  C such that
h . f = g . bimap id h

-- (note bimap id == fmap, so)
h . f = g . fmap h
An algebra (B,f) is initial if there is a unique homomorphism from
  it to each (C,g).
eg (List Integer, In) and (Integer, add) are both algebras for
  ListS Integer:
In  :: ListS Integer (List Integer) -> List Integer
add :: ListS Integer Integer -> Integer
and sum :: List Integer -> Integer is a homomorphism from (List
  Integer, In) to (Integer, add).  In fact (List Integer, In) is
  the initial algebra.
Theorem: for all sensible shape functors S, initial algebras exist.
This means that: whatever shape you pick for your datatype, you can
  define a corresponding notion of folds, which will be in some sense
  “most general” for that datatype.
Duality

Recall
fold :: Bifunctor s => (s a b -> b) -> Fix s a -> b
fold f = f . bimap id (fold f) . out
reverse certain arrows:
unfold :: Bifunctor s => (b -> s a b) -> b -> Fix s a
unfold f = In . bimap id (unfold f) . f
and I have a function which will generate a data structure for me.  Eg
  iterate, repeat,…
If you pick the right abstraction, the relationship between fold and
  unfold is obvious.  If you pick the wrong one it doesn’t make it
  clear:
unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr f b = case f b of
    Nothing -> []
    Just (a,b') -> a : unfoldr f b'
Conclusion

Category theory is a good organisation tool for library designers
Helps you write better code, with less mess
The maths is really quite pretty
but the Haskell makes sense on its own too.
http://patternsinfp.wordpress.com
  http://www.cs.ox.ac.uk/jeremy.gibbons/
TDD, as in Type-Directed Development


  Clément Delagargue
  clementd

thinking with types


  from a map, I can get a value…
    
      maybe
    
  
  from a string, I can parse a string…
    
      maybe
    
  
  want to abstract away this “maybe” concept
  Option[+A] type
  Option[String] + String => Option[Int] = Option[Int]
  and I can end up with code which is correct by construction
    
      our code obviously has no errors
        
          (cf Tony Hoare)
        
      
  tests can only demonstrate the presence of errors, not the
    absence
  types can give mathematical certainty of properties
  Types ⇔ Properties
  provably > probably
  but if I can’t prove interesting properties, it’s not
    useful. so we need:

expressive type systems

parametricity


  single most important feature: parametricity or parametric
    polymorphism
    
      aka generics
      I don’t see it as a code reuse tool
      it gives you lots of expressivity, by capturing the notion of
        “I don’t know or care what type a is”.  This gives useful
        guarantees. eg:
    
  
def f[A](x: A): A
There is only one possible implementation of this function: the
  identity function.  I don’t know what type A is, so I have no way of
  creating a value of that type other than by using the only instance I
  have, x.
Similarly:
def compose[A,B,C](
  g: (B => C),
  f: (A => B)
): (A => C)
again there is only one possible implementation of this type.
This is more interesting:
def rev[A](x: List[A]): List[A]
There are multiple possible fns here (eg identity, reverse, sort all
  fit the signature), but I can still guarantee some things from the
  type alone:

  Any value in the list rev(xs) must also be in xs, because rev
    has no other way to create an instance of A.

These are the famous theorems for free!
restrictions

There are ways to break parametricity using common language features:
no nulls

null is a value which inhabits every type.  This breaks
  parametricity guarantees: eg with nulls I can define:
def f[A](x: A) = null
no case matching on class

def f[A](x: A) = match x {
  case (x : String) = "foo"
  case (x : Int) = 3
  case _  = x
}
no exceptions

def f[A](x: A) = throw RuntimeException
no side effects

def f[A](x: A) = {
  launchBallisticMissiles();
  x
}
types make communication easy


  you should learn Haskell type syntax:

a -> a  -- function from a to a (lowercase indicates a is *any* type)

Int -> Int -- function from concrete type to concrete type

a -> b -> a   -- equivalent to:
a -> (b -> a) -- think of it as a two-arg function

-- typeclasses:
(Ord a) => [a] -> [a]
-- here Ord is a typeclass.  'a' now means "any type which has a
-- natural order"
-- we still get parametricity, but we can restrict it so it's not over all types

-- fn to remove duplicates:

Eq a => [a] -> [a]
Hoogle

Can use hoogle to search on type: hoogle “Eq a => {a} -> {a}”
Hoogle is smart enough to be more general: hoogle “{Maybe a} -> Maybe
  {a}” which can find:
sequence :: Monad m => [m a] -> m [a]
this fits because Maybe is a Monad, so the sequence function can
  be called on it.
parametricity cont’d

Recall rev:
def rev[A](x: List[A]): List[A]
The only extra thing we need beyond the types is this statment:
rev(xs ++ ys) == rev(ys) ++ rev(xs)
This kind of argument leads to property-based reasoning, and
  property-based testing.
Properties are perfect for ensuring you aren’t caught by edge cases.
Properties are often referred to as laws, which must hold but aren’t
  enforrced by the type system.  So your reasoning becomes:
Types + Operations + Laws
which is another way of talking about algebras.
summary


  types are
    
      high-level reasoning tool
      communication tool
    
  
qs


  what about dependent types?
    
      I worry that your types become as difficult as code to work
        with
    
  
Implementing languages on the BEAM


  Robert Virding

what IS the BEAM?


  a VM to run Erlang
    
      not designed to run other languages
        
          like the JVM in that respect
        
      
  properties of the Erlang system:
    
      lightweight, massive concurrency
      async comms
      process isolation
      error handling
      continuous evolution of the system
      soft realtime
      immutable data
        
          baked into the system
        
      
      predefined set of data types
        
          tuples, lists, maps
          that’s it
        
      
      pattern matching
        
          primitives built in to the machine for doing this
        
      
      functional language
      modules
      no global data
    
  
erlang compiler


  can work on files and erlang abstract code
  can generate .beam files or binaries
  has Core, a nice intermediate language
    
      can be input to the compiler
      simple and regular
      easier to compile to
    
  
internal languages


  erlang -> core erlang -> kernel erlang -> beam assembler
  optimization passes between
  you can get anywhere in the compiler
  you can even write the assembler directly (though not
    recommended)

core erlang


  simple functional language
  “normal” lexical scoping
  just the basics
    
      no records
      no list comprehensions
    
  
  supports pattern matching
  most optimizations done on core
  dialyzer speaks core
    
      sort of 🙍
    
  
example


  argh wall of text
  key point: Core Erlang is the nicest thing to target when
    implementing BEAM languages

support tools


  leex - lexer generator
  yecc - parser generator
  syntax tools - for building erlang code
  XML parsers (xmlerl)
  compiler

leex


  based on lex/flex but simpler
  uses regexes to define tokens
  generates scanning functions
    
      direct use
        
          string/2
        
      
      for the i/o system
        
          token/2, tokens/2
        
      
Definitions.

U = [A-Z]
L = [a-z]
D = [0-9]

Rules.

{L}({U}|{L}|{D}|_)* :
        {token,{atom,TokenLine,list_to_atom(TokenChars)}}.
({U}|_)({U}|{L}|{D}|_)* :
        {token,{var,TokenLine,list_to_atom(TokenChars)}}.
%[^\n]* : skip_token.

Usage:
{ok,Tokens,EndLine} = my_scan:string(Chars, StartLine)

{ok,Tokens,EndLine} = io:request(Io, {get_until,'',my_scan,tokens,[StartLine]})
yecc


  basically yacc for erlang
  LALR(1) parser generator
  generates parse/1

Nontermanals E T F.
Terminals '+' '-' '*' '/' '(' ')' number.
Rootsymbol E.

E -> E '+' T : {'+','$1','$3'}.
E -> E '-' T : {'-','$1','$3'}.
E -> T : '$1'.
T -> T '*' F : {'*','$1','$3'}.
T -> T '/' F : {'/','$1','$3'}.
T -> F : '$1'.
F -> '(' E ')' : '$2'.
F -> number : '$1'.

conflicts


  shift-reduce errors
    
      generally prefers shift to reduce
    
  
  reduce-reduce errors
    
      would like to write:
    
  
expr -> pattern '=' expr ...

but ‘pattern’ and ‘expr’ aren’t distinguishable using 1 token of
  lookahead as LALR(1), so must write:
expr -> expr '=' expr ...

and extract pattern later.
new languages


  Leonard Cohen: New Skin for the Old Ceremony
  you can access all existing erlang code, OTP, etc
  get a language which is as optimal as it can be for the VM

examples


  elixir:
    
      generates erlang
    
  
  lfe:
    
      generates core erlang
    
  
Lisp-flavoured Erlang


  goals:
    
      provide lots of lisp goodies
        
          homoiconicity and macros
        
      
      seamlessly interact with vanilla erlang
    
  
  modify the language to fit with erlang/otp
    
      not common lisp or scheme
      no mutable data
      only erlang data types
      macros are only compile-time
      Lisp-2 (not Lisp-1)
        
          ie separate namespaces for functions and values
        
      
      is as fast as vanilla erlang
    
  
core


  LFE core primites are core erlang
  so it’s almost a one-to-one mapping from erlang to core
  exceptions:
    
      lisp ‘if’ goes to core ‘case’
    
  
elixir


  dynamic functional language
  influenced by ruby
    
      but not ruby on BEAM
    
  
  metaprogramming capabilities via macros
    
      though cannot change syntax
    
  
  many libraries and interfaces standardised and rewritten
  compiles to erlang (rather than core erlang)

“new” datatypes


  elixir has structs
  we have to fake it
    
      use existing map datatype
      special format: key ‘__struct__’ with value of struct name
      special syntax and libs to create and recognise the new
        ‘type’
    
  
New languages


  not just basic erlang
    
      different semantics
      non-erlang datatypes
      non-erlang handling of data
    
  
  erlog (prolog)
  luerl (lua)

erlog


  can use leex
  grammar needs backtracking
    
      cannot use yecc or any other parser generator
      hand written parser
    
  
  data:
    
      good mapping
      structures ⇔ tuples
    
  
  logical variables and unification
    
      they are both “global” and “mutable”
      can be unified
      changes seen everywhere
      we are cunning and use {Index}
    
  
  backtracking
    
      erlang doesn’t have it
      use continuation-passing style
      efficient and works
      executing is not “interruptible”
        
          control is in compiled code
        
      
      if we want to have better control we need an interpreter
    
  
luerl


  implements standard lua 5.2
  lua is
    
      simple, rather neat imperative lang
      dynamic lang
      lexically scoped
      mutable variables/environments
      embeddable
    
  
  leex + yecc good
    
      one reduce-reduce conflict, easily resolved
    
  
  datatypes
    
      mutable key-value tables
      global
    
  
  need to explicitly manage global data
    
      global table store
        
          orddict/array/ETC tables
        
      
      environment store
      environments
      tables
    
  
Qs


  don’t you want to make a concurrent lua?
    
      that’s a good question but will take time to answer
    
  
  wouldn’t it be nice to make the erlang GC available to other
    languages
    
      yes, but it’d make it harder for erlang itself
    
  
Programming and testing a distributed database


  Reid Draper, @reiddraper
  distributed databases have a fundamentally different architecture
    from single-node counterparts
  need to look at:
    
      Communication and messaging
      correctness testing
      benchmarking
      abstraction
    
  
communication and messaging


  single-node dbs don’t need to worry here!
  distributed systems need to consider all sorts of things,
    starting with topology
  “8 fallacies of distributed computing”
    
      the network is reliable
      latency is zero
      bandwidth is infinite
      the network is secure
      topology doesn’t change
      there is one administrator
      transport cost is zero
        
          both monetarily and computationally
        
      
      the network is homogeneous
    
  
RPC


  local and remote procedures are indistinguishable
  procedures are familiar
  let’s lift distributed programming to the same abstraction
  RPC claims transparency
  RFC 674
  unfortunately.
    
      this abstraction fails
      handling failure
        
          what is the remote node never responds?
          what if the TCP conn is closed?
          did it fail halfway through, or right before it returned to
            us?
            
              is it safe to retry?
            
          
          procedure call abstraction tells us nothing here
        
      
      why don’t we get a stacktrace?
        
          we just never hear back
          wasn’t this supposed to be transparent?
        
      
      what if our RPC call makes another RPC call?
      how do you serialize a file descriptor or a db connection?
    
  
  a lot of these problems were known about early on:
    
      RFC 684 critiques procedure calls as a paradigm for
        distributed computing
    
  
message passing


  we send a message to a remote host as a basic operation
  no expectations at this level
    
      higher-level protocols built on top
      when I send a message, I expect a reply
    
  
  erlang’s actor model embodies this

Bloom (UC Berkeley)


  http://boom.cs.berkeley.edu/

bawden’s ideas


  implementing distributed systems using linear naming, 1993, Bawden

correctness testing

unit testing


  Typical unit testing: linear effort for the amount of
    coverage/confidence we need in our code.
  if we want 50 tests, we need to write 5
  are we going to think of edge cases like adding to MAX_INT?
  nondeterminism kills unit testing too

property-based testing


  QuickCheck
    
      haskell, erlang, scala, clojure
    
  
PULSE


  builds on property-based testing
  distributed system is inherently nondeterministic at runtime
  want to avoid “works on my machine”
  want to find race conditions
  pseudorandomly generate schedules between different distributed
    nodes
    
      completely controls interleaving of operations on separate
        nodes
    
  
simulation testing


  simulant
    
      records actions of test into database while test is running
    
  
benchmarking

example


  imagine we have a cron job running on a single node to take a
    backup
    
      make tarball, send to S3
      uses a lot of network bandwidth
      what effect does this have on the system?
    
  
  need to capture this sort of thing in your simulation

abstraction


  programming languages

erlang


  language-level api for distributed communication

haskell


  libraries for dealing with streaming data
    
      powerful list processing with map/filter/fold
      can push that into stream processing as well
      conduit library treats a TCP conn as a stream:
    
  
tcpConn
$$ map parseInt
$$ CL.fold (+) 0
durability


  single node is simple
  can make a distributed database which is more durable than any
    single node in the system is

availability


  in a three node system:
    
      is it okay to return to user before all nodes have received a
        write?
      one may fail, but that’s okay
        
          we may have to repair later
        
      
Chad Fowler, Tiny

TDD with kent beck


  TDD is about pushing complexity away
  solve the easy part first, the bit you understand
  push the hard part away
  when you get to the hard bit, solve the easy bit of that first
  etc

Keep things small


  dose of dissonance
    
      massive, elite team
      huge, efficient codebase
      etc
    
  
Limits of memory


  George Miller - The Magical Number Seven, Plus or Minus Two: Some
    Limits on our Capacity for Processing Information
  long-term memory is (effectively) limitless
  short-term memory is limited
    
      analogy with registers in processor
      “chunking” helps hold things together
    
  
  Nelson Cowan
    
      showed that chunking was responsible for getting up to 7 ± 2
      in fact limit was more like 3 or 4
    
  
  some people can do more, but stress brings them back down

code


  “this is the easy one that everyone understands”
  I used to wield the Refactoring book, and talk about design
    patterns, etc
  anecdote: the 10,000-line Java Bean
    
      I felt I had failed
      new rule: every method can only be 5 lines long
        
          every class can only have 5 methods
          it didn’t work, but it seemed like a good idea
        
      
  small:
    
      functions
        
          in haskell, it doesn’t matter if your parameter names are
            called x, f, because the fn is so small you can see
            what’s going on
        
      
      parameter lists
      services
      actors
      applications
      data infratructure
        
          “wouldn’t it be easier if every user had their own database?”
          we used this idea to broke up our monolithic database into a
            bunch of small ones
        
      
      commits
        
          large commits are:
            
              hard to understand
              hard to review
              hard to revert
            
          
  Test everything that could possibly break
    – Kent Beck


  “that’s great, but my code is inherently big”
    
      no it isn’t
      this is the easy part
      you know how to do this
    
  
Teams


  usually people say that big companies suck
    
      cubicle-dilbert-office space environments
    
  
  what works: take big teams and split them into smaller,
    independent teams
  “Zero to One” by Peter Thiel
  Dunbar’s Number
  Gore-Tex practice of deliberately capping the size of an office
    to 150 people
  Military notion of fireteam – small (4) autonomous team
  Putnam: team size can be key to a successful software project
  Jeff Sutherland: Keep Team Size Under 7!
  larger teams need leadership
    
      leadership is hard
      we don’t know how to do it well
    
  
  larger teams have a communication overhead
  single responsibility principle

Projects


  2005/6 – worst burnout of my career
  the big rewrite
  they had a massive php codebase
  we talked about extracting pieces
  doing the strangler pattern
  but we ended up doing a year long rewrite instead
  we shipped it, and the customers hated it
    
      they sent hate mail
    
  
  I should have stopped it
  it self destructed
  it took me years to recover from that project
    
      maybe I never did?
    
  
The problems with big projects:

  too much change at once
    
      small change is easier to introduce successfully
    
  
  big risk
  no real reward until you get to the end
    
      you can’t learn anything as you go
    
  
  http://ww2.cfo.com/risk-management/2014/06/why-big-projects-go-bad/
  Standish CHAOS report
    
      small projects much more likely to succeed than large
        projects (76% vs 10%)
      http://www.versionone.com/assets/img/files/ChaosManifesto2013.pdf
        8th page (numbered 4 on the page itself)
    
  
Should we just get better at dealing with larger things instead of
  using a crutch? (maybe. maybe you can be the person that can memorise
  9-11 items instead)
Isn’t this all the same as decomposition, cohesion, etc? (yes.)
This conference is focussing on tools and/or techniques, which will
  have an incremental impact.  But making things small will have a
  dramatic impact.
“Big” is worse than dogma, partly because it’s our default setting
References: https://www.wunderlist.com/list/136225945
Cheats & Liars: The Martial Art of Protocol Design


  Pieter Hintjens
  making protocols vs self-organizing around problems
  statement: protocols are contracts
    
      who enjoys signing contracts? say for rental, loans, etc
      what about if you have no contract? it’s far worse
      contracts are a cost of doing business
      contracts are a tool for reducing conflict
      in software you need this a lot
      groups work together best when they have contracts between them
      a protocol isn’t just a TCP RFC with frames and headers etc
        
          it’s a way of working together
        
      
  when you do this, you have to define the parties/roles
    
      a protocol for pizza isn’t just “there will be pizza”
      you need to identify pizza providers, pizza consumers
        
          types of ingredients, sizes, delivery time, price, temperature
            on arrival
          you don’t describe the name of the chef
        
      
  contracts need to be specific, but also minimalistic
  cheats and liars try to bamboozle with irrelevent extra language
  it’s hard to fake a pizza contract
    
      you can sell low-quality pizza but you’re still technically
        fulfilling the contract
    
  
  then you get a natural market
  consumers can choose between suppliers
  contracts should also protect against the cheats and liars
    
      make it hard to cheat
      detect and punish when it happens
    
  
  in the early days of the internet, there was the “embrace and
    extend” wars
    
      the big players own the standardization
      AMQP standardization was an insane amount of chaos and pain
        
          people pay lip-service to standards but want to carve out
            their own captive markets
        
      
      large-scale standardization is subject to regulatory capture
    
  
  if protocols are useful they will attract money
    
      which will attract predatory behaviour
      stuff the committee with your friends
      push through changes which noone understands and can’t fight
      patent it
      trademark it
        
          eg Bluetooth
        
      
      hire all the people who make the protocol
      copyright it
      all of these things happened to AMQP
        
          we will not bet on these protocols
          the protocol had been killed
          we came up with a protocol process that couldn’t be captured
        
      
  ZeroMQ:
    
      write a spec
      if that spec is viable people might try to capture it
      we publish the spec under GPL
      if at some point the author goes insane and makes changes, you
        just fork it
        
          fork is GPL too
        
      
      legal wrapping is designed to be capture-proof
    
  
  workshop: write an RFC
    
      start very simple: send “hello” on one port, receive “world” on
        another port
      lesson: do not solve problems you don’t actually have
        
          don’t solve hypothetical problems, no matter how sure you are
            you’ll hit them tomorrow
          determined, applied laziness is essential
        
      
      early ZeroMQ had no versioning, framing, etc
      it was crap
      we spent no time arguing about anything
      second version added a version header
      next version added metadata
      then handshake
      then security
      very gradual
      could prove everything
      nothing was there that wasn’t desperately needed
      engineers love to create stuff
        
          when we create stuff, we’re usually wrong
          we made assumptions
          we don’t actually need it
        
      
  layering:
    
      secret to long-term success
      when you write a contract, it’s worth nothing
        
          then you give it to people and ask them to use it
          if it’s valuable to someone, they make their own
            implementation
        
      
      when your business is based on your implementation of my
        contract, then my contract wins
      the way you make changes to a contract is:
        
          make a new contract which is a modification of the old
          make it backwards-compatible
        
      
  there’s no hurry – this takes years
  you get the power to create very large, distributed systems of
    people
    
      “living systems”
      not necessarily for one business
      can work very efficiently
    
  
  “I see software as a tool for destroying large, inefficient
    companies”
    
      protocols are a way to organize people to destroy their business
    
  
Q&A


  protocols arise out of a problem, not out of nothing. say where
    you have a lot of different people doing the same thing in
    different ways.  does that change things compared to your
    description?  your MVP needs to convince people to switch
    
      one of my projects is called edgenet
        
          I want everyone on the planet to use it
          I want to start small
          I want to prove the problem is a valid problem
            
              I will not speculate
            
          
      you need to set clear rules and boundaries
      bad actors don’t like this and will stay away
        
          you can’t afford verbal contracts
          you need to have a documented process, written by someone as
            cynical as me
          once a year I will ban somebody
            
              you don’t just absorb damage from bad actors
              you make it clear that it is not acceptable
            
          
  in a self-organizing decentralized team, how come you get to
    decide who gets banned?
    
      you can’t. but you must own your own specific project
      there are other projects with criminals and cheats, which you
        can’t control
      but they just make you look better
      you should warn your friends
        
          blacklist people and spread the news around
          and it’s not your fault when there’s a conflict
          when it doesn’t work, at a certain point
          you need multiple people, ask around
            
              if other people are having issue with the same person,
                don’t be afraid to take steps to ban them
            
          
          it’s not innocent, it’s predatory maliciousness
          punish the evil
        
      
  what do you think about the upcoming MQTT and all that?
    
      big companies make stuff.
      then they try to push it through
      I’m running this devroom at FOSDEM on the IoT
      the MQTT guy came along and said he wanted to make a devroom on
        MQTT
        
          I said we don’t want to promote particular products
          turned out he was a marketing manager
        
      
      MQTT is one of these very sad, corporate, zombies
        
          it’s a big, nasty, clumsy, stupid piece of shit
          I would never build anything on that
          go back to raw TCP, it’s better than MQTT
        
      
      most software is rubbish, so this isn’t really
      (joeerl: what is MQTT?)
        
          it’s “Message Queuing T something”
          pub-sub protocol pulled up by IBM
          it’s marketed as the protocol of the Internet of Things
          (joeerl: what?!)
        
      
conclusion


  most of this is in the ZeroMQ Guide chapter 6 (free!)

Functional Patterns in Domain-driven design applied to Financial Models


  Debasish Ghosh

what is a domain model?


  wikipedia definition
  Domain Driven Design, book by Eric Evans
  historically - rich domain models
    
      class models the domain abstraction
      contains state and behaviour together
      state hidden within private access specifier
      we define “nouns first”
    
  
  lean domain models
    
      algebraic data type models the domain abstraction
        
          immutable by definition
        
      
      contains only the defining state as immmutable values
      immutability is the basis of the design here
      nothing but the bare essential definitions go in the ADT
      “verbs first”
    
  
domain model elements


  rehash of evans’ book
  entities & value objects
    
      modeled with types
    
  
  behaviours
    
      modeled with functions
    
  
  domain rules
    
      constraints and validations
    
  
  bounded context
  ubiquitous language

why functional?


  ability to reason about your code
  increased modularity - clean separation of state & behaviour
  immutable data structures
  concurrency becomes easier

mapping


  problem domain:
    
      place order
      do trade
      process execution
    
  
  solution domain:
    
      functions
      on types
      constrains
    
  
  algebra:
    
      morphisms
      sets
      laws
      not concerned with implementation (right now)
      compose for larger abstractions
    
  
concrete algebraic structure: monoid


  a set of elements S
  a special element e
  an operation ⊗ such that
    
      A ⊗ e = A = e ⊗ A  (identity)
      (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C)   (associativity)
    
  
pattern #1: algebraic API design


  client places order (in some flexible format)
    
      transform to internal domain model entity and place for
        execution
      when execution comes back, trade & allocate to client accounts
    
  
We create some types and some operations on those types:
def clientOrders: ClientOrderSheet => List[Order]

def execute: Market => Account => Order => List[Execution]

def allocate: List[Account] => Execution => List[Trade]
and define certain laws that the operations must satisfy. This is
  the algebra of the API.
For a single algebra we can have multiple interpreters aka implementations.
We can decouple the algebra from the interpreter.
Let’s mine some patterns:
def clientOrders: ClientOrderSheet => List[Order]

def execute: Market => Account => Order => List[Execution]

def allocate: List[Account] => Execution => List[Trade]
Note that clientOrders returns List[Order] and that execute takes an
  Order.  Also execute returns a List[Execution] and allocate takes
  an Execution.
This has the pattern of Kleisli arrow (in haskell type notation:
  Monad m => a -> m b or in scala: A => Monad[B])
Define clientOrders thus instead:
def clientOrders: Kleisli[List, ClientOrderSheet, Order]

def allocate(accounts: List[Account]): Kleisli[List, Execution, Trade]
def tradeGeneration(
  market: Market,
  broker: Account,
  clientAccounts: List[Account]) = {

  clientOrders andThen
    execute(market, broker) andThen
      allocate(clientAccounts)
}

Implementation follows the specification and thus we get the
  Ubiquitous Language for free!
Is “nouns first” really the right approach to DDD?
Claim: with functions and types as the binding glue we get all these
  patterns for free using generic abstractions
Pattern #2:


  DDD has patterns like Factories & Specification
    
      these are just normal idioms in functional programming
      they don’t even need special names
    
  
Monadic validation pattern:
val s = for {
  _ <- validQuantity
  _ <- validValueDate
  t <- validUnitPRice
} yield t
(aside: not sure the for-notation adds much here)
Smart constructor: factory pattern (chapter 6, DDD)
validation: specification pattern (chapter 9, DDD)
FP and DDD summary


  FP is focussed on finding the generic
  we implement all specific patterns in terms of generic abstractions
    
      based on function composition
    
  
Pattern #3: functional modelling encourages parametricity


case class Trade(
  refNo: String,
  instrument: Instrument,
  ...)

def netValue: Trade => Money = //..

def valueTradesInBaseCcy(ts: List[Trade]) = {
  ts.foldLeft(Money(0,baseCcy)) { (acc,e) =>
    acc +  inBaseCcy(netValue(e))
  }
}

case class Transaction(id: String, value: Money)

def highestValueTxn(txns: List[Transaction]) = {
  ts.foldLeft(Money(0,baseCcy)) { (acc,e) =>
    acc > e.value
  }
}
Instead of using a specific collection type (List), we can use a
  generic Foldable type.  This reduces the possibilities available to
  valueTradesInBaseCcy and highestValueTxn and reduces scope for
  error.  This is parametricity working for us.
def mapReduce[F[_], A, B](as: F[A])(f: A=>B)
  (implicit fd: Foldable[F], m: Monoid[B]) = fd.foldMap(as)(f)

def valueTradesInBaseCcy(ts: List[Trade]) =
  mapReduce(ts)(netValue andThen inBaseCcy)

def highestValueTxn(txns: List[Transaction]) =
  mapReduce(txns)(_.value)
Takeaways

Move the algebra from domain-specific abstractions to more generic
  ones, to make our code more generic, less verbose, more robust
The program space in the mapReduce function is shrunk - so scope of
  error is reduced
Increases the parametricity of our program - mapReduce is now
  parametric in type parameters A and B
Parametricity: theorems for free! (not the first time this has been
  mentioned in these notes)
Q&A


  when you’re learning, it’s not apparent which abstractions are
    available - eg I didn’t know about the kleisli arrow.  how does a
    beginner get to that stage?
    
      there’s no short answer
      use the tools which are out there - eg hoogle, FP in Scala
        book
      typeclassopaedia
      edward kmett :)
    
  
Naming things


  @ianbarber


  “Names are the one and only tool you have to explain what a variable
    does in every place it appears, without having to scatter comments
    everywhere.” – Michael Mahemoff

You don’t always need to use a name:
window.setTimeout(
    function() {
        console.log("hello");
    }, 1000
);
you need names if you want to rename things:
http://mollyrocket.com/casey/stream_0029.html (zomg, fails without js)
SessionProperties.Wnode.ClientContext = 1;

  Here ClientContext is the type of a timestamp. the name is wrong
    and misleading, although it does correspond to the implementation

what is a name?


  “A ‘name’ is a symbol – usually a human-readable-string –
    identifying some resource, or set of resources.
  The string need not be meaningful to all users, and need not be
    drawn from a uniform name space.”
  – Schoch

Names can be mangled:
  __TFCCC4test1a1b1c1dfS2_FTS0_1xS1_1vFT1xSi_Si_OVS_1e1f is a symbol
  name in swift
Namespaces


  names drawn from a namespace
  a name can be valid but not be assigned (or bound)
    
      can also define aliases
    
  
  a filesystem is a series of nested namespaces
  example: haskell module HttpClient in httpclient.hs - actually in
    /usr/local/httpclient/httpclient.hs
  name purposes:
    
      name: what we want
      address: where it is
      route: how we get there
    
  
  as you go down this list, gets less important from a human
    cognition point of view
    
      change HttpClient to Sausage - wtf?
      change httpclient.hs to sausage.hs - confusing, but less of an issue
      change /usr/local/httpclient to /home/philandstuff/sausage - not sure
        it would be an issue at all any more
    
  
  philosophy of naming
    
      Wittgenstein
        
          my favourite buzzfeed listicle
        
      
      idea
        
          names gain meaning through convention
            
              names attach to your understanding of things
            
          
          sense is in propositions
          interesting propositions are generalizable
        
      
  algorithm:
    
      name things after yourself
      if you need more names, add numbers to the end
    
  
  example: George Foreman
    
      the George Foreman Grill
      children:
        
          George Jr
          George II
          George III
          Georgette
        
      
  example: PageRank
    
      decent name
      often when we name after someone, we have no good way to
        otherwise describe it:
        
          Pythagoras’ Theorem
          Dijkstra’s Algorithm
        
      
  another approach: the Ronseal Pattern (it does exactly what it
    says on the tin)
    
      “Decking stain” - it stains decking
      note: not named after ingredients
    
  
  Code Complete Ch 11 – naming
    
      “In general, if a name refers to some aspect of computing
        rather than to the problem
    
  
  hungarian notation
    
      analysis:
        http://blogs.msdn.com/b/oldnewthing/archive/2004/06/22/162629.aspx
      there were two approaches: the systems division and the apps
        division
      lpString1 (for long pointer) vs szDest (string terminated by zero)
      apps division was better – about how was to be used, rather
        than what it was
    
  
  “the point of having proper names is to just refer to objects and
    not to describe them” – Searle

-- most unrestricted
haversine :: (Ord a, Floating a) => a -> a -> a -> a -> a

-- says how to use it, but not what it does
haversine :: Float -> Float -> Float -> Float -> Float

-- says what it does, but it imposes an extra cost -- how do I get a
-- Location value?
haversine :: Location -> Location -> Distance

  “Zooko’s Triangle” – Human Meaningful, Secure, Decentralised -
    pick two
    
      (not quite a CAP-theorem guarantee, but a broad property)
    
  
general rules


  is a name necessary?
    
      if it doesn’t need one, don’t give it one
    
  
  has it already been named?
    
      don’t invent new names for things
        
          (grr, hash vs map, map vs collect, fold vs reduce vs inject)
        
      
  what is the context & category?
  how should it be used?
  how will people remember it?
    
      don’t make it:
        
          too long
          too similar to something else
          unpronouncable
        
      
Ideas:

  currentLoggedInUser vs me
    
      a shorter name is expected to be used more
      a longer name indicates you expect it to be used less
        
          but also that the distinction you make with the extra length
            is a distinction worth making
        
      
  if someone doesn’t have the name for our thing, they can’t use it
    
      a lot of our success is about the success for people to active
        use the things we create
    
  
  links
    
      http://objectmentor.com/resources/articles/naming.htm
      http://c2.com/cgi/wiki?MeaningfulName
      http://plato.stanford.edu/entries/names/
    
  
questions


  “a lot of naming comes down to programmer readability, and
    similarly to English.  How do you go about naming for non-English
    audiences?”
    
      comes down to context of audience again
      have to use words that the audience are likely to use
      if you know that your audience are haskell programmers, then
        they will already know the words in Prelude
        
          not necessarily well enough to use in English, but well
            enough to use in Haskell
        
      
      when you first define a name, the comments are a good place to
        expand
      if you’re sitting with someone who is explaining something to
        you, the words they use to explain it are sometimes better
        names than the name itself
      often it doesn’t matter if you get it wrong first time
      but in a widely-used library, it’s hard to change
      the harder it is to change, the more effort you should put in
        to getting it right
    
  
  “is it sometimes useful to name things that you’re not using (in
    pattern matching)?”
    
      yes, sometimes they’re useful purely for documentation rather
        than for use
    
  
  “do you abbreviate names?”
    
      it can depend
    
  
  “what about single-letter variable names? would you follow the
    convention of the community, or break convention and be
    description”
    
      single-letter variable names (particularly l) are confusing
      some odd names can get written when extracting a tmp var to
        break down a large expr
    
  
Verifying Sateful and Side-effecting Programs using Dependent Types


  @edwinbrady, St Andrews University

intro


  where is st andrews?
  wonderful collection of error displays
  it’s pretty hard to write software that works
  famous bugs:
  heartbleed
    
      a bug so bad that a marketing department got involved
      nobody spotted it for a long time
      goto
    
  
  goto fail

Idris

Idris is a Pac-man Complete functional programming langauge with
  dependent types.

  cabal update; cabal install idris
  https://github.com/edwinb/idris-demos

In this talk: Reasoning about side effects in Idris

  extra-functional correctness
  example: communication protocols

example: write the type, the editor automatically gives options for
  implementation:
vzipWith : (a -> b -> c) -> Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = []
vzipWith f (x :: xs) (y :: ys) = f x y :: vzipWith f xs ys
Since this is the only implementation, the editor can write this
  function for us from the type alone, including splitting the cases
  into separate pattern matches and making the recursive call.
(it’s hard to take notes on this to demonstrate the real interactivity
  offered…)
aside: with dependent types, you can put anything you want in a type
  and it can get very complicated.
Please don’t do that.
It’s very important that the types make sense.
side effects

The Eff type family indicates side-effects:
vadd_check : Vect n Int -> Vect m Int ->
             { [EXCEPTION String] } Eff (Vect m Int)


read_vec : { [STDIO] } Eff (length ** Vect length Int)

do_vadd : { [STDIO, EXCEPTION String] } Eff ()

Note that we only allow STDIO, not BALLISTICMISSILEIO.
This gives a program which will add two user-supplied vectors of the
  same length.
what is Eff?

Eff is effectively a DSL for managing what you’re allowed to do and
  proving that you don’t do anything forbidden.

  eg: open a file: are you allowed to read it?
    
      only if the open was successful
      types capture success and failure and
    
  
  like a Hoare triple
    
      precondition, command, postcondition
    
  
State machine example: door opening


  CLOSED >- knock -> CLOSED >- open door -> OPEN >- close -> CLOSED

knock     : { [DOOR Closed] } Eff ()

openDoor  : { [DOOR Closed] ==> [DOOR Open] } Eff ()
closeDoor : { [DOOR Open] ==> [DOOR CLosed] } Eff ()
state transitions which handle failure:
data Jam = Opened | Jammed

openDoor : { [DOOR Closed] ==>
             {isJammed}
             [DOOR (case isJammed of
                         Opened => Open
                         Jammed => Closed)] } Eff Jam
doorProg = { [STDIO, DOOR Closed] } Eff ()
doorProg = do knock
              jam <- openDoor
              case jam of
                   Opened => closeDoor
                   Jammed => do putStrLn "NNnnnnnnngh"
                                return ()
This is getting deeply nested.  Is there a better way to handle failure?
Exceptions?  Problem with exceptions is that you don’t know what state
  you were in when you threw; so you don’t know what cleanup you need to
  do.
Inspired by perl’s || die syntax:
doorProg = { [STDIO, DOOR Closed] } Eff ()
doorProg = do knock
              Opened <- openDoor | Jammed => return ()
              closeDoor
If you have a precise enough type system, you have a lot of help.
Protocol actions

Similar to session types.  See http://eb.host.cs.st-andrews.ac.uk/drafts/protocols-spls0614.pdf
data Actions : Type where
     DoSend  : (dest : p) -> (pkt : Type) ->
               (const: pkt -> Actions) -> Actions
     DoRecv  : (src : p) -> (pkt : Type) ->
               (const: pkt -> Actions) -> Actions
     DoRec   : Inf Actions -> Actions
     End     : Actions
echo : Protocol ['C, 'S] ()
echo = do 'C ==> 'S | String
          'S ==> 'C | String
          Done
Refine the types to ensure it really is an echo server:
echo : Protocol ['C, 'S] ()
echo = do msg <- 'C ==> 'S | String
          'S ==> 'C | Literal msg
          Done
Q&A


  “could we have a maybe effect to handle the door failure case?”
    
      there are loads of ways to deal with this
      the key thing is that the type says what state you’re in at
        each point
      there is an EXCEPTION effect but you lose the information about
        the state you’re in
    
  
Process calculi for the working programmer


  @thattommyhall
  Second reference in this conf to Mac Lane’s “Categories for the
    Working Mathematician”

Starting from the beginning


  which is of course the start of the 20th century
  David Hilbert vs Kurt Gödel
    
      also Russell and Whitehead
      Principia Mathematica
    
  
  propositional calculus
  example logical proof
    
      ∃v¬φ          (Assumption)
      ¬φ            (Assumption)
      ∀vφ           (Assumption)
      φ             (Universal Quantifier Elimination, 3)
      (φ&¬φ)        (Tautology,2,4)
      ∀vφ -> (φ&¬φ) (CP,5)
      ¬∀vφ          (Tautology,6)
      ¬∀vφ          (EH,7)
    
  
  Hofstadter, Gödel, Escher, Bach
    
      more programmer-friendly syntax with nesting
    
  
lambda calculus

syntax


  e ::= v | λ/v/./e/ | (e e)
  Lisp syntax is inspired by lambda calculus:
    
      (fn [x] (+ x 1))
    
  
semantics

(\x. E) M => E{M/x}
((\x. x + 1) 2) => 2+1 => 3
((fn [x] (+ x 1)) 2)
scope issues

(\x.\y (x y) (y w))
=> (\y ((y w) y))
but the name collision here changes the meaning.  We didn’t mean for
  the y inside (λy. ..) to refer to the same thing as in (y w)
((fn [x]
   (fn [y]
     (x y)))
 (y w))

=> (in a call-by-name version of clojure)

(fn [y]
   ((y w) y))
order of evaluation


  applicative order (call-by-value)
  normal order (call-by-name)

Universal Register Machines


  End
  Inc n m (add 1 to register n, goto m)
  DecB n m p (subtract 1 from reg n, if x=0 goto m else goto p)

Actors = Lambda calculus + Message Passing


  Robin Milner is responsible for a lot of this work
  Pi Calculus syntax
  parallel processes communicating over channels
    
      writes block until corresponding read, reads block until
        corresponding write
    
  
  non-determinism
    
      if two writers write to same channel, nondeterministic which
        message a reader gets
    
  
dining philosophers


  wiki http://en.wikipedia.org/wiki/Dining_philosophers_problem
  solved using Pict

Communicating Sequential Processes (CSP)


  syntax + semantics

Clojure core.async


  CSP for clojure
  one difference from pi calculus: channels are allowed a buffer
    
      to remove blocking from writes and reads
      though the buffer is always bounded
      (one issue with the actor model is that the mailbox is
        unbounded)
    
  
  channels are first-class values which can be sent down channels

from concurrent to distributed


  mobile pict
  join calculus

deterministic parallelism


  LVars (Lindsey Kuper)
    
      RICON
    
  
practical uses


  Spi-calculus (crypto)
  biology (simulations of cells using pi calculus)
  modelling telephony protocols

failure of formal methods?


  “the world just does not suffer significantly from the kind of
    problem that our research was originally intended to solve” – Hoare

who’s doing this?


  INRIA JoCaml
  Microsoft Research (c-omega?)
  F♯ tryjoinads
  Me (maybe…)
  David Pollak has been talking about making clojure core.async
    distributed

conclusion


  not rocket science
  some of this stuff may be the future
  it’s also interesting

QuickCheck Evolution


  John Hughes @rjmh

QuickCheck runthrough


  why is testing hard?
  say you have n features:
    
      O(n) tests
    
  
  but some bugs appear only when features used together:
    
      pairs of features
      O(n^2) tests
    
  
  etc
  you can’t test everything
  where do you stop?

Don’t write tests!

Generate them

QuickCheck


  1999 - Koen Claessen and John Hughes, for HAskell
  2006 - Quviq founded marketing Erlang version
  many extensions
  finding deep bugs for ericsson, volvo, basho, etc

example

Binary search trees:
to_list(leaf) ->
    [];
to_list(to_list({node,L,X,R}) ->
  to_list(L) ++ [X] ++ to_list(R).

member(_,leaf) ->
    false;
member(X,{node,L,Y,R}) ->
    if X==Y ->
            true;
       X<Y ->
            member(X,L);
       X>Y ->
            member(X,R)
    end.

prop_member() ->
    ?FORALL({X,T},{nat(),tree()},
            member(X,T) == lists:member(X,to_list(T))).
?FORALL asserts a property is true for all natural numbers, and
  trees.  We then generate test cases fitting this pattern
shrinking

Every time I run the test, I get a different, potentially large case;
  but shrinking reduces it down to a much smaller and more manageable
  case.
keeping the example

If you have a failing test case, you may want to keep the failing
  testcase (esp if it appears less frequently), to ensure that you
  really have fixed the bug.
Enter: QuickCheck CI for testing github projects

  demo with https://github.com/rjmh/trees
  testing
  coverage
    
      often directs your attention straight to where the bug is
      coverage data is per-test
        
          the bug can only be in the green lines, since the red lines
            weren’t executed
        
      
  failing tests are remembered between builds
  build doesn’t go green until all previously failing cases are
    passing

state machine testing - example


  test the process registry
    
      register(Name,Pid)
      unregister(Name)
      spawn() - to create pids for test data
    
  
  what’s different?
    
      these functions change the state of the registry
    
  
  we won’t find bugs
    
      instead: reverse engineer preconditions
    
  
state machine models


  test case: sequences of API calls
  for each call, model how the state has changed
  can use postconditions on each call
  when a test fails, shrink the sequence of calls

specification of register:
register_pre(S) ->
    S#state.pids /= [].

register_args(S) ->
    [name(),elements(S#state.pids)].

regsiter(Name,Pid) ->
    erlang:register(Name,Pid).

register_next(S,_,[Name,Pid]) ->
    S#state{regs=S# ...

  can conveniently specify the indended behaviour of stateful
    systems
  testing in practice involves
    
      reverse engineering the actual specification
        
          documentation is always wrong
          specification is always incomplete
        
      
      finding and correcting bugs
    
  
automotive: CAN bus testing


  3000 pages of spec
  20,000 lines of QuickCheck, testing
  1,000,000 LoC, 6 suppliers
  200 problems, of which
    
      100 problems in the standard itself
    
  
does this approach scale? yes!
problem: bugs are plural


  QuickCheck finds one bug at a time
    
      not much point running quickcheck again
        
          it will just shrink down to the same testcase
        
      
      fix the bug
      only then do you find the next bug
    
  
  what if fixing the bugs aren’t so easy?
    
      in the CAN bus case, we didn’t have the source code
      would send a bug report, and get a new version maybe 3 weeks
        later
    
  
  in practice:
    
      unfix the spec to allow the bug
      if we’ve modelled the bug we’ve found correctly, quickcheck
        will now find the next bug
      feels icky
    
  
  “More bugs”
    
      compute “bug signatures” from failures already found
      focus testing effor on areas not yet known to be buggy
    
  
  in practice:
    
      move from finding 1 bug, we’d find 20+ bugs in a single run of
        QuickCheck
    
  
want to try it out?


  https://github.com/hanssv/example_proj
  https://github.com/rjmh/kv_eqc

Q&A


  “is quickcheck-ci erlang only?”
    
      you write the properties in erlang
      there’s a C project we’re testing using it
      “will you support quickcheck for haskell?”
        
          that might be fun
          there’s nothing happening along those lines
        
      
Panel: Two steps forward, one step back


  Andreas Stefik
    
      professor at university nevada las vegas
      research how proglangs impact human beings
      mostly work on Quorum these days
    
  
  Chad Fowler
    
      used to be a ruby guy
      usually use:
        
          ruby
          haskell
          clojure
          scala
          go
          node
          erlang
          objective c
        
      
  Jessica Kerr
    
      15 years of java
      scala then clojure
      I play with ruby in order to speak at ruby conferences
    
  
  Steve Vinoski
    
      work for basho
      lately a lot in C (working on Erlang VM)
      long time C++ (started 1988)
      whole bunch of languages
    
  
  Viktor Klang
    
      work for TypeSafe
      mostly scala
      bit of java, bash, applescript
    
  
static vs dynamic typing


  jess: I think Stefik settled that yesterday :)
    
      ie that static typing improves productivity at some point
        between half a year and 3 years
        
          although we have no data in that interval
        
      
      use of generics
    
  
  Chad: what does static typing mean?
    
      stefik: can mean:
        
          interesting haskell syntax
          generics with crazy wildcards
          most developers think it means “I have types on my method
            decls”
            
              this property is a strong predictor (for what?)
            
          
          implementing generics have a tenfold decrease in productivity
            
              using generics gives a slight bump
            
          
  joeerl: currently we use a lot of C and a lot of erlang
    
      in my experience, 50x more errors in the C than on the erlang
      I’d like to see the division between GC (java) and non-GC (C) langs
      microsoft became rich by shipping buggy software early
        
          will static types deliver
        
      
      Stefik:
        
          I didn’t make a sweeping statement
          we ran formal peer-reviewed RCTs
            
              with human users
              we published the data and the methodology
            
          
          it didn’t say “static typing is always better”
          joeerl: what langs were used in that study
          Stefik: we did 8 studies
          joeerl: can you infer results from one language to
            another
          jess: there’s a lot more difference between C and erlang
          Stefik: we didn’t test C in the syntax studies
            
              there might be any number of causes of bugs in a language
              memory leaks
              if you’re saying there’s many things besides type
                systems that can affect your productivity, that’s true
            
          
          Chad: there’s also strong vs weak typing
          steve: types don’t save me across the network
            
              types don’t save me from locking and concurrency
              other things affect productivity and correctness
            
          
          Stefik: that’s true of course
            
              we don’t know much about these other things that affect
                productivity
              there aren’t many studies - theres one from UT Austin which
                showed eightfold increase in bugs with threads+locks vs STM
              this isn’t surprising but it’s a huge productivity impact
              but we’ve also shown that static vs dynamic typing is a
                productivity impact
            
          
  rvirding: how much is the typing, and how much is being
    forced to write down the type?
    
      stefik: the physical manifestation (the fact that it’s in
        the file) matters
        
          we’ve tested this
          the type needs to be in the method declaration
          not universal across programmers:
          92% of novices screw up types in method decls
          people with more experience get a productivity boost
        
      
  Chad: a little over 10 years ago, there was a big movement for
    dynamic movement against static typing
    
      ruby was the big upcoming
      strong dynamic typing vs weak static typing
      now, we’re all pushing for static typing and category theory
      projects still suck just as bad, software still sucks just as
        bad
    
  
  viktor: there’s a difference between structural and nominal types
    
      particularly with concurrency
      you don’t need to ship the defintion of the types around with
        structural types
    
  
  Stefik: my wish is that people start collecting more and more
    evidence on this stuff
    
      pendulum swings are good
      but if we hone down what works and what doesn’t, when we define
        a new language, we can read 100 papers on productivity
    
  
  jess: everybody can’t read every paper! (reference to @aphyr
    debate recently)
    
      steve: you just don’t want to reinvent the wheel
      jess: how do you find that previous work?
      steve: leslie lamport’s still around! you can ask him
        
          reid draper gave a talk here and included a thing on RPC vs
            messaging
            
              had good mentions of RFCs from 1970s (see above)
            
          
          we have google now
          you don’t have to sit in the library
        
      
      jess: to google it you have to have the name already!
      stefik: I’m talking about people defining a new language
        
          you simply can’t read all the papers
          the number of papers published in a month is probably more
            than anyone can read in a lifetime
        
      
      viktor: getting up to speed with where the researcher was when
        they wrote the paper is a significant effort
    
  
  steve: take all the good papers and put them on stackoverflow
    
      people would read them then!
    
  
  stefik: there would be lawsuits
    
      it’s horribly stupid
      institutions are judged not on quality of papers, but number
        they produce
      when they are produced
        
          the publishing company gets paid for the research
          they contribute nothing to any part of the process
          they provide an “upload” button and you click “submit”
        
      
  jess: we’ve discovered this thing called “open source”

distributed system for akka


  francesco: viktor, how do you approach the work you’re doing, and
    how do you combine new ideas with old ideas?
    
      viktor: what helps me the most is to start from scratch
      esp for distributed systems, don’t start with computers
      first think “how would we solve this problem with people”?
      the more I learn about fundamental problems, the more I see
        them everywhere
      not just in computing
      queueing theory
      a lot of people never peer through the base level
    
  
  chad: do you think anyone knows what the list of fundamentals to
    learn is?
  stefik: there are two groups of scholars in the US trying to
    adjust the compsci education
    
      Exploring Computer Science (ECS) & Computer Science Principles
    
  
  jess: fundamentals are different per person
    
      say someone was doing a talk with queueing theory?
      how many people have a local group where such a presentation
        would be accepted?
    
  
open access


  jeremy gibbons: can you get hold of ACM papers?
    
      does the (in general) paywall stop you accessing the paywall?
      do you find ways round it? do you google the title and find the
        copy on the author’s homepage?
    
  
  philandstuff: the biggest barrier for me is pdf as a format needs
    to die

other sources


  jess: there are other sources from papers – eg open source
    
      how do you find how other people solved a problem before?
    
  
  audience: papers have keywords and get indexed
    
      search by keywords
    
  
  jess: I have written programs to write tests and I didn’t know
    what I was doing was called property-based testing
    
      how can we educate each other?
    
  
  viktor: more negative experience reports
    
      why don’t things work?
    
  
  joeerl: Chad’s slides showed 60% of software projects fail (for
    some value of fail)
    
      I’ve been close to some of those
      the employer never admitted the failure
        
          it was a success, but due to market changes it had been
            down-prioritized
        
      
  Chad: host of the interesting failures aren’t technical
  steve: can you make “failmesh”?
    
      francesco: QCon london once had a “majestic failures” track
        
          C.A.R Hoare and the “billion dollar mistake” - the null
            pointer
        
      
  viktor: I would say a mars lander feet/metres problem was a
    technical error
    
      chad: this is the airport security approach
        
          one person had a bomb in their shoe, so everyone takes of
            their shoes
        
      
      jess: a type system like F♯’s with units would have fixed that
        
          people problem behind our couch database issue
        
      
  rjmh: from CHAOS report, among successful projects, less than
    half of the requirements got implemented
    
      are we doing better nowadays?
      maybe agile methodology means we never discover the half of the
        requirements we failed to implement
      stefik: I’ve heard the CHAOS report cited a million times
        
          it’s a for-profit group
          surveys to managers but not the technical developers
          measuring failures is pretty hard
        
      
      chad: everyone knows the standish thing is bullshitty
        
          the one I showed is 2013
          it seems worth showing to me is that it’s a confirmation of
            what we all already know
        
      
      stefik: I show it to my class too, I’m just reminding that it’s
        bullshitty
      rvirding: how many failures are because the project itself was
        stupid?
      stefik: that’s exactly it!
    
  
  audience: in 2012 RBS had a major failure
    
      people weren’t able to withdraw cash for a weekend
      was that a technical failure or a process failure?
        
          steve: you can look at that as a people failure
            
              they didn’t have a process in place to verify code
              in that case, it’s similar: who put these processes
                together, did they practice them?
              you’re suppose to practice doing upgrades, having outages
              if you don’t do it, how can you have confidence?
            
          
      jess: both etsy and netflix at goto amsterdam:
        
          dealing with failure:
            
              don’t blame anyone
              you can fix it at any of 18 levels
                
                  code
                  testing
                
              
  reiddraper: we have poor role models
    
      eg the 25-year-old male programmer who writes an iphone app
      not the person whos been here 40 years and learns slowly
      how can we fix that?
      jess: pictures of joeerl in a purple velvet jacket