- architecture was always:
- box - box - cylinder
- then layers:
- ideally:
- box - box - box - box - box - box - cylinder
- but in practice, code gets snarled up between the layers
- every time you make a change, you need to change every layer
- ideally:
- in the assembler days:
- cryptic code and brittle code were both slain by compilers
- but now we have multiple orthogonal problems
- latency
- deployment
- search
- security
- caching
- monitoring
- which don’t get solved in a single sweeping way, like compilers solved the problems with assembly language
- architecture is now multiple boxes, multiple cylinders, directed acyclic graph
- most of the code is glue
- because when you break things apart, you spend more effort gluing them back together
- Herb Sutter (2004): the free lunch is over
- lang stdlib
- now: also python and ruby and all of github
- unix utils
- brew install it all!
- database
- MOAR DATABASES!
- algorithms
- read every thesis!
- “IT” makes more sense than “Software Engineering”
- goal is in fact: minimize lead time to business impact
- lead time: starts when problem exists
- business impact: not “done done done”
- @jessitron’s geeky akka simulation of a dev team, comparing:
- a team where they’re trying to write as quickly as possible, not much slack time for tests/monitoring
- a team where they don’t build as fast as possible, and have slack to do
- she graphed two metrics for each team:
- number of implemented features
- number of working features
- “we’re the only industry on the planet that doesn’t give a range when giving an estimate”
- compare: “22 days” with “half a month to 2 months”
- deliberately vague units
- starts a conversation
- delivery manager: “you’re crazy! how can I plan with that estimate”?
- I say because:
- domain
- tech
- security
- perf
- I want the DM to be just as nervous as me about this stuff
- so that they ask questions about how we can reduce uncertainty
- research is a first-class task
- “leverage” is the right word for using technology
- because “leverage” increases risk
- research is an ongoing activity
- research is not the same thing as a spike
- if all you ever do is a spike, you optimize your tech choice for time-to-hello-world
- your architecture diagram should reflect your uncertainty and decisions you haven’t yet made
- box - ? - cylinder
- inner loop: continuous learning/improvement
- outer loop: questioning the process itself
- we’re spending less time coding, and more time learning about
other people’s code
- this is what you need to do to exploit reuse
- IBM invented this in the 1950s
- every management grade had an equivalent tech grade
- VP <-> research fellow
- fixed cost, unlimited opportunity
- Benoit Mandelbrot
- Bill Joy
- evaluation
- monitoring
- deploy
- tests
- program
- databases
- filesystem
- network
- so… you’ve got to be good at all of them
- we tried specialising, but: if your job is to deliver to QA, all you’ll do is pass QA tests
- you need the whole feedback loop to do your job
- we can’t care about everything, that just leads to hopelessness and despair
- we can’t care about nothing
- we kind of just pick a couple and hope for the best
- and hope that enough people randomly choose a particular issue and hope that something gets done
- this is a new talk!
- the great disputes of compsci should be struggled with
- they are chances to make a better, simpler, more relaxed world as much as create opposing camps
- although want to avoid:
- “mustard watches: an integrated approach to time and food”
- spoof paper lampooning arbitrary smushing of concepts together
- “mustard watches: an integrated approach to time and food”
- functional languages were
- isolated
- noninteroperable
- used their own vms
- interop standards were a mess:
- C-calls
- COM
- CORBA
- XML
- F♯ approach (also scala, swift, clojure):
embrace industry-standard runtime layers, influence them. compromise where needed.
- todays FP langs are immensely interoperable while staying true to FP principles
- F♯ type providers raise interop to a completely new level
- (some tensions remain: some techniquees not easily implemented on industry standard VMs)
- F♯ is open, cross-platform, independent
- accepting contribs
- there is now an F♯ software foundation
- becoming a legal entity as a US nonprofit right now
- http://fsharp.org
- xamarin provide F♯ tooling for android and iOS
- usual packages for linux
- visual F♯ tools for windows
- F♯ compiler service powers many other tools
- F♯ community is very self-empowered
- pieces:
- lang
- openness
- cross-platform
- community
- ide tools
- xamarinstudio, emacs, vim, …
- ecosystem
- embrace an open ecosystem of libs, tools and type providers
- flow of contributions:
- github.com/fsharp/fsharp
- visualfsharp.codeplex.com (public git repo)
- this is what gets packaged into the microsoft visual F♯ tools
- nuget packages
- into emacs/xamarin tools
- visualfsharp.codeplex.com (public git repo)
- next release: 4.0
- fsharp uservoice
- projects which use it:
- emacs plugin
- vim plugin
- Visual f♯ power tools
- IFSharp
- FsReveal – reveal.js-based literate programming
- the compiler service is run over the fragments of code to get IntelliSense and suchlike
- enterprise quality + openness + community + tooling = :D
- in 1990s FP langs were anti-OO
- even today any mention of objects in Haskell circles gets an “urggh” reaction
- Compiler Design with Standard ML - Chapter 9, Object Oriented
Programming:
- object: to express a dislike or distatse for something
- embrace objects, make them fit
- particularly for interop
- but don’t embrade full object orientation
- features to allow merging paradigms:
- object expressions to allow expression-oriented programming
- some functional features clash with object features:
- particularly currying vs method overloading
- interesting comparison:
- in C♯, nothing stops you creating circular dependencies, and you have to make special efforts to avoid it
- in F♯, it’s really hard to do this.
- in theory and in practice, unmoderated intra-component cycles are
a disaster
- unrestricted circularity is like crack: once people get used to it, it’s hard to wean them off
- language mechanisms that enforce acyclicity are good
- pattern matching is a lovely feature
- but without a way to name, abstract, and parameterize patterns, it’s a real problem in practice
- similar to Scala extractors?
let (|``Even Number``|``Odd Number``|) n =
if n % 2 = 0 then ``Even Number`` else ``Odd Number``
let (|Const|_|) (e:FSharpExpr) = ...
- relaxation achieved! pattern matching <=> abstraction
- operate in a world between static and dynamic languages
- fsharp.org/use/gpu
- F♯ quotations feature helps here
- most major areas of tension have been resolved
- still some areas that bug me:
- false opposites are worth struggling over
- simplicity and unity are key
- Eric Redmond
- and NoSQL is a rubbish name
- it’s possible to name things that don’t exist like fairies
- working in dbs since 1999 (as a researcher)
- seven databases in seven weeks
- a little riak book
- now at basho
- you should no more make your own distributed system than you should make your own crypto library
- the whole talk right here:
Models which aren’t useful should die. NoSQL is not a useful model, so it should die
- NoSQL is ill-defined
- NoSQL is a bad classifier
- What this means for you
We get people who say “I’m stuck on Mongo or Riak” – this is like saying “I’m stuck between hat or shoes”
- wiki: NoSQL (often interpreted as Not Only SQL)
- but if you’re running a single postgres node, are you running NoSQL?
- Not At All SQL? NAASQL (pron. Nâsgul)
- people desire NoSQL because of some (perceived or actual) shortcoming in SQL
- no horizontal scaling
- “newsql” – VoltDB
- OceanBase
- no semi-structured data
- postgres: JSONB
- SQL Server 2008 supports semi-structured datatypes
- mongolike
- mongo frontend to postgres
- slower dev
- often considered a corollary to no semi-structured data
- ORMs
- just look at RoR!
- can’t use modern tools (like MR/Hadoop)
- MySQL & Postgres both have hadoop connectors
- terrible in representing complex graphs
- high availability
- require indexes for speed
- even if all you have is an id
- a key-value store doesn’t have an index for keys, that’s just what you look up by
- distributed SQL is often a subset, or has caveats
- “we’re SQL98 compatible but you can’t do nested JOINs”
- SQL isn’t always required or helpful
- sometimes all you need is something like S3
- not just ill-defined, actually dangerous
- by models (Graph, doc, col, kv)
- by network topology (mesh, partial-mesh, tree)
- by natural distribution (rdb/graph, doc/col/kv)
- http://en.wikipedia.org/wiki/Graph_database
- easiest thing to model
- boxes and lines
- easy to query (cypher)
START x=node(0)
MATCH x
RETURN x.name
- stores
- neo4j (high perf, ACID)
- hypergraphDB (directed hypergraph)
- titan (distributed)
- ArangoDB (flexible modelling)
- can also act as KV or doc
- (wait, where does this live in our model again?)
- SparkleDB (RDF, SPARQL)
- InfinityDB (distributable, embeddable)
- parts of your graph in a mobile phone, parts in a cloud
- examples
- riak (critical data, simple operations)
- high availability & operations
- aerospike (specialized for SSD+DRAM)
- distributed
- redis (speed, fancy types, messaging)
- horribly broken clustering, so only use single-node instances
- leveldb (embeddable)
- riak (critical data, simple operations)
- message: despite sharing a data model, the use cases here are massively diverse
- row keys, column families
- rows are an ephemeral concept
- KKV? (row key, column family key)
- eg
- BigTable
- HBase
- ordered, sparse (BigTable based)
- Cassandra
- CQL
- thought experiment: if CQL expanded to full SQL, is it no longer NoSQL?
- CQL
- Accumuluo
- Hypertable
- often the first thing people think of when they hear “NoSQL”
- queries via field of document
- normally JSON, but could be XML or whatever
- eg
- CouchDB
- embeddable, replication
- Couchbase
- distributed, failover
- MongoDB
- easy to program
- RethinkDB
- created by former mongo users
- distributed joins
- atomic updates
- CouchDB
- riak + search is a document datastore
- cassandra can do K/V
- ArangoDB (document)
- Postgres (column -> hadoop)
- ElasticSearch (inverted index?)
- (graphs are hard to replicate, because walking paths in a graph
is difficult)
- riak used to have link-walking, they took it out because it was too much overhead
- K/V (with or without indexing)
- Graph
- Other
- is this enough?
- relational DBs can still do K/V and “other” reasonably well
- what’s entailed by distribution?
- what are you trying to achieve?
- sharding vs replication
- what does it mean to have a network partition?
- the CAP theorem
- quick explanation:
- the CAP theorem
say you’re in a bar, and three people order a beer. one goes to the toilet, the other two switch to whisky. If you ask the separated one what the last drink the group had, they can either answer: “I don’t know” or “beer”.
- http://aphyr.com/posts/313-strong-consistency-models
- there are lots of types of consistency
- strong serializable, linearizable, serializable, read-your-writes…
- not all are banned by the CAP theorem in AP systems
- you can have odd things like monotonic consistency
- there are lots of types of consistency
- harvest/yield tradeoff
- you can’t guarantee 100% harvest and 100% yield
- harvest is proportion of request that completes
- 50% of result set responds
- yield is probability that request responds at all
- FLP Impossibility Proof
- safety or liveness but not both
- safety sounds like consistency - is it?
- paxos is safe but not necessarily live
- two-phase locking makes neither guarantee
- safety or liveness but not both
- there’s loads of these issues
- you can’t make a decision here based on “I need a NoSQL database”
- synchronicity is not the only problem (think lightcones)
- intention is the problem
- what does a conflict mean? what does conflict resolution mean?
- first write wins?
- last write wins?
- CRDTs?
- this depends on the domain
- neo4j
- leveldb
- hbase
- couchdb
- every node speaks to every other node
- no SPF
- easy to operate
- can limit cluster size
- chattiness scales quadratically with cluster size
- eg
- riak
- HBase (if configured so)
- Couchbase
- BigCouch
- all different data models!
- don’t let every node talk to every other node
- can introduce complexity to operations
- eg
- riak + MDC (multi-dc replication)
- HBase (if configured so)
- Cassandra
- DC-aware
- consistency is easy to guarantee
- but: single point of failure
- leader election
- these systems often assume you can only get one clean partition
at a time
- three-way partitions can prevent leader election
- lose availability, fail to gain consistency
- three-way partitions can prevent leader election
- eg
- mongo
- HBase (if configured so)
- Postgres/Mysql (more star than tree)
- hard to distribute
- graphs
- relational joins
- easy to distribute
- key/values
- titan (distributed graph)
- infinitydb (distributed graph)
- voltDB (distributed SQL)
- postgres cluster (distributed SQL)
- redis (KVs are easy to distribute, but Redis Cluster sucks)
- don’t roll your own distributed system
- mongo (document can distributed, but mongo tends to be tough to
scale/admin)
- every time you want to double your capacity, you add three nodes
- 3 -> 9 -> 12
- time series, FTS, ranges
- defined schema/schemaless/opaque binary
- large object storage
- HA/SC, Harvest/Yield
- Riak can choose HA or Strong Consistency
- both in the same DB! (although not on same dataset)
- Riak can choose HA or Strong Consistency
- Message patterns (req/rep, pub/sub)
- stream processing, datastream mining
- dev friendliness
- polyglot DBs with data oriented middleware
- (this isn’t a thing, but it should be)
- Jeremy Gibbons, University of Oxford
- goal: show how category theory inspires better code
- but you don’t really need the category theory: it all makes sense as functional programs (Haskell) too.
Two equations, indirectly defining sum:
sum :: [ Integer ] -> Integer
sum [] = 0
sum (x:xs) = x + sum xs
Programs are equations, and are built out of equations.
This pattern works for other fns too. For any given f
and e
, these
equations uniquely define h
:
h [] = e
h (x:xs) = f x (h xs)
The unique solution is called foldr f e
in Haskell:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f e [] = e
foldr f e (x:xs) = f x (foldr f e xs)
This shows that patterns can be expressed as pieces of code.
sum = foldr (+) 0
and = foldr (and) True
decimal = foldr (\d x -> (fromInteger d + x) / 10) 0
id = foldr (:) []
-- etc etc
…only the special syntax. We could have our own list:
data List a = Nil | Cons a (List a)
Then we could have
foldList :: (a -> b -> b) -> b -> List a -> b
foldList f e Nil = e
foldList f e (Cons x xs) = f x (foldList f e xs)
This works for other data types
data Tree a = Tip a | Bin (Tree a) (Tree a)
Then we could have
foldTree :: (a -> b) -> (b-> b->b) -> Tree a -> b
foldTree f g (Tip x) = f x
foldTree f g (Bin xs ys) = g (foldTree f g xs) (foldTree f g ys)
Rose trees (eg for games or XML)
data Rose a = Node a [Rose a]
(aside: name Rose comes from rhododendron (greek for rose tree))
What’s the right notion of fold here?
foldRose1 :: (a -> c -> b) -> (b -> c -> c) -> c -> Rose a -> b
foldRose1 f g e (Node x ts) = f x (foldr g e (map (foldRose1 f g e) ts))
-- but suppose we might want to process list nodes in a different order?
foldRose2 :: (a -> b -> b) -> ([b] -> b) -> Rose a -> b
foldRose2 f g (Node x ts) = f x (g (map (foldRose2 f g) ts))
-- even more general:
foldRose3 :: (a -> [b] -> b) -> Rose a -> b
foldRose3 f (Node x ts) = f x (map (foldRose3 f) ts)
Which one should we use? Category theory can guide this decision.
Haskell gets folds for non-empty lists “wrong”:
foldr1, foldl1 :: (a -> a -> a) -> [a] -> a
this won’t let you define the identity function on nonempty lists
Separate out list-specific shape from type recursion:
-- two-level type definition of list
-- ie ListS is nonrecursive, and Fix doesn't know about lists
data ListS a b = NilS | ConsS a b
data Fix s a = In (s a (Fix s a))
type List a = Fix ListS a
eg, [1,2,3]
is represented thus
In (ConsS 1 (In (ConsS 2 (In (ConsS 3 (In NilS))))))
convenience fn:
out :: Fix s a -> s a (Fix s a)
out (In x) = x
This is a lot of boilerplate that you wouldn’t want to actaully write
but it’s a general way of constructing data types to help understand them
Shape is mostly opaque, just need to ‘locate’ the as and bs
bimap :: (a -> a') -> (b -> b') -> ListS a b -> ListS a' b'
bimap f g NilS = NilS
bimap f g (ConsS a b) = ConsS (f a) (g b)
-- closely related to 'fmap' from Functor
Now we can define a more cleanly separated version of foldr on List:
foldList :: (ListS a b -> b) -> List a -> b
foldList f = f . bimap id (foldList f) . out
-- eg foldList add :: List Integer -> Integer, where
add :: ListS Integer Integer -> Integer
add NilS = 0
add (ConsS m n) = m + n
Think of a bifunctor S
. It is also a functor in each argument
separately.
An algebra for functor S A
is a pair (B,f)
where f :: S A B ->
B
A homomorphism between (B,f)
and (C,g)
is a function h :: B ->
C
such that
h . f = g . bimap id h
-- (note bimap id == fmap, so)
h . f = g . fmap h
An algebra (B,f)
is initial if there is a unique homomorphism from
it to each (C,g)
.
eg (List Integer, In)
and (Integer, add)
are both algebras for
ListS Integer
:
In :: ListS Integer (List Integer) -> List Integer
add :: ListS Integer Integer -> Integer
and sum :: List Integer -> Integer
is a homomorphism from (List
Integer, In)
to (Integer, add)
. In fact (List Integer, In)
is
the initial algebra.
Theorem: for all sensible shape functors S
, initial algebras exist.
This means that: whatever shape you pick for your datatype, you can define a corresponding notion of folds, which will be in some sense “most general” for that datatype.
Recall
fold :: Bifunctor s => (s a b -> b) -> Fix s a -> b
fold f = f . bimap id (fold f) . out
reverse certain arrows:
unfold :: Bifunctor s => (b -> s a b) -> b -> Fix s a
unfold f = In . bimap id (unfold f) . f
and I have a function which will generate a data structure for me. Eg
iterate
, repeat
,…
If you pick the right abstraction, the relationship between fold
and
unfold
is obvious. If you pick the wrong one it doesn’t make it
clear:
unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
unfoldr f b = case f b of
Nothing -> []
Just (a,b') -> a : unfoldr f b'
Category theory is a good organisation tool for library designers
Helps you write better code, with less mess
The maths is really quite pretty
but the Haskell makes sense on its own too.
http://patternsinfp.wordpress.com http://www.cs.ox.ac.uk/jeremy.gibbons/
- Clément Delagargue
- clementd
- from a map, I can get a value…
- maybe
- from a string, I can parse a string…
- maybe
- want to abstract away this “maybe” concept
Option[+A]
typeOption[String]
+String => Option[Int]
=Option[Int]
- and I can end up with code which is correct by construction
- our code obviously has no errors
- (cf Tony Hoare)
- our code obviously has no errors
- tests can only demonstrate the presence of errors, not the absence
- types can give mathematical certainty of properties
- Types ⇔ Properties
- provably > probably
- but if I can’t prove interesting properties, it’s not useful. so we need:
- single most important feature: parametricity or parametric
polymorphism
- aka generics
- I don’t see it as a code reuse tool
- it gives you lots of expressivity, by capturing the notion of “I don’t know or care what type a is”. This gives useful guarantees. eg:
def f[A](x: A): A
There is only one possible implementation of this function: the
identity function. I don’t know what type A
is, so I have no way of
creating a value of that type other than by using the only instance I
have, x
.
Similarly:
def compose[A,B,C](
g: (B => C),
f: (A => B)
): (A => C)
again there is only one possible implementation of this type.
This is more interesting:
def rev[A](x: List[A]): List[A]
There are multiple possible fns here (eg identity, reverse, sort all fit the signature), but I can still guarantee some things from the type alone:
- Any value in the list
rev(xs)
must also be inxs
, because rev has no other way to create an instance ofA
.
These are the famous theorems for free!
There are ways to break parametricity using common language features:
null
is a value which inhabits every type. This breaks
parametricity guarantees: eg with nulls I can define:
def f[A](x: A) = null
def f[A](x: A) = match x {
case (x : String) = "foo"
case (x : Int) = 3
case _ = x
}
def f[A](x: A) = throw RuntimeException
def f[A](x: A) = {
launchBallisticMissiles();
x
}
- you should learn Haskell type syntax:
a -> a -- function from a to a (lowercase indicates a is *any* type)
Int -> Int -- function from concrete type to concrete type
a -> b -> a -- equivalent to:
a -> (b -> a) -- think of it as a two-arg function
-- typeclasses:
(Ord a) => [a] -> [a]
-- here Ord is a typeclass. 'a' now means "any type which has a
-- natural order"
-- we still get parametricity, but we can restrict it so it's not over all types
-- fn to remove duplicates:
Eq a => [a] -> [a]
Can use hoogle to search on type: hoogle “Eq a => {a} -> {a}”
Hoogle is smart enough to be more general: hoogle “{Maybe a} -> Maybe {a}” which can find:
sequence :: Monad m => [m a] -> m [a]
this fits because Maybe
is a Monad
, so the sequence
function can
be called on it.
Recall rev
:
def rev[A](x: List[A]): List[A]
The only extra thing we need beyond the types is this statment:
rev(xs ++ ys) == rev(ys) ++ rev(xs)
This kind of argument leads to property-based reasoning, and property-based testing.
Properties are perfect for ensuring you aren’t caught by edge cases.
Properties are often referred to as laws, which must hold but aren’t enforrced by the type system. So your reasoning becomes:
Types + Operations + Laws
which is another way of talking about algebras.
- types are
- high-level reasoning tool
- communication tool
- what about dependent types?
- I worry that your types become as difficult as code to work with
- a VM to run Erlang
- not designed to run other languages
- like the JVM in that respect
- not designed to run other languages
- properties of the Erlang system:
- lightweight, massive concurrency
- async comms
- process isolation
- error handling
- continuous evolution of the system
- soft realtime
- immutable data
- baked into the system
- predefined set of data types
- tuples, lists, maps
- that’s it
- pattern matching
- primitives built in to the machine for doing this
- functional language
- modules
- no global data
- can work on files and erlang abstract code
- can generate .beam files or binaries
- has Core, a nice intermediate language
- can be input to the compiler
- simple and regular
- easier to compile to
- erlang -> core erlang -> kernel erlang -> beam assembler
- optimization passes between
- you can get anywhere in the compiler
- you can even write the assembler directly (though not recommended)
- simple functional language
- “normal” lexical scoping
- just the basics
- no records
- no list comprehensions
- supports pattern matching
- most optimizations done on core
- dialyzer speaks core
- sort of 🙍
- argh wall of text
- key point: Core Erlang is the nicest thing to target when implementing BEAM languages
- leex - lexer generator
- yecc - parser generator
- syntax tools - for building erlang code
- XML parsers (xmlerl)
- compiler
- based on lex/flex but simpler
- uses regexes to define tokens
- generates scanning functions
- direct use
- string/2
- for the i/o system
- token/2, tokens/2
- direct use
Definitions.
U = [A-Z]
L = [a-z]
D = [0-9]
Rules.
{L}({U}|{L}|{D}|_)* :
{token,{atom,TokenLine,list_to_atom(TokenChars)}}.
({U}|_)({U}|{L}|{D}|_)* :
{token,{var,TokenLine,list_to_atom(TokenChars)}}.
%[^\n]* : skip_token.
Usage:
{ok,Tokens,EndLine} = my_scan:string(Chars, StartLine)
{ok,Tokens,EndLine} = io:request(Io, {get_until,'',my_scan,tokens,[StartLine]})
- basically yacc for erlang
- LALR(1) parser generator
- generates
parse/1
Nontermanals E T F.
Terminals '+' '-' '*' '/' '(' ')' number.
Rootsymbol E.
E -> E '+' T : {'+','$1','$3'}.
E -> E '-' T : {'-','$1','$3'}.
E -> T : '$1'.
T -> T '*' F : {'*','$1','$3'}.
T -> T '/' F : {'/','$1','$3'}.
T -> F : '$1'.
F -> '(' E ')' : '$2'.
F -> number : '$1'.
- shift-reduce errors
- generally prefers shift to reduce
- reduce-reduce errors
- would like to write:
expr -> pattern '=' expr ...
but ‘pattern’ and ‘expr’ aren’t distinguishable using 1 token of lookahead as LALR(1), so must write:
expr -> expr '=' expr ...
and extract pattern later.
- Leonard Cohen: New Skin for the Old Ceremony
- you can access all existing erlang code, OTP, etc
- get a language which is as optimal as it can be for the VM
- elixir:
- generates erlang
- lfe:
- generates core erlang
- goals:
- provide lots of lisp goodies
- homoiconicity and macros
- seamlessly interact with vanilla erlang
- provide lots of lisp goodies
- modify the language to fit with erlang/otp
- not common lisp or scheme
- no mutable data
- only erlang data types
- macros are only compile-time
- Lisp-2 (not Lisp-1)
- ie separate namespaces for functions and values
- is as fast as vanilla erlang
- LFE core primites are core erlang
- so it’s almost a one-to-one mapping from erlang to core
- exceptions:
- lisp ‘if’ goes to core ‘case’
- dynamic functional language
- influenced by ruby
- but not ruby on BEAM
- metaprogramming capabilities via macros
- though cannot change syntax
- many libraries and interfaces standardised and rewritten
- compiles to erlang (rather than core erlang)
- elixir has structs
- we have to fake it
- use existing map datatype
- special format: key ‘__struct__’ with value of struct name
- special syntax and libs to create and recognise the new ‘type’
- not just basic erlang
- different semantics
- non-erlang datatypes
- non-erlang handling of data
- erlog (prolog)
- luerl (lua)
- can use leex
- grammar needs backtracking
- cannot use yecc or any other parser generator
- hand written parser
- data:
- good mapping
- structures ⇔ tuples
- logical variables and unification
- they are both “global” and “mutable”
- can be unified
- changes seen everywhere
- we are cunning and use
{Index}
- backtracking
- erlang doesn’t have it
- use continuation-passing style
- efficient and works
- executing is not “interruptible”
- control is in compiled code
- if we want to have better control we need an interpreter
- implements standard lua 5.2
- lua is
- simple, rather neat imperative lang
- dynamic lang
- lexically scoped
- mutable variables/environments
- embeddable
- leex + yecc good
- one reduce-reduce conflict, easily resolved
- datatypes
- mutable key-value tables
- global
- need to explicitly manage global data
- global table store
- orddict/array/ETC tables
- environment store
- environments
- tables
- global table store
- don’t you want to make a concurrent lua?
- that’s a good question but will take time to answer
- wouldn’t it be nice to make the erlang GC available to other
languages
- yes, but it’d make it harder for erlang itself
- Reid Draper, @reiddraper
- distributed databases have a fundamentally different architecture from single-node counterparts
- need to look at:
- Communication and messaging
- correctness testing
- benchmarking
- abstraction
- single-node dbs don’t need to worry here!
- distributed systems need to consider all sorts of things, starting with topology
- “8 fallacies of distributed computing”
- the network is reliable
- latency is zero
- bandwidth is infinite
- the network is secure
- topology doesn’t change
- there is one administrator
- transport cost is zero
- both monetarily and computationally
- the network is homogeneous
- local and remote procedures are indistinguishable
- procedures are familiar
- let’s lift distributed programming to the same abstraction
- RPC claims transparency
- RFC 674
- unfortunately.
- this abstraction fails
- handling failure
- what is the remote node never responds?
- what if the TCP conn is closed?
- did it fail halfway through, or right before it returned to
us?
- is it safe to retry?
- procedure call abstraction tells us nothing here
- why don’t we get a stacktrace?
- we just never hear back
- wasn’t this supposed to be transparent?
- what if our RPC call makes another RPC call?
- how do you serialize a file descriptor or a db connection?
- a lot of these problems were known about early on:
- RFC 684 critiques procedure calls as a paradigm for distributed computing
- we send a message to a remote host as a basic operation
- no expectations at this level
- higher-level protocols built on top
- when I send a message, I expect a reply
- erlang’s actor model embodies this
- Typical unit testing: linear effort for the amount of coverage/confidence we need in our code.
- if we want 50 tests, we need to write 5
- are we going to think of edge cases like adding to MAX_INT?
- nondeterminism kills unit testing too
- QuickCheck
- haskell, erlang, scala, clojure
- builds on property-based testing
- distributed system is inherently nondeterministic at runtime
- want to avoid “works on my machine”
- want to find race conditions
- pseudorandomly generate schedules between different distributed
nodes
- completely controls interleaving of operations on separate nodes
- simulant
- records actions of test into database while test is running
- imagine we have a cron job running on a single node to take a
backup
- make tarball, send to S3
- uses a lot of network bandwidth
- what effect does this have on the system?
- need to capture this sort of thing in your simulation
- programming languages
- language-level api for distributed communication
- libraries for dealing with streaming data
- powerful list processing with map/filter/fold
- can push that into stream processing as well
- conduit library treats a TCP conn as a stream:
tcpConn
$$ map parseInt
$$ CL.fold (+) 0
- single node is simple
- can make a distributed database which is more durable than any single node in the system is
- in a three node system:
- is it okay to return to user before all nodes have received a write?
- one may fail, but that’s okay
- we may have to repair later
- TDD is about pushing complexity away
- solve the easy part first, the bit you understand
- push the hard part away
- when you get to the hard bit, solve the easy bit of that first
- etc
- dose of dissonance
- massive, elite team
- huge, efficient codebase
- etc
- George Miller - The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information
- long-term memory is (effectively) limitless
- short-term memory is limited
- analogy with registers in processor
- “chunking” helps hold things together
- Nelson Cowan
- showed that chunking was responsible for getting up to 7 ± 2
- in fact limit was more like 3 or 4
- some people can do more, but stress brings them back down
- “this is the easy one that everyone understands”
- I used to wield the Refactoring book, and talk about design patterns, etc
- anecdote: the 10,000-line Java Bean
- I felt I had failed
- new rule: every method can only be 5 lines long
- every class can only have 5 methods
- it didn’t work, but it seemed like a good idea
- small:
- functions
- in haskell, it doesn’t matter if your parameter names are
called
x
,f
, because the fn is so small you can see what’s going on
- in haskell, it doesn’t matter if your parameter names are
called
- parameter lists
- services
- actors
- applications
- data infratructure
- “wouldn’t it be easier if every user had their own database?”
- we used this idea to broke up our monolithic database into a bunch of small ones
- commits
- large commits are:
- hard to understand
- hard to review
- hard to revert
- large commits are:
- functions
Test everything that could possibly break – Kent Beck
- “that’s great, but my code is inherently big”
- no it isn’t
- this is the easy part
- you know how to do this
- usually people say that big companies suck
- cubicle-dilbert-office space environments
- what works: take big teams and split them into smaller, independent teams
- “Zero to One” by Peter Thiel
- Dunbar’s Number
- Gore-Tex practice of deliberately capping the size of an office to 150 people
- Military notion of fireteam – small (4) autonomous team
- Putnam: team size can be key to a successful software project
- Jeff Sutherland: Keep Team Size Under 7!
- larger teams need leadership
- leadership is hard
- we don’t know how to do it well
- larger teams have a communication overhead
- single responsibility principle
- 2005/6 – worst burnout of my career
- the big rewrite
- they had a massive php codebase
- we talked about extracting pieces
- doing the strangler pattern
- but we ended up doing a year long rewrite instead
- we shipped it, and the customers hated it
- they sent hate mail
- I should have stopped it
- it self destructed
- it took me years to recover from that project
- maybe I never did?
The problems with big projects:
- too much change at once
- small change is easier to introduce successfully
- big risk
- no real reward until you get to the end
- you can’t learn anything as you go
- http://ww2.cfo.com/risk-management/2014/06/why-big-projects-go-bad/
- Standish CHAOS report
- small projects much more likely to succeed than large projects (76% vs 10%)
- http://www.versionone.com/assets/img/files/ChaosManifesto2013.pdf 8th page (numbered 4 on the page itself)
Should we just get better at dealing with larger things instead of using a crutch? (maybe. maybe you can be the person that can memorise 9-11 items instead)
Isn’t this all the same as decomposition, cohesion, etc? (yes.)
This conference is focussing on tools and/or techniques, which will have an incremental impact. But making things small will have a dramatic impact.
“Big” is worse than dogma, partly because it’s our default setting
References: https://www.wunderlist.com/list/136225945
- Pieter Hintjens
- making protocols vs self-organizing around problems
- statement: protocols are contracts
- who enjoys signing contracts? say for rental, loans, etc
- what about if you have no contract? it’s far worse
- contracts are a cost of doing business
- contracts are a tool for reducing conflict
- in software you need this a lot
- groups work together best when they have contracts between them
- a protocol isn’t just a TCP RFC with frames and headers etc
- it’s a way of working together
- when you do this, you have to define the parties/roles
- a protocol for pizza isn’t just “there will be pizza”
- you need to identify pizza providers, pizza consumers
- types of ingredients, sizes, delivery time, price, temperature on arrival
- you don’t describe the name of the chef
- contracts need to be specific, but also minimalistic
- cheats and liars try to bamboozle with irrelevent extra language
- it’s hard to fake a pizza contract
- you can sell low-quality pizza but you’re still technically fulfilling the contract
- then you get a natural market
- consumers can choose between suppliers
- contracts should also protect against the cheats and liars
- make it hard to cheat
- detect and punish when it happens
- in the early days of the internet, there was the “embrace and
extend” wars
- the big players own the standardization
- AMQP standardization was an insane amount of chaos and pain
- people pay lip-service to standards but want to carve out their own captive markets
- large-scale standardization is subject to regulatory capture
- if protocols are useful they will attract money
- which will attract predatory behaviour
- stuff the committee with your friends
- push through changes which noone understands and can’t fight
- patent it
- trademark it
- eg Bluetooth
- hire all the people who make the protocol
- copyright it
- all of these things happened to AMQP
- we will not bet on these protocols
- the protocol had been killed
- we came up with a protocol process that couldn’t be captured
- ZeroMQ:
- write a spec
- if that spec is viable people might try to capture it
- we publish the spec under GPL
- if at some point the author goes insane and makes changes, you
just fork it
- fork is GPL too
- legal wrapping is designed to be capture-proof
- workshop: write an RFC
- start very simple: send “hello” on one port, receive “world” on another port
- lesson: do not solve problems you don’t actually have
- don’t solve hypothetical problems, no matter how sure you are you’ll hit them tomorrow
- determined, applied laziness is essential
- early ZeroMQ had no versioning, framing, etc
- it was crap
- we spent no time arguing about anything
- second version added a version header
- next version added metadata
- then handshake
- then security
- very gradual
- could prove everything
- nothing was there that wasn’t desperately needed
- engineers love to create stuff
- when we create stuff, we’re usually wrong
- we made assumptions
- we don’t actually need it
- layering:
- secret to long-term success
- when you write a contract, it’s worth nothing
- then you give it to people and ask them to use it
- if it’s valuable to someone, they make their own implementation
- when your business is based on your implementation of my contract, then my contract wins
- the way you make changes to a contract is:
- make a new contract which is a modification of the old
- make it backwards-compatible
- there’s no hurry – this takes years
- you get the power to create very large, distributed systems of
people
- “living systems”
- not necessarily for one business
- can work very efficiently
- “I see software as a tool for destroying large, inefficient
companies”
- protocols are a way to organize people to destroy their business
- protocols arise out of a problem, not out of nothing. say where
you have a lot of different people doing the same thing in
different ways. does that change things compared to your
description? your MVP needs to convince people to switch
- one of my projects is called edgenet
- I want everyone on the planet to use it
- I want to start small
- I want to prove the problem is a valid problem
- I will not speculate
- you need to set clear rules and boundaries
- bad actors don’t like this and will stay away
- you can’t afford verbal contracts
- you need to have a documented process, written by someone as cynical as me
- once a year I will ban somebody
- you don’t just absorb damage from bad actors
- you make it clear that it is not acceptable
- one of my projects is called edgenet
- in a self-organizing decentralized team, how come you get to
decide who gets banned?
- you can’t. but you must own your own specific project
- there are other projects with criminals and cheats, which you can’t control
- but they just make you look better
- you should warn your friends
- blacklist people and spread the news around
- and it’s not your fault when there’s a conflict
- when it doesn’t work, at a certain point
- you need multiple people, ask around
- if other people are having issue with the same person, don’t be afraid to take steps to ban them
- it’s not innocent, it’s predatory maliciousness
- punish the evil
- what do you think about the upcoming MQTT and all that?
- big companies make stuff.
- then they try to push it through
- I’m running this devroom at FOSDEM on the IoT
- the MQTT guy came along and said he wanted to make a devroom on
MQTT
- I said we don’t want to promote particular products
- turned out he was a marketing manager
- MQTT is one of these very sad, corporate, zombies
- it’s a big, nasty, clumsy, stupid piece of shit
- I would never build anything on that
- go back to raw TCP, it’s better than MQTT
- most software is rubbish, so this isn’t really
- (joeerl: what is MQTT?)
- it’s “Message Queuing T something”
- pub-sub protocol pulled up by IBM
- it’s marketed as the protocol of the Internet of Things
- (joeerl: what?!)
- most of this is in the ZeroMQ Guide chapter 6 (free!)
- Debasish Ghosh
- wikipedia definition
- Domain Driven Design, book by Eric Evans
- historically - rich domain models
- class models the domain abstraction
- contains state and behaviour together
- state hidden within private access specifier
- we define “nouns first”
- lean domain models
- algebraic data type models the domain abstraction
- immutable by definition
- contains only the defining state as immmutable values
- immutability is the basis of the design here
- nothing but the bare essential definitions go in the ADT
- “verbs first”
- algebraic data type models the domain abstraction
- rehash of evans’ book
- entities & value objects
- modeled with types
- behaviours
- modeled with functions
- domain rules
- constraints and validations
- bounded context
- ubiquitous language
- ability to reason about your code
- increased modularity - clean separation of state & behaviour
- immutable data structures
- concurrency becomes easier
- problem domain:
- place order
- do trade
- process execution
- solution domain:
- functions
- on types
- constrains
- algebra:
- morphisms
- sets
- laws
- not concerned with implementation (right now)
- compose for larger abstractions
- a set of elements S
- a special element e
- an operation ⊗ such that
- A ⊗ e = A = e ⊗ A (identity)
- (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) (associativity)
- client places order (in some flexible format)
- transform to internal domain model entity and place for execution
- when execution comes back, trade & allocate to client accounts
We create some types and some operations on those types:
def clientOrders: ClientOrderSheet => List[Order]
def execute: Market => Account => Order => List[Execution]
def allocate: List[Account] => Execution => List[Trade]
and define certain laws that the operations must satisfy. This is the algebra of the API.
For a single algebra we can have multiple interpreters aka implementations.
We can decouple the algebra from the interpreter.
Let’s mine some patterns:
def clientOrders: ClientOrderSheet => List[Order]
def execute: Market => Account => Order => List[Execution]
def allocate: List[Account] => Execution => List[Trade]
Note that clientOrders returns List[Order]
and that execute takes an
Order
. Also execute returns a List[Execution]
and allocate takes
an Execution
.
This has the pattern of Kleisli arrow (in haskell type notation:
Monad m => a -> m b
or in scala: A => Monad[B]
)
Define clientOrders thus instead:
def clientOrders: Kleisli[List, ClientOrderSheet, Order]
def allocate(accounts: List[Account]): Kleisli[List, Execution, Trade]
def tradeGeneration(
market: Market,
broker: Account,
clientAccounts: List[Account]) = {
clientOrders andThen
execute(market, broker) andThen
allocate(clientAccounts)
}
Implementation follows the specification and thus we get the Ubiquitous Language for free!
Is “nouns first” really the right approach to DDD?
Claim: with functions and types as the binding glue we get all these patterns for free using generic abstractions
- DDD has patterns like Factories & Specification
- these are just normal idioms in functional programming
- they don’t even need special names
Monadic validation pattern:
val s = for {
_ <- validQuantity
_ <- validValueDate
t <- validUnitPRice
} yield t
(aside: not sure the for-notation adds much here)
Smart constructor: factory pattern (chapter 6, DDD)
validation: specification pattern (chapter 9, DDD)
- FP is focussed on finding the generic
- we implement all specific patterns in terms of generic abstractions
- based on function composition
case class Trade(
refNo: String,
instrument: Instrument,
...)
def netValue: Trade => Money = //..
def valueTradesInBaseCcy(ts: List[Trade]) = {
ts.foldLeft(Money(0,baseCcy)) { (acc,e) =>
acc + inBaseCcy(netValue(e))
}
}
case class Transaction(id: String, value: Money)
def highestValueTxn(txns: List[Transaction]) = {
ts.foldLeft(Money(0,baseCcy)) { (acc,e) =>
acc > e.value
}
}
Instead of using a specific collection type (List), we can use a
generic Foldable type. This reduces the possibilities available to
valueTradesInBaseCcy
and highestValueTxn
and reduces scope for
error. This is parametricity working for us.
def mapReduce[F[_], A, B](as: F[A])(f: A=>B)
(implicit fd: Foldable[F], m: Monoid[B]) = fd.foldMap(as)(f)
def valueTradesInBaseCcy(ts: List[Trade]) =
mapReduce(ts)(netValue andThen inBaseCcy)
def highestValueTxn(txns: List[Transaction]) =
mapReduce(txns)(_.value)
Move the algebra from domain-specific abstractions to more generic ones, to make our code more generic, less verbose, more robust
The program space in the mapReduce function is shrunk - so scope of error is reduced
Increases the parametricity of our program - mapReduce is now parametric in type parameters A and B
Parametricity: theorems for free! (not the first time this has been mentioned in these notes)
- when you’re learning, it’s not apparent which abstractions are
available - eg I didn’t know about the kleisli arrow. how does a
beginner get to that stage?
- there’s no short answer
- use the tools which are out there - eg hoogle, FP in Scala book
- typeclassopaedia
- edward kmett :)
“Names are the one and only tool you have to explain what a variable does in every place it appears, without having to scatter comments everywhere.” – Michael Mahemoff
You don’t always need to use a name:
window.setTimeout(
function() {
console.log("hello");
}, 1000
);
you need names if you want to rename things:
http://mollyrocket.com/casey/stream_0029.html (zomg, fails without js)
SessionProperties.Wnode.ClientContext = 1;
- Here
ClientContext
is the type of a timestamp. the name is wrong and misleading, although it does correspond to the implementation
“A ‘name’ is a symbol – usually a human-readable-string – identifying some resource, or set of resources.
The string need not be meaningful to all users, and need not be drawn from a uniform name space.”
– Schoch
Names can be mangled:
__TFCCC4test1a1b1c1dfS2_FTS0_1xS1_1vFT1xSi_Si_OVS_1e1f
is a symbol
name in swift
- names drawn from a namespace
- a name can be valid but not be assigned (or bound)
- can also define aliases
- a filesystem is a series of nested namespaces
- example: haskell module HttpClient in httpclient.hs - actually in /usr/local/httpclient/httpclient.hs
- name purposes:
- name: what we want
- address: where it is
- route: how we get there
- as you go down this list, gets less important from a human
cognition point of view
- change HttpClient to Sausage - wtf?
- change httpclient.hs to sausage.hs - confusing, but less of an issue
- change /usr/local/httpclient to /home/philandstuff/sausage - not sure it would be an issue at all any more
- philosophy of naming
- Wittgenstein
- idea
- names gain meaning through convention
- names attach to your understanding of things
- sense is in propositions
- interesting propositions are generalizable
- names gain meaning through convention
- algorithm:
- name things after yourself
- if you need more names, add numbers to the end
- example: George Foreman
- the George Foreman Grill
- children:
- George Jr
- George II
- George III
- Georgette
- example: PageRank
- decent name
- often when we name after someone, we have no good way to
otherwise describe it:
- Pythagoras’ Theorem
- Dijkstra’s Algorithm
- another approach: the Ronseal Pattern (it does exactly what it
says on the tin)
- “Decking stain” - it stains decking
- note: not named after ingredients
- Code Complete Ch 11 – naming
- “In general, if a name refers to some aspect of computing rather than to the problem
- hungarian notation
- analysis: http://blogs.msdn.com/b/oldnewthing/archive/2004/06/22/162629.aspx
- there were two approaches: the systems division and the apps division
- lpString1 (for long pointer) vs szDest (string terminated by zero)
- apps division was better – about how was to be used, rather than what it was
- “the point of having proper names is to just refer to objects and not to describe them” – Searle
-- most unrestricted
haversine :: (Ord a, Floating a) => a -> a -> a -> a -> a
-- says how to use it, but not what it does
haversine :: Float -> Float -> Float -> Float -> Float
-- says what it does, but it imposes an extra cost -- how do I get a
-- Location value?
haversine :: Location -> Location -> Distance
- “Zooko’s Triangle” – Human Meaningful, Secure, Decentralised -
pick two
- (not quite a CAP-theorem guarantee, but a broad property)
- is a name necessary?
- if it doesn’t need one, don’t give it one
- has it already been named?
- don’t invent new names for things
- (grr, hash vs map, map vs collect, fold vs reduce vs inject)
- don’t invent new names for things
- what is the context & category?
- how should it be used?
- how will people remember it?
- don’t make it:
- too long
- too similar to something else
- unpronouncable
- don’t make it:
Ideas:
currentLoggedInUser
vsme
- a shorter name is expected to be used more
- a longer name indicates you expect it to be used less
- but also that the distinction you make with the extra length is a distinction worth making
- if someone doesn’t have the name for our thing, they can’t use it
- a lot of our success is about the success for people to active use the things we create
- links
- “a lot of naming comes down to programmer readability, and
similarly to English. How do you go about naming for non-English
audiences?”
- comes down to context of audience again
- have to use words that the audience are likely to use
- if you know that your audience are haskell programmers, then
they will already know the words in Prelude
- not necessarily well enough to use in English, but well enough to use in Haskell
- when you first define a name, the comments are a good place to expand
- if you’re sitting with someone who is explaining something to you, the words they use to explain it are sometimes better names than the name itself
- often it doesn’t matter if you get it wrong first time
- but in a widely-used library, it’s hard to change
- the harder it is to change, the more effort you should put in to getting it right
- “is it sometimes useful to name things that you’re not using (in
pattern matching)?”
- yes, sometimes they’re useful purely for documentation rather than for use
- “do you abbreviate names?”
- it can depend
- “what about single-letter variable names? would you follow the
convention of the community, or break convention and be
description”
- single-letter variable names (particularly
l
) are confusing - some odd names can get written when extracting a tmp var to break down a large expr
- single-letter variable names (particularly
- @edwinbrady, St Andrews University
- where is st andrews?
- wonderful collection of error displays
- it’s pretty hard to write software that works
- famous bugs:
- heartbleed
- a bug so bad that a marketing department got involved
- nobody spotted it for a long time
- goto
- goto fail
Idris is a Pac-man Complete functional programming langauge with dependent types.
cabal update; cabal install idris
- https://github.com/edwinb/idris-demos
In this talk: Reasoning about side effects in Idris
- extra-functional correctness
- example: communication protocols
example: write the type, the editor automatically gives options for implementation:
vzipWith : (a -> b -> c) -> Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = []
vzipWith f (x :: xs) (y :: ys) = f x y :: vzipWith f xs ys
Since this is the only implementation, the editor can write this function for us from the type alone, including splitting the cases into separate pattern matches and making the recursive call.
(it’s hard to take notes on this to demonstrate the real interactivity offered…)
aside: with dependent types, you can put anything you want in a type and it can get very complicated.
Please don’t do that.
It’s very important that the types make sense.
The Eff
type family indicates side-effects:
vadd_check : Vect n Int -> Vect m Int ->
{ [EXCEPTION String] } Eff (Vect m Int)
read_vec : { [STDIO] } Eff (length ** Vect length Int)
do_vadd : { [STDIO, EXCEPTION String] } Eff ()
Note that we only allow STDIO
, not BALLISTICMISSILEIO
.
This gives a program which will add two user-supplied vectors of the same length.
Eff
is effectively a DSL for managing what you’re allowed to do and
proving that you don’t do anything forbidden.
- eg: open a file: are you allowed to read it?
- only if the open was successful
- types capture success and failure and
- like a Hoare triple
- precondition, command, postcondition
- CLOSED >- knock -> CLOSED >- open door -> OPEN >- close -> CLOSED
knock : { [DOOR Closed] } Eff ()
openDoor : { [DOOR Closed] ==> [DOOR Open] } Eff ()
closeDoor : { [DOOR Open] ==> [DOOR CLosed] } Eff ()
state transitions which handle failure:
data Jam = Opened | Jammed
openDoor : { [DOOR Closed] ==>
{isJammed}
[DOOR (case isJammed of
Opened => Open
Jammed => Closed)] } Eff Jam
doorProg = { [STDIO, DOOR Closed] } Eff ()
doorProg = do knock
jam <- openDoor
case jam of
Opened => closeDoor
Jammed => do putStrLn "NNnnnnnnngh"
return ()
This is getting deeply nested. Is there a better way to handle failure?
Exceptions? Problem with exceptions is that you don’t know what state you were in when you threw; so you don’t know what cleanup you need to do.
Inspired by perl’s || die
syntax:
doorProg = { [STDIO, DOOR Closed] } Eff ()
doorProg = do knock
Opened <- openDoor | Jammed => return ()
closeDoor
If you have a precise enough type system, you have a lot of help.
Similar to session types. See http://eb.host.cs.st-andrews.ac.uk/drafts/protocols-spls0614.pdf
data Actions : Type where
DoSend : (dest : p) -> (pkt : Type) ->
(const: pkt -> Actions) -> Actions
DoRecv : (src : p) -> (pkt : Type) ->
(const: pkt -> Actions) -> Actions
DoRec : Inf Actions -> Actions
End : Actions
echo : Protocol ['C, 'S] ()
echo = do 'C ==> 'S | String
'S ==> 'C | String
Done
Refine the types to ensure it really is an echo server:
echo : Protocol ['C, 'S] ()
echo = do msg <- 'C ==> 'S | String
'S ==> 'C | Literal msg
Done
- “could we have a maybe effect to handle the door failure case?”
- there are loads of ways to deal with this
- the key thing is that the type says what state you’re in at each point
- there is an EXCEPTION effect but you lose the information about the state you’re in
- @thattommyhall
- Second reference in this conf to Mac Lane’s “Categories for the Working Mathematician”
- which is of course the start of the 20th century
- David Hilbert vs Kurt Gödel
- also Russell and Whitehead
- Principia Mathematica
- propositional calculus
- example logical proof
- ∃v¬φ (Assumption)
- ¬φ (Assumption)
- ∀vφ (Assumption)
- φ (Universal Quantifier Elimination, 3)
- (φ&¬φ) (Tautology,2,4)
- ∀vφ -> (φ&¬φ) (CP,5)
- ¬∀vφ (Tautology,6)
- ¬∀vφ (EH,7)
- Hofstadter, Gödel, Escher, Bach
- more programmer-friendly syntax with nesting
- e ::= v | λ/v/./e/ | (e e)
- Lisp syntax is inspired by lambda calculus:
(fn [x] (+ x 1))
(\x. E) M => E{M/x}
((\x. x + 1) 2) => 2+1 => 3
((fn [x] (+ x 1)) 2)
(\x.\y (x y) (y w))
=> (\y ((y w) y))
but the name collision here changes the meaning. We didn’t mean for
the y inside (λy. ..)
to refer to the same thing as in (y w)
((fn [x]
(fn [y]
(x y)))
(y w))
=> (in a call-by-name version of clojure)
(fn [y]
((y w) y))
- applicative order (call-by-value)
- normal order (call-by-name)
- End
- Inc n m (add 1 to register n, goto m)
- DecB n m p (subtract 1 from reg n, if x=0 goto m else goto p)
- Robin Milner is responsible for a lot of this work
- Pi Calculus syntax
- parallel processes communicating over channels
- writes block until corresponding read, reads block until corresponding write
- non-determinism
- if two writers write to same channel, nondeterministic which message a reader gets
- wiki http://en.wikipedia.org/wiki/Dining_philosophers_problem
- solved using Pict
- syntax + semantics
- CSP for clojure
- one difference from pi calculus: channels are allowed a buffer
- to remove blocking from writes and reads
- though the buffer is always bounded
- (one issue with the actor model is that the mailbox is unbounded)
- channels are first-class values which can be sent down channels
- mobile pict
- join calculus
- LVars (Lindsey Kuper)
- RICON
- Spi-calculus (crypto)
- biology (simulations of cells using pi calculus)
- modelling telephony protocols
- “the world just does not suffer significantly from the kind of problem that our research was originally intended to solve” – Hoare
- INRIA JoCaml
- Microsoft Research (c-omega?)
- F♯ tryjoinads
- Me (maybe…)
- David Pollak has been talking about making clojure core.async distributed
- not rocket science
- some of this stuff may be the future
- it’s also interesting
- John Hughes @rjmh
- why is testing hard?
- say you have n features:
- O(n) tests
- but some bugs appear only when features used together:
- pairs of features
- O(n^2) tests
- etc
- you can’t test everything
- where do you stop?
- 1999 - Koen Claessen and John Hughes, for HAskell
- 2006 - Quviq founded marketing Erlang version
- many extensions
- finding deep bugs for ericsson, volvo, basho, etc
Binary search trees:
to_list(leaf) ->
[];
to_list(to_list({node,L,X,R}) ->
to_list(L) ++ [X] ++ to_list(R).
member(_,leaf) ->
false;
member(X,{node,L,Y,R}) ->
if X==Y ->
true;
X<Y ->
member(X,L);
X>Y ->
member(X,R)
end.
prop_member() ->
?FORALL({X,T},{nat(),tree()},
member(X,T) == lists:member(X,to_list(T))).
?FORALL
asserts a property is true for all natural numbers, and
trees. We then generate test cases fitting this pattern
Every time I run the test, I get a different, potentially large case; but shrinking reduces it down to a much smaller and more manageable case.
If you have a failing test case, you may want to keep the failing testcase (esp if it appears less frequently), to ensure that you really have fixed the bug.
Enter: QuickCheck CI for testing github projects
- demo with https://github.com/rjmh/trees
- testing
- coverage
- often directs your attention straight to where the bug is
- coverage data is per-test
- the bug can only be in the green lines, since the red lines weren’t executed
- failing tests are remembered between builds
- build doesn’t go green until all previously failing cases are passing
- test the process registry
- register(Name,Pid)
- unregister(Name)
- spawn() - to create pids for test data
- what’s different?
- these functions change the state of the registry
- we won’t find bugs
- instead: reverse engineer preconditions
- test case: sequences of API calls
- for each call, model how the state has changed
- can use postconditions on each call
- when a test fails, shrink the sequence of calls
specification of register:
register_pre(S) ->
S#state.pids /= [].
register_args(S) ->
[name(),elements(S#state.pids)].
regsiter(Name,Pid) ->
erlang:register(Name,Pid).
register_next(S,_,[Name,Pid]) ->
S#state{regs=S# ...
- can conveniently specify the indended behaviour of stateful systems
- testing in practice involves
- reverse engineering the actual specification
- documentation is always wrong
- specification is always incomplete
- finding and correcting bugs
- reverse engineering the actual specification
- 3000 pages of spec
- 20,000 lines of QuickCheck, testing
- 1,000,000 LoC, 6 suppliers
- 200 problems, of which
- 100 problems in the standard itself
does this approach scale? yes!
- QuickCheck finds one bug at a time
- not much point running quickcheck again
- it will just shrink down to the same testcase
- fix the bug
- only then do you find the next bug
- not much point running quickcheck again
- what if fixing the bugs aren’t so easy?
- in the CAN bus case, we didn’t have the source code
- would send a bug report, and get a new version maybe 3 weeks later
- in practice:
- unfix the spec to allow the bug
- if we’ve modelled the bug we’ve found correctly, quickcheck will now find the next bug
- feels icky
- “More bugs”
- compute “bug signatures” from failures already found
- focus testing effor on areas not yet known to be buggy
- in practice:
- move from finding 1 bug, we’d find 20+ bugs in a single run of QuickCheck
- “is quickcheck-ci erlang only?”
- you write the properties in erlang
- there’s a C project we’re testing using it
- “will you support quickcheck for haskell?”
- that might be fun
- there’s nothing happening along those lines
- Andreas Stefik
- professor at university nevada las vegas
- research how proglangs impact human beings
- mostly work on Quorum these days
- Chad Fowler
- used to be a ruby guy
- usually use:
- ruby
- haskell
- clojure
- scala
- go
- node
- erlang
- objective c
- Jessica Kerr
- 15 years of java
- scala then clojure
- I play with ruby in order to speak at ruby conferences
- Steve Vinoski
- work for basho
- lately a lot in C (working on Erlang VM)
- long time C++ (started 1988)
- whole bunch of languages
- Viktor Klang
- work for TypeSafe
- mostly scala
- bit of java, bash, applescript
- jess: I think Stefik settled that yesterday :)
- ie that static typing improves productivity at some point
between half a year and 3 years
- although we have no data in that interval
- use of generics
- ie that static typing improves productivity at some point
between half a year and 3 years
- Chad: what does static typing mean?
- stefik: can mean:
- interesting haskell syntax
- generics with crazy wildcards
- most developers think it means “I have types on my method
decls”
- this property is a strong predictor (for what?)
- implementing generics have a tenfold decrease in productivity
- using generics gives a slight bump
- stefik: can mean:
- joeerl: currently we use a lot of C and a lot of erlang
- in my experience, 50x more errors in the C than on the erlang
- I’d like to see the division between GC (java) and non-GC (C) langs
- microsoft became rich by shipping buggy software early
- will static types deliver
- Stefik:
- I didn’t make a sweeping statement
- we ran formal peer-reviewed RCTs
- with human users
- we published the data and the methodology
- it didn’t say “static typing is always better”
- joeerl: what langs were used in that study
- Stefik: we did 8 studies
- joeerl: can you infer results from one language to another
- jess: there’s a lot more difference between C and erlang
- Stefik: we didn’t test C in the syntax studies
- there might be any number of causes of bugs in a language
- memory leaks
- if you’re saying there’s many things besides type systems that can affect your productivity, that’s true
- Chad: there’s also strong vs weak typing
- steve: types don’t save me across the network
- types don’t save me from locking and concurrency
- other things affect productivity and correctness
- Stefik: that’s true of course
- we don’t know much about these other things that affect productivity
- there aren’t many studies - theres one from UT Austin which showed eightfold increase in bugs with threads+locks vs STM
- this isn’t surprising but it’s a huge productivity impact
- but we’ve also shown that static vs dynamic typing is a productivity impact
- rvirding: how much is the typing, and how much is being
forced to write down the type?
- stefik: the physical manifestation (the fact that it’s in
the file) matters
- we’ve tested this
- the type needs to be in the method declaration
- not universal across programmers:
- 92% of novices screw up types in method decls
- people with more experience get a productivity boost
- stefik: the physical manifestation (the fact that it’s in
the file) matters
- Chad: a little over 10 years ago, there was a big movement for
dynamic movement against static typing
- ruby was the big upcoming
- strong dynamic typing vs weak static typing
- now, we’re all pushing for static typing and category theory
- projects still suck just as bad, software still sucks just as bad
- viktor: there’s a difference between structural and nominal types
- particularly with concurrency
- you don’t need to ship the defintion of the types around with structural types
- Stefik: my wish is that people start collecting more and more
evidence on this stuff
- pendulum swings are good
- but if we hone down what works and what doesn’t, when we define a new language, we can read 100 papers on productivity
- jess: everybody can’t read every paper! (reference to @aphyr
debate recently)
- steve: you just don’t want to reinvent the wheel
- jess: how do you find that previous work?
- steve: leslie lamport’s still around! you can ask him
- reid draper gave a talk here and included a thing on RPC vs
messaging
- had good mentions of RFCs from 1970s (see above)
- we have google now
- you don’t have to sit in the library
- reid draper gave a talk here and included a thing on RPC vs
messaging
- jess: to google it you have to have the name already!
- stefik: I’m talking about people defining a new language
- you simply can’t read all the papers
- the number of papers published in a month is probably more than anyone can read in a lifetime
- viktor: getting up to speed with where the researcher was when they wrote the paper is a significant effort
- steve: take all the good papers and put them on stackoverflow
- people would read them then!
- stefik: there would be lawsuits
- it’s horribly stupid
- institutions are judged not on quality of papers, but number they produce
- when they are produced
- the publishing company gets paid for the research
- they contribute nothing to any part of the process
- they provide an “upload” button and you click “submit”
- jess: we’ve discovered this thing called “open source”
- francesco: viktor, how do you approach the work you’re doing, and
how do you combine new ideas with old ideas?
- viktor: what helps me the most is to start from scratch
- esp for distributed systems, don’t start with computers
- first think “how would we solve this problem with people”?
- the more I learn about fundamental problems, the more I see them everywhere
- not just in computing
- queueing theory
- a lot of people never peer through the base level
- chad: do you think anyone knows what the list of fundamentals to learn is?
- stefik: there are two groups of scholars in the US trying to
adjust the compsci education
- Exploring Computer Science (ECS) & Computer Science Principles
- jess: fundamentals are different per person
- say someone was doing a talk with queueing theory?
- how many people have a local group where such a presentation would be accepted?
- jeremy gibbons: can you get hold of ACM papers?
- does the (in general) paywall stop you accessing the paywall?
- do you find ways round it? do you google the title and find the copy on the author’s homepage?
- philandstuff: the biggest barrier for me is pdf as a format needs to die
- jess: there are other sources from papers – eg open source
- how do you find how other people solved a problem before?
- audience: papers have keywords and get indexed
- search by keywords
- jess: I have written programs to write tests and I didn’t know
what I was doing was called property-based testing
- how can we educate each other?
- viktor: more negative experience reports
- why don’t things work?
- joeerl: Chad’s slides showed 60% of software projects fail (for
some value of fail)
- I’ve been close to some of those
- the employer never admitted the failure
- it was a success, but due to market changes it had been down-prioritized
- Chad: host of the interesting failures aren’t technical
- steve: can you make “failmesh”?
- francesco: QCon london once had a “majestic failures” track
- C.A.R Hoare and the “billion dollar mistake” - the null pointer
- francesco: QCon london once had a “majestic failures” track
- viktor: I would say a mars lander feet/metres problem was a
technical error
- chad: this is the airport security approach
- one person had a bomb in their shoe, so everyone takes of their shoes
- jess: a type system like F♯’s with units would have fixed that
- people problem behind our couch database issue
- chad: this is the airport security approach
- rjmh: from CHAOS report, among successful projects, less than
half of the requirements got implemented
- are we doing better nowadays?
- maybe agile methodology means we never discover the half of the requirements we failed to implement
- stefik: I’ve heard the CHAOS report cited a million times
- it’s a for-profit group
- surveys to managers but not the technical developers
- measuring failures is pretty hard
- chad: everyone knows the standish thing is bullshitty
- the one I showed is 2013
- it seems worth showing to me is that it’s a confirmation of what we all already know
- stefik: I show it to my class too, I’m just reminding that it’s bullshitty
- rvirding: how many failures are because the project itself was stupid?
- stefik: that’s exactly it!
- audience: in 2012 RBS had a major failure
- people weren’t able to withdraw cash for a weekend
- was that a technical failure or a process failure?
- steve: you can look at that as a people failure
- they didn’t have a process in place to verify code
- in that case, it’s similar: who put these processes together, did they practice them?
- you’re suppose to practice doing upgrades, having outages
- if you don’t do it, how can you have confidence?
- steve: you can look at that as a people failure
- jess: both etsy and netflix at goto amsterdam:
- dealing with failure:
- don’t blame anyone
- you can fix it at any of 18 levels
- code
- testing
- dealing with failure:
- reiddraper: we have poor role models
- eg the 25-year-old male programmer who writes an iphone app
- not the person whos been here 40 years and learns slowly
- how can we fix that?
- jess: pictures of joeerl in a purple velvet jacket