Skip to content

Instantly share code, notes, and snippets.

@philandstuff
Last active February 19, 2024 05:12
Show Gist options
  • Star 85 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save philandstuff/299cda371c7e74b03f18 to your computer and use it in GitHub Desktop.
Save philandstuff/299cda371c7e74b03f18 to your computer and use it in GitHub Desktop.
Euroclojure 2014

EuroClojure 2014, Krakow

Fergal Byrne, Clortex: Machine Intelligence based on Jeff Hawkins’ HTM Theory

  • @fergbyrne
  • HTM = Hierarchical Temporal Memory
  • Slides

big data

  • big data is like teenage sex
    • noone knows how to do it
    • everyone thinks everyone else is doing it
    • so everyone claims to be doing
    • (Dan Ariely)

machine learning is important

  • people don’t trust other people
    • they have their own agendas
  • so they place too much trust in machines

asimov’s take

  • we gain knowledge faster than we gain wisdom
    • applies to human knowledge
    • applies to data: gathering data is easy, drawing conclusions is not

a problem in neuroscience

  • rate of papers published is growing exponentially
  • 2013: 1 every 32 minutes
  • 2014 so far: 1 every 17 minutes

can AI learn from neuroscience?

Jeff Hawkins’ goals in HTM

  • Study the neocortex and establish its principles
  • open sourced NuPIC in 2013

neocortex

  • the wrinkly part at the surface of the brain
    • grey matter: processing
    • white matter: wiring
  • about 2mm thick, 10cm^2 in area
  • 30-50MM neurons
  • 1G connections
  • hierarchical
  • uniform
    • ie all looks physically the same
    • all regions have the same algorithm

6 key principles

on-line learning from streaming data

  • up to 10 million senses feed the brain
  • we don’t (can’t) store this data
  • we build models from live data
  • models constantly updated

hierarchy jof regions

  • sensory data enters at the bottom
  • models are built in every region
  • things change more slowly as you go up
  • hierarchy enables sequences of sequences
    • seq of waves
    • seq of phonemes
    • seq of words
    • seq of sentences
  • hierarchy works upwards and downwards

sequence memory

  • all sensory data involves time
  • sequence memory allows predictions
  • structure in data elaborated over time
  • sequences can be c

sparse distributed representations

  • in each region, many neruons, few active
  • SDRs represent spatial patterns
  • fault-tolerant, semantic ops, high-capacity
  • key to understanding & building intelligent systems

all regions are both sensory and motor

  • behaviour provides context for sensory data
  • structure in model navigated via behaviour

attention

  • use attention to manage the neocortex
  • planning and previsualisation
  • whole subhierarchies can be switched on and off

layers of neocortex

  • from molecular upwards
  • around 5 or 6

neurons

  • distral dendrites detect coincidence of incoming activity from neighbouring cells
  • you don’t just see what you’re seeing now, you predict what you’re going to see next
  • (reality is much more complicated, but this algorithm is sufficient to explain a lot)

clortex

background: numenta’s nupic

  • in dev since 2005
  • partially implements HTM/CLA
  • python/c++
  • open source

strengths

  • skilled dev team
  • eat their own dog food (grok uses nupic)
  • operates on subset of HTM/CLA principles
  • tunable using swarming on your data
  • works well on streaming scalar data (eg machine-generated)
  • great community – http://numenta.org

limitations

  • codebase has evolved as theory has developed
  • difficult/scary to rewrite for flexibility
  • OO with large, coupled, classes (~1500 LoC per class)
  • need to swarm to find parameters, no real-time control
  • not easy to extend beyonnd streaming scalar use case

clortex requirements

  • directly analogous to HTM/CLA theory
  • transparenntly understandable source code
    • a neuroscientist should be able to read & review code
  • directly observable data
  • sufficiently performant
  • useful metrics
  • appropriate platform
    • portability
    • scalability

architectural simplicity

  • first role: be useful!
  • best software is that which is not needed at all
  • human comprehension is king
    • if people can’t understand your code, your code is not finished
    • unit tests are not sufficient in themselves
  • machine sympathy is queen
  • software is a process of R&D
  • software development is challenging & intellectual
    • more science than engineering
      • engineering: you have a good model already, you just have to plug in the particular parameters
      • science: there are a bunch of unknowns which you have to learn & understand

#1: Just use data!

  • maps, vectors, sets
  • all done in a one-page datomic schema

#2: Clojure & its ecosystem

  • clojure data not domain objects

#3: russ miles’ life preserver

  • everything either “core” or “integration”
  • core: a datomic database for the neocortex
  • core: each “patch” of neurons is a graph
  • integration: algorithms, encoders, classifiers, SDRs

key clj libs & tools

  • datomic (+adi)
  • quil/processing
  • incanter
  • lein-midje-doc for literate documentation
  • hoplon-reveal-js for presentations
  • lighttable

review

  • Big Data isn’t just Machine Intelligence problem
  • HTM is exciting

links

Logan Campbell, Clojure at a Post OFfice

history:

  • was at clojure user group
  • a guy turns up and says he’s hiring a team of clojure developers
  • he was at Australia Post
    • a million lines of Java worked on by a team in India
    • wanted to bring it back in-house

project: digital mailbox

  • big companies spend a lot of money sending out bills & junk mail
  • product to seamlessly replace that workflow
  • switch from physical mail to cheaper model
  • consumer can sign up to receive water bill online
  • I was brought on as the “clojure expert”
    • (I’d been playing with it for a couple of years)
  • drama:
    • the people they could hire:
      • really experienced java devs
      • keen on FP
    • they said as they were hiring “you might be doing clojure or you might be doing scala”
    • first few people were scala fans
    • scala v clojure battles
      • “we need static typing”
      • “we need OO for domain modelling”
      • “clojure is slow” (?)
      • “what framework do you use?”
  • “we need static typing? okay, we’ll use core.typed”
  • domain modelling:
    • when people are used to domain modelling in OO, telling them to just use maps feels like a cop-out
    • records + protocols kind of feel like classes
    • wasn’t til I showed them code I’d written and comparing it with their code that they realized that you can just use maps
  • online scala course
    • we did it as a team
    • I also did the exercises in clojure
    • did one exercise three different ways in clojure
      • conditional
      • match
      • stream processing
    • showed them my solutions
      • they already understood the problems because they’d solved them themselves
  • clojure performance was a surprise, because I’d come from ruby (!)
    • clojure is fast
    • there was an underlying feeling that “we need scala for performance”
  • I’m a consultant, so was happy for the team to make the language decisions
    • “if you’re keen on scala, let’s find out a way to pitch it to management”
  • web stack: kept hearing “async async async”
    • felt like premature optimization
    • but still we used http-kit
      • benchmark started to allay fears that clojure was slow

feature: make a payment on a bill

  • not necessarily a full payment
    POST /bills/:bill-id/payments
    Session: user-id
    Post Data: amount
  • GET credit card token for user
    • POST request to payment gateway
  • GET how much left to be paid
  • if payment succeeds: display amount remaining
  • if payment fails: display error

candidates solutions

  • synchronous promises
  • promise monad
  • lamina
  • etc etc

solution 0: synchronous

  • http-kit’s requests return a promise
    • just @deref the promise (blocks the thread)

solution 1.1: promise monad

  • do is aware of promises
    • doesn’t block thread, but waits for promise to be executed before continuing
    • felt natural way to write with promises
    • but incorrect: too much waiting, no concurrency

solution 1.2: promise monad let/do

  • let to define promises
    • do to pseudo-block on them
    • introduces correctness but reduces readability

solution 1.3: let/do/do

  • okay, let’s step away from monads

solution 2: (?)

solution 3: raw promises

  • when to explicitly wait for a particular promise

solution 4: raw callbacks

  • not viable
  • would have just written a hacky little promise library

solution 5: core.async:

  • great! same shape as synchronous code, but correct concurrency

solution 6: lamina

  • didn’t feel totally suited to the situation

solution 7: meltdown (LMAX disruptor based)

  • not appropriate

solution 8: pulsar promises

  • looks exactly the same as the synchronous code, except for one character
  • pulsar rearranges your code at the bytecode level
    • uses JVM agents (normally used for tracing/debugging)
  • pass a fn to one of pulsar’s functions
    • turns synchronous code to async code

solution 9: pulsar actors

  • not appropriate

winners

  • 0: synchronous
  • 5: core.async
  • 8: pulsar

scala solution, for comparison

  • scala futures (basically promises)
  • all monadic
  • I don’t understand it entirely
  • concise
  • battle of the benchmarks, fastest first
    • pulsar-async
    • pulsar-sync
    • core-async
    • raw-callback
    • scala-play-future (significantly less than others)

CQRS (command-query responsibility segregation)

  • want fast reads
  • reduce number of queries
  • don’t want to have to update write code every time we add a new reader

structure

  • service A → cassandra → service B
  • custom triggers in cassandra in clojure (just drop in the .jar!)
    • publish to rabbitmq
    • notify index maintainer
    • write index to cassandra
    • service B reads from cassandra

cassandra triggers

  • can just throw the clojure jar in there
  • everything is byte buffers
    • you need to know the type of all the fields out-of-band
    • not self-describing data at all

microservices

  • I thought we would have a user service and a provider service and a mail service
    • but this gets tricky when you want data about users and providers
  • you need to split things much more fine grained
  • user service →
    • authentication
    • multi-factor auth
    • authorization
    • user profile
    • password reset
      • does it belong in user profile?
      • there’s a bit of workflow here
        • send out email
        • get user to click link
        • enough to warrant its own service
  • drama: needed to talk to systems team to deploy
    • I did things badly
    • I didn’t get anything into production in my 6 months there
    • systems team: we need monitoring and config and stuff
      • if we’d had something early on which had gone through these barriers, we would have had much less stress
      • benchmarks end petty arguments

Q&A

can you share some experience with monitoring & resilience?

  • appdynamics
  • classnames are expected to be java-style class names
    • clojure ones are close enough
  • clj-metrics to expose more high-level metrics
    • requests/second from ring
    • number of bills paid
    • appdynamics could pick it up from jmx
  • nomad for configuration

with http-kit+core.async, what happens when server dies and there’s loads of threads?

  • bottleneck was amount of memory
  • when server runs out, it slows down a lot
  • way to get around that is to monitor resources on your machine and ideally have autoscaling

were the scala guys finally writing clojure in the end?

  • we have one person still hardcore for scala, but sees the merits of clojure
  • a few who did the online scala courses are clojure folks now
  • people who come from the java world of static typing feel they need that
  • but now they’ve written code that actually works, they’re more comfortable with that now

Tom Hall, Escaping DSL Hell by having parens all the way down

  • @thattommyhall

DSLs

  • languages made for specific purposes
    • config mgmt
    • science
    • learning
  • distinction between:
    • internal DSLs: embedded in another language
    • external DSLs: implemented in another language

problems with puppet

  • zen of python:
    • namespaces are a honking great idea, let’s do more of them!

puppet namespaces

  • Exec[‘install’] in two different modules will result in a naming collision
  • fail :(
  • end up with Exec[‘tom::install’] but this is a hack

iteration

  • file type lets you pass in an array
  • nagios_host doesn’t
  • iteration is responsibility of type, not language
    • as far as I know

but you need to know ruby anyway

  • if you want to extend puppet, you need ruby
  • if you need to know ruby, why do we bother with the puppet DSL in the first place?

experimental features: lambdas and iteration

  • any language where lambdas arrive late is not a good language

ansible

  • just YAML
    • oh wait, I might want to iterate
    • oh wait, I’ve got embedded ginger templates in my YAML strings
      • what’s the scope of names in my templates?

if you give people a “language” they will expect loops

  • maybe lambdas
  • probably namespaces
  • this has been done before

chef gets it right

  • it’s embedded in ruby
  • you get iteration and namespaces from ruby

teaching people to program

  • if you design a language:
    • you need a parser, which is hard
    • you need an interpreter/compiler, which is hard
  • if you embed it, you get that stuff for free

geomlab

  • minimal language for teaching
  • talks about pictures
  • intro to FP
  • gets you into recursion early on
  • man $ woman - “next to”
  • man & man - “on top of”
  • (man $ woman) $ tree = man $ (woman $ tree)
  • man $ (woman & tree) – scales nicely to get a nice aspect ratio
  • learn about operator precedence
  • de morgan’s laws
    • although not always held, due to scale
  • define functions
   define manrow(n) = manrow(n-1) $ man when n>1
                    ~ manrow(1) = man
  • builds up to an escher tiling
  • but once you’ve done that, where do we go?
    • only exists in this sim
    • if you want to extend it, you need java
    • “I’m really excited about FP now, but I’ve got nowhere to go”

what if we did it in clojurescript?

  • let’s use ‘below and ‘beside instead of $ and &
  • (below man woman)
  • (beside tree star)
  • http://cljsfiddle.net/fiddle/thattommyhall.geomlab.demo
  • let’s say I want to change man – what does it mean?
    • it’s implemented in the same sort of language
    • I can see there’s a url in there where I fetch an image from the internet
    • I know recursion, because I learned that from the geomlab exercises
    • I can extend the language itself

science languages

  • R
  • wolfram alpha
  • maple
  • matlab
  • these things just aren’t very good languages, even if they are good at their domain

another problem with DSLs

  • netlogo
  • If you’re based on applets, and Oracle drops applet support, you find you need to port your whole language to a new platform (in this case javascript)
  • again, reimplement in clojurescript?
    • anyone interested in hacking on this with me?

conclusion

  • you probably don’t need to make a new language
  • if you do it will probably be rubbish
    • at least for a while
  • think about power and reach
  • you should embed /deeply/ into clojure

links

Q&A

what makes a good first language?

  • clojure needs a better day 0 story
  • at some coder dojos where I’ve taught kids, some don’t even know about files and folders
    • so if you say “open a terminal, cd into a directory” you’ve lost them
      • and it’s not their fault

have you had any kids look at your examples here?

  • I’ve done the geomlab example
  • otherwise this is all a recent exploration
  • errors in cljsfiddle are not reported well
    • again problematic for day zero

Mathieu Gauthron, JVM-breakglass

troubleshooting a java application

  • debugger
    • only powerful when you can narrow down the problem to a series of breakpoints
    • when the problem is a race condition, it will change the nature of the problem you’re studying
  • log/print statements
    • you need to plan before compilation
    • when the problem is in production, it might be too late
  • jmx
    • again, you need to plan for it in advance
  • ad-hoc interactive mechanism

what is jvm-breakglass

  • open source
  • integrates with any jvm process
  • console onto a jvm process

main features

  • interactive prompt
  • see inside private members
  • call arbitrary methods
  • create new object instances
  • create new classes
  • monitor object state
  • no need to use clojure to develop the app

how does it work?

  • jvm-breakglass runs inside the JVM and starts an nrepl server
  • you can then connect using an nrepl client (eg lein)

how to use it?

  • add it to your maven dependencies
  • add an entry point (as a <bean> or in java code)
  • connect with lein repl :connect localhost:1112

demo (enterprise application)

  • tomcat JVM
  • employee/dept data structure
  • report generation
  • java/spring mvc webapp
  • jvm-breakglass
  • spring data
    • in XML, naturally

homepage

  • oh no! one of the reports isn’t working?
  • “list employees in london” is empty
    • but we know that employee Mick Jagger lives in london
    • what’s going on?

breakglass to the rescue

  • view environment:
    • current directory, System/getProperties
    • view conf directory
  • list all loaded Spring beans
  • instrospect into object private members
    • bean builtin fn
    • to-tree to do so recursively
  • view methods or fields for a given object
  • redefine a class
    • in this case, (proxy [Address] ["1 Mayfair", "SW1", "London"] (getCity [] "London")) to define the new version, overriding a method
    • (.setAddress (:Mick employees) address) to inject it into the live data

take a step back

  • remember what it’s like to be a java programmer?
  • working with jmx beans and suchlike to try to understand why production is down
  • this stuff looks like magic

Q: how do you convince production people to put nrepl server in place?

  • short answer: impossible
  • that’s not how you present it
  • either you do it sneakily (that’s bad), and only pull the trump card when the team is desparate
  • or you convince the team that it would be useful in the UAT environment, and “of course it’s never going to be used in production” -

Q: have you considered a high-level switch that would prevent you mutating anything in the host application?

  • don’t know how you’d be able to do that
  • have been thinking about it
  • maybe using clojail
  • kind of defeats the point

Q: have you tested this with a scala app?

  • haven’t tried
  • I’ve reverse-engineered the java bytecode, and it’s readable
  • as long as you know how it compiles, it seems reasonable

Q: you were using methods like get-obj and passing string name. how does breakglass know which object to get?

  • eg if you have multiple instances of Department, how does it know which department?
    • in Spring it’s a Spring bean which is named
    • if you’re not using Spring, what’s your entry point?
      • when you create your NreplServer to enable jvm-breakglass, you can add your entry points there
      • new NreplServer(port).put("department"),myObject);
      • static methods & fields can be used too

Gary Crawford, Using Clojure for Sentiment Analysis of the Twittersphere

stratified medicine

  • determine the best treatment for someone based on their genetic makeup to manage their chronic disease

sentiment analysis

  • Paper: “Twitter mood predicts the stock market”
    • predicted Dow Jones average through monitoring tweets
  • people who suffer chronic disease tend to be neurocompromised
    • what would normally be a minor illness can prove fatal
  • can we use twitter to predict spread of disease?

so we tried

  • score tweets for flu symptoms
  • the data science wasn’t very difficult
    • but scaling it was
  • 30 million geo-tagged tweets sent from UK
  • couldn’t scale, even with
    • HDFS/hadoop
    • mongo/aggregation
    • mongo/mapreduce
    • postgres

how can we do fast, real-time analytics of social media?

  • application: how do people feel about Scotland’s independence referendum?
  • data increases in value as we analyse it
    • tweets
    • analytically prepared data
    • analysis
    • insight
    • predictions
  • the raw data isn’t what you care about
  • don’t store the raw tweets, only store the analytically prepared data
  • stored in redis using ptaoussanis/carmine
    • it has great support for bitmaps

example

  • (car/setbit sentiment tweet-id 1)
  • (car/bitcount "SCOTLAND") – tells me how many tweets have mentioned Scotland
  • how many people in england are happy?
(wcar*
 (car/bitop "AND" "ENGLAND&JOVIALITY" "ENGLAND" "JOVIALITY")
 (car/expire "ENGLAND&JOVIALITY" 10) ;; don't keep the data longer than 10 seconds
 (car/bitcount "ENGLAND&JOVIALITY"))
  • further: “how many people in Scotland are tired or grumpy?”

getting the data in

  • adamwynne/twitter-api
  • you can specify you only want tweets from a certain geographical locality with a bounding box
    • but this is literally a rectangle
    • need it around Europe
  • LMAX-Exchange/disruptor to communicate
    • journaling
    • syncing
  • business logic

what sentiment?

  • this is hard!
  • “I’m loving #EuroClojure! :D”
  • Positive Affect: enthusiastic, active, alert
  • Negative Affect: subjective distress
  • actually two separate dimensions, not opposites
  • Watson et al, 1988
  • PANAS
  • then PANAS-x
  • then PANAS-t
    • accounts for bias on social media
    • outlines sanitisation
    • validate against 10 real events

sanitisation

where? reverse geocoding

  • don’t want to rely on external services
  • don’t want heavy IO
  • don’t want round trips to database
  • accuracy not too much of a concern
    • we already lose accuracy in interpreting the sentiment of the tweet
  • convert a map of the uk to colours:
    • look up geocode coords in map
    • check colour → get country code
  • problem: the world is a sphere
    • projecting a sphere onto a rectangle
  • prior art in d3.js
  • use JavaFX to exploit it

when?

  • there’s a lot of seconds in a day
  • and even more seconds in a year
  • really not interested in seconds anyway
  • want to group tweets by minute
  • and also group by hour
  • and also group by day, and month, and year

why?

  • why are we doing this?
  • online social media are surveillance
  • the line between public and private is becoming blurred
  • if we don’t need data, we shouldn’t collect it
    • in this example:
      • we’re never more granular than country
      • we’re never more granular than overall sentiment
      • we’re never more granular than minute
    • hopefully this is enough to prevent anyone being identified
  • Datensparsamkeit

Q: have you used Storm for this?

  • no

Q: any preliminary results on the Scotland referendum analysis?

  • I’ve had more luck with tech than data science?

Q: which way should we vote?

  • haha

Q: how do you verify your results?

  • it’s very crude at the moment?

Paul Ingles, Multi-armed Bandit Optimisation in Clojure

  • @pingles

problem statement

  • product optimisation cycles are long, complex, and inefficient
  • the multi-armed bandit model shows lots of things we’re getting wrong
  • eg: online newspapers
    • fundamentally human-led, editorially-led
  • people behave irrationally
  • Dan Ariely & Daniel Kahnemann
  • (@philandstuff suggestion: Stuart Sutherland, Irrationality)
  • economist subscription options
    1. online $59
    2. print $125
    3. print & online $125
    4. the ridiculousness of option 2. makes option 3. seem more reasonable
  • need machines to optimise at scale; but need humans to provide stuff only they can
  • running RCTs to optimise sites
    • doing so on a continuing basis
    • measuring big effects work with small numbers of participants
    • but measuring small effects requires ever larger numbers
    • to the extent that you can only run ~12 experiments a year
    • which is not really good enough

Bandit strategies can help

  • a product for procrastinators by a procrastinator
  • Product: Notflix!
    • video website
    • http://notflix.herokuapp.com/
    • shows 3 different videos
    • show good videos at top of page, and less good at bottom
    • show best possible thumbnail for each video
  • optimising with multi-armed bandits
    • optimising order and thumbnails

multi-armed bandit problem

  • slot machine = one-armed bandit
  • problem: you have a bunch of money you want to “invest” in a casino
    • you have a number of different machines to play
    • each machine has a different probability of reward
    • you don’t know what that probability is up front
  • need to balance “exploration” and “exploitation”
    • ie learning about the world vs using that knowledge to maximise income
    • analogy: trying new foods out vs sticking to what you like

bandit model

  • number of arms {1, 2, …, K }
  • number of trials: 1, 2, …, T
  • rewards: {0,1}
  • K-headlines
    • options of different text
  • K-buttons
    • options of button text, colour, etc
  • K-pages
    • whole page redesigns
  • explore this space with notflix

bandit strategy

;; choose which arm to pull
(defn select-arm [arms]
  ...)

;; update arm with feedback
(defn pulled [arm]
  ...)
(defn reward [arm x]
  ...)

(defrecord Arm [name pulls value])

ε-greedy

  • “hello world” algorithm
  • generally exploit
  • ε (epsilon) is the rate of exploration
  • eg if ε = 0.1, your strategy is:
    • with probability 10%, try a random arm with equal probability
    • with probability 90%, try the best arm based on current knowledge
  • if ε = 0, always exploit; if ε = 1, always explore
  • example with bernoulli-bandit
(bernoulli-bandit {:arm1 0.1 :arm2 0.1 :arm3 0.1 :arm4 0.1 :arm5 0.9})
  • with ε=0.2, you converge faster on the best arm
  • but ε=0.1, you exploit it more when you find it
  • once you’ve found the best arm, you should be able to double down
    • ie explore more at the beginning (when you have least knowledge) and less at the end
    • lots of extensions to ε-greedy to factor things like this in

Thompson sampling

  • Arm model
    • Θ_k: Arm k’s hidden true probability of reward (in range [0,1])
    • can build a distribution for Θ_k based on current knowledge
    • small number of pulls means wide distribution; large number means narrow distribution
    • captures uncertainty in value of Θ_k
  • each iteration, take a random sample from each distribution, take the largest sample
    • algorithm naturally balances exploration/exploitation trade-off
    • the more it learns, the narrower the distributions get, and so the more likely it is to choose an arm with a higher expected value
  • incanter example
  • Thompson-sampling example with same Bernoulli-bandit from above
    • compared with ε-greedy, explores much more much earlier, and exploits much more later on
    • considered optimal convergence
  • we can use it to rank things (not just select)
    • take a sample from each arm distribution, then order arms by that value
    • in notflix, can use for ordering the videos we show

applied to notflix

  • video rank bandit
  • for each video, a thumbnail bandit
  • at the end, the best video should be at the top
    • and each video should show the best thumbnail

results

  • videos, worst to best
    • “hero of the coconut pain”
    • “100 Danes eat 1000 chillies”
    • “3 year-old with a portal gun”
  • thumbnail bandit data
  • “we built a fictional but amazing product”

links

Q: this model assume bandits have same probability through time

  • can it readapt?
  • Thompson sampling does adapt
    • it won’t change back as quickly

Q: isn’t there an interaction between the two bandits?

  • if the thumbnail is crappy, they might not click the video
  • made an assumption about this
  • in general, if you leave it running over time and let the evidence build, it should be fine in the long run
  • but that is definitely a flaw

Tommi Reiman, Schema and Swagger to improve your web APIs

super simple web api in clojure

  • just using compojure
  • “sausage” as example data
  • PUT /foo/sausage/:id
  • example:
    • in Java: immutable value object
    • in Scala: case class
    • in Clojure:
      • free-form map?
      • constructor fn with bunch of validation?
      • prismatic/schema!

prismatic schema

  • define structure of sausage
  • then call s/validate to validate
  • schema can define functions
(s/defn get-sausage :- (s/maybe Sausage) [id :- Long]
  (@sausages id))

(s/defn ^:always-validate get-sausage2 :- Sausage [id :- Long]
  (@sausages id))

schema coercion

(defmodel Pizza {:id Long
                 :name String
                 :price Double
                 :hot Boolean
                 (s/optional-key :description) String
                 :toppings #{(s/enum :cheese :olives :ham :pepperoni :habanero)}})
  • allows slurping JSON data, but imposing extra types
  • eg above we can slurp toppings from a JSON array into a Clojure set rather than a vector

double schema

  • loose schema for first input
    • (def Customer {...})
  • tighter schema for validated input
    • (def ValidCustomer (merge Customer {...}))

schema selectors

  • accept but remove unrecognised params with select-schema

generative schema

contribs

  • sfx/schema-contrib
  • cddr/integrity

swagger

  • a specification for describing, producing, consuming, visualising RESTful web services
  • https://helloreverb.com/developers/swagger
  • existing adapters
  • clojure options:
    • octohipster
    • swag
    • ring-swagger
      • compojure-api
      • fnhouse-swagger
  • endpoint definitions in JSON
  • data models as a JSON Schema
  • swagger UI
    • visualises the API
  • code gen
    • no clojure support yet (anyone?)
  • swagger-socket
    • run it all on top of websockets

ring-swagger

  • https://github.com/metosin/ring-swagger
  • JSON-Schema has some dates
    • but prismatic/schema will never support dates, as it’s more generic
  • higher level abstractions on top of swagger, but nothing for the web developer

compojure-api

  • an extendable web api lib on top of compojure
  • macros & middleware with good defaults
  • schema-based models & coercion
  • GET* macro to define input and output schemas

fnhouse-swagger

  • prismatic/fnhouse
    • launched at clojure/west
  • defnk with metadata → annotated handler
  • fnhouse-swagger
    • metosin/fnhouse-swagger

summary

  • schema is an awesome tool
  • describe, validate, coerce your data
  • building on top of ring-swagger
    • compojure-api → declarative web apis
    • fn-swagger → meta-data done right
    • or do your own!
  • kekkonen.io
    • CQRS-lib

Renzo Borgatti, The Compiler, the Runtime and other interesting beasts from the clojure codebase

an amazing growth:

  • mar 2006: first commit
  • oct 2006: 30k loc (7 month old)
  • oct 2007: clojure announced!
  • oct 2008: invited to Lisp50 to celebrate 50 years of lisp
  • May 2009: 1.0 + book!
  • now: almost 90k loc

initial milestones

  • apr 06: lisp2java sources
  • may 06: boot.clj appears
  • may 06: STM first cut
  • june 06: first persistent data structure
  • sep 06: java2java sources
  • aug 07: java2bytecode started
  • right after: almost all the rest: refs, lockingtx

drew on lots of sources of knowledge

  • collection of papers

high-level view:

  • (def lister (fn [& args] args))
  • read → analyse → emit/compile → compile
  • although the lines between the stages get blurred at times

reader

  • takes stream, returns data structures
  • PersistentList, Symbol, etc

analyser

  • input: data structure
  • output: exprs
    • DefExpr
      • Var
      • FnExpr
        • Sym
        • PersistentList
          • FnMethod
            • LocalBinding(Sym(“args”)),
            • BodyExpr
              • PersistentVector
              • LocalBindingExpr

Emission

  • bytecode generation for Exprs
  • prerequisite for evaluation
  • emit() method in Expr interface
  • Notable exception: called over ??

Evaluation

  • transform Exprs into their “usable form”
  • eg
    • new object
    • a var
    • namespace
  • FnExpr is just getCompiledClass().newInstance

Compilation

  • Usually coordination for emit
  • Compiler.compile namespace -> file

Emit

  • input: Exprs
  • output: bytecode

monsters!

RT

  • this is how the RT class gets initialised: the first time it gets referenced:
final static private Var REQUIRE = RT.var("clojure.core", "require");
  • simply referring to it here causes the static initializers to run
  • RT has a lot of behaviour in static initializers
    • inside it is the doInit(); call
      • which loads all of clojure.core
    • all just from referring to RT in some otherwise unrelated class!

Compiler

  • inner classes for each Expr type

LispReader

  • inner classes for each token you might encounter
  • <clinit>
    • sets up reader macros
      • macros and dispatchMacros (latter for #{ #( #_ #^ etc)

analyze()

  • not a class, but a family of methods
    • analyzeSeq
    • new ConstantExpr
    • MapExpr.parse
  • FnExpr.parse
    • invokes the compiling phase during parsing phase

emission

  • ASM lib used to generate bytecode
  • FnExpr.emitMethods()
    • generate a method for each of the arities of the function

other beasts

  • LockingTransaction and Ref

DynamicClassLoader

  • clojure.lang.DynamicClassLoader.findClass(String)
    • RT.classForName()
    • Compiler$HostExpr.maybeClass()
  • Class.forName() goes up the hierarchy of classloaders and asks each what they know
    • an instance of DynamicClassloader is created for each namespace
      • and also for each form
    • (this is true for the bootstrap phase; not always true eg in AOT (ahead-of-time) compilation)
  • supporting dynamicity
    • in defineClass:
      • classCache.put(name, new SoftReference(c,rq));
    • in findClass:
      • Reference<Class> cr = classCache.get(name);
    • SoftReferences are used to save PermGen, since if we redef a var we don’t want it to keep consuming PermGen

Bonus: clojure was initially implemented in lisp

  • ~1600 loc to implement read, analyse, compile, eval
  • although emitting Java code, not bytecode
  • was also generating C♯

Q: some things in bytecode can’t be expressed in java

  • is there anything which clojure generates which can’t be decompiled back to Java?
    • I’m pretty sure yes, but not sure exactly what
    • Rich:
      • locals-clearing
      • constructs which use goto (which exists in bytecode but not Java)

Rich Hickey, the insides of core.async channels

aside: here’s what clojure looks like in a good IDE

  • (ie IntelliJ)
  • yes, Compiler.java is massive
    • but if your IDE has a structure editor, you can navigate them all easily
    • it’s all in one file because I don’t want 300 files

aside2: the classloader has a cache in a branch

  • fast-load branch

warning! implementation details ahead

  • subject to change!
  • informational only

the problems

  • single channel implementation
    • for use from both dedicated threads and go threads
      • simultaneously, on same channel
  • alt and atomicity
    • Java CSP libraries often didn’t support alt well
    • it’s tricky to do atomically
  • multi-reader/multi-writer
  • concurrency
    • construct deals with the ick of threads and mutexes
  • (this talk: focus on JVM impl; JS version has less of these issues)

API

  • >! >!! put! alt! → channel → <! <!! take! alt!
  • it’s not an RPC mechanism, it’s just a conveyor belt

SPI (service provider interface)

  • >! >!! put! alt!impl/put! [val handler] → channel → impl/take! [handler]<! <!! take! alt!

anatomy

  • channel has:
    • pending puts (fifo)
    • a buffer (optional) in the middle
      • contains data
    • pending takes (fifo)
    • flag indicating if channel is closed
  • fifos implemented as linked queues
  • important to distinguish queues of operations from buffer of data

invariants

  • never pending puts and takes simultaneously
  • never takes and anything in buffer
  • never puts and room in buffer
  • take! and put! use channel mutex
  • no global mutex
    • or even multi-channel mutex

put! scenarios

  1. one or more waiting take! operations
    • gets paired up, takes handler gets completed
  2. stuff in the buffer, but with room in buffer
    • puts its stuff in the buffer, succeeds and immediately completes
  3. buffer full (or no buffer)
    • enter puts queue, block
      • results in backpressure
  4. full buffer, but windowed
    • sliding buffer: latest information takes priority, drop head of buffer (oldest item in fifo), put! completes immediately and enters buffer
    • dropping buffer: drop put! on floor, but completes immediately
    • could have more sophisticated policies in future

take! scenarios

  1. nothing in buffer
    • enqueued
  2. buffer has stuff, but no puts waiting
    • get data, immediately complete
  3. buffer full (or no buffer), puts pending
    • get something (either head of buffer or get paired with first put!)
    • first waiting put! completes (either enters buffer or hands directly to take!)

close! scenario

  • all pending takes complete with nil (closed)
  • subsequent puts complete with nil (already closed) (relatively new)
  • subsequent takes consume ordinarily until empty
    • any pending puts complete with true
    • takes then complete with nil

queue limits

  • puts and takes queues are not unbounded either
  • 1024 pending ops limit
    • somewhat arbitrary, might change
    • will throw if exceeded
      • if you’re seeing this, it’s an architecture smell
    • most likely if you use put! on the edge of your system

alt(s!!)

  • attempts more than one op
  • on more than one channel
  • without global mutex
  • nor multi-channel locks
  • exactly one op can succeed

implications

  • registration of handlers is not atomic
  • completion might occur before registrations are finished, or any time thereafter
  • completion of one alternative must ‘disable’ the others atomically
  • cleanup

handlers

  • wrapper around a callback
    • callbacks are icky, so we want to hide them
  • SPI
    • active?
    • commit → callback-fn
    • lock-id → unique-id
    • java.util.concurrent.locks.Lock: lock, unlock

take/put handlers

  • simple wrapper on callback
  • lock is no-op
  • lock-id is 0
  • active? always true
  • commit → the callback

alt handlers

  • each op handler wraps its own callback, but delegates rest to shared “flag” handler
  • flag handler has lock
    • a boolean active? flag that starts true and makes one-time atomic transition
  • commit transitions shared flag and returns callback
    • must be called under lock

alt concurrency

  • no global or multi-channel locking
  • but channel does multi-handler locking
    • some ops commit both a put and a take
  • lock-ids used to ensure consistent lock acquisition order
    • (avoids deadlock)

alt cleanup

  • “disabled” handlers will still be in queues
  • channel ops purge

SPI revisited

  • handler callback only invoked on async completion
    • only 2 scenarios
  • when not “parked”, op happens immediately
    • callback is not used
    • non-nil return value is op return
  • only time ops park
    • put! when it gets blocked on full buffer
    • take! when it gets blocked on empty buffer
  • only time ops complete asynchronously
    • take! with pending puts
    • put! with pending takes

wiring !/!!

  • blocking ops (!!)
    • create promise
    • callback delivers
    • only deref promise on nil return from op
      • non-nil indicates immediate success (and so callback never gets called)
  • parking go ops (!)
    • IOC state machine code is callback

summary

  • you don’t need to know any of this
  • but understanding the “machine” can help you make good decisions

Q: why use alt! for putting? what’s rationale?

  • taking multiple channels is like a select(2)
  • when you have consumers of different capabilities
    • I want to try to write to everyone, but whenever the first one is ready, I give it to them
    • Q: what’s the difference between that and having four consumers on a single channel?
      • you might have a priority metric, or a cost metric
      • though yes sometimes you can achieve same result two different ways

Q: why is global or multi-channel mutex not good enough?

  • well it would be easy! :)
  • a global mutex could make registration atomic
  • you’d have to make disabling other alts atomic
  • you’d have to make rendezvous atomic
  • you could have two unrelated sets of channel operations, why should they contend?
  • people hate global locks
  • rules out by my aesthetic sense :)

Q: David Nolen had an example of 10000 go blocks updating a textarea, did he hit the 1024 limit?

  • no I don’t think so, but not sure exactly

Q: are buffer & queue sizes useful metrics to monitor?

  • that would be great, and making them monitorable is on the TODO list

Q: other possible extensions?

  • buffer policies
    • you might have logic about priority
  • core.async has proven its utility and it’s become important
    • go macro is a great PoC of what you can do with a macro with several kLoC behind it
      • has its own subcompiler inside it
      • kind of implements a subset of clojure
    • maybe build async support into the compiler?
      • move locals from the stack to fields on the method object
      • I don’t need the stack anymore
      • I can be paused and resumed on another thread
      • declare a fn as async
      • comply with this SPI
      • could build other things like generators & yield
    • the pride moment of “look you can do this with a macro” is not dominated by the desire to make this performant and more solid
  • Q: continuations? how do they differ?
    • continuations are more general
    • this won’t use continuation-passing-style
    • it’s related
    • it won’t be like call/cc
    • it won’t be first-class
    • you won’t be able to resume it more than once
    • for a specific set of use-cases
    • Oleg did a talk that just generators are enough to do stuff that people think you need a lot more for

Q: is there something planned for dynamic binding and the go macro?

  • there are fns which allow you to do the conveyance
    • don’t know if go allows all of them to work

Q: channels on the network?

  • it’s easy to have something you call a channel and put over a wire
  • pretty hard to have all the semantics of these channels over the wire
  • already have queues and all sorts of interfaces to do similar things
  • atomic alt! over more than one wire not going to happen
  • maybe semantics for ports
  • or limitations on alt!
  • the wire has its own semantics, this is the key thing here
    • failure, queueing, delays
  • really easy to just take something from the wire and call put!

Q: is there a typical way to monitor a go block?

  • what kind of monitoring?
  • see that it’s still working, still alive?
  • if the channels were monitorable, you could see if things were producing/consuming properly

Q: what other options did you consider & reject in the design of core.async

  • something other than CSP?
  • the generators stuff
  • continuations
  • I liked what golang did
    • they made a good choice
    • there’s a java csp lib that impls the same kinds of ops
    • it’s difficult to get the semantics correct
  • wanted alts! to be a regular fn, not syntax
    • which feels like an enhancement over go
  • what we’re putting on these channels is immutable
    • which gives extra robustness

Meta-eX, conference party

  • github: meta-ex
  • twitter: meta_ex
  • soundcloud: meta-ex
  • facebook: meta.ex.live
  • website: http://meta-ex.com
  • wooo!

David Nolen, Invention, Innovation & ClojureScript

  • @swanodette
  • recently left NYT for Cognitect

“The future doesn’t have to be incremental”, Alan Kay

  • talks about Xerox PARC
  • worked there for a decade
  • in that decade, inventions!
    • bitmap screens
    • laser printers
    • GUI
    • PC
    • WYSIWYG & DTP
  • innovating is taking inventions and bringing them to a wider audience

The Dream Machine, JCR Licklider and the Revolution that made personal computing possible

  • M Mitchell Waldrop
  • he believed human factors would play an important role
  • we would all have a computer
  • he helped create the future we live in today
  • he helped ARPA finance PARC’s research
  • he helped finance John McCarthy & Ed Fremkin (sp?)

Man-Computer Symbiosis, JCR Licklider, 1960

  • talks about the trie data structure
  • (clojure’s persistent data structures use these!)

invention is hard

  • but innovation is equally important
  • Douglas Engelbart’s original mouse wasn’t very usable
    • a tonne of work went into making it more natural, more durable
    • (apple computers reference)
    • this is innovation!

Purely functional data structures, Okasaki

  • this book is about “paper complexity” – stuff that looks good on paper
  • it’s a foundation which people can build variants on
  • Rich did this
    • he doesn’t get credit for inventing the bit-mapped vector trie

the state of clojurescript

  • released 2011-07-20
  • a lot has happened since then
  • early experiments:
    • clojurescript one
    • himera (from fogus)
      • “translations from javascript”
      • showed what value clojurescript provides over javascript

has 81 contributors <3

  • the reason we don’t have copy-on-write data structures is because someone put in the hard work to make them
  • the reason we have source maps, similar

lighttable - ~11,000 lines of clojurescript

also, the world hasn’t stopped

  • js hosts have improved
  • persistent data structures were a basic performance win
    • COW doesn’t scale well past (say) 100
  • V8 had a lead when we introduced persistent data structures
  • we hoped that others would catch up
    • javascriptcore
    • webkit is trying to get asmjs-level performance with JIT compilation
    • nashorn has come along

demo

  • mori: library for js devs
    • here used to demo performance of persistent data structures
  • comparison:
    • adding 1000000 items to a JS Array
    • adding 1000000 items to a persistent vector
    • 85 ms vs 235 ms (V8)
    • this is really good!
  • comparison:
    • adding 1000000 items to a JS Array
    • adding 1000000 items to a persistent vector (using transients)
    • 85 ms vs 47 ms (!) (v8)
    • transients are faster than mutable arrays
    • javascriptcore: 28 ms (arrays) vs 30 ms (transient vector)
  • nashorn demo
    • benchmark: react running at the command line with om
    • building a template 100 times
      • ~13 ms avg with v8
      • ~8 ms avg with jsc
      • ~14 ms avg with spidermonkey
    • nashorn: slow load time & long warmup time
      • starts really slow (>1s)
      • converges slowly, but:
      • approaches ~23 ms

now what?

  • typescript, dart?
    • these are under the opinion we want to build the same broken type of stuff
    • cljs: we can build things radically simpler

React

  • library from facebook
  • other libraries have a deep-seated notion that everything is mutable
    • angular, backbone, …
  • react is different: it has a functional mindset
  • the virtual DOM evolves from one value to the next
    • clojurescript allows fast diffing between these values
    • react will do the right thing
  • react has completely taken over the cljs world
    • Om
    • reagent
    • quiescent (much thinner)
    • reacl

Om

  • Om was an experiment to show that representing app state a single global value was a good idea
    • this had been done before in other areas:
      • databases
      • server-side
  • we’re not going to make interfaces that people haven’t seen before
  • prismatic’s blog post about moving to Om
    • simple components which don’t interact in crazy ways

Goya

  • by Jack Schaedler (sp?)
    • ui dev for ableton
  • “we can do real undo”
  • Jack saw this and wondered if it would scale
  • Goya: pixel editor
    • surface: immutable vector
  • gets undo without adding complexity to app
  • get almost unlimited number, without loss of performance
  • github: jackschaedler/goya
  • his app is complicated!
    • the UI is complicated
    • but cljs eliminates unnecessary complexity
  • how much memory does his app use? not much
    • (aside: use google chrome dev tools!)

innovate!

  • model story needs work
    • js MVCs backbone/ember/angular
    • notion of a model on the client
      • you can do operations on it
    • nothing particularly compelling for this in the react space
    • DataScript
      • export some elements of the datomic api to the client
      • store your data in a flat way
      • sensible query api over it (queries on trees aren’t so fun)
      • datomic allows you to ask for entities
        • lift a tree out of the flat database
  • react model can be further improved
    • addressability
    • immutable everything
      • they have to convert styles and DOM attributes back to javascript objects which have to be walked
    • (one benefit of react: it’s facebook’s problem 😺 )

Q: is it possible to implement Om all in clojurescript using a macro?

  • I suppose it could
  • you might want to compose things dynamically, and macros are static
  • you have to be concerned with the amount of code that a macro generates
  • I would not pursue that idea

Q: is there a community place for shared Om components?

  • I’m not going to spend much time on it
  • if Om needs to be improved to make this happen, I will do that
  • you want to be able to use other people’s code without jumping through too many hoops
  • things get tricky with events & communication between components
    • need some agreement on how people communicate between components

Q: what’s your vision for cljs 1.0? how can we help with the yak shaving?

  • basic things like sharing code
  • shared analysis over clojure and clojurescript
    • would open up a lot of tooling
      • eg linter to lint both languages
    • would like infrastructure for tooling to be much better
  • when you go to 1.0, people lock to that version and are slow to move off it

Q: are you seeing much evidence of cmd-line or server-side cljs?

  • most people doing it are doing node.js

Q: when is cljs going to be self-hosting?

  • it’s not that we don’t want it
    • we’re keen on self-hostability/bootstrappability, if not self-hosting
  • nice to remove the JVM dependency
    • eg lighttable might not want it
  • it’s last-mile stuff at the moment
    • which isn’t that fun
    • and I don’t personally need it so I won’t work on it

Q: do you forsee a pure cljs version of react?

  • if someone wants to shave that yak, that would be awesome
  • if the system is immutable all the way down, the optimizability explodes

Ali Asad Lotia, Why devops needs Clojure

  • @aalotia

background

  • was a dev who had helped get stuff to prod
  • our ops person left
  • they asked me to fill in
  • I said “okay, as long as you hire a replacement soon”
  • they didn’t arrive
  • I missed being able to write code

problem

  • we’re exec’ing a jar, and it keeps taking 3s
  • I saw an opportunity to write a very simple noir app
  • much improved performance
  • people were impressed, asked to see the source code
  • “what is this clojure thing, and why did you use it?”
    • I’m not a seasoned Java dev
  • moved to another company:

Beamly

  • TV focused social network
    • smart TV planner
    • personalised TV Magazine
  • availability

behind the curtain

  • AWS: us-west, us-east, eu-west
  • milli-services (ie not quite µservices)
    • scala
    • node.js

my team

  • build/release automation
    • but we don’t do deploys; we just enable them
  • persistence
  • platform performance/metrics/logs
  • core libs

deploys in the mutable days

  • generate build artefacts
  • define config in puppet
  • deploy artefacts
  • deploy config

phoenix servers

  • base server images, with some configuration changes
  • relatively short-lived
  • didn’t name them or worry when they were switched off

disconnected dev and ops

  • zed shaw:
    • “maybe you use a language like lisp that pretends the computer is some purely functional fantasy land with padded walls for little babies”
    • actually, yes I do

immutable servers

  • kill server for every deploy
  • package new server images (AMIs) in order to deploy new version

Requirement: examine server images

  • aws console
    • some config but manual and not all info
  • python + boto:
    • just got back a list of objects
    • we know there’s more available! we saw it in the console!
  • clojure + amazonica
    • it just gave us data back!
    • data trumps objects every time for this kind of use case
  • console → cli → sdk → repl/scripts

repls are awesome

  • exploration of APIs
  • minimise context switching
  • instant feedback
  • data rich (or richer, at least, in some cases)

team reactions

  • “Soooo many brackets!!!”
    • I don’t see them anymore – paredit deals with it
  • “How do I iterate over this?”
    • why do you need to iterate? what are you trying to do?
  • “I want to change this value”
    • again, what are you trying to do?
  • “Wow, this is really powerful”

Offloading state

  • Immutable servers
  • pass the buck to a service someone else manages
    • application data
    • metrics
  • but when you do autoscaling, it takes some problems away but gives us other problems
    • provider defined data model
  • clojure was a great fit for managing autoscaling groups
    • all the information we needed was made visible by a single clj fn
  • in a repl, with ad-hoc tasks, having some clojure code you’ve written and evalling it is really powerful

Observing platform performance

  • knock-on/trickle down effects
  • sensu handlers limited

riemann

  • http://riemann.io
  • had used graphite
    • very data-poor
  • riemann gives you a clojure map, which is a much richer model
  • embedded REPL
  • overridable
  • extensible
  • responsive primary author

tracking our services

  • zion - system knowledge base
    • who owns which service? what do I do when alert X fires?
  • component details

infrastructure as data

  • config
  • metrics
  • logs
  • we had powerful ways to analyse this data without having to resort to glomming 500 scripts together
  • we have a single language which is superlative
  • I sit next to extremely good Scala devs and ask how they would do it
    • “I’d write a case class”

future work

  • analyse logs + metrics
  • catch and correct misconfigurations
  • scripts with upcoming fastload?
    • if clojure fastload is fast enough that we don’t have to worry about startup time, could it replace some of our python scripts?
  • cyanite to replace graphite
    • cassandra/clojure
  • “lisp isn’t a language, it’s a building material”
    • Alan Kay

clojure summary

  • pros
    • core data structures
    • data manipulation
    • community
      • #clojure and #ldnclj on freenode
      • people accept PRs, give real feedback
      • projects move
    • shared aesthetic

refs

  • martin fowler posts above
  • mcohen01/amazonica
  • pyr/cyanite

Q: can you share more info about zion?

  • we will when it’s in decent shape
  • too coupled to our particular environment right now

Q: graphite data poor? can you elaborate, particularly with reference to storage backend?

  • how data is stored is poor
  • all you get is an arbitrarily long key name (hierarchical)
    • a timestamp
    • a single numerical value
  • with Riemann, you can add arbitrary tags to the events
    • persisting them – don’t have a great answer
    • looking at influxdb
    • store time-series data with a richer data model

Leonardo Borges, Taming Asynchronous Workflows with Functional Reactive Programming

  • who has used Reactive extensions?
    • do you think it’s FRP?
  • currently writing “Clojure Reactive Programming: RAW”
  • when people talk about FRP, they mean merely “inspired by FRP”

Naming is hard

what’s the difference?

  • every construct in FRP has a precise mathematical definition
  • free of side-effects
    • kind of like Haskell’s IO monad

history

  • 1997: created in haskell
  • other haskell libs
    • reactive-banana, netwire, sodium
  • FRP-insired:
    • Rx[.NET/Java/JS], baconjs, reagi (cljs)
  • main abstractions: Behaviours and Events
    • traditionally:
type Behavior a = [Time] -> [a]
type Event a    = [Time] -> [Maybe a]
  • this talk: compositional event systems

motivating example

  • imperative code to iterate over a list
    • lots of changing state
  • functional code
    • we describe what, but not how
    • no mutating variables
    • gain reusable single-purpose functions
  • CES has similar principles
  • think of key presses as a list of keys over time
  • http://bit.ly/rxjava-github
  • http://bit.ly/rxjs-github
  • subscribe to event sources, filter/transform them
  • map behaviour to event streams
    • say, by sampling every second
  • flatMap / selectMany

network IO

  • rather than events from keyboard, mouse etc
  • in javascript: callback hell :(
  • on jvm: clojure promises don’t compose
  • promises in js are slightly better but have limitations

demo: simple polling app

  • partition/zip

quote

  • “FRP is about handling time-varying values like they were regular values”

why not core.async?

  • core.async feels like it’s a lower level of abstraction
  • it’s a great foundation for an FRP-inspired framework
  • reagi is built on top of core.async ( http://bit.ly/reagi )

bonus example: reactive API to AWS

  • retrieve list of resources from a stack
  • for each EC2 instance, get status
  • same for each RDS instance

Q: when you do you jump from handling manually to observables?

  • my rule of thumb is if I need anything more than a single callback, I’ll use this (or core.async)

Q: have you used RxJava from clj? How nice is it?

  • works great, so does RxJs

Stuart Sierra, Components: Just enough structure

architecture

  • software architecture is very simple(!)
    • presentation
    • business logic
    • DB
  • actually, much more complex
    • config
    • connections to external resources
      • monitoring
      • queues
      • sessions/connections in pools
    • process state
      • thread pools
      • caches
      • schedulers

Java: structure built in

clojure: not much structure

  • clojure namespaces aren’t classes
    • they’re not instantiable
  • def creates a singleton
  • (def foo (atom ..)) creates global mutable state
  • bootstrapping
(defn start-all! []
  (database/connect!)
  (create-queues!)
  (start-thread-pool!)
  ....
  (start-web-server!))

component

  • immutable data structure (map or record)
  • public api
  • management lifecycle
  • relationships to other components
  • It’s an object (ssh!)
    • not using it to represent data

State wrapper component

  • (defrecord DB [host conn] ....)
  • opaque to most consumers (by convention)

Public API

  • fns take component as an argument

Lifecycle: Constructor

  • set up initial state
  • no side effects

Lifecycle: Transitions

  • side effects happen here:
(defprotocol Lifecycle
  (start [component])
  (stop [component]]))
  • start and stop return an updated version of the component

Service provider component

  • (defrecord Email [endpoint api-key ...] ...)

Domain model?

  • traditionally intermingle data and behaviour:
public class Customer {
    private String name;
    private Address address;
    public void notify() {...}
    //...
}
  • Let data be data
  • just use a map

domain model component

  • represent aggregate operations
  • (defrecord Customers [db email])
  • db and email are other components, used by the customer component
  • entirely interacts through their public APIs
  • to construct a Customers instance, need to get its dependencies

system map

  • takes created but unstarted components:
(defn system [...]
  (component/system-map
   :customers (customers)
   :db (db ...)
   :email (email...)))
  • to start the system, understands dependencies and works out correct dependency order to start each component
  • then wires each component up to the correct (started) dependency
  • stopping the system is similar but in reverse dependency order
  • Before start, dependencies not filled in yet (just nil)
  • after start, fill in dependencies
  • the system is just a map
    • so if I want to inject a test stub, I can just assoc it in:
(defn test-system [...]
  (assoc (system ...)
    :email (stub-email)
    :db    (stub-db)))
  • works as long as I do it before starting any services

DB for testing

  • fixtures to inject into database
  • mocking the db is too hard unless you use datomic 😏

var substitution & asynchrony

  • with-redefs and binding are delimited in time
    • problems if you dispatch to another thread
    • potential race conditions
    • tightly coupled to implementation
    • wrong level of granularity

Entry point: main

  • exactly one mutable global for the whole system
  • (def sys (atom nil))
    • use reset! not swap! because start and stop are side-effecting and swap! might call multiple times
    • (@samaaron ed: uses agents for this sort of thing)

Web app: static routes

  • defroutes considered harmful

renaming dependencies

  • you can merge systems
  • name common components with shared keys:
{:a/web-app ..
 :a/server ...
 :db ...
 :email ...}

{:b/web-app ..
 :b/server ...
 :db ...
 :email ...}

(merge system-a system-b)

core.async

  • components take channels as state
  • decouples components from one another
  • system creation can create the channels you want and wire them up

summary

  • advantages
    • once you’re used to the patterns of clear dependencies and boundaries, you maybe don’t even need the library anymore
    • isolation, decoupling
    • testing, refactoring
    • automatic ordering of start/stop
    • easy to swap in alternate implementations
    • everything at most one map lookup away
  • disadvantages
    • requires whole-app buy-in
      • won’t get a lot of the benefits without this
      • porting an existing system can be tedious
    • system map is too big to inspect visually
    • cannot start/stop only part of a system
      • may try to fix someday but don’t really understand how yet
  • possible future
    • “init” acquires resources but doesn’t start?
    • “close”/”stop” separation – close acquired resources and discard dependencies so they can be GC’d
    • (the “stop” method doesn’t dissoc anything)
      • dissoc stops a record being a record
      • you might want to use that state again
    • handle mutable containers for systems
      • currently, library code doesn’t care – you can use an atom or a var or whatever
      • allow individual components to start, stop, or change at runtime
      • deref container and get “current” component with latest deps
      • catch errors, mark component as “failed”
        • this is the tricky part

Philip Potter, Generative testing with clojure.test.check

Chris Ford, the hitchhiker’s guide to the Curry-Howard correspondence

  • number of papers published today by the foremost expert on the Curry-Howard correspondance…
    • 1
  • Don’t panic!
  • Gödel’s incompleteness theorem

introduction

  • a → a
    • this is a proposition in logic
    • but it’s also a type
      • the type of the identity function

our heroes

  • Haskell Curry
    • 1958: textbook on combinatorial logic
      • didn’t necessarily understand how revolutionary this idea was
  • William A. Howard (1969)
    • not only does a type correspond to a proposition, but:
      • a function with a type corresponds to a proof of a proposition
    • “The formulae-as-types notion of construction” - finally published in 1980

modus ponens

  • (a → b) → a → b
    • modus ponens
    • type of apply (haskell or idris):
apply :: (a -> b) -> a -> b
apply f x = f x
  • the implementation here corresponds to a proof of modus ponens
  • apply works with any types a and b
  • modus ponens works with any propositions a and b
  • view the type “Integer” as the proposition that integers exist
    • any example – say, 65 – counts as a proof of this proposition
  • here, we use (3==) to prove that Integer -> Bool is populated
(3==) :: Integer -> Bool

apply (3==) 4
False: Bool

composition

  • (a → b) → (b → c) → (a → c)
    • type of function composition
  • length : List a -> Integer
  • (3==) : Integer -> Bool
    • comp length (3==) : List a -> Bool
    • if we accept that List a exists, we now prove that Bool exists

axioms?

  • a → a
  • (a → b) → a → b
  • a → b → (a,b)
    • if I can build an a, and I can build a b, then I can build an (a,b)
  • (a,b) → b

bottom type

  • a → b
  • (a → b) → (b → a)
    • neither of these are true in general
    • the bottom type: ⊥ is guaranteed to have nothing in it
      • represents falsity in the Curry-Howard correspondance
      • represents something it’s impossible to prove, because it’s not true
AnythingGoes : Type
AnythingGoes = (a : Type) -> a

cantProveItAll : AnythingGoes -> _|_
cantProveItAll f = f _|_
  • cantProveItAll shows that AnythingGoes is uninhabited (because if it weren’t, it would imply that ⊥ was inhabited)

harmless(?)

  • types prove our program correct?
  • types only get us so far
    • can still get runtime errors if the types check out

mostly harmless.

  • are types defective?
  • haskell will crash at runtime despite an advanced type system
    • head [] isn’t defined

enter Idris

  • Edwin Brady, creator of Idris
    • (and whitespace)
  • killer feature of Idris:
    • allows you to make condescending remarks about the Haskell type system
    • although it’s really a dialect of Haskell
  • example:
Type Nat = Z or (S Nat)

Type List = [] or (x :: List)

data Vect : Nat -> Type -> Type where
  Nil  : Vect Z a
  (::) : (x : a) ->
         (xs : Vect n a) ->
         Vect (S n) a
  • in Haskell, types can be parameterized on other types
  • in Idris, they can also be parameterized on values as well as types
    • Vect 2 Integer is the type of vectors which contain exactly 2 Integers
      • or rather, Vect (S (S Z)) Integer
head : Vect (S n) a -> a
head (x::_) = x

head []
Can't unify Vect 0 a
with Vect (S n) iType
  • trying to take the head of an empty vector is a compile-time error

concatenation

  • signature: Vect m a -> Vect n a -> Vect (m+n) a
  • sort: Ord a => Vect m a -> Vect m a
    • would have caught Phil’s my-sort which dropped duplicates(!)
    • didn’t manage to get this implemented in the lunch break
      • it’s not theorems for free 😉

even number family of types

data Even : Nat -> Type where
  Zero : Even Z
  Next : Even n -> Even (S (S n))

Zero : Even Z
Next (Next (Next Zero)) : Even 6
  • can now show that even numbers sum to even numbers:
add : Even m -> Even n -> Even (m + n)
add Zero y = y
add (Next x) y = Next (add x y)
  • Although we’ve really proved that:
    • there exists an operation which takes Even n and Even m and returns Even (n+m)
    • we chose add but could have chosen any other name
  • can prove that 42 is even
  • Even 3 is a valid type
    • Even 3 -> _|_
    • proof in slides

the unit type

  • () represents truth
    • you don’t need anything else to prove this
    • you can construct it without context
    • (you could use Even 42 to represent truth too)
    • so LifeTheUniverseAndEverything -> Even 42 :)

References

Q: is there a way to specify that sort’s return value is sorted?

  • @bodil thinks it’s true :)

Q: is the type-checker guaranteed to terminate?

  • it’s equivalent to the halting problem

Anna Pawlicka, Reactive data visualisations with Om

Technologies

D3 (data-driven documents)

  • to visualise data
    • table of numbers, bar chart, whatever
  • data bound to DOM
  • interactive - transformations driven by data
  • huge community
    • huge number of plugins and extensions
  • Higher level libs available
    • hide the complexity of d3
    • but if you need to tweak the underlying d3 it’s still available

leaflet.js

  • layer on top of d3
  • mapping data
    • tile layers, vector layers
  • user interaction

dimple.js

  • charting library on d3
  • bar charts

react

  • (interface components)
  • solves one problem: complex UI rendering
  • just the V of MVC
    • say no to “two-way data binding”
  • re-renders the entire UI
    • sounds like a bad idea
    • actually quite performant, due to:
  • virtual DOM
    • diffs between previous and next renders of a UI
  • less code
  • shorter update times

react lifecycle

  • IInitState →
  • IWillMount →
  • IShouldUpdate →
    • IRenderState
    • IRender
  • Om handles most of these for us (particularly IShouldUpdate)

Om

  • entire state of the UI in single piece of data
  • immutable data structures = reference equality check
    • shouldComponentUpdate() can be overridden to take advantage of this
  • snapshottable, free undo

Liberator & core.async

  • component interaction
  • liberator: RESTful apis via defresource macro
  • core.async
    • js developers would freak out looking at it
    • get blocking calls without browser freezing

data sources

  • for example, local sensors
  • may want to perform some sql queries to see patterns in your data
  • may wish others to retrieve data through API (if they don’t like our chart)

Chart & API demo

  • user interacts, triggers API calls to fetch data, updates graph in real time
  • chart implementation
    • om/IInitState to construct
    • om/IRenderState
      • to update
  • device-form
    • om/IWillMount to read shared info to find API endpoint
    • om/IRenderState
  • form-row
  • chart-figure

last.fm chart

  • chart based on last.fm playlist
  • input box takes username, calls last.fm to find that user’s playlist
    • chart then shows bands from most popular downwards

interactive maps

  • input box for postcode lookup
    • uses google geocoding api to get coords
  • can click map to create marker & display coordinates
  • app-model stores map location and coordinates panel contents
  • nice use of core.async sliding-buffer 😎
  • (go (while true ...)) could be (go-loop ...) ? dunno

summary

  • fast rendering and interactivity is yours with js + cljs + om
  • immutability = efficiency
  • sane application structure
  • (philandstuff ed: this presentation is very visual, just watch the video!)

algernon, The Face of Inspiration, or how Clojure helps bring Lisp to Python

  • github: algernon
  • twitter: algernoone
  • sorry, I was in the hallway track for this 😦

Malcolm Sparks, Assembling secure clojure applications from re-usable parts

  • @malcolmsparks, juxt

warning! research, evolving ideas, alpha quality

juxt/modular

  • set of components compatible with stuartsierra/component
  • http-kit, bidi router, mustache templating

juxt/cylon

  • security components
    • login form
    • session
    • user domain
    • hashing
    • authn and authz

assertion

  • libraries are great
  • systems are complex

assertion 2

  • a meta-architecture, that can scale to hundreds of diverse projects, is useful

architecture

  • don’t want to just port Spring MVC to clojure
  • components, dependencies, protocols

components

  • reusable bits

dependencies

  • wiring of components together
  • since the system is in a var, I can do a tree-walk on the system, and show it
  • can visualise it with dagra and svg rendering, and react
  • slide deck which shows its own wiring
    • I’m So Meta, Even This Acronym

protocols

  • integration surface
  • necessary for component interchangeability
  • example: bayonet light-bulb fitting
    • can plug a light bulb into it
    • light bulb dies – replace it!
    • over time, can replace entire system by replacing parts
  • hidden couplings
    • copied code
    • database schemas and sql queries
    • URI formation & URI dispatch
    • have to change things in multiple places to effect change
  • juxt/bidi
    • dispatch and forge URIs from the same route data

component example

  • constructor
    • defaults with merge
    • schema/validate
  • components are units of cohesion
    • implements multiple protocols:
      • component/Lifecycle
      • WebService
      • JavaScripts
      • TemplateData

intermission

  • maze creation in cljs
    • “drunken walk” algorithm
      • the first time you visit a space, you break down the wall
  • core.async visualisation and demo of map<

the index pattern

the interceptor pattern

  • a component is wired in between two components

shared dependency pattern

security

challenge

  • don’t want to re-implement security components
  • tried-and-tested security by default
  • flexibility of ‘roll-your-own’

example: website

  • router routing between sub-websites A and B
  • add a login form to the router, which uses:
    • user-domain
      • password-algo (eg scrypt)
      • user-store (eg cassandra)
    • session-store
    • all comes from cylon
  • add authorization component to website B (again from cylon), using:
    • authenticator
    • session-store (same dep as above)

summary

demo

  • lein new modular myapp
  • lein new modular myapp +cljs
  • lein new modular myapp +cljs +security
  • lein new modular myapp +cljs +security +devtools
  • (dev) fn
    • if your code doesn’t compile on your repl, then you just get loads of stack traces
  • secure content, rather than URI routes
    • there may be multiple routes to the content
    • restrict-handler to wrap a response in a RestrictedHandler which implements IFn to look like a fn and make it invokable

Q: what about hypermedia to decouple URI dispatch & formation?

  • we don’t have HATEOAS because it’s quite hard
  • want to make it easier

that’s all folks!

  • thanks for reading :)
@bowika
Copy link

bowika commented Jun 27, 2014

awesome, thanks for doing this

@Biserkov
Copy link

@minimal
Copy link

minimal commented Jun 28, 2014

Generative schema gist: https://gist.github.com/davegolland/3bc4277fe109e7b11770

Herbert is like Schema and comes with test.check integration:
https://github.com/miner/herbert#testcheck-integration

@beppu
Copy link

beppu commented Jun 29, 2014

These are great notes to what looks to have been a high quality conference. Thank you for sharing your notes.

@tzach
Copy link

tzach commented Jun 29, 2014

Thanks for taking the time and effort putting this together.
Much appreciated

@sir-pinecone
Copy link

Thank you!

@kasz
Copy link

kasz commented Jun 29, 2014

Thank you very much. Great supplement where my own notes are lacking.

@garycrawford
Copy link

@thomas-shares
Copy link

Anna's slides are here:
http://www.slideshare.net/annapawlicka/reactive-data-visualisations-with-om
and her demo is here:
https://github.com/apawlicka/om-data-vis
And thank you for taking these notes, fantastic job.

See you all next year!!! Thomas

@leontalbot
Copy link

Thanks for these! Regarding React.js re-implementation in pure ClojureScript, take a look at Tesseract: https://github.com/scottrabin/tesseract

@fbehrens
Copy link

Thanks for sharing !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment