- Hello World Open: programming contest organised by Reaktor and Supercell with ~2500 teams, worldwide distribution
- Client-server car race, cards driven with some parameters such as decelerate/accelerate, change lanes
- Organisers provided test server with simple testing UI
- Parameters for car are quite simple: current angle, position etc
- Clojure works well for processing simple data structures like this - analysis, storing, examining
- Used Incanter to plot bot data during test (throttles vs. angles)
- numeric.expresso used to reverse-engineer physics: basically guessing what the formula might look like
- Other option for solving physics problem: interpolate with lots of data
Code should be on Github by Monday.
- Incanter is useful for visualising data flowing through a live system - easy to create simple graphs. Doesn't have to be used just for stand-alone analytics applications!
- Loved the recorded coding in the presentation. https://twitter.com/dl1ely/status/558906547133505536
- Many people in audience use Hadoop, only a few have used Spark
- Spark brings big data, distributed systems and the JVM together
- Spark value proposition:
- Keeps stuff in memory - no I/O needed for intermediate results
- Can work with many different data sources and data structures
- Spark vs Hadoop:
- Hadoop (= MapReduce + HDFS) is the de-facto standard
- Spark replaces MapReduce, but you can still use HDFS with Spark as well
- What's wrong with MapReduce? Performance - performs I/O after each step. Bad for tasks with many steps, e.g. machine learning
- The main innovation of Spark is data sharing between processing steps
- Resilient Distributed Datasets (RDDs)
- could be on any node
- are partitioned into blocks, e.g. 64 Mb large, and distributed across nodes
- resilient = doesn't matter if some blocks get lost e.g. due to node failure, Spark keeps track of lineage and will re-compute
- Workflow:
- Get RDD from datasource, e.g. HDFS, JDBC query, Cassandra, text file
- "Transform" and get new RDDs
- Perform an "action" and get an end result
Now for some live coding - or not!
- Example: compare access log parsing with plain Clojure and sparkling.
- Spark code is easier to unit test than Hadoop equivalents
- Spark solution uses Spark's built-in support for datastructures with key-value pairs
- e.g. reduce-by-key only works with RDD in tuple format
- important for performance
- makes code shorter than the pure Clojure version
Spark and Clojure
- Clojure is a great fit for Spark's already FP-inspired approach
- Attaching to nREPL on running node is very handy - GorillaREPL used for Notebook-style interface
Tips
- Check out Partitioners to reduce data shuffling
- Data model design is important
- Do stuff by keys
- http://lintool.github.io/SparkTutorial is a good first read
sparkling vs flambo
- flambo promises to be simple, but breaks the rule of "as simple as possible but not simpler" by converting Scala Tuple2 to Clojure vector and back when necessary -> loses Partitioner information and becomes slow
- Spark feels like a more modern approach compared to Hadoop
- I don't really have big data use cases at the moment, but Spark offers a good scaling path for growth scenarios
- Should check out GorillaREPL for introspection of live systems
What we're talking about: ability to automatically deploy tested code to production whenever
- Test pyramid in Clojure:
- UI/top level: clj-webdriver (Selenium), kerodon (ring HTML), peridot (any ring app) -> impure, side effects
- Integration: impure, side effects
- Component: pure kind of integration tests
- Unit: clojure.test, midje, with-redefs for simple harnesses. pure
- Contract tests: useful in today's mobile-centric world where we can't influence update rates of mobile apps
- Can act as a stub for clients
- pact, pact-jvm (doesn't have clojure wrapper it seems), janus (not maintained)
- Component/integration tests can be implemented using the same tools as unit tests, the difference is mostly conceptual. Can e.g. be separated into namespaces (
acceptance
,integration
,core
, ...). Midje has good support for wildcards. - Open question: where does generative tests fit in the test pyramid?
Heroku
- Does magic behind the scenes, not very transparent but includes
lein uberjar
-main
function,:main
key inproject.clj
,Procfile
all needed- Breaks the "Only compile binaries once" principle
Own infrastructure
- Aim for deploying a single JAR
- Will probably include shell scripting, Ansible or something
- One way: create your own .clj file with configuration values. Read it in, merge with default options
- (My note: Component, leaven, duct are better options)
- Think about where in the test pyramid your tests fit
- You can get pretty far with standard Clojure testing tools
- Main features of Datomic:
- No update-in-place, only assertions and retractions (where retraction != removal)
- Immutable data, database as a value
- ACID
- Datalog query language
- Doesn't do storage by itself, instead uses Riak, RDBMS, DynamoDB ...
- Datom!
- Entity, Attribute, Value, Tx, Operation (successful or not)
- 1234, name, John, "at work", true
- Time: easy access to past and future states
- Indexes: EAVT, AEVT, AVET, VAET
- Value of DB passed to queries and functions
- Datalog:
- declarative like SQL, logical language, pattern matching
- :find, (:in,) :where
- most logic will be in :where
- order of statements usually doesn't matter - as a rule of thumb, provide specific datoms first
- Classic Datalog is the Entity API, there's also a new, higher-level Pull API
- Java methods can be used too, e.g.
.startsWith
- Powerful DB functions in Clojure, use with
invoke
- Filters can be used to use different queries for different parts of the database. e.g.
as-of
(DB value at a certain point of time),since
(auditing)
Transactor
- Single instance (one standby)
- Handles writes
- Ensures ACID
Use cases for Datomic
- Most useable when going back in time, immutability are important capabilities
- Not suitable as replacement for e.g. ElasticSearch/Solr full text search, where query is done over lots of data
- Datomic has a really powerful query language - the fiddly SQL queries that I sometimes use to for example generate reports can probably be replaced by a few lines of Datalog
- So much can be done using validators, transactions, filters.. logic moves to the DB layer
- Wonder what the biggest win would be - the immutability or the query capabilities
- Web developing in Clojure is liberating - no dominant frameworks, many ways of doing things
- ring is de-facto
- similarities in other languages: Rack, WSGI
- called "ring" b/c of client -> request -> response -> client feedback loop
- Other functions handled by small libraries: clout, compojure, hiccup, enlive, friend ..
- Everything works well together, everyone uses plain Clojure data structures
- For more complex applications: Component is nice. But start with it! Very hard to retrofit
- Rationale: lisp inside-out syntax not nice for reading example-based tests - left-to-right is nicer
- Useful add-ons: simple stubbing and mocking, auto-test feature ...
future-fact
: like JUnit @Ignore annotation, for TODOs- Lots of checkers, that can be used on RHS instead of values - check the Midje wiki
anything/irrelevant
: useful for cases when return value is not interesting b/c we are checking for side effects
Stubbing
provided
: does essentially the same thing aswith-redefs
Mocking
- Testing for interactions - "called 5 times" scenario
- Also using
provided
Integration tests - how to separate them
- Can tag tests with e.g.
:slow
metadata, run withlein midje :filter -slow
- Can run into cases with nested facts
- Separate namespaces another option
- Testing is a good organisational inroad for Clojure and Midje is a great testing tool
- Second talk with failed live coding already. Albrecht's approach of having recorded coding paid off!
Slides (generated with test.check generators!)
- Converting example-based tests: think about what properties must hold
- The point is to use only "generators" and "properties" for your test
- Running loads of iterations is great for finding edge cases
- Re-use seed to get the same input sequence
Real world examples - www.stylefruits.de
- Complicated routing - /hosen/lee/farbe-hellblau/...
- Simplify by mapping paths to descriptors and back. Property:
(comp descriptor->path path->descriptor)
is the identity function - Generator: all valid descriptors
- Found lots of bugs
Stateful things
- Paper: "Testing Telecoms Software with Quviq QuickCheck"
- jstepien/states
- michalmarczyk/ctries.clj
- Integration tests: could have generators that seed databases, creates events as well
Resources
- QuickCheck paper (Claessen, Hughes)
- "Testing the hard stuff and staying sane", John Hughes
- "Generative testing with clojure.test.check", Philip Potter
- Use
sample
for testing complicated generators - Can use exploratory/generative testing for finding edge cases and implement unit tests for these - very valid use case for generative testing!
- 2012: painful experience with AngularJS building a soulseek clone
- State handling main pain point
- Since 2014: success with ClojureScript and Om
- Besides state management lots of other goodies: hotswap (figwheel), time travelling ...
- Compiler is not a bad thing
- core.async - "if you do not understand yet, you are probably thinking too hard - it's simple"
- People have done quite cool samples for others to learn from, e.g. http://shaunlebron.com/t3tr0s-slides
- bhauman/devcards
- Om and Chestnut (lein) vs Reagent and Tenzing (boot)
Missed the lightning talks unfortunately.
Great summary of the day! You can find my slides here (pdf)