jstaffans/clojured_2015.md

## clojured_2015.md

      
    Raw
  

              clojured_2015.md
            
          
    Notes from ClojureD 24.1.2015

Albrecht Schmidt: "Start your engine: My Clojure Bot in the Hello World Open 2014"


Hello World Open: programming contest organised by Reaktor and Supercell with ~2500 teams, worldwide distribution

Client-server car race, cards driven with some parameters such as decelerate/accelerate, change lanes
Organisers provided test server with simple testing UI


Parameters for car are quite simple: current angle, position etc
Clojure works well for processing simple data structures like this - analysis, storing, examining
Used Incanter to plot bot data during test (throttles vs. angles)
numeric.expresso used to reverse-engineer physics: basically guessing what the formula might look like
Other option for solving physics problem: interpolate with lots of data

Code should be on Github by Monday.
My take-home points


Incanter is useful for visualising data flowing through a live system - easy to create simple graphs. Doesn't have to be used just for stand-alone analytics applications!
Loved the recorded coding in the presentation. https://twitter.com/dl1ely/status/558906547133505536

Paulus Esterhazy and Christian Betz: "Big Data Processing with Spark and Clojure"

Slides,
Github

Many people in audience use Hadoop, only a few have used Spark
Spark brings big data, distributed systems and the JVM together
Spark value proposition:

Keeps stuff in memory - no I/O needed for intermediate results
Can work with many different data sources and data structures


Spark vs Hadoop:

Hadoop (= MapReduce + HDFS) is the de-facto standard
Spark replaces MapReduce, but you can still use HDFS with Spark as well
What's wrong with MapReduce? Performance - performs I/O after each step. Bad for tasks with many steps, e.g. machine learning
The main innovation of Spark is data sharing between processing steps


Resilient Distributed Datasets (RDDs)

could be on any node
are partitioned into blocks, e.g. 64 Mb large, and distributed across nodes
resilient = doesn't matter if some blocks get lost e.g. due to node failure, Spark keeps track of lineage and will re-compute


Workflow:

Get RDD from datasource, e.g. HDFS, JDBC query, Cassandra, text file
"Transform" and get new RDDs
Perform an "action" and get an end result


Now for some live coding - or not!

Example: compare access log parsing with plain Clojure and sparkling.
Spark code is easier to unit test than Hadoop equivalents
Spark solution uses Spark's built-in support for datastructures with key-value pairs

e.g. reduce-by-key only works with RDD in tuple format
important for performance
makes code shorter than the pure Clojure version


Spark and Clojure

Clojure is a great fit for Spark's already FP-inspired approach
Attaching to nREPL on running node is very handy - GorillaREPL used for Notebook-style interface

Tips

Check out Partitioners to reduce data shuffling
Data model design is important
Do stuff by keys
http://lintool.github.io/SparkTutorial is a good first read

sparkling vs flambo

flambo promises to be simple, but breaks the rule of "as simple as possible but not simpler" by converting Scala Tuple2 to Clojure vector and back when necessary -> loses Partitioner information and becomes slow

My take-home points


Spark feels like a more modern approach compared to Hadoop
I don't really have big data use cases at the moment, but Spark offers a good scaling path for growth scenarios
Should check out GorillaREPL for introspection of live systems

Hugo Firth: "Continuous Delivery in Clojure"

What we're talking about: ability to automatically deploy tested code to production whenever
Testing


Test pyramid in Clojure:

UI/top level: clj-webdriver (Selenium), kerodon (ring HTML), peridot (any ring app) -> impure, side effects
Integration: impure, side effects
Component: pure kind of integration tests
Unit: clojure.test, midje, with-redefs for simple harnesses. pure


Contract tests: useful in today's mobile-centric world where we can't influence update rates of mobile apps

Can act as a stub for clients
pact, pact-jvm (doesn't have clojure wrapper it seems), janus (not maintained)


Component/integration tests can be implemented using the same tools as unit tests, the difference is mostly conceptual. Can e.g. be separated into namespaces (acceptance, integration, core, ...). Midje has good support for wildcards.
Open question: where does generative tests fit in the test pyramid?

Deployment

Heroku

Does magic behind the scenes, not very transparent but includes lein uberjar
-main function, :main key in project.clj, Procfile all needed
Breaks the "Only compile binaries once" principle

Own infrastructure

Aim for deploying a single JAR
Will probably include shell scripting, Ansible or something

Configuration


One way: create your own .clj file with configuration values. Read it in, merge with default options
(My note: Component, leaven, duct are better options)

My take-home points


Think about where in the test pyramid your tests fit
You can get pretty far with standard Clojure testing tools

Konrad Szydlo: "The power of the Datomic database"


Main features of Datomic:

No update-in-place, only assertions and retractions (where retraction != removal)
Immutable data, database as a value
ACID
Datalog query language
Doesn't do storage by itself, instead uses Riak, RDBMS, DynamoDB ...


Datom!

Entity, Attribute, Value, Tx, Operation (successful or not)
1234, name, John, "at work", true


Time: easy access to past and future states
Indexes: EAVT, AEVT, AVET, VAET
Value of DB passed to queries and functions
Datalog:

declarative like SQL, logical language, pattern matching
:find, (:in,) :where
most logic will be in :where
order of statements usually doesn't matter - as a rule of thumb, provide specific datoms first
Classic Datalog is the Entity API, there's also a new, higher-level Pull API
Java methods can be used too, e.g. .startsWith


Powerful DB functions in Clojure, use with invoke
Filters can be used to use different queries for different parts of the database. e.g. as-of (DB value at a certain point of time), since (auditing)

Transactor

Single instance (one standby)
Handles writes
Ensures ACID

Use cases for Datomic

Most useable when going back in time, immutability are important capabilities
Not suitable as replacement for e.g. ElasticSearch/Solr full text search, where query is done over lots of data

My take-home points


Datomic has a really powerful query language - the fiddly SQL queries that I sometimes use to for example generate reports can probably be replaced by a few lines of Datalog
So much can be done using validators, transactions, filters.. logic moves to the DB layer
Wonder what the biggest win would be - the immutability or the query capabilities

Andreas 'Kungi' Klein: "Frameworkless Web Development in Clojure"

Slides

Web developing in Clojure is liberating - no dominant frameworks, many ways of doing things
ring is de-facto

similarities in other languages: Rack, WSGI
called "ring" b/c of client -> request -> response -> client feedback loop


Other functions handled by small libraries: clout, compojure, hiccup, enlive, friend ..
Everything works well together, everyone uses plain Clojure data structures
For more complex applications: Component is nice. But start with it! Very hard to retrofit

Tobias Bayer: "Clojure Testing with Midje"

Slides

Rationale: lisp inside-out syntax not nice for reading example-based tests - left-to-right is nicer
Useful add-ons: simple stubbing and mocking, auto-test feature ...
future-fact: like JUnit @Ignore annotation, for TODOs
Lots of checkers, that can be used on RHS instead of values - check the Midje wiki

anything/irrelevant: useful for cases when return value is not interesting b/c we are checking for side effects


Stubbing

provided: does essentially the same thing as with-redefs

Mocking

Testing for interactions - "called 5 times" scenario
Also using provided

Integration tests - how to separate them

Can tag tests with e.g. :slow metadata, run with lein midje :filter -slow

Can run into cases with nested facts
Separate namespaces another option


My take-home points


Testing is a good organisational inroad for Clojure and Midje is a great testing tool
Second talk with failed live coding already. Albrecht's approach of having recorded coding paid off!

Jan Stepien: "Generative Testing: Properties, State and Beyond"

Slides (generated with test.check generators!)

Converting example-based tests: think about what properties must hold
The point is to use only "generators" and "properties" for your test
Running loads of iterations is great for finding edge cases

Re-use seed to get the same input sequence


Real world examples - www.stylefruits.de

Complicated routing - /hosen/lee/farbe-hellblau/...
Simplify by mapping paths to descriptors and back. Property: (comp descriptor->path path->descriptor) is the identity function
Generator: all valid descriptors
Found lots of bugs

Stateful things

Paper: "Testing Telecoms Software with Quviq QuickCheck"
jstepien/states
michalmarczyk/ctries.clj
Integration tests: could have generators that seed databases, creates events as well

Resources

QuickCheck paper (Claessen, Hughes)
"Testing the hard stuff and staying sane", John Hughes
"Generative testing with clojure.test.check", Philip Potter

My take-away points


Use sample for testing complicated generators
Can use exploratory/generative testing for finding edge cases and implement unit tests for these - very valid use case for generative testing!

Jelle Akkerman: "Clojurescript and user interfaces: Simplicity yields possibilities"


2012: painful experience with AngularJS building a soulseek clone

State handling main pain point


Since 2014: success with ClojureScript and Om

Besides state management lots of other goodies: hotswap (figwheel), time travelling ...
Compiler is not a bad thing
core.async - "if you do not understand yet, you are probably thinking too hard - it's simple"


People have done quite cool samples for others to learn from, e.g. http://shaunlebron.com/t3tr0s-slides
bhauman/devcards
Om and Chestnut (lein) vs Reagent and Tenzing (boot)

Talks I didn't see but were probably great

Falko Riemenschneider: "JavaFX GUI architecture with Clojure core.async"

Slides
Martin Klepsch: "Boot - Build Tooling For Clojure"

Slides, template
Meikel Brandmeyer: "Hay - a concatenative language"

Github
Michael Klishin (via Video): "Scalable Way of Doing Open Source: The ClojureWerkz Story"

Slides

Missed the lightning talks unfortunately.