Skip to content

Instantly share code, notes, and snippets.

@whostolebenfrog
Created December 4, 2014 16:51
Show Gist options
  • Save whostolebenfrog/17ac0ce63073668b719c to your computer and use it in GitHub Desktop.
Save whostolebenfrog/17ac0ce63073668b719c to your computer and use it in GitHub Desktop.
Clojure exchange 2014, ephemeral first data structures
Wished he had been able to go with Transient-First data structures, but they don't feel like that from a user point of view.
The whole point of transient is to take data structures and to make them go fast.
Ephemeral is take persistent structures, do interesting things to them.
Maybe persistent-first data structures!
Basic concepts;
- transients
--- mutable
--- enforce and demand thread isolation
--- can be frozen
Ephemeral another term for mutable
Implementation strategies
- trees with path copying for efficient PDS (persistent data structure) updates
- A notion of ownership for subtress to support transients
Transients OK to pass between threads but must be in such a way that only one thread acts upon them at any one time. They don't give you thread safety.
Transients goal in life is to become persistent :-)
History of key datastructures;
- Bagwell 2000, ideal hash trees
--- mutable hash array mapped tries
Hickey 2007, Clojure
- persistent hash maps based on HAMTs with path copying
- also vectors based on a modification of the idea behind HAMTs
Hickey 2009, Clojure
- transient vectors
- transient maps added later in 2009 by C.Grand
- c.c/transient originally called mutable, persistent!, immutable!
Hash tables even without a large number of collisions will occasionally need to be resized. Hash array mapped tries don't have this problem. They allocate as much memory as strictly necessary to keep the information.
People... 2011
- cache-aware lock-free concurrent hash tries
- concurrent tries with efficient non-blocking snapshots
- ctries - lock-free mutable HAMTs with O(1) snapshots
- the snapshots are first-class Ctries, independent of originals
A closer look at transients;
- key notion, subtree ownership
- distinguished central location stores and edit marker, an atomic reference
- and so do individual nodes
- if the reference owns the subtree then it can update in place, otherwise it allocate new nodes(?)
- new paths get new edit markers
- at some point we can freeze the transient, at which point we reset all markers to nil
Devised to make certain operations faster in clojure
persistent-first approach in a mutable setting
- NavigableMaps
- essense of NavigableMaps: nearest neighbour queries and subsetting
--- subsetting: narrow down a map to a range of keys, give ones between our bounds and in such a way that the original still retains all the keys
- JDK impls return subcollections as views
--- modifications are reflected in the original view and vice versa
--- cannot add items to views outside the original bounds
--- inappropriate when the subcollection is required for general use e.g. passed on for arbitary use
--- could be OK for read but not really suitable as a first class map
Persistent-first approach cont'd
- BST (including red/black) support join split and range
** didn't follow section on joining trees **
- split: produce subcollections including keys > and < than a given key
- range: extract a subcollection bounded by the given keys (subset)
- all implementations are in O(log n) for red/black and AVL
cont'd
- in a mutable setting those opperations modify the original tree
- range discards keys outside bounds in O(log n) time
- in an immutable setting new trees are returned
- originals are unharmed, submaps are first class
- better NavigableMaps than Java in this respect
- but incompatible the mutable feel
... unless we add transients! ...
A transient can be stored in a TreeMap workalike wrapper
- an independent snapshot can be produced by invalidating the transient
- the original wrapper object is free to install a new edit marker immediately
- the snapshot can be subsette to produce first-class subcollections etc.
- these can be safely returned for arbitary use
The cost: must store edit markers, might be a little slower for updates (though not for lookups)
- no benefit for SortedMap but for NavigableMap perhaps woth a look
Ctries: overview
- map structure with support for concurrent updates
- comparable to regular HAMTs in performance of HAMT operations
- snapshotable in O(1) time with little degradation to performance of other operations
- snapshots are completely independent first-class Ctries
- now implemented in Clojure, ctries.clj
Internals:
- similar to Clojure's PHMs though with a different range of node types
- key structural difference, indirection odes ensure linearisability at a slight performance cost
- compare and swap based in-place updates to the tree structure
- generation marker used to determine subtree ownership
--- generation markers are at root nodes
- no current hardware supports CAS for this
Cool demo using @ to create persistent snapshots
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment