We ended up with what is in kvscala.scala. This
post will walk through each refactoring state we took starting
from the following interface that provides an implicit state
interfact:
trait KVCache[K, V] {
def store(k: K, v: V): Unit
def lookup(k: K): Option[V]
def invalidate(k: K): Unit
}
The expectation of this interface is that we modify the internal
state of the implementation instance of KVCache[K, V]
(we use
InMemoryCache
in our examples (see kvcache.scala source file).
When the cache value is accessed by multiple threads we
require internal coordination via error prone locks to provide
correctness. This incurs the cost of duplicated lock logic in both
mutating operation implementations in every implementation
of KVCache[K, V]
. We also make testing our cache semantics
hard to assert as tests rely on knowing the order of the operations
to determine resulting state. When the coordination is done at the
KVCache
operation implementation level we cannot even specify
ordered sequences of computations in chunks against the key value
cache.
We are lazy and we want everything for free. And we think we can
do better.
Implicit mutable state is not ideal because we end up sharing
it in our usage across threads. Either we end up with contention
due to low level coordination/locks resulting in unpredictable
run-time characteristics or incorrect views of the underlying
data.
The client (consumer of this API) might do something like this:
// Types for the example. And yes, I might be obsessed with season one right now.
case class Character(firstName: String, lastName: String, nickName: Option[String])
sealed trait CellBlock
case object TheSuburbs extends CellBlock
case object TheGhetto extends CellBlock
case object SpanishHarlem extends CellBlock
case object GoldenGirls extends CellBlock
case object Others extends CellBlock
case class Cell(left: Character, right: Character)
// Setup values to make the usage cleaner
val piper = Character("Piper", "Chapman", Some("Pipes"))
val alex = Character("Alex", "Vause", None)
val missclaudette = Character("Claudette", "Pelage", Some("Miss Claudette"))
val leanne = Character("Leanne", "Taylor", None)
// Actualy usage of the API
val c: KVCache[CellBlock, List[Cell]] = InMemoryCache()
c.store(TheGhetto, List(Cell(missclaudette, piper)))
c.store(TheSuburbs, List(Cell(leanne, alex))
Note the usability of the API is not a problem in the above
implicit state model. We aren’t juggling part of the result from
step(N) and threading it into the input of step(N+1). When we
were writing our minimal non-OITNB example in
kvcache.scala we made numerous mistakes
in the second explicit state API design we came up with next.
Demonstrating its error-prone-ness.
The study group translated the above implicit interface and
exposed explicit state in the result types and we ended this
phase of our study session with code that could support the
following:
trait KVCache[K, V] {
def store(k: K, v: V): KVCache[K, V]
// Read-only so we can reuse +KVCache[K, V]+ we used to lookup for next query.
// Therefore, we don't need to return the new cache as it stays the same
def lookup(k: K): Option[V]
def invalidate(k: K): KVCache[K, V]
}
// This is referentially transparent because we could substitute this code:
object Main extends App {
val c: KVCache[String, String] = InMemoryCache()
c.store("index.body.html", "<p>Welcome to Litchfield Penitentiary.</p>")
c.store("about.body.html", """
<p>This penitentiary houses a minimum- and a maximum-security prison located in upstate New York.</p>
<p>The minimum-security prison is comprised of five cell blocks, a Security Housing Unit (SHU), and a psychiatric ward.</p>
<p>The maximum-security prison (MAX) is for inmates with more serious or violent offenses.</p>
""")
def main1 = {
val c2 = c.invalidate("about.body.html")
val v1 = c2.lookup("contact.body.html")
val c3 = c2.invalidate("index.body.html")
val v2 = c3.lookup("contact.body.html")
(v1, v2)
}
def main2 = {
def g(a: String, b: String) = {
val c2 = c.invalidate(a)
c2.lookup(b)
}
val v1 = g("about.body.html", "contact.body.html")
val v2 = g("index.body.html", "contact.body.html")
(v1, v2)
}
println(main1 == main2) // should print true
}
This interface gave us referential transparency (see Main
above), however,
the consuming code then became error prone and could not easily be used concurrently
without a lot of coordination mechanics above the KVCache[K, V]
interface in the
consumer/client code of the interface, which would also be highly error prone
(perhaps worse?) and/or would cause throughput bottlenecks on store
and
invalidate
operations.
As reference this is what code that consumes this API would
look like (and as an anecdote, we kept making logical errors
to get it to run with the expected results):
// I bet you can't tell what TV series I just started watching.
val oitnb1 = InMemory[String, String]()
val oitnb2 = oitnb1.store("larry", "piper")
val oitnb3 = oitnb2.store("alex", "piper")
val oitnb4 = oitnb3.invalidate("larry")
val oitnb5 = oitnb4.invalidate("alex")
// what next??
// I have just finished season one, so no spoilers, this is basically where it is at.
So what did we do in the third step was try to keep the first set of
arguments the same as the previous APIs and returned a function that
accepts the state we need to operate on and returns a tuple with the
desired output result and the new state. We then lifted that function
into a simple type so we could run
the function given the state
to act upon.
Let us look at the type signature changes so far:
// original definition
//def store(k: K, v: V): Unit
// explicit state iteration which assumes the instance
// this works on contains internal immutable state we can
// access to append the new key-value pair to return
// the new cache value back to consumer
//def store(k: K, v: V): KVCache[K, V]
// next step: take a leap of faith and eliminate the internal state of
// the +KVCache[K, V]+ instance
//def store(k: K, v: V): Map[K, V] => (Unit, Map[K, V])
// by creating a data constructor around this return type signature we have
case class State[S, A](run: S => (S, A))
def store(k: K, v: V): State[Map[K, V], Unit]
So now we have something we can keep injecting the underlying state context
into and receiving both the new state and the result of the specific operation. By
itself this is still not giving us the consuming API we want to use, so what next?
In Scala we can use the for-comprehension to describe a sequence of effectful (in
our case we are potentially modifying the internal representation of the cache when
we execute our operations defined for our domain (store
, lookup
, invalidate
).
Something like the following might be a way to express a sequence of KVCache
operations without the error prone boilerplate in the second step usage example:
case class Actor(name: String, imdbId: String)
case class Character(firstName: String, lastName: String, nickName: Option[String])
val lauraPrepon = Actor("Laura Prepon", "nm0696059")
val taylorSchilling = Actor("Taylor Schilling", "nm2279940")
val laverneCox = Actor("Laverne Cox", "nm1209545")
val uzoAduba = Actor("Uzo Aduba", "nm2499064")
val danielleBrooks = Actor("Danielle Brooks", "nm5335029")
val madelineBrewer = Actor("Madeline Brewer", "nm5483400")
def setupCast(c: KVCache[Character, Actor]) = for {
_ <- c.store(Character("Alex", "Vause", None), lauraPrepon)
_ <- c.store(Character("Piper", "Chapman", Some("Pipes"), taylorSchilling)
_ <- c.store(Character("Sophia", "Burset", None), laverneCox)
_ <- c.store(Character("Suzanne", "Warren", Some("Crazy Eyes"), uzoAduba)
_ <- c.store(Character("Tasha", "Jefferson", Some("Tastee"), danielleBrooks)
_ <- c.store(Character("Trisha", "Miller", None), madelineBrewer)
}
def killCharacters(c: KVCache[Actor, Character]) = for {
_ <- c.invalidate(madelineBrewer)
}
def displayCredits(c: KVCache[Actor, Character]) = sys.error("todo")
def showCreditsBeforeAndAfter(c: KVCache[Actor, Character]) = for {
_ <- setupCast(c)
_ <- displayCredits(c)
_ <- killCharacters(c)
_ <- displayCredents(c)
} yield ()
val c: KVCache[Actor, Character] = InMemoryCache()
showCreditsBeforeAndAfter(c)
To be able to use values of type KVCache[K, V]
in a for-comprehension we need to
define a map
and a flatMap
function with the following types:
case class State[S, A](run: S => (A, S)) {
// fmap :: m a -> (a -> b) -> m b
// part of the functor interface
def map[B](f: A => B): State[S, B] = State(
s0 => {
val (a, s1) = run(s0)
(f(a), s1) // (s1, f(a))
}
)
// bind :: m a -> (a -> m b) -> m b
// part of the monad interface
// State[S, A] => (A => State[S, B]) => State[S, B]
def flatMap[B](f: A => State[S, B]): State[S, B] = State(
(s0: S /*State */) => {
val (a, s1) = run(s0)
f(a).run(s1)
}
)
}
It turns out the implementation of flatMap
basically contains the boilerplate
code we needed to remove from the client code of our previous round’s explicit
state design.
Below we can see how we might use operations that return a State[S, A]
in a
sequential computation:
val season1Map: Map[String, String] = Map(
"privileged" -> "Piper"
, "brooding" -> "Alex"
, "scary" -> "Red"
, "funny" -> "Tastee"
, "stylish" -> "Sophia"
, "crazy" -> "Suzanne"
, "pregnant" -> "Dayanara"
)
val season2Map: Map[String, String] = Map(
"scary" -> "Vee"
, "privileged" -> "Soso"
, "funny" -> "Nicky"
) // I am only half way through
val season3Map: Map[String, String] = Map.empty[String, String] // not watched yet
val happyLookup: State[Map[String, String], String] = for {
v <- c.lookup("happy")
} yield v.getOrElse("default")
println(happyLookup.run(season1Map)) // (Tastee, ...)
println(happyLookup.run(season2Map)) // (default, ...)
println(happyLookup.run(season3Map)) // (default, Map())
def set(c: KVCache[String, String])(k: String, v: String) =
for {
_ <- c.store(k, v)
} yieldc
def expire(c: KVCache[String, String])(k: String, v: String) =
for {
v <- c.lookup(k)
_ <- c.invalidate(k)
} yield (v, c)
val setter = set(c)
println(setter("brooding", "Piper").run(season2Map)) // (..., ...)
val expirer = expire(c)
println(expirer("privileged").run(season1Map)) // ((Piper, ...), ...)
We are basically writing small programs that we can run later given some initial state.
We can then reuse these small programs using different initial state (note here our initial
state changes from season to season).
The for-comprehension syntax is basically syntactic sugar and it desugars like the following:
def expire(c: KVCache[String, String])(k: String, v: String) =
for {
v <- c.lookup(k)
_ <- c.invalidate(k)
} yield (v, c)
def expireDesugared(c: KVCache[String, String])(k: String, v: String) =
c.lookup(k) flatMap {
(ov: Option[String]) => c.invalidate(k) flatMap {
(_: Unit) => State(s => ((), s)).map(_ => (ov, c))
}
}
Another observation is that by using map2
, sequence
, both
, mapM_
, etc
we can combine and with map
we can transform the underlying encapsulated result
value without lots of boilerplate code to unwind the value from the container
and wrap it up again if/when we need to do something with the wrapped result.
We didn’t have time this session to implement all the above operations on
State[S, A]
, but hopefully we can see how the following type signatures
would be useful:
// inside case class State[S, A](run: S => (A, S)) {
// full type: State[S, A] => List[State[S, A]] => State[S, List[A]]
// Use when we want to transform a List of states into a State where
// result type is a List.
// For example, I could imagine needing to go from a `List[State[S, Character]]`
// to a `State[S, List[Character]]` in some situations where we can use the
// new value inside a for-comprehension.
def sequence[A](fs: List[State[S, A]]): State[S, List[A]] = sys.error("TODO")
// full type: State[S, A] => State[S, B] => ((A, B) => C) => State[S, C]
// I could see a use for this when parsing disjoint CSV files where we
// pull data from one and data from the matching line in another and
// construct a value representing both peices of data. e.g. actor names
// and bios.
def map2[B,C](sb: State[S, B])(f: (A, B) => C): State[S, C] = sys.error("TODO")
// full type: State[S, A] => State[S, B] => State[S, (A, B)]
def both[A,B](rb: State[S, B]): State[S, (A, B)] = sys.error("TODO")
//}
// inside the companion State object since instance data isn't needed.
// full type: (A => State[S, B]) => List[A] => State[S, Unit]
// Use when we want to do some *action* on the underlying values.
def mapM_(f: A => State[S, B])(xs: List[A]): State[S, Unit] = sys.error("TODO")
We didn’t get to tackle the problem of concurrent access to KVCache[S, A]
directly
yet but by walking through this thought exercise with the Scala compiler we get a
better sense that by changing the API to have properties that allow us to reason
about our code more equationally we can build on top of this more solid foundation
to more adquately express and solve the problem at hand.
Can you imagine some combinators we could define to even combine these already quite
powerful operations together?
As an aside I noticed the parallel to mealy machine model where I speced out the
interface for it in mealy.scala. After we derive implementations for
sequence
and a few other operators on State[S, A]
we might look at
fitting State
into the Mealy
interface which appears a little more
generic but supports the basic semantics.