Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save aappddeevv/8920115 to your computer and use it in GitHub Desktop.
Save aappddeevv/8920115 to your computer and use it in GitHub Desktop.
You often need to flatten a map into a tuple or case class object. This describes how to do it generically. When using a nosql database, for example, you often need to convert maps to tuple/case classes. That's because nosql databases have implied schemas and we have to help the statically typed code by un-implying the schema.

#The Problem: The Need to Flatten a Map When dealing with nosql databases or even traditional RDBMS, you often need to flatten a map to a tuple or create a case class. The question is, how do you do this?

While a map can be flattened to a list of (key, value) pairs fairly easily, it does not create a tuple of values.

scala> m
res39: scala.collection.immutable.Map[String,Any] = Map(blah -> 10, hah -> nah)

scala> m.flatMap { case (k,v)=>k::v::Nil}
res42: scala.collection.immutable.Iterable[Any] = List(blah, 10, hah, nah)

The other issue is that a Map[String, Any] needs to be converted to the right type since a tuple or case class needs the right types specified at compile time.

A tuple is a compile time construct with static type checking. Hence, we need someway to specify how the tuple should be constructed, at compile time, while allowing you to use this "template" to convert maps.

Also, if it is not obvious at this point, we consider maps with (string, Any) pairs in it. This blog is not about parsing strings into jvm values such as an Int.

In the end, to solve this flattening problem is quite straight forward. we need to define an object that contains the property to obtain from the map and a general method to "get" the value. Then we need a template of the target data structure to set the "got" value into. The object that holds the property and the "get" method can be considered a "strategy design pattern" or a "generalized getter" object. These are common in the java world. scala provides some syntax and abstraction support that makes their usage more transparent--which is a good thing.

#Creating Metadata Since you want to create a template to create the output tuple from the map, we need to have some structures that help fill out a template. we'll make the template a tuple of these structures.

First let's think about converting a value, of type Any, to a statically known type without using casting. The general way to do this is to define a method, say a Converter, that handles the conversion. Thinking ahead, since we want nice error messages to be printed, we'll also need the name of the map key we want to convert. You do not need this extra metadata, but we will include it to enhance usability.

Let's define our converter to take a value and convert it to the right output type:

trait Converter[A] extends (Any => Either[String, A])

Then we need to create a set of converters. This approach uses the typeclass concept.

object Converter {

  def apply[A](transformer: (Any, String) => Either[String, A]) = new Converter[A] {
    def apply(value: Any, name: String) = transformer(value, name)
  }

  implicit def intConverter: Converter[Int] = Converter {
    value =>
      value match {
        case v: Int => Right(v)
        case x => Left("Value " + x + " is not an Int)
      }
  }
}

So by creating converters for the known types and then importing the implicits into the scope, we can always ensure we have some converters handy. Of course, we need to add our name metadata to the above to make the Either messages more clear add a check for null value conversion, which for our case, will return an error in the left of the Either.

Let's put this altogether:

trait Converter[A] extends ((Any, String) => Either[String, A])

object Converter {

  def apply[A](transformer: (Any, String) => Either[String, A]) = new Converter[A] {
    def apply(value: Any, name: String) = transformer(value, name)
  }

  def checkNull[A](transformer: (Any, String) => Either[String, A]): Converter[A] = Converter {
    (value, name) =>
      value match {
        case value =>
          if (value != null) transformer(value, name)
          else Left("Value for " + name + " is null")
      }
  }

  implicit def intConverter: Converter[Int] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: Int => Right(v)
        case x => Left("Value " + x + " is not an Int for " + name)
      }
  }

  implicit def stringConverter: Converter[String] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: String => Right(v)
        case _ => Left("Value " + value + " is not a String for " + name)
      }
  }

  implicit def longConverter: Converter[Long] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: Long => Right(v)
        case _ => Left("Value " + value + " is not a Long for " + name)
      }
  }

  implicit def FloatConverter: Converter[Float] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: Float => Right(v)
        case _ => Left("Value " + value + " is not a Float for " + name)
      }
  }

  implicit def shortConverter: Converter[Short] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: Short => Right(v)
        case _ => Left("Value " + value + " is not a Short for " + name)
      }
  }

  implicit def byteConverter: Converter[Byte] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: Byte => Right(v)
        case _ => Left("Value " + value + " is not a Byte for " + name)
      }
  }

  implicit def booleanConverter: Converter[Boolean] = Converter.checkNull {
    (value, name) =>
      value match {
        case v: Boolean => Right(v)
        case _ => Left("Value " + value + " is not a Boolean for " + name)
      }
  }
}

Now we could ensure the converters are available by importing Converters._.

#Creating a Template: Implied Schema We mentioned the need for a template to capture the compile time aspect. And we need a way to apply that template to a map in order to run the conversion.

First, lets bundle the column name (the implied schema metadata) with the converter:

class SimpleColumn[A](val name: String)(implicit val converter: Converter[A])

and lets consider a template to be a tuple of these columns specifications. This is the "implied schema" that Martin Fowler talks about in his nosql talk. For example, the tuple (nameColumn, ageColumn) represents a template where nameColumn is new SimpleColumn[String]("name").

But now we need a function to convert the map, given a template. Let's assume we want a 1 part tuple to start with. This corresponds to having just 1 value in the map we care about and want to extract into a Tuple1.

object TemplateFlattener {

  def convert1[T1V, T1 <: SimpleColumn[T1V]](template: Tuple1[T1],
    values: Map[String, Any]): Tuple1[T1V] = {
    val rawValue = values(template._1.name)
    template._1.converter(rawValue, template._1.name) match {
      case Left(_) => (null)
      case Right(v) => Tuple1(v)
    }
  }
  
}

You can create a couple of variations of this function that are a bit neater but hopefully it conveys the requirement adequately.

Here's the rub. In order to get the arity to work out, we need to specify these functions for all possible tuple sizes. There are some tricks using the Product class methods such as productArity and productIterator that we could use, but lets assume that we just need to do some boilerplate methods that handle the tuple size we will care about in our applications. We also need to think hard about that nasty null located in the return tuple.

Let's test it to make sure it works first:

scala>val nameColumn = new SimpleColumn[String]("name"{
scala>val m = Map("name"->"Joe Smith")

scala> convert1[String, SimpleColumn[String]](Tuple1(nameColumn), m)
res8: (String,) = (Joe Smith,)

So that looks good but it does look very complicated and how do you scale it and make it easy to use?

Its pretty clear we will need to use some implicits so that a template flattener is automatically selected for us. And that flattener selection will need to be anchored to some type of object. We obviously cannot attach it to the column object because we need something that goes across multiple columns.

Our "missing value" strategy also needs to be thought about. If a value is missing in the map, we could just use a null in the value's place, use an Option or signal some type of error indicating the tuple could not be created. That's a fairly wide range of choices. If a value is missing but the tuple should be constructed anyway, then the value should be interpreted as an optional value to begin with in the tuple. So we need to disentangle the concept of missing and the concept of a null value since a map can contain a null as the value of a key. Let's assume that if the value is missing, we should return an error condition and if the value is null, if we indicated it so, that the null is either a null in the returned tuple or a None. We'll need a way to flag if a null or None should be returned. In other words, we need "options."

#Playing with Tuples In scala, tuples have some limits, for example, only tuple sizes up to 22 are allowed. HLists were created to allow type lists with no tuple size limit and play much the same concept as a tuple. There are some nuances and flexibility of using HLists, but tuples are good enough for what we need right now.

In order to get the compiler to automatically selection a method like convert1 we have to give it some type information e.g. the target tuple data structure with types specified. We want to make this easy to use so instead of having the programmer specify this tuple yet again, lets just bundle the tuple that represents our implied schema and an object whose type represents the conversion from the tuple schema to the target tuple data structure.

So the tuple is an okay way to specify the implied schema e.g. (nameColumn, ageColumn) but it needs to be applied to the underlying data structure, in our case a map, and it needs to produce an output value tuple using missing/null logic.

#Designing for Ease of Use What is the right interface to make this easy to use? Let's design the interface by writing a few scala-like statements:

val input: Stream[Map[String, Any]] = resultIterator.toStream

val transformedInput = input.map(MapRow(_))

My wrapping each item in the stream with a "row" concept we can enforce a type that we control and that we can use in type inference so we do not have to adapt our schema code to a wider variety of underlying data structures, just the Row concept.

object MyImpliedSchema extends MapImpliedSchema { 
  def nameColumn = column[String]("name")
  def ageColumn = column[Int]("age")

  def * = (nameColumn, ageColumn)
}
...
val testData = Map("name"->"blah", age->20)
val testDataTuple: (String, Int) = MyImpliedSchema(testData)

val myTransformedInput: Stream[(String, Int)] = transformedInput.map(MyImpliedSchema)
...
val nameColumn2 = SimpleColumn[String]("name")
val ageColumn2 = SimpleColumn[Int]("age") 
val myTransformedInput2: Stream[(String, Int)] = transformedInput.map( (nameColumn2, ageColumn2))
val myTransformedInput3: Stream[(String, Option[Int])] = transformedInput.map ((nameColumn2, ageColumn2.?))

Okay, that looks like slick, alot. But for good reason. Define the schema in a nice object that bundles together the columns and defines a project operator. We also want to be able to define the implied schema on the fly as you can see at the bottom of the code. The use of ? on a column also transforms it into an optional column. We'll also want to an input may to a case class, but lets work with tuples for the moment.

The use of streams is convenient because each access to an element requires a costly conversion to the output tuple. And we can convert a stream to a single output value, a list, etc. using standard scala and you can lift additional functions into the stream in order to perform domain specific processing.

#Bundling the Parts Together We need to feed our "data" to the convert functions that take an implied schema and outputs the desired client-facing data tuple. However, we do not want to tie our column to the specific data structure. Hence, we need another strategy design pattern "getter." In other words, our "converter" converts in a type safe way an Any to a target type, but we still logic to get the value from the underlying data structure. We could bundle this "getter" logic in a separate object but lets just make it a parameter that any column definition can pick up, as long as you the programmer brings the right "getter" logic into scope at instantiation time.

#Summary Thoughts Its pretty clear the above can be abstracted further, however, at some point, it may make sense to simply switch to a supported driver infrastructure similar to slicks. It is clear that is fairly straight forward to put together a few abstractions to convert an incoming stream into a API friendly presentation that is easier to use by a domain specific program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment