Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

#scala, Map[String, Any] and scalaz Validation

#The Problem

I seem to encounter alot of Map[String, Any] in my programming probably because I am using graph databases alot that store key-value pairs in the nodes and relationships (think neo4j).

Because of this, I encounter alot of map-like processing. Being able to fluently handle these map structures fluently during data import processing or just general processing is very important.

The classic problem I ran into a lot was how to use the Map object more fluently and easily in my data import or query-like processing. I usually have a UI with my application and the UI needs to be able almost any data structure, so it is usually setup to be fairly robust to not knowing the exact types of values in the map or deriving those from the data itself. However, for data import processing as well as querying, I typically do need to know and count on a few well known types for values that are guaranteed to be in my objects like a name (a String) or some other property, like length (an Int).

But alot of what I wound up doing initially made it hard. You can always do a match on the return value from a Map.get, but that usually is not composable. For example,

yourMap.get("key2") match {
  case Some(x: String) => ...do something only when key2's value is a string...
  case _ => ...signal an error do something else...
}

Of course, you have Map.getOrElse but that actually returns a type of Any. Sometimes, the Any type does indeed become the right type for the next operation you need to perform, but often I need a typed value back. I do not want to use asInstanceOf[] in my code, because that just means I have dynamic type casts in my application. Even if I centralized that operation, I can do better and have actually easier idioms to use that support my error reporting approach in my application. You also have, for example, other operations like Map.map and Map.fold or Option.fold to play with. I also want an extensible mechanism. In case I need to do more than casts and I don't want "map/flatMap" calls everywhere.

The kicker is that when I started using the techniques described below, I reduced code size and complexity (by alot) and found errors that I did not know existed and that I thought existed. For me, that was a huge win.

#But Can't we Just Do This with TypeTag (new scala reflection) You can and you can't. We want to have something type safe that does not throw exceptions and does not use asInstanceOf. Let's use TypeTag in a quick and dirty way at the REPL:

scala> import reflect.runtime.universe._
import reflect.runtime.universe._

scala> val x: Map[String, Any] = Map("blah"->"hah", "nah" -> 20)
x: Map[String,Any] = Map(blah -> hah, nah -> 20)

scala> x.get("blah")
res0: Option[Any] = Some(hah)

scala>   implicit class RichMapLike[V](map: Map[String, V]) {
     |     def as[B](property: String)(implicit tag: TypeTag[B]): Option[B] = {
     |       map.get(property) match { 
     |         case Some(v: B) => Some(v)
     |         case _ => None
     |       }
     |     }
     |   }
<console>:14: warning: abstract type pattern B is unchecked since it is eliminated by erasure
               case Some(v: B) => Some(v)
                            ^
defined class RichMapLike

scala> x.as[String]("blah")
res2: Option[String] = Some(hah)

scala> x.as[Int]("blah")
res3: Option[Int] = Some(hah)

scala> x.as[Int]("blah").get
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
        at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
        at .<init>(<console>:15)


So we get exceptions thrown! We don't want that. We want error messages and functional control. Also, look at how since we have type erasure, we get back what we thought we wanted, but no actual check was performed until the actual get occurred. So if everything is perfect it works. When its not, we don't know until later and we get undesirable flow control.

This is the same thing that happens if you just use an enhanced get definition e.g. get[T <: Any](property: String): Option[T] = { ... getProperty(property).asInstanceOf[T] ... }. Here you also get an exception. Of course, you can wrap it, but that's not how I want to propagate error information and this does not allow me the chance to provide an extensible "converter" mechanism that would help convert one value to another for the return value.

So this is not what I want.

#Type Classes

Type classes are described as an extensible way to extend types after you have already programmed them. This allows downstream programmers to extend your code more easily. They have also been described as an approach to provide "witness" or "metadata" to your types to help extend or refine them. I have always thought of them as a way to harmonize a small set of functions that operate on your type when different types have different ways of expressing the same semantic operation. For example, one class may have a function called "addme" to add a value to it while another, unrelated class, has a method "youAddMeSeymour" that does the same conceptual thing but has a different name and the classes are unrelated through inheritance. So if you could provide without having to be explicit, an object that says "here's the strategy to add something and its always called add" then you could create a different strategy for each unrelated class and provide that strategy to your function call.

But with implicits in scala and the ability to do some other clever things, we can create a type class to help with the Map[String, Any] casting problem and actually perform no dynamic casts in your written code. Of course, under the hood, there is always casting of some type going on. But here's an example of what I am talking about:

 import scalaz.syntax.validation._

  trait Caster[A] extends (Any => Validation[String, A])

  object Caster {
    def apply[A](transformer: Any => Validation[String, A]) = new Caster[A] {
      def apply(value: Any) = transformer(value)
    }
  }

  trait ConverterOps {
    implicit val stringCaster = Caster[String]((v: Any) => v match {
      case x: String => x.success[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to String").fail[String]
    })

    implicit val intCaster = Caster[Int]((v: Any) => v match {
      case x: Int => x.success[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Int").fail[Int]
    })

    implicit val booleanCaster = Caster[Boolean]((v: Any) => v match {
      case x: Boolean => x.success[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Boolean").fail[Boolean]
    })

    implicit val longCaster = Caster[Long]((v: Any) => v match {
      case x: Long => x.success[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Long").fail[Long]
    })

    implicit val doubleCaster = Caster[Double]((v: Any) => v match {
      case x: Double => x.success[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Double").fail[Double]
    })

    implicit val floatCaster = Caster[Float]((v: Any) => v match {
      case x: Float => x.success[String]
      case x @ _ => (s"Could not reintepret $x of type ${x.getClass} to Float").fail[Float]
    })
  }

  /**
   * Extract a typed value from a validation or failure if the type does not match.
   * {{{
   *  yourMap.get("yourProperty").toSuccess("No property yourPropery found") |> extract[Sring]
   * }}}
   */
  def extract[T](v: Validation[String, Any])(implicit caster: Caster[T]): Validation[String, T] = 
    v.flatMap { caster(_) }

You will need to wrap this in an object of course and make sure the extract() definition and the implicit val's are in scope for your operations.

But let's talk about how this helps us.

Essentially, all this does is create a type that holds a function. We could have done this a bunch of ways such as a val that holds the function or an abstract member, etc. We just chose to say that Caster is some type of function. That function takes an Any value and converts it to a Validation[String, T] object where T is the type you expect the value to be. This function does not necessarily convert values from different types although it could. It merely picks out the values that conform to your type expectations. If the value does not match your type expectation, it returns the "error" message in the failure part of the Validation. The function is actually a very general purpose transformation function but for us, we are merely using it to help with type conversion that's why we named it Caster versus a more general concept of Transformer.

Let's see it action. First let's set it up and show how to mix the Map.get with Validation to work with Validation objects. Part of our design criteria especially during data import, is to get good enough error information to report back to the user what went wrong.

scala> val x = Map[String, Any]("key1" -> 20, "key2"->"value2")
x: scala.collection.immutable.Map[String,Any] = Map(key1 -> 20, key2 -> value2)

scala> x.get("key1")
res35: Option[Any] = Some(20)

scala> x.get("key1").toSuccess("No key1 property present") 
res37: scalaz.Validation[String,Any] = Success(20)

scala> x.get("key3").toSuccess("No key3 property present") 
res38: scalaz.Validation[String,Any] = Failure(No key3 property present)

We can see how to create Validation objects directly from the Map.get function. Now lets add our "extract" function to the mix.

cala> x.get("key1").toSuccess("No key1 property present") |> extract[String]
res39: scalaz.Validation[String,String] = Failure(Could not reinterpret 20 of type class java.lang.Integer to String)

scala> x.get("key1").toSuccess("No key1 property present") |> extract[Int]
res40: scalaz.Validation[String,Int] = Success(20)

scala> x.get("key3").toSuccess("No key3 property present") |> extract[Int]
res27: scalaz.Validation[String,Int] = Failure(No key3 property present)

In the first case, we try to extract a String but the value for key1 is an Int so we get the appropriate error. In the second example, we succeed because the value for key1 is an Int and we want to extract an Int. The return value in either case is Validation object that we can further process. The third example shows that the error message from the validation flows through the extract. Once the left side a pipe goes failure, it propagates through the rest of the pipe, which is exactly what we wanted. This fails fast and first failure propagates and is the implied semantics of a "pipe" in general.

Notice that in this case we also say "reinterpret" versus "convert" in our message. We are not actually converting the value in the sense of changing its internal structure. We are really just casting but using the match concept to pick out the properly constructed and typed values. And we are doing this with the following qualities:

  • No explicit type casts. Despite the name Caster, we are not actually performing the casts in the code ourselves, the compiler is doing the work for us.
  • Using simple machinery from scala and scalaz
  • Its extensible if the types on the map turn out to be something other than what the default "Caster" objects
  • Its composable. Although I use the |> operator because it expresses intent well (like an unix pipe) its really just g(f(x)) type function composition.

Function composition was important because I also had more use cases that I needed to cover to keep my code tidy.

#Composability to the Rescue Compared to the previous idioms I had tried to develop to help me with these use cases, the use of Validation is much more composable. For example, I am able to add more functions into the mix while still using the basic parts I developed here. And by using scalaz's Validation, everything just works like I want.

Sometimes in my map, the value for a key is a String, but I need to physically parse it. Fortunately, scalaz has some syntax support for .parseInt or .parseBoolean directly on the string. These functions return Validation objects. The scala versions, .toInt or .toBoolean throw exceptions. That's not bad, but I need to let the calling layer handle exceptions directly and decide how to propagate them. Other applications may want to handle exceptions differently.

Let's put the .parseInt/Boolean functions into play. Of course, the value to parse has to be a string first so we need to "extract" out a string value. We know what using just x.get("key2").map(_.parseInt) will not work because the "get" on the map x returns an Option[Any] and parseInt only works on a String.

scala> val x = Map[String, Any]("key1"->"true", "key2"->"20")
x: scala.collection.immutable.Map[String,Any] = Map(key1 -> true, key2 -> 20)

scala> x.get("key2").map(_.parseInt)
<console>:25: error: value parseInt is not a member of Any
              x.get("key2").map(_.parseInt)
                                  ^

scala> (x.get("key1").toSuccess("key1 not found") |> extract[String]).flatMap(_.parseBoolean)
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
res33: scalaz.Validation[java.io.Serializable,Boolean] = Success(true)

Well that works! But we have a deprecation mention (for scalaz 7.1). The flatMap will be removed in the future. Why? Well, I think the rationale is that the "message" part is not carried through during monadic computation. For a single call to extract[Int] maybe this is not such a big deal, but we could see this creeping in and becoming a problem with the last entry above.

flatMap essentially operates on the success value and creates a new Validation object, potentially with a different type, as a return value. In this case, parseBoolean returns a Validation[Throwable, Boolean]. "map" of course preserves the original type but then you have a validation of a validation type enclosing that has to be flattened. So yes, flatMap worked, but it won't work forever in scalaz. And you'll notice that our extract() function up above used flatMap! There's trouble brewing in the future.

#Generalizing Caster to a Disjunction In scalaz, Validation has a failure and a success side. Scala's Either has a left and right side. Scala's Try has a failure and success side where failure is really an exception. Scalaz also has a disjunction type. A disjunction type is one that has 2, unrelated parts to it. The disjunction uses the name / and it has a left and right side. But scalaz's disjunction does not use the language left or right, it assumes the good result is on the right. It provides various methods to pull out the left or right side (-/ or /-) depending on what you want to do with the values. It has a flatmap that is going to stay.

But the more important thought is that / is about the same as a Validation and you can cast a disjunction to a Validation. Because Either, disjunction and Validation are all very similar and each can be mapped into the other without losing any information along the way, they are called isomorphic. Since the caster only really does one thing it does not need a "list of error messages" on the failure side. Validation could be just fine for it. But we could make have our Caster use the disjunction and have the "extract" function actually choose which disjunction type to return.

This leads us to:

 import scalaz.syntax.validation._

  trait Caster[A] extends (Any => \/[String, A])

  object Caster {
    def apply[A](transformer: Any => \/[String, A]) = new Caster[A] {
      def apply(value: Any) = transformer(value)
    }
  }

/**
   * You need to instantiate this object at some point. Here's one way to do it inside
   * a module that you define:
   * {{{
   * object Converter extends ConverterOps
   * }}}
   * then you can include these one by one or all at the same time using:
   * {{{
   * import Converter._
   * }}}
   */
  trait ConverterOps {
    implicit val stringCaster = Caster[String]((v: Any) => v match {
      case x: String => x.right[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to String").left[String]
    })

    implicit val intCaster = Caster[Int]((v: Any) => v match {
      case x: Int => x.right[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Int").left[Int]
    })

    implicit val booleanCaster = Caster[Boolean]((v: Any) => v match {
      case x: Boolean => x.right[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Boolean").left[Boolean]
    })

    implicit val longCaster = Caster[Long]((v: Any) => v match {
      case x: Long => x.right[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Long").left[Long]
    })

    implicit val doubleCaster = Caster[Double]((v: Any) => v match {
      case x: Double => x.right[String]
      case x @ _ => (s"Could not reinterpret $x of type ${x.getClass} to Double").left[Double]
    })

    implicit val floatCaster = Caster[Float]((v: Any) => v match {
      case x: Float => x.right[String]
      case x @ _ => (s"Could not reintepret $x of type ${x.getClass} to Float").left[Float]
    })
  }

  /**
   * Extract a type value from a disjunction or left if the type does not match. 
   */
  def extract[T](v: \/[String, Any])(implicit caster: Caster[T]): = v.flatMap{ caster(_) }

  /**
   * Extract a typed value from a validation or failure if the type does not match.
   * {{{
   *  yourMap.get("yourProperty").toSuccess("No property yourPropery found") |> extract[Sring]
   * }}}
   */
  def extractV[T](v: Validation[String, Any])(implicit caster: Caster[T]): Validation[String, T] = 
    v.flatMap { caster(_).validation }

 /**
   * Use this when the function you are using (say a parseInt) returns a Validation[Throwable,_]
   * but you want Validation[String,_] so that you are only working with string messages. This
   * function pulls out the exception message as the string.
   *
   * {{{
   * Some("10X").map{_.parseInt} |> justMessage
   * }}}
   */
  def justMessage[S](v: Validation[Throwable, S]): Validation[String, S] = v.leftMap(_.getMessage)

  /**
   * Allows you to use the scalaz parseInt/Boolean/Float methods in function composition.
   * By supplying the conversion function from a String to a Validation, you can pass in
   * the Validation that holds the String value to apply the conversion function to.
   *
   * {{{
   * val yourField: Validation[String, Boolean] = yourMap.get("yourField").toSuccess("yourField not found") |>
   * 	extractV[String] |>
   *    simpleParse(_.parseBoolean |> justMessage)
   * }}}
   * Notice that justMessage is inside the parse function call.
   */
  def simpleParse[T](f: String => Validation[String, T])(v: Validation[String, String]): Validation[String, T] = {
    v.flatMap(f)
  }


//...

object RichMapGet {

  /* You lose the error messages, but that's what you get when you use Option. */
  implicit class mapToRichGet[V](map: collection.Map[String, V]) {
    def as[B](property: String)(implicit tag: Caster[B]): Option[B] = {
      map.get(property) match {
        case Some(v) => tag(v).toOption
        case None => None
      }
    }

    def asV[B](property: String)(implicit tag: Caster[B]): Validation[String, B] =
      map.get(property).toSuccess("No property " + property + " found") |> extractV[B]

  }
}
...

Notice how the flatMap is on the disjunction and we can just return the Validation object by converting the disjunction to a Validation in the extract function. We could call that extract "extractV" and create another that returns an Either such as "extractE". We could of course generalize this even further and make the "messages" more flexible (make them a parameter and allow them to be internationalized) but this is sufficient for most uses.

We also tossed in some parsing functions that help with chaining. These are rather simple, as you would expect, but allow us to use easier-to-read function composition. We'll describe this in the next section.

The RichMapGet is designed as an additional import you need to perform explicitly. In brings in the "as" function into scope and allows a property map (one with strings as keys and Any as values) to safely and extensibly convert to the right type. You can convert without messages (using "as") or with messages ("asV").

An Approach to Processing Maps

You can process a map using many different scala idioms. Which ones you choose will depend on how readable, composable and reusable you want them. Certainly, the standard Option, map, fold, and other operations can operate on maps quite easily. However, I have found that if I want some relatively robust error messages back from my data processing on maps, that I need some more infrastructure to target the level of design that I wish to have.

I have included some examples below. You can vary and do things differently of course. That's the beauty of scala and scalaz, you can pick and choose at a very fine level of detail to get the solution that works for you.

Here's a list of scenarios I have typically run into and that I want to have information messages provided if my expectations for a value in a map are not met. We do not want to create a full blown rules-based system, just create some small, simple, composable functions that help meet these needs. Most importantly, we need the associated "information" related to the validity checks. It's pretty easy to imagine that if only have to do this a few times, then we can just standard scala constructs and be done with it.

  • Check that a property is present.
  • Convert a property from a String to a JVM type.
  • Check that property is present, and if so, do something with it.
  • Check that a property is present and if so, check that it is valid to convert.
  • Check that multiple properties are present.
  • If a property is not present, add a default.
  • Check the validity of a value, if present, specific to the application e.g. its value is within a specific range.
  • Any property that has a String value, truncate the value to a default if the string is blank. Or remove the property.
  • Remove properties

INCLUDE THOSE EXAMPLES HERE

#Accumulating Messages You'll notice that in the above use of Validation, once a failure was hit, most of the processing just stopped. There is different syntax available to help accumulate the error messages (the strings). That's where ValidationNel comes into play. Instead of just a single string on the failure side of the validation, you can have a list of strings which correspond to a list of "errors" that were encountered. But since you have to process all the validations that can be processed in the first pass, you have to use a different syntax. Depending on your specific application needs, you may have dependencies in your validations. For example, you cannot determine if a numerical value is in the right range until you have accessed it in the map and validated that you can interpret it as a number and then parse it to create a number. Then you can determine whether its in the right range.

What processing you can perform before other steps in your overall function will suggest how you compose your smaller functions together with the intent of flowing throw any and all messages to the calling function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.