Skip to content

Instantly share code, notes, and snippets.

@wandersoncferreira
Created October 5, 2020 04:43
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wandersoncferreira/06bdefe99b9db12fcf7ce269b2f42a63 to your computer and use it in GitHub Desktop.
Save wandersoncferreira/06bdefe99b9db12fcf7ce269b2f42a63 to your computer and use it in GitHub Desktop.

Data Readers

I've read about the subject, but I want to look closer. The following analysis will be ECA (Exploratory Clojure Analysis) on data readers and I hope more people can benefit from it.

What are data readers?

Functions used to read specific tagged literals. For example, #inst "2020-10-05" is a built-in literal in Clojure and if you type it in the REPL, you will get a java.util.Date instance back.

The magic happens because clojure has a mechanism where we can define a loose contract between the tagged literal and its implementation which is the actual code that will be executed in order to transform the self-describing data into something else.

For example, if you want to change the default implementation that handles the #inst literal you can proceed like

(binding [*data-readers* {'inst #'clojure.instant/read-instant-timestamp}]
  (type (read-string "#inst \"2020-10-05\"")))
;; => java.sql.Timestamp

(binding [*data-readers* {'inst #'clojure.instant/read-instant-calendar}]
  (type (read-string "#inst \"2020-10-05\"")))
;; => java.util.GregorianCalendar

You can also add one of these options to the data_readers.clj file or even create a completely new one. If you do so, your whole application will use the new definition for the conversion.

We will look how to create custom data readers.

Custom data readers

Clojure enable you to create your own data readers too. This can be interesting to share context between services.

Currently, I work in financial industry with Private Credit and there are some special values that I wish I could encode differently to add more context and validations around them. Let's see some examples and how to achieve it.

(All these rules to create the context are purely illustrative...)

  • Internal Rate of Return (IRR)

    • Always BigDecimals
    • Never negative
  • Credit Score

    • Always Integer
    • Never larger than 1000
  • Credit Risk

    • Always a Single Letter
    • Letter in set {A,B,C,D,E}

First create the following file src/data_readers.clj

{company/irr wand.util.readers/irr
 company/credit-score wand.util.readers/credit-score
 company/credit-risk wand.util.readers/credit-risk}

(wand is the name of my prototype project)

As best practice, you should always use qualified namespaces literals and reserve the unqualified for Clojure.

The content of the file src/wand/util/readers.clj

(ns wand.util.readers)

(defn irr [form]
  form)

(defn credit-score [form]
  form)

(defn credit-risk [form]
  form)

You can now connect to your REPL and at the src/wand/core.clj namespace you can try

(ns wand.core
  (:require [wand.util.readers]))


#company/credit-score 100
;; => 100

Do not forget to require the implementations. Now, we can implement the rules behind the tagged literals.

(defn irr
  [form]
  (if (pos? form)
    (BigDecimal. form)
    (throw (ex-info (str form " is negative!") {}))))

(defn credit-score
  [form]
  (if (<= form 1000)
    (Integer. form)
    (throw (ex-info (str form " is larger than 1000!") {}))))

(defn credit-risk
  [form]
  (if (contains? #{"A", "B", "C", "D", "E"} form)
    form
    (throw (ex-info (str form " is not in the set{A,B,C,D,E}!") {}))))
    

Cool!! Now, if you need to create a map with some custom data, you can offload the burden to check the validity of that number to the tagged literal.

{:score 232
 :risk "V"
 :irr 2.1}

Is a perfect valid map. However,

{:score #company/credit-score 232
 :risk #company/credit-risk "V"
 :irr #company/irr 2.1}

Is not!

1. Caused by clojure.lang.ExceptionInfo
   V is not in the set{A,B,C,D,E}!
   {}

Great!!

Store and share your data

Now, let's say that we want to read an EDN file that was produced by other service that also shares the same meaning for our custom literals.

The content of our credit.edn file

{:score #company/credit-score 232
 :risk #company/credit-risk "B"
 :irr #company/irr 2.1}

All great! Let's read it

(edn/read-string (slurp (io/resource "credit.edn")))

And... boom!

1. Unhandled java.lang.RuntimeException
   No reader function for tag company/credit-score

What? We defined the data readers and they all work just fine. Yes, but life is not so easy... There are two nice materials talking about the problem with safety of reading data and evaluating untrusted code.

The first is the great Documentation entry provided at the clojure.edn/read function and the second is this great talk from Steve Miner (The Data-Reader's Guide to the Galaxy)

So, I will consider you at least read the documentation entry, and let's be confident that no malicious code was added in the implementation of our three custom literals.

We should explicit pass our readers to the edn/read-string

(edn/read-string {:readers *data-readers*}
                 (slurp (io/resource "credit.edn")))
                 
;; => {:score 232, :risk "B", :irr 2.100000000000000088817841970012523233890533447265625M}

And if someone tried to cheat on you and handed over the following EDN file

{:score #company/credit-score 2000
 :risk #company/credit-risk "B"
 :irr #company/irr 2.1}

You will spot right on reading

1. Unhandled clojure.lang.ExceptionInfo
   2000 is larger than 1000!
   {}

Print representation and more...

Let's get fancy! Let's say we want to provide special printing to our Score, Risk, and IRR values. After all, when you print a #inst literal, you get a nice #inst .... value back at your REPL. Also, we will encode these company-specific values into their own types.

To enable such functionality I will encode our values into custom records and extend the print-method interface to them.

(ns wand.util.readers)

(defrecord Irr [value])
(defrecord CreditScore [value])
(defrecord CreditRisk [value])

(defn irr
  [form]
  (if (pos? form)
    (->Irr (BigDecimal. form))
    (throw (ex-info (str form " is negative!") {}))))

(defn credit-score
  [form]
  (if (<= form 1000)
    (->CreditScore (Integer. form))
    (throw (ex-info (str form " is larger than 1000!") {}))))

(defn credit-risk
  [form]
  (if (contains? #{"A", "B", "C", "D", "E"} form)
    (->CreditRisk form)
    (throw (ex-info (str form " is not in the set{A,B,C,D,E}!") {}))))

(defmethod print-method wand.util.readers.Irr [irr ^java.io.Writer w]
  (.write w (format "#company/irr %s%%" (:value irr))))

(defmethod print-method wand.util.readers.CreditRisk [credit-risk ^java.io.Writer w]
  (.write w (format "#company/credit-risk %s Level" (:value credit-risk))))

(defmethod print-method wand.util.readers.CreditScore [credit-score ^java.io.Writer w]
  (.write w (format "#company/credit-score %s Points" (:value credit-score))))

And our example in the core namespace become

{:score #company/credit-score 232
 :risk #company/credit-risk "C"
 :irr #company/irr 2.1}
 
;; => {:score #company/credit-score 232 Points, 
;;     :risk #company/credit-risk C Level,
;;     :irr #company/irr 2.100000000000000088817841970012523233890533447265625%}

I don't know if I find this sooo useful now. But we can do that. :)

Open questions

I can see the benefits of data readers to secure regular understanding about a piece of data between (and inner) services. And to enable different contexts to handle the data transformation in their own way.

However, as we currently do not leverage this approach too much at work, I don't know exactly the shortcomings of this decision in the long run.

Would love to hear more people presenting their experiences with a codebase that heavily leverages custom tag literals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment