public
Last active

A proof-of-concept of a new approach to encoding JSON values for aeson.

  • Download Gist
AesonPlayground.hs
Haskell
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
{-# LANGUAGE GeneralizedNewtypeDeriving, FlexibleInstances,
OverloadedStrings #-}
 
import Data.Monoid (Monoid(..), (<>))
import Data.String (IsString(..))
import Data.Text (Text)
import Data.Text.Lazy.Builder (Builder, singleton)
import qualified Data.Text.Lazy.Builder as Bld
import qualified Data.Text.Lazy.Builder.Int as Bld
 
-- The phantom type here allows us to say "I am encoding a value of
-- type x".
data Build a = Build {
_count :: {-# UNPACK #-} !Int
, run :: Builder
}
 
instance Show (Build a) where
show = show . run
 
data Object
data Array
data Mixed
 
object :: Build Object -> Build Object
object (Build 0 _) = build "{}"
object (Build _ kvs) = build $ singleton '{' <> kvs <> singleton '}'
 
array :: Build a -> Build Array
array (Build 0 _) = build "[]"
array (Build _ vs) = build $ singleton '[' <> vs <> singleton ']'
 
instance Monoid (Build a) where
mempty = Build 0 mempty
mappend (Build i a) (Build j b)
| ij > 1 = Build ij (a <> singleton ',' <> b)
| otherwise = Build ij (a <> b)
where ij = i + j
 
instance IsString (Build Text) where
fromString = string
 
instance IsString (Build Mixed) where
fromString = build . Bld.fromString
 
(<:>) :: Build Text -> Build a -> Build Object
k <:> v = Build 1 (run k <> ":" <> run v)
 
int :: Integral a => a -> Build a
int = build . Bld.decimal
 
text :: Text -> Build Text
text = build . Bld.fromText
 
string :: String -> Build Text
string = build . Bld.fromString
 
build :: Builder -> Build a
build = Build 1
 
mixed :: Build a -> Build Mixed
mixed (Build a b) = Build a b

What about using a "finally tagless" style which allows going both to a concrete Aeson.Value or directly to a Builder like this:

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}

module NewAeson
    ( ToJSON(..)

    , JsonValue(..), fromValue

    , Json, Object, Array, toBuilder
    ) where

import Data.Text (Text)
import Data.Monoid
import Data.Attoparsec.Number
import qualified Data.Aeson as Aeson
import Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as H
import Data.Vector (Vector)
import qualified Data.Vector as V
import Blaze.ByteString.Builder
import Blaze.ByteString.Builder.Char.Utf8
import Prelude hiding (null)

--------------------------------------------

class ToJSON a where
  toJSON :: (JsonValue json array object) => a -> json

--------------------------------------------

class (Monoid object, Monoid array) =>
    JsonValue json array object | json -> array object,
                                  array object -> json where

    object   :: object -> json
    array    :: array  -> json
    string   :: Text   -> json
    number   :: Number -> json
    bool     :: Bool   -> json
    null     :: json

    row      :: Text -> json -> object
    element  :: json -> array
    elements :: Vector json -> array
    elements = mconcat . V.toList . V.map element

--------------------------------------------

instance JsonValue Aeson.Value Aeson.Array Aeson.Object where
    object   = Aeson.Object
    array    = Aeson.Array
    string   = Aeson.String
    number   = Aeson.Number
    bool     = Aeson.Bool
    null     = Aeson.Null

    row      = H.singleton
    element  = V.singleton
    elements = id

fromValue :: forall json array object. (JsonValue json array object)
          => Aeson.Value -> json
fromValue (Aeson.Object obj) = object $ H.foldlWithKey' f mempty obj
    where
      f :: object -> Text -> Aeson.Value -> object
      f o k v = row k (fromValue v :: json) <> o
fromValue (Aeson.Array v)  = array $ elements (V.map fromValue v :: Vector json)
fromValue (Aeson.String t) = string t
fromValue (Aeson.Number n) = number n
fromValue (Aeson.Bool b)   = bool b
fromValue Aeson.Null       = null

--------------------------------------------
-- These definitions were copied from json-builder:

newtype Json = Json Builder
newtype Object = Object CommaMonoid deriving (Monoid)
newtype Array = Array CommaMonoid deriving (Monoid)

toBuilder :: Json -> Builder
toBuilder (Json builder) = builder

data CommaMonoid
   = Empty
   | Comma !Builder

instance Monoid CommaMonoid where
  mempty = Empty
  mappend Empty     x = x
  mappend (Comma a) x
        = Comma (a <> case x of
                        Empty   -> mempty
                        Comma b -> fromChar ',' <> b)


instance JsonValue Json Array Object where
  object (Object xs) = case xs of
                         Empty    -> Json (copyByteString "{}")
                         Comma ys -> Json (fromChar '{' <> ys <> fromChar '}')
  array (Array xs) = case xs of
                       Empty     -> Json (copyByteString "[]")
                       Comma  ys -> Json (fromChar '[' <> ys <> fromChar ']')
  string = Json . fromText
  number = error "TODO"
  bool True  = Json (copyByteString "true")
  bool False = Json (copyByteString "false")
  null = Json (copyByteString "null")

  row k json  = Object (Comma (fromText k <> fromChar ':' <> toBuilder json))
  element json = Array (Comma (toBuilder json))

EDIT: Added the 'bool' method and the 'fromValue' function.

That's a pretty interesting attempt. I certainly see a lot of value in being able to write one definition and be able to get either an aeson value, a json builder, or potentially even other structures and syntaxes.

Some potential issues that I see:

  1. Differing run-time complexities from JsonValue instances, e.g. it's O(1) to mappend two json builder arrays, but O(m+n) to mappend two aeson value arrays. How common would it be to want to use one definition to get a value but another to get a builder? Or is there enough common ground in the interface that typical use cases can be nearly optimal to both types of output?

  2. Execution speed and code size. My intuition with this type of code is that without careful use of JsonValue instances, with known types and inlined/specialized across modules, you will lose a fair bit of performance to indirect calls to small functions. But numbers are better than speculation.

Still, I think this approach is worthy of more exploration and experimentation. (And reminds me that sometimes I really wish GHC had closed type classes/type families, because you could use them here to avoid my second concern, at least in principle, although you would be giving up the potential polymorphism in the process.)

There might be some interesting ideas lurking in Bryan's attempt, but one thing that sticks out to me is that you can use the proof-of-concept to produce syntactically invalid json, by say mappending two text values.

Hi @lpsmith

it's O(1) to mappend two json builder arrays, but O(m+n) to mappend two aeson value arrays.

For that reason I introduced the elements :: Vector json -> array method.

How common would it be to want to use one definition to get a value but another to get a builder?

One use-case that I see is for post-processing a JSON Value. This is from some code at work for example. Here the Identified type can be used to add an identifier to some other Haskell type:

data Identified id a = Identified
                       { identifiedId      :: id
                       , identifiedContent :: a
                       } deriving (Show, Typeable, Generic)

instance (ToJSON id, ToJSON a) => ToJSON (Identified id a) where
    toJSON (Identified iden x) =
        case toJSON x of
          Object obj -> Object $ H.insert "id" (toJSON iden) obj
          v -> object [ "id"      .= toJSON iden
                      , "content" .= v
                      ]

Here you can see I use toJSON internally and do case analysis on the resulting JSON Value.

My intuition with this type of code is that without careful use of JsonValue instances, with known types and inlined/specialized across modules, you will lose a fair bit of performance to indirect calls to small functions. But numbers are better than speculation.

I agree, we need some benchmarks here. Users probably need to add SPECIALIZE pragma's to their toJSON definitions.

Err, I meant, how common would it be to want two definitions, one to get a value, and the other to get a builder, due to the differing time/space complexities of the implementations?

@lpsmith FYI: I began experimenting with this approach.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.