colinhacks/ground_truthiness.md

## ground_truthiness.md

      
    Raw
  

              ground_truthiness.md
            
          
    Achieving the Holy Grail of TypeScript Data Modeling
Or, a scalable solution to type complexity

For the highly abbreviated version of this, check out this comment: colinhacks/zod#53 (comment). That comment conveys why this is so exciting. If you agree then the outline below explains the context for why I think it’s so exciting and introduces a the architectural concept of “ground truthiness” which I think is a very powerful notion.

Context

This journey started in 2018 when I decided to start a company. For inexplicable reasons I decided to build an electronic medical record software, one of the most regulated and complex product categories in the world. Also I was a solo founder. Now if you’re ever tempted to do this, for the love of god, don’t. Just say no!
Since this was medical software so I wanted it to be rock solid — end to end typesafety, powerful and expressive endpoints, painless schema migration. I was deeply entrenched in vanilla JavaScript development, but I had enough friends tell that Typescript was the one true path, so dragged myself kicking and screaming into TypeScriptland. But I was extremely worried at the time that maintaining static types would just be an extra chore.
Shortly thereafter I settled on GraphQL for the API started reading the docs for TypeGraphQL. Within a couple minutes though I noticed a problem.

Specifically, I noticed that the definition of the Recipe model and the AddRecipeInput type are extremely similar, but you have to implement them as two separate classes. It’s redundant and makes it harder to change your schema over time. Overtime I realized this was a recurring DX issue with GraphQL and a major unsolved problem in the TypeScript ecosystem:
Your types aren’t DRY enough

And unfortunately the thing that was supposed to be the backbone of modern typesafe applications — GraphQL — has intensified this problem significantly. It’s easy to see why. Types in GraphQL are represented as strings, not code. Strings aren’t parsable by the TypeScript type engine.

First you define your schema as a string.
Then you define your API endpoints. The inputs and outputs are defined as strings.
If those inputs involve nested JSON data you need to flatten it out by hand and write each layer as a string.
Then you define your client-side queries and mutations as their own strings.
But sometimes this gets a little repetitive so you define fragments as strings.
Then you use a multi-stage code generation pipeline to generate types from your model/endpoint/input/query/fragment text files.
Then when you add a new property to one of your tables, you have to update all those strings again, and there’s no typechecker to help you…because everything is a string.

Now let’s try modeling the Recipe/AddRecipeInput problem in TypeScript.
type Recipe = { id: string; title: string; description?: string };
type AddRecipe = Omit<Recipe, “id”>;

// bonus points
type UpdateRecipe = Partial<Recipe> & { id: string };
Huh. That’s much nicer. Once you use TypeScript for a while, this is the way you start thinking about types. You define your ground truth types as accurately as you can, then use typescript’s various built-in utilities to generate the variants on those types you’ll inevitably need throughout your application.
I started thinking: how should we describe the fundamental difference between the GraphQL approach and the “derived types” approach? What’s the word for this property of systems where most types are defined as variants of our core data models, instead of being re-declarated duplicatively?Unfortunately the best term I can come with is:
GROUND TRUTHINESS

Ground truthiness is the extent to which types in your system are derived from a “ground truth” source, usually the data types enforced by your database.
A way to think about this: your database is the font of all type safety in your application, so you should be able to trace the ancestry of your types back to it to the greatest extent possible. Otherwise you’re building a system who’s fragility increases with complexity.
Over the course of solving this problem, I’ve tumbled down a deep and terrifying rabbit hole. Now I’ve resurfaced with some solutions, a couple still-unsolved problems, and a general principle that I think could change the way we architect systems as TypeScript developers.
Subproblem #1: defining types

Solution: I built Zod. it’s a schema validation library, you build up types from primitives, data structures, and logical operators. it has the methods you’d want to generate common variants of your types. you can do “subtype” refinements: runtime checks that can’t be expressed in the typescript type system. then i wrote up a not all presumptuous blog post, posted to HN, and inexplicably got a few hundred stars
This section of the talk will be a condensed version of this post: https://vriad.com/essays/zod
Subproblem #2: defining variants of types

Yup, superstruct, io-ts are all popular, well-built libraries…that have roughly the expressive power of TypeScript circa December 2016. That’s when TypeScript 2.1 introduced Partial and Pick into core. As a point of reference this is the same version of typescript that introduced support for async await. Given how early on these features were implemented, they clearly represent patterns that are commonly required. So why don’t any of the type validation libraries support them?
What other utilities are they missing:

Pick/omit
Partial
Merging two objects
Object extension (overwriting properties or adding new ones)
.scalars(): omit all keys don’t correspond to primitive values
.relations(): omit all keys do correspond to primitive values

Zod supports all of these: https://github.com/vriad/zod#objects
Subproblem #3: defining your ground truth types

There’s one final hurdle to achieving maximum ground truthiness. TypeScript doesn’t let you infer recursive types. There’s no way around this. Believe me, I’ve tried. Anders is smarter than me, and you just can’t trick TS into doing this.
Let’s look at some approaches. I’m using Zod syntax here.
Naive approach

const User = z.object({ posts: z.array(Post); };
const Post = z.object({ author: User; })
Lazy types
const User = z.lazy(()=>z.object({ posts: z.array(Post) }))
const Post = z.lazy(()=>z.object({ author: User }))
You get a TypeError: 'User' implicitly has type 'any' because it does not have a type annotation and is referenced directly or indirectly in its own initializer.
Casting to base class

type User = { posts: Post[] };
type Post = { author: User };
const User: BaseType<User> = z.lazy(()=>z.object({ posts: z.array(Post) }))
const Post: BaseType<Post> = z.lazy(()=>z.object({ author: User }))
This is the solution used by every library that supposedly supports recursive types.
If you squint at this it looks like we’ve pulled this off. the “inferred” type is the proper recursive type and this works at runtime too! But because you cast to the BaseType class you lose access to all the great object methods we built in Subproblem #2.
This has been the state of the art for the past three years. No library offered a way of solving this problem. The problem itself is weird an unintuitive. It’s the opposite of what this libraries are intended for. Normally they let you chain and compose a bunch of methods and functions, then you get your static type “for free”. The static type is inferred from your code. But here we’re starting with the static type (defined manually by you) and inferring the structure of the Zod validator that implements that type.
Solution: I built a utility that I call “toZod” that solves this exact problem. if we try to give it a validator that doesn’t actually match the type, we get an error: https://github.com/vriad/tozod
We’ve achieved the holy grail of ground truthiness! We now have a validator that captures the full complexity of our types AND lets us treat our models as object types. This has never been possible before.
Even better you can define any refinements you like. Your IDs are uuids? Well drop it into your schema and that constraint is enforced throughout your entire application.
Onwards and upwards

This capability is a big deal, and will let us manage ever greater complexity without sacrificing robustness or developer experience.
If you want to try out Zod, it’s on npm (yarn add zod). toZod is in it’s own package currently since it requires TypeScript 3.9+ (yarn add tozod).
As a sneak preview of the kind of tools you can build on top of this, here’s a little screenshot of a framework I’m working on for building APIs on top of Zod: https://github.com/vriad/zod-rpc. I’m currently using in in production and let me tell you: it is glorious. I think an equivalent library for building GraphQL APIs would change the game — that’s left as an exercise for the reader :)