Skip to content

Instantly share code, notes, and snippets.

@netshade
Created July 21, 2014 14:20
Show Gist options
  • Save netshade/3a237588e51124c6fc52 to your computer and use it in GitHub Desktop.
Save netshade/3a237588e51124c6fc52 to your computer and use it in GitHub Desktop.
Thoughts About Data At Rest

So this thing has been happening a lot lately as I observe game developers talk about data. They have multiple phases to how data is dealt with, namely:

  • Data when it must be written to (designed)
  • Data when it is being read (played)

And they treat the data in very different ways. During the design phase, data is expressly flexible and transformable. Store it in XML? Dynamically allocate all the things? Objects whose properties can be discovered at runtime? All acceptable ideas, because efficiency isn't the /point/ during this phase.

But in the transition from design phase to play phase will traditionally have a baking step, wherein the data as designed is turned into a very efficient (and mechanically simple) representation. What was once a 2MB XML file is now a 200kb packed binary representation that can be mmap'd and read directly from structs.

The efficiency in this case isn't the thing I'm interested in, however. It's the guarantee of representation and the understanding that data as it enters the program is exactly the representation that your code expects.

In the domain of web apps, a trivialcontrived example that illustrates an opposing philosophy:

An application categorizes its users into one of several application defined categories ( "happy", "sad", "obtuse" ). Once categorized, a user's category will never change. The list of categories is mutable but non-destructive ( the list may only grow ). It is decided that the user's category will be stored as an integer index of the category's position in the master list. When the category must be consulted, the index will be used to lookup the value as necessary. This decision was made to support changes in development where the actual text of the category may be changed.

That decision, flexibility in the program's behavior to support changes at development time, creates a very real cost where production code must keep in mind the existence of this list. Even if it is agreed that the list may only grow, the list must still exist in production systems, must still either account for or intentionally ignore the unavailability of the list due to disk failure, etc. etc. etc.

These tweets:

  1. https://twitter.com/netshade/status/490850024062844928
  2. https://twitter.com/netshade/status/490850335557042176
  3. https://twitter.com/netshade/status/490850640545869826

illustrated my frustration with making decisions similar to this, and requiring transformations of the data outside of the responsible object. The data as it enters the program should be the data as its intended to be used; if any transformation of the data is required outside the piece of the program responsible for loading the data in the first place, it seems like a failure on my own part of understanding the problem domain wholly.

Something that's really nice about C is you can use tools like xxd to output your data structures in the format that they should be used as at compile time; you have a known constant of memory usage, no allocation after startup, and no transformation of the data during runtime. If the data was right when you loaded it, it's still right. (caveats about not altering that data, accidentally writing to the wrong memory, etc. etc.)

I guess I should acknowledge that this philosophy sounds really good when taking into account a non-time-based view of the development of the app. We need flexibility in development because some problems aren't able to be known wholly in any cost efficient way. It just feels like this necessity (development time agility) introduces a false requirement at runtime; that maybe some tasks in development should be hard because they satisfy the requirements of a comparatively small population (developers right now) at the expense of a much larger population (developers in the future, production users, etc.).

Anyways, just thoughts.

@caindy
Copy link

caindy commented Jul 21, 2014

Metaprogramming?

@netshade
Copy link
Author

I feel like metaprogramming puts the guarantee in a different place, and it's not really guarantee since it can change over time. I've definitely seen good uses of it, but I've been thinking the thing I really want is a codified "bake" phase before deployment, much like asset compilation or the like, wherein static data structures are put in the most appropriate form. For databases I know it's a lot harder, and to be truthful I'm not sure what I want here (if anything). All half formed thoughts right now. :)

@caindy
Copy link

caindy commented Jul 21, 2014

I was just thinking of metaprogramming as a generalized solution to the mismatch between design-time expressivity and run-time mechanical sympathy/compatibility. One recent example that came to mind is the way the Elixir folks do routing. Beam (Erlang and Elixir's VM) does super fast pattern matching on function heads (its basic dispatch mechanism). So, they use macros to define routes in their web frameworks that expand to a long list of functions. Their Unicode support actually uses the Unicode spec directly to do the same thing.

I read a blog post recently that talked about using meta-programming in Julia to make their date calculations 10x faster (leap seconds).

Anyway, yeah, you gotta know what it is you want to expand to eventually, but I feel like having a compile time "baking" step (code that generates code) is awesome. I'm inferring another axis of your concerns was around DTOs/messaging between "modules" (scare quotes intended to introduce ambiguity). This is a thing that is near and dear to my heart as you well know. I'm following a discussion on adopting JSON-LD + Hydra, and I just don't have the stomach for the rehashing of CORBA/WSDL/ad nauseum.

I like BED which isn't actually a thing yet, but yeah, a self-descriptive "universal" binary layout for messages is something I want. I'm also seriously looking at Iris and Kestrel for messaging between "modules" (hyperobjects). I'm anticipating the big news of Transit from Rich Hickey tomorrow.

@netshade
Copy link
Author

Ah yeah, I see your point there, and totally agreed, in that sense ( reflection but at more of a macro level introspection ) makes sense. I checked out the Transit stuff and that is equally cool - something I've been really wanting to check out lately is Cap'n Proto as well, which falls more on the efficiency side of this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment