Skip to content

Instantly share code, notes, and snippets.

@netshade
Created July 21, 2014 14:20
Show Gist options
  • Save netshade/3a237588e51124c6fc52 to your computer and use it in GitHub Desktop.
Save netshade/3a237588e51124c6fc52 to your computer and use it in GitHub Desktop.
Thoughts About Data At Rest

So this thing has been happening a lot lately as I observe game developers talk about data. They have multiple phases to how data is dealt with, namely:

  • Data when it must be written to (designed)
  • Data when it is being read (played)

And they treat the data in very different ways. During the design phase, data is expressly flexible and transformable. Store it in XML? Dynamically allocate all the things? Objects whose properties can be discovered at runtime? All acceptable ideas, because efficiency isn't the /point/ during this phase.

But in the transition from design phase to play phase will traditionally have a baking step, wherein the data as designed is turned into a very efficient (and mechanically simple) representation. What was once a 2MB XML file is now a 200kb packed binary representation that can be mmap'd and read directly from structs.

The efficiency in this case isn't the thing I'm interested in, however. It's the guarantee of representation and the understanding that data as it enters the program is exactly the representation that your code expects.

In the domain of web apps, a trivialcontrived example that illustrates an opposing philosophy:

An application categorizes its users into one of several application defined categories ( "happy", "sad", "obtuse" ). Once categorized, a user's category will never change. The list of categories is mutable but non-destructive ( the list may only grow ). It is decided that the user's category will be stored as an integer index of the category's position in the master list. When the category must be consulted, the index will be used to lookup the value as necessary. This decision was made to support changes in development where the actual text of the category may be changed.

That decision, flexibility in the program's behavior to support changes at development time, creates a very real cost where production code must keep in mind the existence of this list. Even if it is agreed that the list may only grow, the list must still exist in production systems, must still either account for or intentionally ignore the unavailability of the list due to disk failure, etc. etc. etc.

These tweets:

  1. https://twitter.com/netshade/status/490850024062844928
  2. https://twitter.com/netshade/status/490850335557042176
  3. https://twitter.com/netshade/status/490850640545869826

illustrated my frustration with making decisions similar to this, and requiring transformations of the data outside of the responsible object. The data as it enters the program should be the data as its intended to be used; if any transformation of the data is required outside the piece of the program responsible for loading the data in the first place, it seems like a failure on my own part of understanding the problem domain wholly.

Something that's really nice about C is you can use tools like xxd to output your data structures in the format that they should be used as at compile time; you have a known constant of memory usage, no allocation after startup, and no transformation of the data during runtime. If the data was right when you loaded it, it's still right. (caveats about not altering that data, accidentally writing to the wrong memory, etc. etc.)

I guess I should acknowledge that this philosophy sounds really good when taking into account a non-time-based view of the development of the app. We need flexibility in development because some problems aren't able to be known wholly in any cost efficient way. It just feels like this necessity (development time agility) introduces a false requirement at runtime; that maybe some tasks in development should be hard because they satisfy the requirements of a comparatively small population (developers right now) at the expense of a much larger population (developers in the future, production users, etc.).

Anyways, just thoughts.

@netshade
Copy link
Author

Ah yeah, I see your point there, and totally agreed, in that sense ( reflection but at more of a macro level introspection ) makes sense. I checked out the Transit stuff and that is equally cool - something I've been really wanting to check out lately is Cap'n Proto as well, which falls more on the efficiency side of this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment