Skip to content

Instantly share code, notes, and snippets.

@levand
Last active May 19, 2023 16:38
Show Gist options
  • Star 37 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save levand/c97dd272bfd2f88fe5089eb81f85f98f to your computer and use it in GitHub Desktop.
Save levand/c97dd272bfd2f88fe5089eb81f85f98f to your computer and use it in GitHub Desktop.
Advice about data modeling in Clojure

Since it has come up a few times, I thought I’d write up some of the basic ideas around domain modeling in Clojure, and how they relate to keyword names and Specs. Firmly grasping these concepts will help us all write code that is simpler, cleaner, and easier to understand.

Clojure is a data-oriented language: we’re all familiar with maps, vectors, sets, keywords, etc. However, while data is good, not all data is equally good. It’s still possible to write “bad” data in Clojure.

“Good” data is well defined and easy to read; there is never any ambiguity about what a given data structure represents. Messy data has inconsistent structure, and overloaded keys that can mean different things in different contexts. Good data represents domain entities and a logical model; bad data represents whatever was convenient for the programmer at a given moment. Good data stands on its own, and can be reasoned about without any other knowledge of the codebase; bad data is deeply and tightly coupled to specific generating and consuming functions.

I have found the following practices to be useful to maximize the “goodness” of your data in a Clojure program. These are not hard and fast rules, just heuristics for writing better code.

  • Make a mental distinction between “public” data structures and “private” data structures. Private data structures are used internally in function bodies, or passed and returned from namespace-private functions. Public data structures are the inputs and outputs of the public functions in the namespace.

  • Because a programmer can be reasonably confident that they control both the production and consumption of “private” data, they can and should do whatever is most convenient: use non-namespaced keywords, tightly couple data to code, etc. There are no restrictions here other than expedience and developer preference.

  • Public data, in contrast, should:

    • Use namespaced keywords
    • Represent domain entities
    • Have specs
  • Every map in public data should correspond to an identifiable "domain entity type." This doesn't mean that the type needs to be externally defined in a database (although it may); just that the entity type is a logically independent concept and you should be prepared to:

    • Have conversations about it
    • Refer to it in documentation
    • Articulate invariants and expectations
      • Spec expected attributes via s/keys
    • Identify where in the codebase it is used

    Note that this mental exercise of defining an entity type is not necessarily a heavyweight process of exhaustively enumerating every type. It describes what should already be the case in any well-functioning codebase: there are a shared set of domain known concepts that have a known representation in data. That's all -- nothing complicated.

  • The keys/attributes in each entity map should be namespaced keywords. Each attribute should uniquely identify the same concept, no matter where it is found in the codebase. For example, :account/number should mean the same thing, everywhere: if you see a :account/number key, you should know complete information about the type, formatting and expected range of the value.

  • Related to the above, each keyword should have a single spec. The spec for a given keyword should only occur once in the codebase (or else one of them will be nondeterministically overwritten.)

  • The namespace portion of a key/attribute should reflect the entity type upon which it may be found. For example, an Account entity map might have :account/name and :account/type keys. Any map containing a :account/<something> key can be presumed to represent an Account in the application's domain model, whether that is defined explicitly or implicitly.

  • Entity types are not exclusive: data can "fit" multiple types. This is why it is idiomatic to have common attributes that can apply to multiple entity types. For example, instead of a specific :account/id, you might use a more generic :entity/id. This means that a map with :account/name and :entity/id keys is logically both a Account and an Entity.

    Be somewhat careful here not to reuse attributes just for the sake of reusing them. Make sure they represent the same logical concept, and can be used consistently across the codebase. In general, the rule of thumb is that if you ever find yourself writing an if or case statement to determine how to process an attribute, you probably should have used two different attributes to start with.

  • Related to the above, entities should never be nailed down too rigorously. To borrow language from formal logic, the set of permissible attributes for a given entity type is open. Just because an entity is of one type doesn't mean it can't also have additional attributes beyond those enumerated for that type. Code should respect this, and not break if given an entity with unexected or unknown attributes -- they might be perfectly valid and meaningful for a different part of the system. "Be liberal in what you accept, be conservative in what you send."

  • Keyword namespaces can have multiple parts, separated by a period. The semantic implications of this are not strongly defined, but the following uses are all reasonable:

    • Full globally unique disambiguation, such as is idomatic for Java class names or RDF attributes. For example, uk.co.bigbank.account/number
    • An application or subsystem prefix to disambiguate two different concepts with the same name, e.g :tax.account/number vs :client.account/number
    • Indicating a "containment" relationship in a substructure. For example, the map value of a :account/transaction might contain the keys :account.transaction/amount and :account.transaction/type. Be careful here, however: don't use this idiom if you might want to refer to the sub-entity indepentently from its container. Remember, the namespace should refer to the entity type. A person might have a :person/address map, but that doesn't mean you should use :person.address/postal-code. If an address is a meaningful entity regardless of who lives there, we probably want to talk about an Address, not a PersonAddress. Something like :address/postal-code is probably more appropriate.
  • Keyword names can be whatever makes sense for the attribute, with the caveat that because the entity type should already be expressed in the namespace, you ought not to re-encode it in the name. E.g, prefer keywords like :account/id to :entity/account-id.

@mprokopov
Copy link

mprokopov commented Mar 27, 2021

Nice summary!
I found it hard to change a codebase when something like "it's good to refactor this data" pops up in my mind. So yes, data modeling is a crucial part and should be taken seriously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment