Skip to content

Instantly share code, notes, and snippets.

@holyjak
Last active September 3, 2021 10:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save holyjak/cc269a947903661fa6b00c92f1deb8fc to your computer and use it in GitHub Desktop.
Save holyjak/cc269a947903661fa6b00c92f1deb8fc to your computer and use it in GitHub Desktop.

Notes from playing with Datomic

Many thanks to Francis Avila (favila) and others.

Enums: idents vs. type/keyword

The on-prem docs recommend to use idents to represent enums though this advice likely dates from before d/pull and attribute+entity predicates.

Ident

  • - They are entities ⇒ can attach extra info (as eg. with Scala enum values) and make that queryiable ⇒ strictly more powerful

  • d/pull, contrary to d/entity, returns them as entity IDs, not the keywords

  • you get a VAET index of them

  • because of the semantics of ident lookup you can rename them safely

Keyword pros

  • Can be checked with :db.attr/preds (ident is entity ID so cannot compare against a set of allowed keywords, would need :db/ensure)

  • d/pull returns them as kwds

Where <val> in <set>

Writing queries where we want to check that a particular attribute value is in a given set or not.

(comment
 ;; Set filtering cannot be introspected by the query engine
 ;; This can be good if the set is large
 ;; and there's no index datomic could use
 ;; to retrieve matching datoms.
 ;; Evaluation cannot be parallel,
 ;; but the intermediate result set will be smaller
 ;; and none of the unification machinery will get involved.

 ;; As a literal:
 [:find ?e
   :where
   [?e :person/favourite-colour ?val]
   [(#{:blue :green} ?val)]]

 ;; As a parameter:
 [:find ?e
   :in $ ?allowed-val-set
   :where
   [?e :person/favourite-colour ?val]
   [(contains? ?allowed-val-set ?val)]]
 #{:green :blue}

 ;; Using unification
 ;; If you bind the items you are filtering by to a var
 ;; datalog will perform filtering implicitly via unification.
 ;; This is good if your filter value is indexed,
 ;; because now the query planner can see it
 ;; and possibly use better indexes or parallelize IO.
 ;; However, this may produce larger intermediate result sets
 ;; and consume more memory because of unification.

 [:find ?e
   :where
  ;; Could use an index
  [(ground [:green :blue]) [?val ...]]
  [?e :person/favourite-colour ?val]
  ]

 [:find ?e
  :where
  ;; Reverse clause order:
  ;; Now it *probably doesn't* use an index?
  ;; Depends on how smart the planner is.
  ;; Worst-case, it's as bad as a linear exhaustive
  ;; equality check of each val
  ;; which may or may not be worse than a hash-lookup
  ;; depending on the size of the set.
  [?e :person/favourite-colour ?val]
  [(ground [:green :blue]) [?val ...]]]

 ;; As a parameter:
 [:find ?e
   :in $ [?val ...]
   :where
   [?e :person/favourite-colour ?val]]
 [:green :blue]

 ;; Use a rule with literals
 ;; In most cases this will be the same as the previous approach,
 ;; but without the "maybe"s because you don't need to trust the query planner.
 ;; This is the most explicit and predictable,
 ;; and definitely parallelizeable (rules inherently are).
 ;; But you *must* use literal values.
 [:find ?e
  :in $
  :where
  (or [?e :person/favourite-colour :green]
      [?e :person/favourite-colour :blue])]


;; In any given case I would benchmark all three.

 )

Summary: There’s three different basic techniques, and they can have dramatically different perf depending on the situation

Notes on Datomic on-prem docs

Note
The Datomic Cloud docs seem better, e.g. things are interlinked etc.

The documentation is sometimes quite terse and some things are hard (impossible?) to find.

  • According to my tests, naming an attribute with at the beginning will cause problems (because it is used for reverse lookup) and should not be used. I would expect it mentioned at the :db/ident section of attributes. I was also unable to find the docs to confirm my vague memory of `<ns>/<attr. name> being used for reverse lookups (it is under Pull, not in the query reference, which makes sense if I think about it. Also using the docs' Search for 'reverse' helps so perhaps this is just my bad searching skills than the docs' fult).

  • I would appreciate a guide for people that know SQL, explaining how to translate its constructs to Datalog. Mongo has a nice example. Especially I struggled to find how to implement SQL’s WHERE <column> IN (val1, val2, …​) (it turns out you can use a set as a function, ground, or or - all with unique pros and cons. A page with FAQ of "How to do X in Datalog/Datomic…​" would be very useful.

  • I would appreciate "best practices" / guidance to select the most appropriate of multiple approaches, such as for the where .. in …​ case *

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment