Skip to content

Instantly share code, notes, and snippets.

@parsonsmatt
Created December 17, 2018 18:46
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save parsonsmatt/6b747d3020c4a4ac43b6580b65392a23 to your computer and use it in GitHub Desktop.
Save parsonsmatt/6b747d3020c4a4ac43b6580b65392a23 to your computer and use it in GitHub Desktop.
acid state antirecommendation

Someone asked whether or not acid-state was production ready. I shared my experiences:

parsonsmatt [11:32 AM] @nikolap it's used by cardano-wallet and Hackage. Based on my experience with acid-state, I'd say it is not a good choice for production data storage. For local desktop apps, SQLite is a much better choice, and for real production apps, Postgresql is king.

parsonsmatt [11:44 AM] acid-state did not have a test suite, at all, until I implemented the very first tests (for TemplateHaskell code generation) earlier this year. It has picked up some tests since then, but I'm still not confident in it's correctness.

It claims to be resilient to unplugging the power cord, but I doubt that, as it's not resilient to Ctrl-C: acid-state/acid-state#79

Migration errors are extremely difficult to identify and debug. Modifying a function that is makeAcidic-ified or any datatype involved in an acidic function will cause problems recovering from a database, unless you check all SafeCopy instances (most of which will be derived) and make appropriate migration instances. None of this is compiler-checked, and it's caused a major and difficult-to-fix bug in almost every single release of cardano-wallet. You get an error message that is basically "Error decoding type: not enough bytes" where type is the first new field that fails to decode after you've added to some record or some function. It's awful.

The data storage format is a binary blob, so you need to use Haskell to view or edit the data, and you need to use the right version of whtaever acidic record you're using. If you fucked up the schema, this means git checkout old-commit-with-working-schema, recompiling, and reloading GHCi. With a different solution (eg SQL db), you can view raw data by using eg pgsql command line SQL client and do arbitrary queries on it. Since the on-disk data format has a schema, you can ask a SQL database whether or not the Haskell type is compatible, and if it isn't, you get a detailed table-vs-record incompatibility report.

acid-state strongly encourages a top-down nested tree sort of data storage. This is efficient if you work top-down, but if you need any kind of relational query, then your performance is going to be absolutely awful. Additionally, it uses Haskell functions on Haskell datatypes, whereas a dedicated database is going to be hyper optimized C code operating on super efficient data structures for queries. You can kinda recover SQL-like querying and performance by using ix-set and storing tables of data with foreign key references, but now you're just reinventing SQL poorly so that you can use less efficient data structures in Haskell.

Oh, and the entire record must be kept in memory. So if you have 13GB of application state, you need 13GB of memory, minimum, and this is all kept in live memory that the GC must walk over every cycle. So performance is bad compared to an on-disk solution. You can implement record sharding on top of acid-state to help this problem, but it's fragile and brittle and infects your data schema, and you might as well use something else.

The benefits are just... really small, and the costs are massive. It's not a good choice.

@mightybyte
Copy link

Another challenge with migrations is that if you're using safecopy and make a change to a persisted data type, you write your migration as a function OldType -> NewType. This means that you have to keep both the old and new versions of your data type around. If you make a change to Foo, resulting in a FooV2 or V2.Foo data type, you then have to go change all the types that contain Foo to instead contain the new FooV2 or V2.Foo and you have to keep around the V2 versions of those types too. This can result in carrying around a lot of code for old data types.

One way to deal with this is by doing a checkpoint and complete migration of the data, then deleting the old types and the migration functions. But if you do that, you run back into the git checkout old-commit-with-working-schema place mentioned above if you ever need to do anything with backups that are in the old format.

@srid
Copy link

srid commented Dec 19, 2018

@jonpetterbergman
Copy link

V2.Foo

Yes, this pattern has emerged in my code.
When making updates to datatypes in module User, start with moving User to User_Vx, where x is the version number.
Make a new module User which imports and re-exports all unchanged datatypes from User_Vx. Also import User_Vx qualified as Vx which is used to refer to the old version of whatever you are making changes to. The Migrate instance will look like:

instance Migrate UserInfo where
  type MigrateFrom UserInfo = Vx.UserInfo
  migrate (Vx.UserInfo x y ...) = ...

You'll end up with User_V0, User_V1, User_V2 ... and User (current).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment