Instantly share code, notes, and snippets.

Embed
What would you like to do?
acid state antirecommendation

Someone asked whether or not acid-state was production ready. I shared my experiences:

parsonsmatt [11:32 AM] @nikolap it's used by cardano-wallet and Hackage. Based on my experience with acid-state, I'd say it is not a good choice for production data storage. For local desktop apps, SQLite is a much better choice, and for real production apps, Postgresql is king.

parsonsmatt [11:44 AM] acid-state did not have a test suite, at all, until I implemented the very first tests (for TemplateHaskell code generation) earlier this year. It has picked up some tests since then, but I'm still not confident in it's correctness.

It claims to be resilient to unplugging the power cord, but I doubt that, as it's not resilient to Ctrl-C: https://github.com/acid-state/acid-state/issues/79

Migration errors are extremely difficult to identify and debug. Modifying a function that is makeAcidic-ified or any datatype involved in an acidic function will cause problems recovering from a database, unless you check all SafeCopy instances (most of which will be derived) and make appropriate migration instances. None of this is compiler-checked, and it's caused a major and difficult-to-fix bug in almost every single release of cardano-wallet. You get an error message that is basically "Error decoding type: not enough bytes" where type is the first new field that fails to decode after you've added to some record or some function. It's awful.

The data storage format is a binary blob, so you need to use Haskell to view or edit the data, and you need to use the right version of whtaever acidic record you're using. If you fucked up the schema, this means git checkout old-commit-with-working-schema, recompiling, and reloading GHCi. With a different solution (eg SQL db), you can view raw data by using eg pgsql command line SQL client and do arbitrary queries on it. Since the on-disk data format has a schema, you can ask a SQL database whether or not the Haskell type is compatible, and if it isn't, you get a detailed table-vs-record incompatibility report.

acid-state strongly encourages a top-down nested tree sort of data storage. This is efficient if you work top-down, but if you need any kind of relational query, then your performance is going to be absolutely awful. Additionally, it uses Haskell functions on Haskell datatypes, whereas a dedicated database is going to be hyper optimized C code operating on super efficient data structures for queries. You can kinda recover SQL-like querying and performance by using ix-set and storing tables of data with foreign key references, but now you're just reinventing SQL poorly so that you can use less efficient data structures in Haskell.

Oh, and the entire record must be kept in memory. So if you have 13GB of application state, you need 13GB of memory, minimum, and this is all kept in live memory that the GC must walk over every cycle. So performance is bad compared to an on-disk solution. You can implement record sharding on top of acid-state to help this problem, but it's fragile and brittle and infects your data schema, and you might as well use something else.

The benefits are just... really small, and the costs are massive. It's not a good choice.

@kccqzy

This comment has been minimized.

Copy link

kccqzy commented Dec 18, 2018

and this is all kept in live memory that the GC must walk over every cycle.

I agree with almost every point written here but this one. Its kind of a nitpick. It's true that RAM consumption will be high, sometimes as high as twice the amount of data you have. But GC time is not an issue, if you use compact regions. This does mean that you again need more learning curve to fix an issue caused by this library, so overall I don't think my nitpick detracts from your overall argument.

@jonpetterbergman

This comment has been minimized.

Copy link

jonpetterbergman commented Dec 19, 2018

I won't try to "defend" acid-state, but just explain how I have dealt with some of these things:

as it's not resilient to Ctrl-C

Yes, you need to install your own handlers for Ctrl-C, and other signals as well.

Migration errors are extremely difficult to identify and debug. Modifying a function that is makeAcidic-ified or any datatype involved in an acidic function will cause problems recovering from a database, unless you check all SafeCopy instances (most of which will be derived) and make appropriate migration instances...

Whenever I deploy an upgrade, I make a checkpoint when shutting down the server (checkpoints are made in the signal handler). That takes care of maceAcidic-ified functions changing, or datatypes involved changing, since at the next startup no transaction-replaying will occur. Before deploying I will have tested this on a copy of the production database.

@alexanderkjeldaas

This comment has been minimized.

Copy link

alexanderkjeldaas commented Dec 19, 2018

Yes, you need to install your own handlers for Ctrl-C, and other signals as well.

Then it is not resilient as you can't handle SIGKILL which is guaranteed to be used if the app is unable to exit in time.

@jonpetterbergman

This comment has been minimized.

Copy link

jonpetterbergman commented Dec 19, 2018

SIGKILL ...

Yes, in that case you will get the dreaded "not enough bytes".

Another way to get "not enough bytes" is to run out of disk space. This has happened to me a couple of times over the years and I need to manually shave of the bytes making up the last, broken, transaction.

@mightybyte

This comment has been minimized.

Copy link

mightybyte commented Dec 19, 2018

Another challenge with migrations is that if you're using safecopy and make a change to a persisted data type, you write your migration as a function OldType -> NewType. This means that you have to keep both the old and new versions of your data type around. If you make a change to Foo, resulting in a FooV2 or V2.Foo data type, you then have to go change all the types that contain Foo to instead contain the new FooV2 or V2.Foo and you have to keep around the V2 versions of those types too. This can result in carrying around a lot of code for old data types.

One way to deal with this is by doing a checkpoint and complete migration of the data, then deleting the old types and the migration functions. But if you do that, you run back into the git checkout old-commit-with-working-schema place mentioned above if you ever need to do anything with backups that are in the old format.

@srid

This comment has been minimized.

Copy link

srid commented Dec 19, 2018

@jonpetterbergman

This comment has been minimized.

Copy link

jonpetterbergman commented Dec 20, 2018

V2.Foo

Yes, this pattern has emerged in my code.
When making updates to datatypes in module User, start with moving User to User_Vx, where x is the version number.
Make a new module User which imports and re-exports all unchanged datatypes from User_Vx. Also import User_Vx qualified as Vx which is used to refer to the old version of whatever you are making changes to. The Migrate instance will look like:

instance Migrate UserInfo where
  type MigrateFrom UserInfo = Vx.UserInfo
  migrate (Vx.UserInfo x y ...) = ...

You'll end up with User_V0, User_V1, User_V2 ... and User (current).

@JBetz

This comment has been minimized.

Copy link

JBetz commented Dec 21, 2018

Since the on-disk data format has a schema, you can ask a SQL database whether or not the Haskell type is compatible, and if it isn't, you get a detailed table-vs-record incompatibility report.

What do you mean by "ask a SQL database"? Do you use a specific tool for getting these reports, or are you just talking generally about the error messages you get from SQL -> Haskell functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment