Skip to content

Instantly share code, notes, and snippets.

@nuria
Created February 29, 2024 22:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nuria/a76615131650626f5867a4b88e1e55d1 to your computer and use it in GitHub Desktop.
Save nuria/a76615131650626f5867a4b88e1e55d1 to your computer and use it in GitHub Desktop.
You might have thought , “mmm this is un-structured data I want to put in this record , let me use a json field and we will see later what we want to do with it.” Stop right there! These are the reasons why postgres JSON columns are evil and we should not use them as they cause a lot of problems.
(this applies to JSON stored in varchar columns and proper JSON blobs)
- **They grow unbounded to major sizes.** We had an incident in which code gone wild persisted 1GB of json in the users.details column, this affected the database but also cascaded to other parts of our system. The 1GB JSON column also broke our search pipeline as kafka messages are capped to 8MB so that record could not be processed. Until we cleaned up it halted processing of changes for indexing of classes on elastic search.
- **Particularly problematic when used to store a logs of events**. We have used JSON fields to store logs of events which have a many-to-one relationship with the primary record the JSON field is stored on. We’ve seen first-hand how this approach is not scalable due to enough events eventually stacking up enough to overflow the max size and make the JSON value unreadable.
- Examples:
- enrollment_transfers (WHICH ARE STILL IMPLEMENTED THIS WAY)
- We have some enrollments whose transfer history cannot be read because they have been transferred too many times
- audit_log
- payment processing events
- **They are a privacy nightmare.** We have had instances of sensitive data persisted in those fields together with innocuous data and there is not a way to separate those for retention data purposes.
- **No Data Types.** Everything in a JSON blob is stored as text in the database which can lead to anomalies where similar information is represented in different formats (for example, dates represented as DD-MM-YYYY vs YYYY-MM-DD, etc). When new data is inserted into a JSON blob, no error is thrown if a value is added with an unexpected data type or format. For example, if a JSON blob stores learner_age as an integer, nothing is stopping the value “adult” from being added, leaving the issue to surface later when it breaks the website and/or our analytics (this is a real example from our enrollments.details JSON blob viewable with the SQL below).
- `select distinct details_super."age" from dbt.base_enrollments`
- **They sometimes hold relational data and there is no way to know that is the case**. We have json fields that hold arrays of UUIDs that of course are not validated against any FK constrains, things break in strange ways w/o obvious data integrity errors. More details [here](https://outschool.slack.com/archives/C0AEUT2CA/p1701875914718799).
- **They don’t clearly show when data is missing.** When the value of a structured field (such as ip_address) in a table is null, it’s clear what data is missing because the column of the missing value is named. However, if a field (like ip_address) is stored on a table within a JSON blob instead and that field’s value is missing, then you could only know the value is missing if you already knew to expect it there (because information is added to JSON blobs in key-value pairs).
- JSON stored in varchar columns has all the problems referenced above but even better, it will go over the size limit leaving you with chopped data and thus invalid, unreadable, JSON.
- When a JSON field becomes unreadable in this way, it can make it more difficult to read the rest of the data stored in the table with it. When querying a data table that has rows with unreadable JSON, a SQL error is often thrown which requires finding and excluding the bad data in order to get the query to run.
- [Exceptions caused by JSON](https://docs.exasol.com/db/latest/sql_references/functions/json_error_handling.htm)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment