Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save laymonage/b53a1acbbab36b77776cd526b48fd2a5 to your computer and use it in GitHub Desktop.
Save laymonage/b53a1acbbab36b77776cd526b48fd2a5 to your computer and use it in GitHub Desktop.
Add Cross-DB JSONField and ArrayField: Proposal for Django in Google Summer of Code 2019 by Sage M. Abdullah.

Add Cross-DB JSONField and ArrayField

Proposal for Django in Google Summer of Code 2019 by Sage M. Abdullah.

Table of Contents

Abstract

Django currently has limited support for JSONField. The support is provided in contrib.postgres.fields module, meaning it can only be used for PostgreSQL database backend. The current implementation of JSONField for PostgreSQL uses the jsonb data type.

Django officially supports PostgreSQL, MySQL, SQLite, and Oracle Database Server. With the JSON1 extension available for SQLite, all of the supported backends now have native JSON handling. Aside from JSONField, Django also supports related fields such as ArrayField and HStoreField for PostgreSQL.

The current JSONField implementation supports key, index, path, null value, and missing key lookups. It also shares lookups relating to containment and keys with HStoreField, such as contains (overridden), contained_by, has_key, has_any_keys, and has_key. ArrayField currently supports contains (overridden), contained_by, overlap, and len lookups along with index and slice transformations.

Native implementation of JSONField is available for MySQL through the use of django-mysql package. For Oracle Database, there is the oracle-json-field package. There is no native implementation for SQLite yet, though there is jsonfield and django-jsonfield that implement the field as a CharField or TextField.

Goals

The main goal is to bring the fields from contrib.postgres into core (db), with JSONField as the first priority. Other fields include ArrayField and HStoreField, but the latter does not seem to be worth to implement. Achieving this will enable developers to use the fields with any supported database backends.

Since there are examples for all other database backends except SQLite, the first step is to create a POC field for SQLite. The next step is to create a unified field that works for all database backends. Tests will also be created to make sure the implementation works as expected.

Once each field is completed, the next step is to implement lookups that work across all database backends. Lookups that are only supported for certain database backends will be implemented with checks that make sure they are only available if supported.

To avoid conflicts for users of the existing PostgreSQL fields, the implementation will not immediately replace the implementation in contrib.postgres. Instead, a migration path will be defined in the documentation. A warning may also be raised if developers are still using those fields.

The documentation will not only cover the implementation and migration path, but also some kind of a how-to guide on compiling SQLite with the JSON1 extension so that it can be used with Python and Django (though it seems to be enabled by default on some instances).

Benefits

A database-agnostic JSONField has been discussed since 2010 in #12990. It has also been discussed in django-developers since 2016. The accomplishment of this task would benefit greatly for the community, since JSON is a common data type to be used in web development and the community seems to be very interested in a database-agnostic JSONField. Robust ORM lookups for JSON data supported natively by the backends are also very desirable.

Implementation

Overview

Implementing a field that works on all supported database backends will require a lot of analysis beforehand. This is due to different implementations of JSON data type in each database backend. Even though there are some common lookups, the implementations differ between one another.

After creating a POC field for SQLite, there probably needs to be a table of some kind to show the differences (available lookups, syntax, etc.) between implementations of JSON data type in each database backend. Once it's all done, we can figure out which features we want to include in the cross-DB field.

Advantages

There are existing implementations of native JSONField for all database backends except SQLite, so we are not starting from scratch. I will try to reuse those existing implementations to make sure they're as compatible as possible.

The documentation for JSON data type in PostgreSQL, SQLite, MySQL, and Oracle Database are pretty thorough, so it looks like they'll be helpful in the future.

Disadvantages

As already said, the implementations of JSON data type in each database backend are varyingly different. Maintaining compatibility across all backends will be quite tricky. We may need to do something like as_sqlite(), as_oracle(), etc. to separate the different implementations.

Schedule and milestones

Before starting to work on this project, I would like to do some preparation:

  • Researching how Django models work with the database backends.
  • Researching how JSON data type is implemented in all database backends.
  • Familiarizing with all database backends and the best way to use JSON in each one.

On June 5-6, I will be on holiday (Eid al-Fitr), so I won't be able to work on those dates.

I will need to do academic registration sometime in early August (no certain date yet). This will probably take a whole day, depending on my university. Best case, this will take less than an hour. Worst case, this will take two days (if there are technical difficulties with the server).

On August 20-22 (tentative), I need to help my faculty in guiding the freshmen for their first days in college. I won't be very busy, so I'll still be able to work (part-time) on those dates.

Beside those dates, I won't have anything else to do. So, that means more time for me to work on the project.

Community bonding: Research and patches

(From May 7 until May 27 -- 3 weeks)

During the community bonding period, I will try to:

  • Do more research about Django codebase, especially the database layer.
  • Review existing tickets, make some new ones, and submit patches.
  • Hang out more in Django IRC channels and mailing lists to get to know the community and help them.

First milestone: Research, POC JSONField for SQLite, lookups analysis

(From May 27 until June 24 -- 4 weeks)

I will create a POC JSONField for SQLite (3.2.1), write documentation and tests for it (3.2.2), and create an analysis about lookups for JSON data type in all database backends (3.2.3).

Creating a POC JSONField for SQLite (2 weeks)

(Holidays on June 5 - 6)

During the first few days, I may need to do research to find the best way to create a POC JSONField for SQLite. I'll consult to the mentors as I create it to make sure I'm on the right track. I will try to reuse the implementation of JSONField in contrib.postgres as much as possible.

Writing documentation and tests for the POC SQLite (1 week)

Once the POC field is done, I will spend the next week writing tests and documentation for it. This work may be merged if desired.

Analyzing JSON lookups in all database backends (1 week)

In the last week of the first milestone, I will analyze all of the available JSON lookups in every database backend. From the analysis, I will create a table of some kind to compare which lookups are available across all database backends. This will help me in implementing the unified field that works for all database backends. If I have some free time, I will also dig deeper about how JSON data type is implemented in all database backends (especially those other than SQLite and PostgreSQL).

First milestone evaluation (4 days)

The first evaluation will take place on June 25 - 29. By this evaluation, I hope I will have completed:

  • POC JSONField for SQLite, along with tests and docs.
  • JSON lookups analysis.

Second milestone: Implement unified field and lookups, tests, and docs

(From June 30 until July 22 -- ~3.5 weeks)

The first evaluation may finish early, so I may be able to start working for this second milestone earlier. In this second milestone, I will implement a unified JSONField and its lookups (3.3.1) and write tests and docs for them (3.3.2).

Implementing unified JSONField and lookups (2.5 weeks)

This is the most important step of this project. This probably can be divided into parts: making sure the unified JSONField works for both SQLite and PostgreSQL, extending the field to work for MySQL (and MariaDB) and Oracle Database, and implementing the lookups. However, I'm still not sure how long each part would take, so I combine them into a single step that takes 2.5 weeks to complete.

The existing implementation for MySQL and Oracle Database will be very helpful in this step.

Writing tests and documentation (1 week)

I probably will have written some of the tests and docs while implementing, but I'm sure there will be a room for improvements. During the rest of this milestone, I will write additional tests and docs. I may also make general improvements to the implementation as needed.

Again, this work may be merged if desired.

Second milestone evaluation (4 days)

The second evaluation will take place on July 23 - 27. By this evaluation, I hope I will have completed:

  • Unified JSONField and its lookups, along with tests and docs.

Third milestone: ArrayField implementation, more documentation

(From July 28 until August 20 -- 3 weeks)

(Academic registration in early August, a bit busy on August 20 - 22)

As before, I may start earlier if the evaluation didn't take as long (< 4 days). In this third and final milestone, I aim to implement ArrayField (3.4.1). I will also write documentation, especially a migration path from contrib.postgres to the unified field and a how-to on compiling SQLite with JSON1 extension to make it work with Django (3.4.2). The rest of the milestone will be spent in merging the whole project into master (3.4.3).

Implementing ArrayField (1 week)

SQLite and MySQL have the json_array() function which can be used to store arrays. Meanwhile, Oracle Database stores JSON as a VARCHAR2, BLOB, or CLOB with IS JSON constraint, and arrays are a valid JSON value.

If I'm correct, this shouldn't be too hard to implement, since some lookups in JSONField can also be used for ArrayField. Also, I'll already be familiar with JSONField and its implementation. I aim to finish this in 1 week, including writing the tests.

Writing docs for ArrayField, migration path, and SQLite+JSON1 (1.5 week)

During this week, I will write documentation for ArrayField. After that, I will write about the migration path from contrib.postgres to the unified fields.

I will also write a how-to on compiling SQLite with JSON1 extension. One option is to write about how to compile the JSON1 extension and loading it with the .load command in SQLite. Another option is to write about how to compile SQLite amalgamation with the -DSQLITE_ENABLE_JSON1 option.

Merging the whole project into master (0.5 week)

In the rest of the week, I aim to iron out any minor issues that may be left. Then, the whole project should be ready to be merged into master. If the first two milestones are already merged, this should be a smaller patch.

Third and final milestone evaluation (1 week)

The final evaluation will take place on August 20 - 27. By this evaluation, I hope I will have completed:

  • ArrayField implementation, tests, and docs.
  • Documentation about the migration path and SQLite+JSON1 guide.
  • The whole project, merged into master.

Rest of the summer

If I'm able to complete the project earlier than expected, I aim to work on existing tickets on Trac, especially those for the database layer. I have yet to decide which tickets I want to work on, since I can't tell which ones will still be open once I finish this project.

Notes

I will make sure to communicate with the mentors regularly during the whole project (not just for evaluations). I also plan to write weekly blog posts about my progress on this project. I will also try to be a regular contributor to Django after this project ends.

About me

My name is Sage M. Abdullah and I'm in my second year as a Computer Science student at Universitas Indonesia. I am 19 years old (I'll be 20 this June). I live in Jakarta, Indonesia (UTC+07:00). I've been coding in Python for the past 2 years. I also code in Java and a little bit of C# and C.

I started contributing to Django in March this year. So far, I have submitted two patches (#30294 and #30295) and they both have been merged. I have also reported a ticket (#30326). They're easy pickings, but I look forward to contribute more to Django's codebase in the future. I have also contributed to Django's localization for Indonesian language on transifex since 6 months ago.

During my first few months in using Python, I made a simple IRC-style chatbot for LINE using Flask called AidenBot. I also made a few web scrapers, one of them being kbbi-python, a simple module that scraps the online Indonesian dictionary and returns the serialized result as a JSON object.

My email is laymonage (at) gmail.com. You can find me in #django-dev, #django, and #gsoc IRC channels as laymonage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment