"Django ORM support for composite primary keys" proposal for Google Summer of Code 2024
In database design, composite primary keys are often necessary for the partitioning and sharding of database tables.
Citus
is a PostgreSQL extension that transforms PostgreSQL into a distributed database.django-multitenant
is a library by Citus which enables developers to build multi-tenant applications in Django.
In Citus, composite primary keys are required. So, in order to use these tools with Django, the developers must work around the Django ORM.
To make building multi-tenant apps easier, I propose adding composite primary key support to the Django ORM.
- Multi-Column Primary Key support
- #373
- GSoC 2024 proposal: Django ORM support for composite primary keys
A composite primary key can be defined by setting Meta.primary_key
(similar to Peewee).
class User(models.Model):
tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
id = models.BigAutoField(primary_key=False)
class Meta:
primary_key = ("tenant_id", "id")
If Meta.primary_key
is set, primary_key=True
can't be set on any fields, any attempt to do so will result in a check error.
All officially supported databases (PostgreSQL, MariaDB, MySQL, Oracle, SQLite) support composite primary keys.
If, for some reason, the database doesn't support composite primary keys,
the feature can be disabled with the supports_composite_primary_keys
feature flag defined on the
db.backends.base.features.BaseDatabaseFeatures
class.
The implementation of this feature doesn't need any backwards-incompatible changes to public APIs, only internal APIs.
e.g. def _create_primary_key_sql(self, model, field):
-> def _create_primary_key_sql(self, model, fields):
A notable backwards-compatible change is, if a composite primary key is defined, _meta.pk
is assigned a
tuple of fields instead of a single field.
_meta.pk
is used 100+ times in Django, all occurences have to be reviewed and adjusted.
In Django, AutoField
s must set primary_key=True
.
To make it possible for composite primary keys to include surrogate keys (e.g. SmallAutoField
, AutoField
, BigAutoField
),
AutoFieldMixin
needs to allow setting primary_key=False
for fields part of the composite primary key.
id = models.BigAutoField(primary_key=False)
This proposal is only concerned about making auto fields work with composite primary keys, however, there have been requests (1, 2, 3) in the past to support other use cases and remove this limitation altogether.
ForeignKey
doesn't support composite foreign keys, but its parent class ForeignObject
does.
class Comment(models.Model):
tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
id = models.BigAutoField()
user_id = models.BigIntegerField()
user = models.ForeignObject(
User,
on_delete=models.CASCADE,
from_fields=("tenant_id", "user_id"),
to_fields=("tenant_id", "id"),
)
ForeignObject
works well with composite primary keys, it supports multi-column JOIN
s,
but it doesn't create a composite foreign key in the database.
Also, while ForeignKey
creates an index automatically, developers need to define an index
explicitly when using ForeignObject
.
While database-level composite foreign keys and automatic indexes are nice to have, they are not integral to implementing composite primary keys.
So, no changes needed (for now).
Using GenericForeignKey
is generally considered bad design 1.
That said, if support for composite primary keys is required, it could be achieved with the following:
class TaggedItem(models.Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.TextField()
content_object = GenericForeignKey("content_type", "object_id")
The composite primary keys are JSON-encoded and stored in a text field, e.g. [1, 2]
, ['c141ef6c-4816-4377-8fab-cf8f3ac3152a', 'c378e84c-9e85-4aeb-bc95-a85a2e403c98']
.
GenericForeignKey
can JSON-decode the text field, and if it's an array of integers or strings, filter for composite primary keys.
A composite primary key can be displayed in URLs in the format quote(pk1) + ',' + quote(pk2)
.
Since Django's quote
function already URL-encodes ,
, this change is backwards-compatible.
_meta.pk
is used all over Django's source code.
All occurences need to be reviewed and adjusted individually.
I believe this can't be done 175hr, so I propose a scope of 350hr.
Fortunately, this proposal is backwards-compatible, so support for composite primary keys can be introduced incrementally.
The primary goal of this proposal is to add composite primary keys to the Django ORM.
So, among other things:
- A model can define a composite primary key.
- The migration system can create a database-level composite primary key.
- The composite primary key works with the ORM's public APIs (e.g.
.get()
,.create()
,.delete()
,.bulk_update()
, etc.). - The composite primary key works with other fields (e.g. auto fields).
- It's tested and documented.
To deliver this, I'll need at least 5 weeks = 200hr.
The secondary goal of this proposal is to add composite primary key support to other parts of Django, if time permits.
So, the remaining 150hr I would spend working on composite primary key support for other Django code (e.g. Django Admin).
My name is Bendegúz Csirmaz, I've been a professional software engineer since 2017. I have 4 years of experience developing Django applications (LinkedIn). I have some free time now to work on open source projects.
Google Summer of Code 2024 is a great opportunity to contribute to my favorite web framework and deliver a long-awaited, important feature - one that I would also like to use in my own projects.