Google Summer of Code 2020 proposal: "Adapt schema editors to operate from model states instead of fake rendered models."
- Abstract
- The New Migrations Framework
- Timeline and Milestones
- About Me
1.1 Drawbacks of the existing migrations framework.
Django's ORM is very powerful, but it's not the most efficient one at the moment. Currently, a ModelState
is used to render a fake Model
and then passed down to the SchemaEditor
during the migrate
phase. This step is really heavy and extracts a lot of toll on the framework (see#22608 and #29898). While ModelState
s are almost as functional as a Model
, their reduced API makes them very efficient. But due to this, ModelState
s fail to:
- resolve relational fields.
- store information about dependencies between models.
Due to these issues, migrations
experience a major slowdown.
1.2 Goals
Broadly speaking, this proposal aims to refactor the migrations
framework, such that the SchemaEditor
successfully works with ModelState
s, instead of fake rendered Model
classes. This would involve changing the ModelState
such that it can have methods to get resolved references and also be able to manage all the _meta
attributes of the Model
. The SchemaEditor
would be required to alter work with ModelState
s instead of Model
s. We aim to not break backward compatibility but certain parts of the SchemaEditor
might be deprecated. Ultimately, we want to see that swapping out Model
for ModelState
results in a significant and non-negligible decrease in the time consumed during the migrate
phase of a project.
1.3 Benefits
The benefits of this change are highly appreciable as it massively cuts down on the time consumed during the migrate
phase. This means faster and efficient migrations
. While the current logic is fine for a few Model
s, these changes will be highly appreciable when projects involve a high number of Model
s. It also makes the code less redundant, as we eliminate the need to render any Model
s and directly use ModelState
s for all our migration
needs.
2.1 Overview
The changes are based on this initial patch by Markus Holtermann. This is supposed to be an internal change in the framework, so no developers should be affected by these changes.
-
The New Stuff:
As stated above,
ModelState
, currently does not have a way of resolving related fields or access to the data type of the related field. To overcome this hurdle, I propose making a new registry/lookup.This registry would keep a track of all related
Field
s andModel
s. We would probably have to make a new data structure for this, something similar toModelTuple
. This data structure should be able to store information about the field as well as the related field/model.class RelatedFieldTuple: def __init__(self, name, related, relation, app_label=None, model=none): # name is the name of the field. # related is the name of the related field. # relation is the type of relation # app_label and model are optional for accuracy def lookup(self): return(namedtuple(name, related, relation))
We would initially populate this registry as the
Application
s module and their correspondingModel
s are being imported in thepopulate()
method inregistry.Apps
. It makes sense to populate/update this registry whenever we runpython manage.py makemigrations
. This would make sure that the registry is ready to be used whenever we runpython manage.py migrate
.Currently, a
ModelState
does not have access to the data type of the related field as it does not have any resolved references unlike aModel
. But theSchemaEditor
'scolumn_sql(()
method relies onForeignKey
'sdb_type()
to find out the data type of the related field (usually anAutoField
or the whatever theto_field
parameter refers to).To solve this issue, rather than iterating over all the
Field
s in allModelState
s to find whichField
s are related(which would be computationally very expensive), it's a nice idea to have a mapping. This mapping would store theModelState
'sapp_label
andmodel_name
, mapping it to thefrom_fields
,to_app_label
,to_model_name
,to_fields
of the relatedModelState
.This mapping would live inside
ProjectState
, as it's the item which is passed around and cross-appForeignKey
s/ManyToManyField
s are resolved properly. It would be populated or altered whenever we runpython manage.py makemigrations
.# If app_label is not mentioned, we will assume the models are from the same app. # from_fields and to_fields will be lists. related_field_mapping = { (app_label, model_name): [(to_app_label, to_model_name, from_fields, to_fields), ...] } # Then we can loop over the mapping, and get the related db_type, via the to_fields. for to_field in to_fields: field_db_type = project_state.fields[to_field].db_type(connection)
We'll also have to make some changes to all ModelOperation
s and FieldOperation
s. We will have to rewrite database_forwards()
database_backwards()
methods, since they directly interact with the SchemaEditor
and use Model
s.
-
An Example:
If we look at
AlterField
indb.migrations.operations.fields.py
right now, thedatabase_forwards()
method gets theModel
of theto_state
by callingget_model()
onto_state.apps
and further works with the result to getto_field
. It also works outfrom_model
andfrom_field
in a similar fashion, and then passes these down toschema_editor.alter_field()
.In the new migrations framework, we would have something like:
def database_forwards(self, app_label, schema_editor, from_state, to_state): # We get the model state instead of getting the model by calling get_model() to_model_state = to_state.models[app_label, self.name] from_model_state = from_state.models[app_label, self.name] # We get the fields from the ModelState itself by calling get_field_by_name() from_field = from_model_state.get_field_by_name(self.name) to_field = to_model_state.get_field_by_name(self.name) # We pass the ModelState instead of the Model to the SchemaEditor. schema_editor.alter_field(from_model_state, from_field, to_field)
There would be some massive changes in the ProjectState
, StateApps
and ModelState
as well. Since we don't want to render fake Model
s anymore, it's sensible to remove methods like reload_model()
, _reload()
, render()
, etc.
The ideal approach would be to have methods like rename_model()
, add_field()
, alter_field()
, etc. in ProjectState
, which are proxied by the state_forwards()
and state_backwards()
methods in all classes subclassed by ModelOperation
and FieldOperation
, quite similar to how DeleteModel.state_forwards
calls ProjectState.remove_model
. Since these methods would reside in ProjectState
, it would be more convinient to maintain related_field_mapping
. This would also help while testing all these new changes.
-
An Example:
django/db/migrations/operations/models.py class RenameModel: def state_forwards(self, app_label, state): state.rename_model(self, app_label, state, old_name_lower, new_name)
django/db/migrations/state.py class ProjectState: def rename_model(self, app_label, model_state, old_name_lower, new_name): # Renaming the model reanmed_model = state.models[app_label, old_name_lower].clone() renamed_model.name = new_name state.models[app_label, old_name_lower] = renamed_model # Repointing all fields pointing to the old model to the new model. for (model_app_label, model_name), model_state in state.models.items(): # Repointing logic
In addition to these classes, there would be one more class, ModelStateOptions
in django.db.migrations.state.py
.
-
class ModelStateOptions:
This class is meant to handle all the
_meta
attributes of theModel
. Since theSchemaEditor
is going to useModelState
s directly, which as of now do not have any solid implementation for theOptions
class, this class becomes a necessary requirement for the reason stated above. This class would include methods likemanaged()
,swapped()
,fields()
,swapped()
, and possibly acontribute_to_class()
, etc. I further think it would be nice for all singleself
argument methods to be decorated by@cached_property
, to make code less redundant.
Since we are looking to deprecate the behaviour of SchemaEditor
using Model
s and instead use ModelState
in the future, we will have to make significant changes in the SchemaEditor
. For example we will have to make large changes in methods like table_sql()
, column_sql()
, create_model()
, delete_model()
, etc. as all these methods make use of Model
s.
Initially, I would test all these changes on a PostgreSQL database, and then slowly expand these tests to the rest of all supported databases.
2.2 Advantages
As stated before, this will reduce time consumed during migrations
considerably, especially as the number of Model
s increases in a project. This closes the longstanding ticket #22608, which was first filed 6 years ago. It also removes redundancy in the codebase by eliminating the need to render fake Model
s and directly using ModelState
s.
My university has suspended classes indefinitely at the moment due to the outbreak of COVID-19. Hence, I am mostly free and would be able to dedicate 35-40 hours a week towards accomplishing these tasks. I would be working on existing tickets till my proposal gets reviewed and accepted (hopefully).
I aim to learn as well as give back a lot during this period, and would be writing a blog post every week about my progress, to help myself be on track, and let the community know about my plans and progress. I would be starting work as soon as I can, i.e when the results are announced on 27th April 2020. Please note that, I cannot guarantee that this timeline is set in stone, as the situation right now with COVID-19 is very dynamic and I might need to change it depending on future circumstances. I hope that this does not cause a major issue.
3.1 Application Review Period
- Work on existing Migrations/ORM tickets, to make myself more familiar with the framework. I'll try to work on more difficult tickets to improve my skills, as I have already worked on a few easy ones.
3.2 April 27 - May 4
-
Discussion of the rubrics with the mentor and other contributors of the community.
-
Finalisation of the required approach, and discussion of further corner cases.
3.3 May 5 - May 15
-
Work on the central registry to keep track of all fields and models and corresponding relationship/data type.
-
Incorporate the registry into the
populate()
method. -
Work on the
related_field_mapping
for easy access todb_type
of the related field.
3.4 May 16 - June 5
-
Make all the new methods such as
rename_model()
,alter_field()
insideProjectState
. -
Alter the
state_forwards()
methods in all classes subclassed byOperation
. -
Make further required changes in
db/migrations/state.py
.
3.5 June 5 - June 15
- Alter the
SchemaEditor
(methods likecreate_model()
,delete_model()
, etc.), to work withModelState
s instead ofModel
s.
3.6 June 15 - June 23
- Alter the
database_forwards()
anddatabase_backwards()
methods in all subclasses ofOperation
.
3.7 June 24 - July 9
- Work on the new class
ModelStateOptions
and write all methods necessary to handle_meta
attributes. (I might have exams during this period, hence the rather long duration).
3.8 July 10 - July 20
- Start testing the new framework manually; make changes and fixes wherever required accordingly.
3.9 July 20 - August 8
- Work on tests and documentation.
3.10 8th August onwards
- I'd hate to sit around idle, so if all goes according to this timeline and I am finished with this on time, I would be working on other migration/ORM tickets which would help improve the framework, such as:
Hi, I am Sanskar Jaiswal, a second year undergrad studying Electronics and Communication Engineering at Vellore Institute of Technology, Vellore, India. I have been coding since high school, and although my major is Electroncis, my passion is developing good software. I am a member of IEEE-VIT, which is one of the best student chapters on my campus. My peers there have highly motivated me and shaped me into the developer I am today. We have won several hackathons, and had the opportunity to build amazing stuff. I personally have worked on the following projects involving Django:
-
Recruitment Website: This was a recruitment website portal that my friend and I built, which had to support various features for candidates and the recruiter. This project uses Django Rest Framework too. The code might be a little dirty, because we had to finish and deploy it in 3 nights.
-
Blog: This was one of my first projects with Django, in which I made a blog on my own from scratch, following all best practices involved.
Apart from Django projects, I have worked a lot on Machine Learning and Flutter, check out my GitHub, if you're interested. :)
I got interested in GSoC, when a senior in my university introduced me to it and the culture of open source, and I was very mesmerized when I saw how incredible it is. My senior pushed me to contributing to Django, as I was familiar with the framework. Since then, I have opened 4 PRs:
-
Merged PRs:
-
Open PRs:
Django was the first web framework, that I learned and I simply love it. The idea to contributing to such an amazing software in a significant way, is something I am definitely excited about.
Details:
- Name: Sanskar Jaiswal
- Email: jaiswalsanskar078@gmail.com
- Gender: Male (he/him)
- GitHub: aryan9600
- LinkedIn : Sanskar Jaiswal
- IRC nick: aryan9600
- Contact: +91 810 000 44969
- Country: India
- Timezone: Indian Standard Time (IST | UTC +5:30)
- Languages Known: English, Hindi