Skip to content

Instantly share code, notes, and snippets.

@aryan9600
Last active March 30, 2020 16:51
Show Gist options
  • Save aryan9600/b1c2eaf445006c17e02e7677cf1098d5 to your computer and use it in GitHub Desktop.
Save aryan9600/b1c2eaf445006c17e02e7677cf1098d5 to your computer and use it in GitHub Desktop.
This is my proposal for Google Summer of Code 2020, under Django.

Google Summer of Code 2020 proposal: "Adapt schema editors to operate from model states instead of fake rendered models."

Table of Contents

  1. Abstract
  2. The New Migrations Framework
  3. Timeline and Milestones
  4. About Me

1. Abstract

1.1 Drawbacks of the existing migrations framework.

Django's ORM is very powerful, but it's not the most efficient one at the moment. Currently, a ModelState is used to render a fake Model and then passed down to the SchemaEditor during the migrate phase. This step is really heavy and extracts a lot of toll on the framework (see#22608 and #29898). While ModelStates are almost as functional as a Model, their reduced API makes them very efficient. But due to this, ModelStates fail to:

  • resolve relational fields.
  • store information about dependencies between models.

Due to these issues, migrations experience a major slowdown.

1.2 Goals

Broadly speaking, this proposal aims to refactor the migrations framework, such that the SchemaEditor successfully works with ModelStates, instead of fake rendered Model classes. This would involve changing the ModelState such that it can have methods to get resolved references and also be able to manage all the _meta attributes of the Model. The SchemaEditor would be required to alter work with ModelStates instead of Models. We aim to not break backward compatibility but certain parts of the SchemaEditor might be deprecated. Ultimately, we want to see that swapping out Model for ModelState results in a significant and non-negligible decrease in the time consumed during the migrate phase of a project.

1.3 Benefits

The benefits of this change are highly appreciable as it massively cuts down on the time consumed during the migrate phase. This means faster and efficient migrations. While the current logic is fine for a few Models, these changes will be highly appreciable when projects involve a high number of Models. It also makes the code less redundant, as we eliminate the need to render any Models and directly use ModelStates for all our migration needs.


2. The New Migrations Framework

2.1 Overview

The changes are based on this initial patch by Markus Holtermann. This is supposed to be an internal change in the framework, so no developers should be affected by these changes.

  • The New Stuff:

    As stated above, ModelState, currently does not have a way of resolving related fields or access to the data type of the related field. To overcome this hurdle, I propose making a new registry/lookup.

    This registry would keep a track of all related Fields and Models. We would probably have to make a new data structure for this, something similar to ModelTuple. This data structure should be able to store information about the field as well as the related field/model.

    class RelatedFieldTuple:
    
        def __init__(self, name, related, relation, app_label=None, model=none):
    
        # name is the name of the field.
        # related is the name of the related field.
        # relation is the type of relation
        # app_label and model are optional for accuracy
    
        def lookup(self):
            return(namedtuple(name, related, relation))
    
    

    We would initially populate this registry as the Applications module and their corresponding Models are being imported in the populate() method in registry.Apps. It makes sense to populate/update this registry whenever we run python manage.py makemigrations. This would make sure that the registry is ready to be used whenever we run python manage.py migrate.

    Currently, a ModelState does not have access to the data type of the related field as it does not have any resolved references unlike a Model. But the SchemaEditor's column_sql(() method relies on ForeignKey's db_type() to find out the data type of the related field (usually an AutoField or the whatever the to_field parameter refers to).

    To solve this issue, rather than iterating over all the Fields in all ModelStates to find which Fields are related(which would be computationally very expensive), it's a nice idea to have a mapping. This mapping would store the ModelState's app_label and model_name, mapping it to the from_fields, to_app_label, to_model_name, to_fields of the related ModelState.

    This mapping would live inside ProjectState, as it's the item which is passed around and cross-app ForeignKeys/ManyToManyFields are resolved properly. It would be populated or altered whenever we run python manage.py makemigrations.

    # If app_label is not mentioned, we will assume the models are from the same app.
    # from_fields and to_fields will be lists.
    related_field_mapping = {
       (app_label, model_name): [(to_app_label, 
       to_model_name, from_fields, to_fields), ...]
    }
    
    # Then we can loop over the mapping, and get the related db_type, via the to_fields.
    
    for to_field in to_fields:
       field_db_type = project_state.fields[to_field].db_type(connection)
    

We'll also have to make some changes to all ModelOperations and FieldOperations. We will have to rewrite database_forwards() database_backwards() methods, since they directly interact with the SchemaEditor and use Models.

  • An Example:

    If we look at AlterField in db.migrations.operations.fields.py right now, the database_forwards() method gets the Model of the to_state by calling get_model() on to_state.apps and further works with the result to get to_field. It also works out from_model and from_field in a similar fashion, and then passes these down to schema_editor.alter_field().

    In the new migrations framework, we would have something like:

    def database_forwards(self, app_label, schema_editor, from_state, to_state):
        
        # We get the model state instead of getting the model by calling get_model()
        to_model_state = to_state.models[app_label, self.name]
    
        from_model_state = from_state.models[app_label, self.name]
    
        # We get the fields from the ModelState itself by calling get_field_by_name()
    
        from_field = from_model_state.get_field_by_name(self.name)
        to_field = to_model_state.get_field_by_name(self.name)
    
        # We pass the ModelState instead of the Model to the SchemaEditor.
    
        schema_editor.alter_field(from_model_state, from_field, to_field)
    
            
    

There would be some massive changes in the ProjectState, StateApps and ModelState as well. Since we don't want to render fake Models anymore, it's sensible to remove methods like reload_model(), _reload(), render(), etc.

The ideal approach would be to have methods like rename_model(), add_field(), alter_field(), etc. in ProjectState, which are proxied by the state_forwards() and state_backwards() methods in all classes subclassed by ModelOperation and FieldOperation, quite similar to how DeleteModel.state_forwards calls ProjectState.remove_model. Since these methods would reside in ProjectState, it would be more convinient to maintain related_field_mapping. This would also help while testing all these new changes.

  • An Example:

    django/db/migrations/operations/models.py
    
    class RenameModel:
    
        def state_forwards(self, app_label, state):
            state.rename_model(self, app_label, state, old_name_lower, new_name)
    
    django/db/migrations/state.py
    
    class ProjectState:
    
        def rename_model(self, app_label, model_state, old_name_lower, new_name):
        
            # Renaming the model
            reanmed_model = state.models[app_label, old_name_lower].clone()
            renamed_model.name = new_name
            state.models[app_label, old_name_lower] = renamed_model
    
            # Repointing all fields pointing to the old model to the new model.
    
            for (model_app_label, model_name), model_state in state.models.items():
    
                # Repointing logic
    
    

In addition to these classes, there would be one more class, ModelStateOptions in django.db.migrations.state.py.

  • class ModelStateOptions:

    This class is meant to handle all the _meta attributes of the Model. Since the SchemaEditor is going to use ModelStates directly, which as of now do not have any solid implementation for the Options class, this class becomes a necessary requirement for the reason stated above. This class would include methods like managed(), swapped(), fields(), swapped(), and possibly a contribute_to_class(), etc. I further think it would be nice for all single self argument methods to be decorated by @cached_property, to make code less redundant.

Since we are looking to deprecate the behaviour of SchemaEditor using Models and instead use ModelState in the future, we will have to make significant changes in the SchemaEditor. For example we will have to make large changes in methods like table_sql(), column_sql(), create_model(), delete_model(), etc. as all these methods make use of Models.

Initially, I would test all these changes on a PostgreSQL database, and then slowly expand these tests to the rest of all supported databases.

2.2 Advantages

As stated before, this will reduce time consumed during migrations considerably, especially as the number of Models increases in a project. This closes the longstanding ticket #22608, which was first filed 6 years ago. It also removes redundancy in the codebase by eliminating the need to render fake Models and directly using ModelStates.


3. Timeline and Milestones

My university has suspended classes indefinitely at the moment due to the outbreak of COVID-19. Hence, I am mostly free and would be able to dedicate 35-40 hours a week towards accomplishing these tasks. I would be working on existing tickets till my proposal gets reviewed and accepted (hopefully).

I aim to learn as well as give back a lot during this period, and would be writing a blog post every week about my progress, to help myself be on track, and let the community know about my plans and progress. I would be starting work as soon as I can, i.e when the results are announced on 27th April 2020. Please note that, I cannot guarantee that this timeline is set in stone, as the situation right now with COVID-19 is very dynamic and I might need to change it depending on future circumstances. I hope that this does not cause a major issue.

3.1 Application Review Period

  • Work on existing Migrations/ORM tickets, to make myself more familiar with the framework. I'll try to work on more difficult tickets to improve my skills, as I have already worked on a few easy ones.

3.2 April 27 - May 4

  • Discussion of the rubrics with the mentor and other contributors of the community.

  • Finalisation of the required approach, and discussion of further corner cases.

3.3 May 5 - May 15

  • Work on the central registry to keep track of all fields and models and corresponding relationship/data type.

  • Incorporate the registry into the populate() method.

  • Work on the related_field_mapping for easy access to db_type of the related field.

3.4 May 16 - June 5

  • Make all the new methods such as rename_model(), alter_field() inside ProjectState.

  • Alter the state_forwards() methods in all classes subclassed by Operation.

  • Make further required changes in db/migrations/state.py.

3.5 June 5 - June 15

  • Alter the SchemaEditor (methods like create_model(), delete_model(), etc.), to work with ModelStates instead of Models.

3.6 June 15 - June 23

  • Alter the database_forwards() and database_backwards() methods in all subclasses of Operation.

3.7 June 24 - July 9

  • Work on the new class ModelStateOptions and write all methods necessary to handle _meta attributes. (I might have exams during this period, hence the rather long duration).

3.8 July 10 - July 20

  • Start testing the new framework manually; make changes and fixes wherever required accordingly.

3.9 July 20 - August 8

  • Work on tests and documentation.

3.10 8th August onwards

  • I'd hate to sit around idle, so if all goes according to this timeline and I am finished with this on time, I would be working on other migration/ORM tickets which would help improve the framework, such as:
    • Adding support for functional constraints. (see #30916)
    • Fixing this issue related to ManyToManyField. (see #31064)

4. About Me

Hi, I am Sanskar Jaiswal, a second year undergrad studying Electronics and Communication Engineering at Vellore Institute of Technology, Vellore, India. I have been coding since high school, and although my major is Electroncis, my passion is developing good software. I am a member of IEEE-VIT, which is one of the best student chapters on my campus. My peers there have highly motivated me and shaped me into the developer I am today. We have won several hackathons, and had the opportunity to build amazing stuff. I personally have worked on the following projects involving Django:

  • Recruitment Website: This was a recruitment website portal that my friend and I built, which had to support various features for candidates and the recruiter. This project uses Django Rest Framework too. The code might be a little dirty, because we had to finish and deploy it in 3 nights.

  • Blog: This was one of my first projects with Django, in which I made a blog on my own from scratch, following all best practices involved.

Apart from Django projects, I have worked a lot on Machine Learning and Flutter, check out my GitHub, if you're interested. :)

I got interested in GSoC, when a senior in my university introduced me to it and the culture of open source, and I was very mesmerized when I saw how incredible it is. My senior pushed me to contributing to Django, as I was familiar with the framework. Since then, I have opened 4 PRs:

Django was the first web framework, that I learned and I simply love it. The idea to contributing to such an amazing software in a significant way, is something I am definitely excited about.

Details:

  • Name: Sanskar Jaiswal
  • Email: jaiswalsanskar078@gmail.com
  • Gender: Male (he/him)
  • GitHub: aryan9600
  • LinkedIn : Sanskar Jaiswal
  • IRC nick: aryan9600
  • Contact: +91 810 000 44969
  • Country: India
  • Timezone: Indian Standard Time (IST | UTC +5:30)
  • Languages Known: English, Hindi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment