Skip to content

Instantly share code, notes, and snippets.

@manav014
Last active April 12, 2022 05:49
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save manav014/9b0feb734e4d140eef1913340602d2ae to your computer and use it in GitHub Desktop.
Save manav014/9b0feb734e4d140eef1913340602d2ae to your computer and use it in GitHub Desktop.
"Adapt schema editors to operate from model states instead of fake rendered models" -- proposal for Google Summer of Code 2021.

Adapt schema editors to operate from model states instead of fake rendered models" proposal for Google Summer of Code 2021.

Table of content

  1. Abstract 1.1 Drawbacks of the existing Migration framework 1.2 Goals 1.3 Benefits
  2. The new ModelState implementation
  3. Schedule and milestones
  4. About me

1. Abstract

1.1 Drawbacks of the existing Migration framework

The migration framework is very well optimized since #22608 was created. But Currently, Schema Editor works with model or __fake__ rendered model classes. These model classes are a part of ProjectState.apps.all_models These are registered using register_model() which is called in the ModelBase and rendered from ModelStates using render() method. These model classes are dynamically created or recreated python classes along with the dependency chain that slows down the migrate phase. (see #29898);

In state_forwards() we have to call reload_model() that internally calls render_multiple() and render() methods that leads to the creation of these dynamic python classes. The whole flow of the problem is depicted here by means of a flowchart. ModelStates are reduce API and works similarly as model classes. But as ModelState class can not resolve references, hence we can not directly replace model classes with that of ModelStates.

1.2 Goals

This proposal is all about adapting Schema Editor to work with ModelState in spite of the model classes.

1. The first milestone is to create a central registry that could store all the relations between models.

2. The second milestone would be, to create method mapping in ProjectState for all Operation subclasses' state_forwards and state_backwards. For Example CreateModel.state_forwards calls into ProjectState.add_model where all the logic is encapsulated. Similarly there would be new methods in ProjectState like alter_field, rename_model and so on that would be called(called signifies function call) by the state_forwards of respective Operation subclasses.

3. The third milestone would be, to update BaseDatabaseSchemaEditor and DatabaseSchemaEditor methods and properties along with all database_forwards and database_backwards methods in order to adapt ModelStates.

1.3 Benefits

The migration framework is one of the most important modules of Django and its optimization will add a lot of value to the Django Framework. There are undoubtedly many benefits of this change.

  • The new implementation would be more efficient and more rapid in applying the migrations.
  • This leads to the quick application of migrations with the migrate command.
  • This will lead to a significant decrease in the migrate phase of large projects.
  • This will in fact clean the code and make it less redundant.
  • This implementation will lead to the reduction of significant load on the framework.
  • Closes ticket #29898.

2. The New Framework

2.1 Overview

The new implementation will consist of a central registry in ProjectState instances where all the related fields would be registered for all apps. Then significant changes in ProjectState, Schema Editor, and Operations' subclasses would be made in order to make all of them adapt ModelStates in spite of fake rendered models in a backward compatible way. Then the new method mapping in ProjectState for all Operation subclasses' state_forwards and state_backwards would be introduced. Please note that the below-mentioned code snippets are not final and may be further optimized at the time of actual implementation.

Central Registry.

  1. As of now the main problem is that ModelState can not resolve references
  2. A new Central Registry would be introduced which will locate in ProjectState instances(suggested by @charettes in a discussion) and store all the related references of all apps in the form:

    project_state.related_fields_registry[app_label, model_name]:[(from_field,app_label,model_name,to_field),...]
  3. We may now easily get the db type of to_field using:

    project_state.models[to_app_label][to_model_name].fields["to_field"].db_type(connection)
  4. The db_type derived from step 3 can now be used in methods like column_sql() in SchemaEditor class that needs the db_type of remote fields.
  5. The registry would be initialized in the __init\_\_() method of ProjectState using the following code:

    self.related_fields_registry = related_fields_registry or {}

Populating the Central Registry.

  1. The central registry would be populated by the state_forwards or state_backwards methods of operations' subclasses like CreateModel , AddField, etc.(listed here) The registry would be invalidated when state alterations would be performed.(suggested by @charettes in a discussion)
  2. To populate the central registry, if the to_field is specified with the ForeignKey then it may be easily retrieved using the following Example for state_forwards of AddField :

    self.field.to_fields # fetching the to_field
  3. The to_field fetched in the above step would now be added to the central registry.
  4. If the to_field is not set, then the primary key of the to_model would be used.
  5. The registry will store the relations in a forwards as well as backward manner.
  6. The implementation for ProjectState.add_model would be as follows:

    #Please note that the original implementation would be more optimized and will have more clean code
    
    def add_model(self, model_state):
        app_label, model_name = model_state.app_label, model_state.name_lower
    
        #Central Registry Population start
        for name,field in model_state.fields.items():
            if(field.is_relation):
                to_app_label, to_model_name = field.remote_field.model.split('.')
                try:
                    if None not in field.to_fields:
                        self.related_fields_registry.setdefault((model_state.app_label,model_state.name),[]).append((field,to_app_label,to_model_name,field.to_fields))
                    else:
                        #The "pk" would be replaced with the Primary key field name for the model
                        self.related_fields_registry.setdefault((model_state.app_label,model_state.name),[]).append((field,to_app_label,to_model_name,'pk'))
                except AttributeError: # If to_fields is not explicitly defined
                    self.related_fields_registry.setdefault((model_state.app_label,model_state.name),[]).append((field,to_app_label,to_model_name,'pk'))
        # Central Registry population end
    
        self.models[(app_label, model_name)] = model_state
        if 'apps' in self.__dict__:  # hasattr would cache the property
            self.reload_model(app_label, model_name)
  7. To get the primary key using ModelState in order to replace "pk" in above code, we will set a new attribute in ModelState initialization and this "pk" would be initialized in __init\_\_ method of the ModelState class. There won't be the need of any extra iteration as the pk would be initialized in the loop for sanity checks, The new code would be as follows:

    def __init__(self, app_label, name, fields, options=None, bases=None, managers=None):
        ...Previous_Initializations...
    
        # New code for Primary Key starts
        self.pk=None
        for name, field in self.fields.items():
            if(not self.pk and field.primary_key):
                self.pk=field
            # New code for Primary Key ends
    
            ...All_Sanity_Checks..
        ...More_Sanity_checks...

Additions in ProjectState class. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. CreateModel.state_forwards call into ProjectState.add_model() where all the storage and cache invalidation logic is encapsulated. 2. The idea is to make other state_forwards and state_backwards methods work the same way. 3. For example In AlterTogetherOptionOperation currently the following code is executed:

def state_forwards(self, app_label, state):
    model_state = state.models[app_label, self.name_lower]
    model_state.options[self.option_name] = self.option_value
    state.reload_model(app_label, self.name_lower, delay=True)
  1. In new implementation it would be changed with the following code:

    def state_forwards(self,app_label,state):
        state.alter_together_option(self,app_label,state) # ProjectState method that will encapsulate the logic
  2. This would make it easily testable in isolation and allow it to maintain the new registry mapping.
  3. Also new methods like - add_to_fields_registry, search_in_registry (to work with the new central registry) would be introduced.

Modifications in SchemaEditor classes.

  1. The new BaseDatabaseSchemaEditor and DatabaseSchemaEditor classes would be based on the initial patch by Markus Holtermann.
  2. For Example, The new implementation of delete_model in BaseDatabaseSchemaEditor would be as follows where get_to_field will get the to_field from the new registry.(This is just an example code real implementation will differ a bit):

    def delete_model(self, model):
        """Delete a model from the database."""
        if isinstance(model,ModelState):
            for name,field in model.fields:
                if field.manytomany and projectstate.get_to_field(model.app_label,model.name_lower,field).auto_created is True:
                    through_table = '%s_%s_%s' % (
                        model.app_label, model.name_lower, name.lower(),
                    )
                    through_db_table = truncate_name(
                        through_table,
                        self.connection.ops.max_name_length()
                    )
                    self.execute(self.sql_delete_table % {
                        "table": self.quote_name(through_db_table),
                    })
        else:
            warning_on_model_class(self,self.delete_model) # Warn if the model class is passed
            # Handle auto-created intermediary models
            for field in model._meta.local_many_to_many:
                if field.remote_field.through._meta.auto_created:
                    self.delete_model(field.remote_field.through)
            # Delete the table
            self.execute(self.sql_delete_table % {
                "table": self.quote_name(model._meta.db_table),
            })
            # Remove all deferred statements referencing the deleted table.
            for sql in list(self.deferred_sql):
                if isinstance(sql, Statement) and sql.references_table(model._meta.db_table):
                    self.deferred_sql.remove(sql)
  3. The get_to_field method would be defined in ProjectState class to get the to_field from_app_label,model_name, and field_name.
  4. Other methods in the BaseDatabaseSchemaEditor and DatabaseSchemaEditor would be edited the same way.
  5. Along with the BaseDatabaseSchemaEditor other SchemaEditor classes like DatabaseSchemaEditor of postgresql would be adapted to work according to the proposed changes.

Backward Compatibility.

  1. While adapting to the changes the backward compatibility would be kept in mind.
  2. All the schema editor methods that will support ModelStates, they will work the same way as they were working before the patch with model classes and a new warning would be added for the same.
  3. As there would be a need for SchemaEditor methods to use ProjectState instance a new context manager function would be used in order to not change the signature of pre-defined methods. The code for the same would be as follows:

    @contextmanager
    def patch_project_state(schema_editor, project_state):
        schema_editor.project_state = project_state
        try:
            yield
        finally:
            del schema_editor.project_state

Modifications in database_forwards and database_backwards.

  1. The new database_forwards and database_backwards methods of Opertion subclasses will use the newly defined patch_project_state context manager to pass on the state to schema editor. The code for which would be as follows:

    with patch_project_state(schema_editor, from_state):
                schema_editor.remove_field(from_model, from_model._meta.get_field(self.name)) # could be any Schema Editor Method
  2. Also new database_forwards and database_backwards methods will now use project states in spite of rendered models:

    from_model_state = from_state.models[app_label, self.model_name_lower]
      if self.allow_migrate_model(schema_editor.connection.alias, from_model_state):
          with patch_project_state(schema_editor, from_state):
              schema_editor.remove_field(from_model_state, self.name)

Modifications in ModelState class.

  1. The new ModelState class will have a _meta property that will return an object of ModelStateOptions. These ModelStateOptions will work similarly to the Options class in Model classes and will have properties like app_label, model_name, db_table, etc.
  2. The _meta property that would be defined in the ModelState class would be as follows:

    @cached_property
    def _meta(self):
        return ModelStateOptions(self)
  3. The code for ModeStateOptions would be similar to the following code:

    class ModelStateOptions(object):
    
        def __init__(self, model):
            self.model = model
    
        @property
        def app_label(self):
            return self.model.app_label
    
        @property
        def model_name(self):
            return self.model.name_lower

3. Schedule and milestones

Before I start to code I would like to complete the following checklist:

  • I will understand the codebase of the remaining classes and Operations. So far I have understood the code for BaseDatabaseSchemaEditor, ModelState, ProjectState, Operation, Apps, CreateModel, DeleteModel, ModelBase, Options, RenameModel, and a few other related classes.
  • I will understand the code for special, fields and model operations in depth along with the DatabaseSchema of all backends.Also, I will learn some best practices to use while writing code.
  • While understanding the codebase I will discuss my idea in-depth and how will I achieve the same and will note the points by fellow developers.

I will start working on the above mentioned checklist during the application review period. My exams would be conducted during mid-August (the final date is not decided yet). Till then I have online classes, So I would be able to devote 40-45 hours a week(5-6 hours on weekdays and 7-8 hours on weekends) throughout the GSoC period and as there would be only one evaluation in between this time, I will try to finish the task before time.

I would like to devote 70% of my time to learning and coding, and 30% of my time to test the changes and write documentation for new stuff. I will write blog posts every weekend to make the community aware of my progress, contributions, and a plan for next week.

3.1 Community Bonding(May 17 - May 21)

  • Discuss approach and implementation with senior developers or Mentors.
  • Figure out if there is an any better approach to the problem or the solution could be improved in any way.
  • I will use a bottom-up approach to code. I will start by proxying all Operation subclasses' state_forwards and state_backwards into ProjectState, then I will add logic for the central registry, and then I will make Schema Editor's Actions and properties to adapt ModelState.

3.2 Central Registry -- First Milestone

(From May 22 to June 24) During this phase, I will work on Central Registry and will make ProjectState and Operation subclasses adapt it.

3.2.1 Initialization and population of central registry (1.5 weeks) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  • Creating the proxy for all Operation subclasses' state_forwards and state_backwards methods in ProjectState.
  • Initialization of the Central Registry in ProjectState(like ProjectState.add_field() etc.).
  • Code the logic to populate the registry with ProjectState.add_model() and newly introduced ProjectState.add_field().
  • Fixing and Writing tests along with Documentaion.(if required)

3.2.2 Modifications in all state_forwards and state_backwards methods(2 weeks) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  • Work on Registry alteration methods that would be defined in ProjectState and they would be used by all field and model Operation subclasses' to perform updation in the registry.
  • Code the logic to call all the newly introduced registry alteration methods in newly introduced ProjectState actions like rename_field etc.
  • Fixing and Writing tests along with Documentaion.(if required)

3.2.3 Writing tests and Documentation(1 week)

  • Modification of tests as per the new implementation (if required). Also, the introduction of new tests for the newly introduced changes.
  • Writing the documentation for newly introduced changes, central registry and its methods

3.3 Adapting Central Registry and ModelState -- Second Milestone

(From June 25 to August 6)

3.3.1 Introduction of ModelStateOptions class(1.5 week) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  • Creating a ModelStateOptions class that would be accessible with _meta property. Also, it will consist of functions like - db_tablespace and will work similarly to that of BaseModel options.
  • Manually testing and writing tests for the newly introduced ModelStateOptions class and its methods.
  • Writing documentation for newly introduced ModelStateOptions class.

3.3.2 Modification in all database_forwards and database_backwards methods(1 week)

  • Code a context manager that will send the instance of ProjectState to all the Schema Editor's method calls in order to not change the argument list of the methods.
  • Adapt all database_forwards and database_backwards methods to send ModelStates in spite of rendered models to all SchemaEditor's method calls.

3.3.3 Modifications in BaseDatabaseSchemaEditor class(1 week)

  • Adapting table_sql() and column_sql() to work with ModelStates and warn if model class is passed.
  • Adaption actions method like create_model(), delete_model() etc. to work with ModelStates and warn if model class is passed.(Backward Compatibility would be kept in mind while writing the changes)
  • Fixing all the failing tests(If any).

3.3.4 Modifications in postgresql and mysql DatabaseSchemaEditor classes(1.5 week)

  • Adapting methods of DatabaseSchemaEditor of mysql/schema.py to work properly with the new ModelState implementation
  • Fixing all the failing tests(If any) for mysql schema editor.
  • Adapting methods of DatabaseSchemaEditor of postgresql/schema.py to work properly with the new ModelState implementation
  • Fixing all the failing tests(If any) for postgresql schema editor.

3.3.5 Modifications in oracle and sqlite3 DatabaseSchemaEditor classes(1.5 week)

  • Adapting methods of DatabaseSchemaEditor of sqlite/schema.py to work properly with the new ModelState implementation
  • Fixing all the failing tests(If any) for sqlite schema editor.
  • Adapting methods of DatabaseSchemaEditor of oracle/schema.py to work properly with the new ModelState implementation
  • Fixing all the failing tests(If any) for oracle schema editor.

3.4 Working on tests and documentation -- Final Milestone

(From August 7 to August 13)

3.4.1 Testing (0.5 week)

  • Writing tests for all the newly introduced changes and also fixing the failing tests

3.4.2 Documentation (0.5 week)

  • Write documentation for all the Schema Editor changes and newly introduced methods and classes.

3.5 If time permits or After GSoC...

  • I would like to work on the filterabilitiy of window functions. ( I have raised a topic as well for the same on the forum )

5. About me

My name is Manav Agarwal and I am a Junior student of Dr. A.P.J. Abdul Kalam Technical University (India). I started my coding journey when I was in my 9th grade with the basics of python language and started developing projects with Django when I was in my 11th grade. I developed an E-Commerce platform using Django for a nearby company which was one of the best projects I developed.

5.1 Past contributions in Django

I started contributing to Django in September 2020 and I am thankful to all the Senior Developers who helped me out with these contributions. I have a significant number of contributions in different modules of the framework which reflects my self-motivation and shows that I am comfortable with the codebase of Django.

  • #6517 : MySQL: manage.py dbshell does not get charset from DATABASES setting
  • #13060 : ManagementForm exception in case of bad prefix should be easier to understand
  • #26607 : Add a hook to customize the admin's formsets parameters
  • #29712 : Add warning in makemessages command if the localecode with l flag is not correct
  • #31516 : Change automatic migration naming from date-based to operation-based
  • #31636 : BooleanFieldListFilter doesn't respect field choices.
  • #32294 : fields.E305 is raised on ManyToManyFields with related_name='+' in models in different apps but with the same name.
  • #28785 : Tracking Migrations
  • PR-13537 : Fixed #6517 -- Made dbshell use charset option on MySQL.
  • PR-13578 : Fixed #13060 -- Improved error message when ManagementForm data is missing.
  • PR-13722 : Fixed #26607 -- Allowed customizing formset kwargs with ModelAdmin.get_formset_kwargs().
  • PR-13615 : Fixed #29712 -- Made makemessages warn if locales have hyphens and skip them.
  • PR-14109 : Fixed #31516 -- Improved naming of migrations with multiple operations.
  • PR-13413 : Fixed #31636 -- Made BooleanFieldListFilter respect Field.choices.
  • PR-13822 : Fixed #32294 -- Prevented ManyToManyField's hidden related name collisions between apps.
  • PR-14246 : Fixed #14246 -- Made makemigrations warn if migrations are missing.(Open PR)
  • I infact created a PR for the issue of Migration framework(PR-14206) as POC but it has a lot of loopholes and improper code, so I just closed it.

5.2 Personal Details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment