Adapt schema editors to operate from model states instead of fake rendered models" proposal for Google Summer of Code 2021.
- Abstract 1.1 Drawbacks of the existing Migration framework 1.2 Goals 1.3 Benefits
- The new ModelState implementation
- Schedule and milestones
- About me
The migration framework is very well optimized since #22608 was created. But Currently, Schema Editor works with model or __fake__ rendered model classes. These model classes are a part of ProjectState.apps.all_models
These are registered using register_model() which is called in the ModelBase and rendered from ModelStates using render() method. These model classes are dynamically created or recreated python classes along with the dependency chain that slows down the migrate
phase. (see #29898);
In state_forwards() we have to call reload_model() that internally calls render_multiple() and render() methods that leads to the creation of these dynamic python classes. The whole flow of the problem is depicted here by means of a flowchart. ModelStates are reduce API and works similarly as model classes. But as ModelState
class can not resolve references, hence we can not directly replace model classes with that of ModelStates.
This proposal is all about adapting Schema Editor to work with ModelState
in spite of the model classes.
1. The first milestone is to create a central registry
that could store all the relations between models.
2. The second milestone would be, to create method mapping in ProjectState for all Operation
subclasses' state_forwards
and state_backwards
. For Example CreateModel.state_forwards calls into ProjectState.add_model where all the logic is encapsulated. Similarly there would be new methods in ProjectState like alter_field
, rename_model
and so on that would be called(called signifies function call) by the state_forwards
of respective Operation subclasses.
3. The third milestone would be, to update BaseDatabaseSchemaEditor
and DatabaseSchemaEditor
methods and properties along with all database_forwards
and database_backwards
methods in order to adapt ModelStates
.
The migration framework is one of the most important modules of Django and its optimization will add a lot of value to the Django Framework. There are undoubtedly many benefits of this change.
- The new implementation would be more efficient and more rapid in applying the migrations.
- This leads to the quick application of migrations with the
migrate
command. - This will lead to a significant decrease in the
migrate
phase of large projects. - This will in fact clean the code and make it less redundant.
- This implementation will lead to the reduction of significant load on the framework.
- Closes ticket #29898.
The new implementation will consist of a central registry in ProjectState instances where all the related fields would be registered for all apps. Then significant changes in ProjectState, Schema Editor, and Operations' subclasses would be made in order to make all of them adapt ModelStates in spite of fake rendered models in a backward compatible way. Then the new method mapping in ProjectState for all Operation subclasses' state_forwards
and state_backwards
would be introduced. Please note that the below-mentioned code snippets are not final and may be further optimized at the time of actual implementation.
- As of now the main problem is that
ModelState
can not resolve references A new Central Registry would be introduced which will locate in ProjectState instances(suggested by @charettes in a discussion) and store all the related references of all apps in the form:
project_state.related_fields_registry[app_label, model_name]:[(from_field,app_label,model_name,to_field),...]
We may now easily get the db type of
to_field
using:project_state.models[to_app_label][to_model_name].fields["to_field"].db_type(connection)
- The
db_type
derived from step 3 can now be used in methods like column_sql() in SchemaEditor class that needs thedb_type
of remote fields. The registry would be initialized in the __init\_\_() method of
ProjectState
using the following code:self.related_fields_registry = related_fields_registry or {}
- The central registry would be populated by the
state_forwards
orstate_backwards
methods of operations' subclasses like CreateModel , AddField, etc.(listed here) The registry would be invalidated when state alterations would be performed.(suggested by @charettes in a discussion) To populate the central registry, if the to_field is specified with the ForeignKey then it may be easily retrieved using the following Example for state_forwards of AddField :
self.field.to_fields # fetching the to_field
- The
to_field
fetched in the above step would now be added to the central registry. - If the to_field is not set, then the primary key of the to_model would be used.
- The registry will store the relations in a forwards as well as backward manner.
The implementation for ProjectState.add_model would be as follows:
#Please note that the original implementation would be more optimized and will have more clean code def add_model(self, model_state): app_label, model_name = model_state.app_label, model_state.name_lower #Central Registry Population start for name,field in model_state.fields.items(): if(field.is_relation): to_app_label, to_model_name = field.remote_field.model.split('.') try: if None not in field.to_fields: self.related_fields_registry.setdefault((model_state.app_label,model_state.name),[]).append((field,to_app_label,to_model_name,field.to_fields)) else: #The "pk" would be replaced with the Primary key field name for the model self.related_fields_registry.setdefault((model_state.app_label,model_state.name),[]).append((field,to_app_label,to_model_name,'pk')) except AttributeError: # If to_fields is not explicitly defined self.related_fields_registry.setdefault((model_state.app_label,model_state.name),[]).append((field,to_app_label,to_model_name,'pk')) # Central Registry population end self.models[(app_label, model_name)] = model_state if 'apps' in self.__dict__: # hasattr would cache the property self.reload_model(app_label, model_name)
To get the primary key using
ModelState
in order to replace"pk"
in above code, we will set a new attribute inModelState
initialization and this "pk" would be initialized in __init\_\_ method of theModelState
class. There won't be the need of any extra iteration as thepk
would be initialized in the loop for sanity checks, The new code would be as follows:def __init__(self, app_label, name, fields, options=None, bases=None, managers=None): ...Previous_Initializations... # New code for Primary Key starts self.pk=None for name, field in self.fields.items(): if(not self.pk and field.primary_key): self.pk=field # New code for Primary Key ends ...All_Sanity_Checks.. ...More_Sanity_checks...
Additions in ProjectState class. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. CreateModel.state_forwards call into ProjectState.add_model() where all the storage and cache invalidation logic is encapsulated. 2. The idea is to make other state_forwards
and state_backwards
methods work the same way. 3. For example In AlterTogetherOptionOperation currently the following code is executed:
def state_forwards(self, app_label, state):
model_state = state.models[app_label, self.name_lower]
model_state.options[self.option_name] = self.option_value
state.reload_model(app_label, self.name_lower, delay=True)
In new implementation it would be changed with the following code:
def state_forwards(self,app_label,state): state.alter_together_option(self,app_label,state) # ProjectState method that will encapsulate the logic
- This would make it easily testable in isolation and allow it to maintain the new registry mapping.
- Also new methods like -
add_to_fields_registry
,search_in_registry
(to work with the new central registry) would be introduced.
- The new
BaseDatabaseSchemaEditor
andDatabaseSchemaEditor
classes would be based on the initial patch by Markus Holtermann. For Example, The new implementation of
delete_model
inBaseDatabaseSchemaEditor
would be as follows whereget_to_field
will get theto_field
from the new registry.(This is just an example code real implementation will differ a bit):def delete_model(self, model): """Delete a model from the database.""" if isinstance(model,ModelState): for name,field in model.fields: if field.manytomany and projectstate.get_to_field(model.app_label,model.name_lower,field).auto_created is True: through_table = '%s_%s_%s' % ( model.app_label, model.name_lower, name.lower(), ) through_db_table = truncate_name( through_table, self.connection.ops.max_name_length() ) self.execute(self.sql_delete_table % { "table": self.quote_name(through_db_table), }) else: warning_on_model_class(self,self.delete_model) # Warn if the model class is passed # Handle auto-created intermediary models for field in model._meta.local_many_to_many: if field.remote_field.through._meta.auto_created: self.delete_model(field.remote_field.through) # Delete the table self.execute(self.sql_delete_table % { "table": self.quote_name(model._meta.db_table), }) # Remove all deferred statements referencing the deleted table. for sql in list(self.deferred_sql): if isinstance(sql, Statement) and sql.references_table(model._meta.db_table): self.deferred_sql.remove(sql)
- The get_to_field method would be defined in
ProjectState
class to get theto_field
from_app_label
,model_name
, andfield_name
. - Other methods in the
BaseDatabaseSchemaEditor
andDatabaseSchemaEditor
would be edited the same way. - Along with the
BaseDatabaseSchemaEditor
other SchemaEditor classes likeDatabaseSchemaEditor of postgresql
would be adapted to work according to the proposed changes.
- While adapting to the changes the backward compatibility would be kept in mind.
- All the schema editor methods that will support ModelStates, they will work the same way as they were working before the patch with model classes and a new warning would be added for the same.
As there would be a need for SchemaEditor methods to use ProjectState instance a new context manager function would be used in order to not change the signature of pre-defined methods. The code for the same would be as follows:
@contextmanager def patch_project_state(schema_editor, project_state): schema_editor.project_state = project_state try: yield finally: del schema_editor.project_state
The new
database_forwards
anddatabase_backwards
methods ofOpertion
subclasses will use the newly definedpatch_project_state
context manager to pass on the state to schema editor. The code for which would be as follows:with patch_project_state(schema_editor, from_state): schema_editor.remove_field(from_model, from_model._meta.get_field(self.name)) # could be any Schema Editor Method
Also new
database_forwards
anddatabase_backwards
methods will now use project states in spite of rendered models:from_model_state = from_state.models[app_label, self.model_name_lower] if self.allow_migrate_model(schema_editor.connection.alias, from_model_state): with patch_project_state(schema_editor, from_state): schema_editor.remove_field(from_model_state, self.name)
- The new
ModelState
class will have a_meta
property that will return an object ofModelStateOptions
. These ModelStateOptions will work similarly to theOptions
class in Model classes and will have properties likeapp_label
,model_name
,db_table
, etc. The
_meta
property that would be defined in the ModelState class would be as follows:@cached_property def _meta(self): return ModelStateOptions(self)
The code for
ModeStateOptions
would be similar to the following code:class ModelStateOptions(object): def __init__(self, model): self.model = model @property def app_label(self): return self.model.app_label @property def model_name(self): return self.model.name_lower
Before I start to code I would like to complete the following checklist:
- I will understand the codebase of the remaining classes and Operations. So far I have understood the code for
BaseDatabaseSchemaEditor
,ModelState
,ProjectState
,Operation
,Apps
,CreateModel
,DeleteModel
,ModelBase
,Options
,RenameModel
, and a few other related classes. - I will understand the code for
special
,fields
andmodel
operations in depth along with theDatabaseSchema
of all backends.Also, I will learn some best practices to use while writing code. - While understanding the codebase I will discuss my idea in-depth and how will I achieve the same and will note the points by fellow developers.
I will start working on the above mentioned checklist during the application review period. My exams would be conducted during mid-August (the final date is not decided yet). Till then I have online classes, So I would be able to devote 40-45 hours a week(5-6 hours on weekdays and 7-8 hours on weekends) throughout the GSoC period and as there would be only one evaluation in between this time, I will try to finish the task before time.
I would like to devote 70% of my time to learning and coding, and 30% of my time to test the changes and write documentation for new stuff. I will write blog posts every weekend to make the community aware of my progress, contributions, and a plan for next week.
- Discuss approach and implementation with senior developers or Mentors.
- Figure out if there is an any better approach to the problem or the solution could be improved in any way.
- I will use a bottom-up approach to code. I will start by proxying all
Operation subclasses
'state_forwards
andstate_backwards
into ProjectState, then I will add logic for thecentral registry
, and then I will make Schema Editor's Actions and properties to adaptModelState
.
(From May 22 to June 24) During this phase, I will work on Central Registry and will make ProjectState
and Operation
subclasses adapt it.
3.2.1 Initialization and population of central registry (1.5 weeks) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Creating the proxy for all Operation subclasses'
state_forwards
andstate_backwards
methods in ProjectState. - Initialization of the Central Registry in ProjectState(like
ProjectState.add_field()
etc.). - Code the logic to populate the registry with
ProjectState.add_model()
and newly introducedProjectState.add_field()
. - Fixing and Writing tests along with Documentaion.(if required)
3.2.2 Modifications in all state_forwards and state_backwards methods(2 weeks) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Work on Registry alteration methods that would be defined in
ProjectState
and they would be used by all field and model Operation subclasses' to perform updation in the registry. - Code the logic to call all the newly introduced registry alteration methods in newly introduced
ProjectState
actions likerename_field
etc. - Fixing and Writing tests along with Documentaion.(if required)
- Modification of tests as per the new implementation (if required). Also, the introduction of new tests for the newly introduced changes.
- Writing the documentation for newly introduced changes, central registry and its methods
(From June 25 to August 6)
3.3.1 Introduction of ModelStateOptions
class(1.5 week) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Creating a
ModelStateOptions
class that would be accessible with_meta
property. Also, it will consist of functions like -db_tablespace
and will work similarly to that ofBaseModel options
. - Manually testing and writing tests for the newly introduced
ModelStateOptions
class and its methods. - Writing documentation for newly introduced
ModelStateOptions
class.
- Code a context manager that will send the instance of
ProjectState
to all the Schema Editor's method calls in order to not change the argument list of the methods. - Adapt all
database_forwards
anddatabase_backwards
methods to sendModelStates
in spite of rendered models to all SchemaEditor's method calls.
- Adapting
table_sql()
andcolumn_sql()
to work with ModelStates and warn if model class is passed. - Adaption actions method like
create_model()
,delete_model()
etc. to work with ModelStates and warn if model class is passed.(Backward Compatibility would be kept in mind while writing the changes) - Fixing all the failing tests(If any).
- Adapting methods of
DatabaseSchemaEditor
ofmysql/schema.py
to work properly with the newModelState
implementation - Fixing all the failing tests(If any) for mysql schema editor.
- Adapting methods of
DatabaseSchemaEditor
ofpostgresql/schema.py
to work properly with the new ModelState implementation - Fixing all the failing tests(If any) for postgresql schema editor.
- Adapting methods of
DatabaseSchemaEditor
ofsqlite/schema.py
to work properly with the new ModelState implementation - Fixing all the failing tests(If any) for sqlite schema editor.
- Adapting methods of
DatabaseSchemaEditor
oforacle/schema.py
to work properly with the new ModelState implementation - Fixing all the failing tests(If any) for oracle schema editor.
(From August 7 to August 13)
- Writing tests for all the newly introduced changes and also fixing the failing tests
- Write documentation for all the Schema Editor changes and newly introduced methods and classes.
- I would like to work on the
filterabilitiy
of window functions. ( I have raised a topic as well for the same on the forum )
My name is Manav Agarwal
and I am a Junior student of Dr. A.P.J. Abdul Kalam Technical University (India)
. I started my coding journey when I was in my 9th grade with the basics of python language and started developing projects with Django when I was in my 11th grade. I developed an E-Commerce platform using Django
for a nearby company which was one of the best projects I developed.
I started contributing to Django in September 2020 and I am thankful to all the Senior Developers who helped me out with these contributions. I have a significant number of contributions in different modules of the framework which reflects my self-motivation and shows that I am comfortable with the codebase of Django.
5.1.1 Issues Fixed
- #6517 : MySQL: manage.py dbshell does not get charset from DATABASES setting
- #13060 : ManagementForm exception in case of bad prefix should be easier to understand
- #26607 : Add a hook to customize the admin's formsets parameters
- #29712 : Add warning in makemessages command if the localecode with l flag is not correct
- #31516 : Change automatic migration naming from date-based to operation-based
- #31636 : BooleanFieldListFilter doesn't respect field choices.
- #32294 : fields.E305 is raised on ManyToManyFields with related_name='+' in models in different apps but with the same name.
- #28785 : Tracking Migrations
5.1.2 Pull Requests
- PR-13537 : Fixed #6517 -- Made dbshell use charset option on MySQL.
- PR-13578 : Fixed #13060 -- Improved error message when ManagementForm data is missing.
- PR-13722 : Fixed #26607 -- Allowed customizing formset kwargs with ModelAdmin.get_formset_kwargs().
- PR-13615 : Fixed #29712 -- Made makemessages warn if locales have hyphens and skip them.
- PR-14109 : Fixed #31516 -- Improved naming of migrations with multiple operations.
- PR-13413 : Fixed #31636 -- Made BooleanFieldListFilter respect Field.choices.
- PR-13822 : Fixed #32294 -- Prevented ManyToManyField's hidden related name collisions between apps.
- PR-14246 : Fixed #14246 -- Made makemigrations warn if migrations are missing.(Open PR)
- I infact created a PR for the issue of Migration framework(PR-14206) as POC but it has a lot of loopholes and improper code, so I just closed it.
- Email :
dpsman13016@gmail.com
- Timezone : Indian Standard Time (UTC + 5:30)
- Primary Language : English
- Country of Residence : India
- Contact Number : +91 9897659505 / +91 9837693800
- LinkedIn Profile : https://www.linkedin.com/in/manav-agarwal-982553190/