Skip to content

Instantly share code, notes, and snippets.

@akki
Last active March 28, 2017 05:53
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akki/b438292c2c3cf199012f to your computer and use it in GitHub Desktop.
Save akki/b438292c2c3cf199012f to your computer and use it in GitHub Desktop.
My GSOC 2016 proposal for Django

Reworking Django's Indexes

Table of Contents

  1. Abstract
  2. Motivation
  3. Proposed solution and specs
  4. Timeline
  5. About me

1. Abstract - The What

This proposal aims to bring flexibility in Django's operations related to database indexes - their creation, modification. Another goal is to allow Django to support creation of more complex indexes using different methods (like RTree, Hash) on relevant backends (most notably PostgreSQL), such as functional indexes and indexes with different algorithms (e.g. GiST).

2. Motivation - The Why

2.1. So that we can do more with indexes.

Right now, Django isn't able to offer much customisability to it's users when it comes to creating indexes. In fact, db_index and index_together only give the ability to create some specific indexes. But much more can be done with indexes, like declaring indexing in ascending/descending order, for example. Indexes if used properly can speed up a database scan significantly by giving sub-linear time lookup to improve performance.

2.2. Because we need more than BTrees.

There are new fields being introduced in contrib.postgres which have similar issues to the rationale for spatial_index, a BTree index is either insufficent, inefficient or does not work and an alternative index type is needed for that field. A field should be able to declare the type of index which is appropriate to it.

PostgreSQL also supports partial indexes using the WHERE clause.

2.3. Extend expressions into indexes.

In addition, both postgres and oracle support powerful functional indexes (and mysql supports indexing on the first few characters of a string), which allow significant performance benefits for certain queries. You can create indexes on expressions like lower(col1) to speed up queries like SELECT * FROM test1 WHERE lower(col1) = 'value';. Of course, it is possible to create these indexes by hand if you know what you are doing using RunSQL based migrations but this is a place for improvement.

3. Proposed solution and specs - The How

Class based indexes

Introduce django.db.models.indexes with deconstructable Index class. This class would allow us to have the power over creation of indexes depending on the various arguments passed to it.

Index() classes will be able to define their own SQL using the as_sql pattern similar to that used by Lookup(), but will be passed the SchemaEditor class instead of a compiler. By default, this will defer to the SchemaEditor class, but it allows non-standard index creation such as that for spatialite (using a SELECT statement and a function) without having to subclass SchemaEditor.

Meta.indexes

Introduce Meta.indexes for other indexes. This will hold a list of Index() instances and will keep a track of all indexes for a given model. index_together will continue to be supported and would also be added to Meta.indexes internally.

Migrations layer

The general workflow for index addition/deletion would be -Migrations framework (autodetector) detects that Meta.indexes has changed and adds an Operation => The operations.models layer tells migrations that due to the change a new index needs to be added/deleted/altered/renamed => migrations framework implements this change.

In order to have a robust way to detect name changes in indexes, the index name will be explicitly passed from the operations to migrations, whether they are auto-generated or have been defined by user. It would be a part of the deconstructed form of the index.

Some advances in db_index

db_index of the Field class will support index classes (e.g. IntegerField(db_index=indexes.Hash)) other than the values True and False.

Field will gain a method get_default_index which will be called when db_index=True is passed to the __init__. The default value would be the SpatialIndex in gis, which will mark the start of deprecation of spatial_index which would be supported with deprecation warnings.

db_index, index_together would actually be shortcuts for editing Meta.indexes.

Functional Indexes

The FunctionalIndex class will be introduced to allow use of functional indexing. It would let expressions to be used in index definitions. It would take in expressions (e.g. F, lower) based on which indexes would be created.

The as_sql of this class would use the power of django expressions and their already existing methods (like resolve_expression, as_sql, etc.) to create the sql query. For example, an expression like F('col1') + F('col2') creates a CombinedExpression which can be used to get the corresponding sql statement of this expresion. The connection would be used from the passed SchemaEditor class object.

Syntax

Index classes will take the fields, name of the index, method to be used while creating the index, etc. as its arguments. name would allow the user to pick an explicit name for their index if they want. Also, passing the name as an argument to the class would be easier to implement and would be helpful while checking for duplicate index names (as compared to an index_name = Index('field1') approach).

The idea is to create an index for each value in the Meta.indexes list. This way an index like Index(['field1', 'field2']) would basically imply a multi-column index. Also, names of the index would be passed as an argument to the Index class.

Example:

# models.py

class Town(models.Model):
   name = models.CharField(max_length=255)
   slug = models.SlugField(unique=True)
   region = gis.PolygonField(db_index=True)
   area = models.FloatField()
   population_density = models.FloatField()

   class Meta:
      indexes = [
         models.Index('slug', 'name'), # multi-column indexing.
         models.Index('name', name='user_given_index_name'),
         models.FunctionalIndex(F('area') * F('population_density'), type=models.BTree),
      ]

Backwards compatibility

spatial_index will be deprecated as setting db_index to True on a spatial field would be the new way.

4. Timeline - The When

To avoid the whole 12-week work getting merged as a single jumbo patch, the workflow has been designed in such a way that different parts of the work can be merged independently as individual patches. To indicate such areas the timeline contains certain checkpoints meaning that work between two checkpoints can be reviewed and merged separately (although checkpoint 1 is necessary to get any other patch merged). This will facilitate the review process and also allow a better understanding of the changes taking place in an incremental way.

4.1. Basic Work (1.5 weeks)

  • Creating the django.db.models.indexes module and a base class like BaseIndex.
  • Creating the deconstructible Index class, defining its various attributes and methods - deconstruct, as_sql, remove_as_sql and other private methods.
  • Modify the schema(s) in django.db.backends to incorporate a more flexible index creation/deletion definitions and introduce new method for the same.

4.2. Add customisability (1.5 weeks)

  • Creating classes for various index methods like BTree, Hash.
  • Add support for various opclasses and ensure backward compatibility.
  • Add support for using index ordering - #20888.
  • Create documentation for indexes - introduction part and types of indexes available.
  • Quarterly rebasing against master.

4.3. Introduce Meta.indexes (1 week)

  • Add indexes in models.options i.e. Create Meta.indexes to store indexes = [IndexType('field')].
  • Add function to detect changes in django.db.migrations.autodetector.
  • Create AlterIndexes operation in migration layer.
  • Other changes in migrations layer (state, auto_detector etc.). Stitching different parts together.

4.4. Writing tests and update docs (1.5 weeks)

  • Write tests for for autodetector.
  • Write tests for migrations.
  • Write tests for schema.
  • Add tests for #23577 check if everything works fine.
  • Mentions in release notes and other areas of documentation as per requirement.

Checkpoint 1 - Add basic class based index.

This commit would add the basic Index class, it's functionality and would allow it's usage through Meta.indexes. This part is the skeleton of custom indexes and hence is a prerequisite for the rest of the work (checkpoints).

4.5. Integrate index_together (0.5 week)

  • Make index_together use the new indexes's functions.
  • Remove/Deprecate AlterIndexTogether operation, alter_index_together and other related functions.
  • Quarterly rebasing against master.

4.6. Allow index classes to db_index (1 week)

  • Allow support for custom index classes to db_index.
  • Get this change get detected in autodetector as AlterIndexes and not as AlterField.
  • Addition of get_default_index to Field (and it's subclasses, as required).
  • Set db_index to True by default for unique=True and ForeignKey fields.
  • Updating docs.
  • Writing tests.

Checkpoint 2 - Integrate older ways of index creation to use class based indexes internally.

This merge would update the existing ways of modifying indexes i.e. index_together and db_index so that they use the new class based indexes. It can be further broken into separate commits for index_together and db_index if required.

4.7. Integrate spatial index (2 weeks)

  • Creating class for SpatialIndex type in gis.db.models.indexes.
  • Take into account definitions of various backends.
  • Tuning it up to be used with db_index in BaseSpatialField.
  • Writing test for db_index=SpatialIndex().
  • Updating docs of both indexes and gis.
  • Removing any non-required code. Deprecation of of spatial_index.
  • Quarterly rebasing against master.

Checkpoint 3 - Integration and deprecation of spatial_index.

4.8. Add functional indexes (2 weeks)

  • Allow expressions as arguments.
  • Integrating the expressions api.
  • Resolving the expression and using it's sql for index creation.
  • Write various methods
  • Writing the FunctionalIndex.

4.9. Writing tests, docs and winding up (1 week)

  • Write extensive/rigorous tests for functional indexes.
  • Writing In-detail documentation for indexes - usage, specs, etc.
  • Mentioning in release notes and other areas of documentation that reqiure updation.
  • Final rebase against master.

Checkpoint 4 - Add functional indexes.

This merge would allow Django to let it's users create functional indexes.

If time permits

If I finish early, I would try to resolve the tickets which are dependent on the work of this proposal, namely #24530 and #23577 by writing their tests and seeing if anything else is required to resolve them. I would also complete anything else that is required to resolve and close #26167.

5. About me

My name is Akshesh Doshi and I am an undergraduate student at Indian Institute of Technology Roorkee (IITR), India. My time zone is UTC+05:30. I have been programming for 4 years now and have also worked in C++, PHP, Java, C# other than Python.

I am the coordinator of Information Management Group (IMG), a student group responsible for the development of the main website and the intranet portal of the institute (which is obviously made using Django ;)). I had joined the group 3 years ago and that is when I had my first experience with Django and fell in love with Python. You can find more about the work over here. I have used Django for many other projects and have used Python for things other than web development like image processing, data analysis and machine learning as well. I have successfully completed many big projects in the past and take responsibilities assigned to me very seriously.

It's been some months since I started contributing to Django. While there have been some merged patches, there are other works which are under development. My experience till now has been really great and I have found the community very welcoming.

I can communicate in English. You can find me hanging out on #django-dev IRC channel as akki. My email id is aksheshdoshi+gsoc@gmail.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment