ChetanKhanna/proposal.rst Secret

## proposal.rst

      
    Raw
  

              proposal.rst
            
          
    GraphQL plugin for Django

GSoC 2020 proposal for Django by Chetan Khanna

Table of Contents


Abstract

Current state of GraphQL and Django

Since its announcement by Facebook, GraphQL has become quite popular. Unlike REST Django currently has no official support for GraphQL.
There are however, quite a few other implementations of this concept available in Python, and Django being one of the most prominent web framework out there, most of them provide some sort of support/extension to it. Hence to say, Django users are already using GraphQL widely in their projects.
However, most of them are based of the referece implementation by Facebook written in JavaScript, and as such, leaves a lot of room for improvement and make them more Pythonic and suited for Django conventions. Also, there is scope for improvement on query handling/optimization as per the ORM.
Goals

The goal of this project will be to setup a Django plugin that helps Django users develop projects using GraphQL. I will be aiming to address some of the core issues in the existing third party projects, trying to get them closer to the Django environment.
Benefits

Having an API built solely for Django means we can leverage all the goodness of the ORM and the way Django resolves queries directly in it. Taking into account the ususal naming conventions and patterns followed in the Django ecosystem elsewhere we can avaoid the overhead of learning a new API to a great extent for our users. Since Django already supports development of RESTful API's, we should definitly give our users the same support for GraphQL.
The new API

Approach and principles

I propose to write this as third party extension to Django. I'll be going for code-first approach for the schema meaning the resolver functions will be bound to the type defintions. This should reduce the need to keep them in sync and avoid the overhead of linking the resolver functions to the schema. This in turn becomes cumbersome as the project scales and goes against the Django principles of quick development with less code, and to some extent, DRY principles.
Instead of wrapping it over an existing library, I propose to write code bottom up. The reason being, I want to address a few core issues from other libraries and would in fact, give us more room for tailoring things to our need. This would also mean that we can cut down on additional library requirements for Django projects by integrating features here either during GSoC or during further development. However, a lot of effort has already been put into developing some really nice API's for integrating graphql into Python and Django. It would not be wise to ignore them and hence I will be reusing/refactoring code wherever possible.
API design

The official graphql spec is brilliantly wriiten and I'll be using the same as the basis of this new API (well, it's literary the basis of every graphql library out there). The basic implementation of any graphql service is language agnostic, and roughly has the following structure:

I'll add a breif description and possible implementations for each but the main points of improvements from the existing API's will be in the type module which defines the type system for our API and the execution module which contains the logic for ORM optimized query execution. The design and suggested implementations are still at a high-level, but I've tried to address issues/use-cases in as much detail possible.
GraphQLView

I plan on implementing a new View type for dealing with the graphql queries. This will be the kitchen where all other modules are called to process, parse, validate and generate a valid response as per the graphql spec. This will be very similar to how graphene-django does it. Here is the implementation from graphene-django.
However, initially I plan on keeping the view as simple as the Python equivalent of graphql-js implementation here.
Once the request is processed and the query is executed, we need to serialize the returned object for the client. I plan on putting it inside the GraphQLView initially.
lang module

This is the module that checks for syntax error in the query received. This is pretty much language independent. The idea is to tokenize the incoming query string using lexers and then parse it together into an AST.
The graphql-spec covers this elaborately and a lot of libraries have implemented it in a more or less similar fashion and we can follow the same.
type module

This provides the building blocks for our schema. We write our type system in this module. This will be a fairly importatnt module since this will ultimately affect how we build and process our schema. Again, this is well covered in the specs.
I wanted to, however, discuss the specific design of this module with my mentors since as I stated earlier in the proposal that a lot of other API's use a direct converted implementation from graphql-js, there is room for improvement in Python. So, currently, in graphql-core, the type definitions are class instances, or simply objects. For example:
GraphQLInt = GraphQLScalarType(
    name="Int"
    description="Description"
)
As such, the resolver functions cannot be bind to it. It should clearly be a class instead:
class GraphQLInt(GraphQLScalarType):
    name="Int"
    description="Description"
Graphene does re-write it's type system, however, it still has to walk around, since it uses graphql-core for basic query operations. This issue is also discussed on graphql-core and strawberry pages.
We can retain the type system definition of graphene, which will be fine since this will be an integrated API and not just a wrapper over some existing one. Or we can also use the one as done in pygraphy package which uses a base meta class for all type classes. I will have to discuss this with my mentors though.
I also propose the use of explicit decorators to mark resolver function for a field. Currently, graphene (and graphene-django) links a field with its resolver by appending resolve_ to the field name (something similar to what we do when defining tests, except we don't have to link to anything). So for example, if we have a field name foo, then any instance method of name resolve_foo() is automatically used for resolving this field. But if we change the field name to bar then the function defined earlier no longer resolves that field. Marking the methods with decorators will hopefully overcome this issue. In fact, this proposal is already discussed in graphene here.
At last, I also plan to add extensions for all existing Django fields to the type system.
validation module

Once our schema design is done, we can use visitor pattern to validate the incoming queries against it. This is mostly common in a lot a graphql implementations. We can modify it according to our schema implementation. Here are the referece implementation in graphql-js and graphql-core.
execution module

Execution of the query as described in the graphql spec is pretty straightforward. We traverse the AST breadth-first and execute the resolver funtions attatched to each field. Here is the pseudocode from the spec, and the graphql-core implementation of the same.
But this method is pretty naive and we can improve this using the many functions that the ORM provides. The idea will be to generate an optimized queryset at the end of traversing the AST and executing that instead of those given in the resolver functions directly.
Such optimization is already attempted in a package called graphene-django-optimizer. It takes in the incoming query info and the queryset returned by the resolver functions and outputs a modified queryset using only, select_related and prefetch_related methods.
We can adopt this in our execution module by either:

We can write the exectution module naively to maintain compatibility with the the spec providing decorators for optimized query exectution. This will however result in un-optimized queries if the user does not use the decorators and we may want to show some sort of warning here.
Tweak the exectution a bit. We still traverse the AST, but instead of directly executing the resolver function, we look into the selection_set of each field and modify the resultant queryset using appropiate ORM methods. This could be a bit tricky, but more promising. I am willing to use this method. This way, all our queries will be optimized by default.

The exact implementation will also depend on how we did our schema and type module, but the graphene_django_optimizer will still be a useful package to look at. I might also have to look at the internals of query execution by the ORM (although after working on more than a couple of ORM related tickets I do have some idea). Regardless, I will be needing more discussion from my mentors about the exact implementation and if I might be overlooking something.
Schedule and milestones

(Due to the current circumstances, the schedule of university exams may be changed, although the semester is unlikely to be extended since online classes are ongoing. I've prepared the schedule as per the original examination schedule.)
The schedule below is only a rough estimate as per my understanding.
I've intentionally stretched out on implementation of several modules so to leave room for any backtracking that I might have to do if things go south. As such, I've also added plenty of stuff in post GSoC period section. I've also allocated good time for testing and documentation. I've felt the lack of good documentation while I was researching on previous implementations of graphql.
Regardless, I'm well aware that there will be plenty of work to do on this library even after I implement everything proposed below and I plan to keep contributing to it well after GSoC.
Pre result announcement

First off, there are a couple of tickets that I was working but had to leave them incomplete to focus on my application, I will be atempting to close them.
Next, since almost the entire community bonding period will go in my university exams, I plan to do some of the research work that I would have carried out then. I'll also try experimenting with the tougher parts of the API to help during actual implementation, especially newer concepts like AST and visitor pattern.
Community Bonding

(From May 4 until May 31 -- 4 weeks)
If the university examinations go as planned, I will be free by May mid. (Anyways, examination usually don't exeed a couple of weeks) I would be able to proactively engage with the community and mentors regarding my project and will concretely decide upon the design and any component that I need to re-consider.
First milestone: Implementing lang module

(From June 1 until June 14 -- 3 weeks)
The module is pretty straightforward but it involves concepts like AST on which I haven't worked before. Once this module is done, the API will be able to parse any query into an AST, and throw syntax errors otherwise.
Writing the lexers (1 week)

I'll start with writing a minimal view class and go on to implement lexers for the incoming query. By this time I should be done with any initial setup and tokenize the query string according to the graphql spec.
Parsing the tokens into AST (1 week)

I'll be implementing the AST this week. It may extend a bit into the next week and I might require some help from my mentors as well.
Testing and documentation (1 week)

After finising off any of the last week's work, I'll focus on writting tests and documentation of the API so far.
By the time of first evaluation, I should be ready with the parsing of the query into AST with tests and documentation.
Second milestone: Implement the type system

(From June 15 until July 12 -- 4 weeks)
I will be implementing the type module in this period. I will need a lot of inputs from my mentor here. I have to decide whether the approach I suggested above is good to follow or a new/improved one is needed, but the basic idea is to make the existing type system more Pythonic.
Implementing a rough POC for the type system (1 week)

In order to evaluate our options and check their feasibilty, I might have to implement a rough POC. I will take reference from the above links and also keep up the discussion with my mentors while doing so.
Implementing the POC (2 weeks)

Once the approach is finalized, I will implement it during this time. As per my time requirements, I may decide to skip some non-crucial parts of schema defintion (like directives for example). However, I would like to at least extend the type system to all the fields Django offers.
Testing and documentation (1 week)

This will be the final week of schema implementation where I will solely be writting tests and documentation.
Third milestone: Execution framework

(From July 13 until August 9 -- 4 weeks)
(I'll be travelling back to my university and will have to do registration for next semester during this time, so will need a 2-3 days off)
Although we should have validated the query first, I thought this was a more important module where I'll need more help from my mentors, so keeping it first.
Finalizing design for the module (1 week)

I will be discussing with my mentors how can we optimize query execution and as such, may need to come up with a rough POC. The aim will be to explore possible improvements from the referenced implementations. I will also look into any other approach as suggested by my mentors in this period.
Implementing the final design (2 weeks)

Once the design and optimizations are finalized, I will begin implementing the module accordingly. This is one of the more important modules, so I will stay in constant discussion with my mentors to ensure I am on the right track.
Testing and documentation (1 week)

If everything went timely, I will dedicate this week to write tests and documentation for this module.
Fourth milestone: Validation of query

(From August 10 until August 30 -- ~3 weeks)
Again, this one is relatively straightforward since I know exactly what is to be implemented. I'll be using the visitor pattern and the rules for validation as defined in the spec to validate the query.
Writing the validation modeuke (2 week)

For this part, I'll be writting tests as I write code (thought it would be good to ensure everything is actually validated). This should take about a couple of weeks.
Documentation and remaining tests (1 week)

I will complete any remaining test and devote the rest of my time to finishing off the documentation during this period.
I will be submitting my work at the end of this time. I will keep the last week before result announcement for completing any pending work, writing tests and documentation.
Remaining work

NOTE: Well, there will be a lot of work remaining on this new API, but I hope this would not be a wasted endeavour and would infact greatly boost the progress in the direction. I hope the community wouldn't dismiss the project because of its size and the fact that it would be too tough to complete in a GSoC timeframe.
As I stated earlier, I'm listing a few things that I know would still need more work, although it's possible that I might as well implement some of them during GSoC:

improving the GraphQLView class.
serializing the response generated from executing the query for client
improving errors and making them more verbose and helpful.
adding authentication and autherization to the query requests
adding ability to create custom validation rules
more rigorous testing and documentation

I would be more than happy to continue working on it after GSoC ends.
About me

My name is Chetan Khanna and I'm currently in my third year of undergraduate studies majoring in Mathematics and Computer Science from BITS Pilani, India. My timezone is UTC+5:30. I have been coding in Python for about a couple of years now. I learnt Django about a year back for a college project, which I'm currently leading.
When I first started contributing to Django, I was intrigued by the code behind its powerful ORM. All my contributions are solely from that part of the project I've helped fix for #30827 and #29871. I've also contributed to #29338, #29844 and #29214.
I've done a lot of experimenting with Python since I learnt it during my summer break in first year of college, from using Pygame and constraint satisfaction problems to rebuild classic games to basic ML/DL. Currently I'm working under one of our associate deans, building web platforms that serves about 6000 students from all three campuses of our university. Some parts of it are in a private repository so won't be able to list them here. Most of my other work is available on GitHub.
Link to my GitHub profile
Link to our organization
My email is ck.chetan20@gmail.com. I'm there on django forum by the username ChetanKhanna. I've also recently joined IRC by the username chetankhanna. You can ping me on #django and #django-dev.