Skip to content

Instantly share code, notes, and snippets.

@anirudhprabhakaran3
Last active March 30, 2024 07:33
Show Gist options
  • Save anirudhprabhakaran3/e96400bfb9b28d1de3b69e82beeb894d to your computer and use it in GitHub Desktop.
Save anirudhprabhakaran3/e96400bfb9b28d1de3b69e82beeb894d to your computer and use it in GitHub Desktop.
[GSoC 2024] Configurable Content Type Parsing

[GSoC 2024] Configurable Content Type Parsing

Introduction / Motivation

This project is motivated by an attempt for modernizing the HTTPRequest object. One of the changes proposed is adding a content-aware request.data property. Currently, we have request.POST already parsing form data - this project aims to:

  • add a request.data that will from now store this data (request.POST will alias this variable, to ensure backward compatibility)
  • provide a general class (ContentParser/AbstractContentParset) that can be extended. This class will implement (at least) the following functions:
  • can_parse() -> bool that returns if the parser can parse the request.body.
  • parse() that returns the parsed body, and stores it in request.data.

Previous Discussions

There has been quite some discourse on this topic. There is a draft DEP (that can be found here) that introduces the idea for content parsing. This mainly focuses on content parsing that other (extremely successful) extensions to Django have done - one of the most famous being django-rest-framework.

On another path, we had Adam Johnson make this proposal to update the request object variables to make them more pythonic and informative about their actual function, thereby removing any ambiguity. After discussion, the original issue was marked as wontfix. This was due to concerns about:

  • Documentation. As pointed out by Carlton Gibson, just the naming change require documentation updates in a lot of places.
  • Backward compatibility. This would definitely cause some problems in keeping older APIs working, especially the ones that depend on request.POST.
  • The original proposal just wanted to change nomenclature, and nothing more. Having such a disruptive change for limited functionality improvement was not considered optimal, and it was suggested to club this with the content parsing to provide a new set of APIs with new functionalities. Discussion can be found here.

As mentioned above, the final idea that emerged was to merge these two concerns into one overarching choice to improve the request object itself - provide new APIs that have content parsing in them.

There has been work done in these two directions:

  • David Smith's patch tackles the content parsing part, by adding the BaseParser and JSONParser. This also abstracts out FormParser and MultiPartParser.
  • Abhinav Yadav's patch tackles the issue of renaming the variables (like request.GET -> request.query_params, etc.).

These MRs are currently blocked, pending on a vote by the technical board, and DEP to be proposed. Relevant thread.

Work Proposed

As we note, there is quite some work that has happened in this field. However, there are still some bits and pieces left to make this change more cohesive and effective, without it being disruptive. The following points outline what I wish to work on during the GSoC period:

  • Assist Carlton in writing the DEP for the same. I think the draft DEP linked above would be a good starting point for us to begin from, since it is very coherent about the changes needed. However, the main focus there was for JSON parsing (bringing functionality present in other plugins like Django REST Framework), and not a generic approach. We should modify and update that to include more generic content parsing. This should also include the proposed changes for updating the request variables.
  • The MR also has to be updated to add configurable content parsers. After discussion, it was decided to not use a setting variable for this. If custom parsers are required throughout the project, users can add their parser in the middleware. The ideal approach would be to add this on a per-view basis. I think having a decorator would be pretty good, as this seems to be the way that most view-specific functionality is implemented in Django.
  • Finish the work on the MRs and get them merged after Technical Board approval. Ideally, the approach would be to create a research candidate (RC) branch, which we ask a few volunteers to use to check what all breaks. (I myself would like to volunteer - there are a couple of projects that I am working on where I would love to try it out, including one that is deployed on a production environment). Collecting feedback from these volunteers would be the best step to go ahead with this.
  • Update documentation. As shown by Carlton, there are a lot of places where the docs are to be updated. This has to be updated to show all the new features that have been introduced. As pointed out,this will required release notes.

Timeline

The project is earmarked as a 350 hour project, which I feel is justified considering the amount of work that is left to be done regarding the same. A rough breakdown of how I plan to finish this project is mentioned below. Please do note the footnote as well.

  • Community Bonding (May 1 - May 26)

    • Work on the DEP with Carlton
    • Get opinions from the wider community about the implementation details.
    • By the end of the community bonding period, hopefully we can get the DEP approved and work on the code.
  • May 27 - June 10

    • Polish David's MR and complete it. This includes getting getting community opinions and fixing the various TODO tasks mentioned in PR.
      • Finalising the function signature of parse, along with the return type. One idea was for it to always return both post and file data, but this has to be confirmed and implemented.
      • request.POST and request.data can be called on the same request. Ideally, we do not want there to be different types of processing of request.body for different variables.
      • Proper exception raising also has to be considered.
      • For requests, we must decide how to set and get associated parsers. Currently we have a setter on the request object. There was also an question raised whether the data property should be set on the HTTPRequest or the [WSGI | ASGI]Request.
  • June 10 - July 8

    • Write code for allowing adding custom parsers. Users will ideally subclass the BaseParser class and make their own parser class. They will also have to implement the parse function to parse the request.body. They can add this to the request either by adding it in the middleware, or by adding it to a view using a decorator.
  • July 8 - July 12

    • Document all the new changes that have been introduced till now.
    • Prepare documentation and reports for midterm evaluation.
  • July 12 - July 29

    • Polish Abhinav Yadav's MR and complete. The major concerns regarding this MR are:
      • to ensure that no other part of the request cycle breaks due to the nomenclature changes.
      • to ensure that documentation is properly updated to reflect these change.
      • to ensure that (wherever applicable) warnings (ideally deprecation warnings, if we are planning to remvoe the old APIs) are displayed in an informative manner.
    • By the end of this phase, we should have an release candidate (RC) ready for beta testing among volunteers and among the wider Django community.
  • July 29 - August 12

    • Work with Adam to introduce these changes into django-upgrade. I am giving myself a couple of weeks for this, as I am not fully up to date with this project, and I might have some difficulties that I have not considered right now.
  • August 12 - August 26

    • This time will mainly be used for iterative testing and improvement based on usage by volunteers. There might be some corners that we missed out, or some bugs we introduced that broke earlier functionality, which will need correction.
  • August 26 - September 2

    • Bug fixes
    • Complete documentation
    • (Hopefully) Approval and merging into the main codebase, with release notes provided for anyone migrating to this version!

About Myself

Hello! I am Anirudh Prabhakaran, a final year university student from the National Institute of Technology Karnataka, Surathkal. I have been using Django for quite some time now - and it has become my go-to framework for any web development project.

Some noteable places where I have used Django:

  • Corpus is a Django based Club Management Platform for IEEE NITK. I am one of the founders of this project, and have contributed mainly to this project till now.
  • DBAccess is a Django based Access Management solution that I am currently developing.
  • I have been working on creating an LLM server powered by Llama for use in our college. The API to that server is made using Django REST Framework. Unfortunately, code is not publically available yet - but I am trying to push it for open source adoption.

Footnote

This proposal has been made considering what I could grasp from the various discussion. Due to the diffuse nature of this discourse (it was first brought up about 10 years ago), there might be a few bits and pieces that I missed out on. Of course, I am keeping that in mind, and this proposal is to give a guiding direction to collect all the efforts together. There can (and probably) will be quite a few updates and requirements added to this proposal, which I plan to take in my stride and work on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment