Skip to content

Instantly share code, notes, and snippets.

@alexpatel
Created April 3, 2015 16:12
Show Gist options
  • Save alexpatel/5873fa5c3c0d22d3c2d5 to your computer and use it in GitHub Desktop.
Save alexpatel/5873fa5c3c0d22d3c2d5 to your computer and use it in GitHub Desktop.
Improved URL Resolution for Django - Proposal, Google Summer of Code 2015

Improved URL Resolution

Contents

  1. Abstract
  2. Current Infrastructure
  3. Motivation
  4. The URLResolver API
    1. Interface
    2. Usage
    3. Backwards Compatibility
    4. Internationalization
    5. Other Design Considerations
  5. Timeline and Milestones
    1. Milestone 1: API Implementation (Weeks 1 - 4)
    2. Milestone 2: Refactoring the Existing URL Resolver (Weeks 5 - 8)
    3. Milestone 3: Proof-of-Concept: Werkzeug-Style Routing for Django (Weeks 9 - 12)
  6. About Me

Abstract

A clean, elegant URL scheme is an important detail in a high-quality Web application. Django lets you design URLs however you want, with no framework limitations.

URL Dispatcher, Django Documentation

Django features a powerful and complete URL dispatching mechanism, defined in django.core.urlresolvers, that uses regular expression pattern matching to map requested URLs to views with speed and reliability. Having not been subject to significant changes since the beginning of the project's development, however, this tool has not been updated to reflect the extensibility, flexibility, and ease of development at the core of Django's philosophy. For example, it is not possible for developers to specify a non-regex based resolver for a URL configuration, nor is there a standardized way to extend the current resolver to support common patterns or alternative pattern syntaxes to simplify the process of easily writing pretty URLs.

The primary objective of this project is to formalize a public-facing API to support the use of alternative URL resolving mechanisms. This will involve substantial refactoring and abstraction of the current URL resolution mechanism, which suffers from a tight coupling of internal components and little documentation for quick extension by developers. Further, it will involve reincorporating Django's regex-based RegexURLResolver as the default resolving mechanism, the performance and backwards compatibility of which must be thoroughly analysed and tested. Finally, the project aims to incentivize developers to create and release alternative URL resolvers compliant with the new API through outreach efforts and a proof-of-concept alternative resolving mechanism.

Current Infrastructure

Upon receiving a request, an instance of Django's BaseHandler passes the requested URL to an instance of RegexURLResolver along with the ROOT_URLCONF setting and the base / URL. The URL resolver recursively traverses the URL configuration, populating itself with the RegexURLPattern instances, which map URL patterns to callbacks, default arguments, and names, that are specified within a URL configuration's url_patterns attribute. The URL resolver attempts to match the URL to each pattern; upon success, it returns a ResolverMatch object instantiated with the matched callback and extracted arguments, otherwise raising a Resolver404 error. Django expects each URL pattern to be a valid regular expression, and uses Python's re library to match URL strings to regular expressions by compiling them into instances of the RegexObject class and using the associated search() and match() methods.

Internationalization in URL patterns is handled by the LocaleRegexProvider class, which is a parent to both RegexURLPattern and RegexURLResolver. A LocaleRegexURLResolver, which matches the active language code as a prefix to a requested URL and is a child of RegexURLResolver, is used when the developer uses django.conf.urls.i18n.i18n_patterns, rather than django.conf.urls.patterns, in the root URL configuration.

Motivation

The benefits of lexical analysis with regular expressions are both numerous and widely-accepted. Regular expressions are definitive, concise, and widely used in many common programming languages. Further, Django's use of regular expressions for URL pattern matching is both long and well supported. While it should not be the case that raw regex pattern matching be replaced with an alternative pattern matching system, the rigidity and lack of extensibility with which Django employs regular expressions for URL resolution has several important pitfalls that warrant the functionality to permit the developer to employ an alternative mechanism with ease.

  • Regexes are hard.

    Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

    Jamie Zawinski

    Despite its advantages, regular expressions can be difficult to create and read, and, as a result, developers are susceptible to failing to accomodate corner case URL strings. Further, the learning curve for regular expressions is much steeper than that of Django itself, and the their grammar and syntax can intimidate beginner and even intermediate developers, who lack core support for an alternative.

  • It is difficult to extend regex-based URL resolution.

    The framework should make it just as easy (or even easier) for a developer to design pretty URLs than ugly ones.

    URL Design: Encourage Best Practices, Django Design Philosophies

    Certain 3rd-party tools, such as smarturls, attempt to overcome the difficulty for the developer in constructing and parsing regex-based URL patterns by wrapping django.conf.urls.url with an additional method that provides commonly-used patterns or alternative pattern syntaxes, which are compiled into regular expressions before the URL resolver is instantiated. While this method is correct and lightweight, it is also ad-hoc, and lacks defined support by the framework.

  • There is little support for non-regex based resolvers and alternatives to re.

    URLs should be as flexible as possible. Any conceivable URL design should be allowed.

    URL Design: Infinite Flexibility, Django Design Philosophies

    Because regular expressions, through Python's re library, are hard-coded into Django's URL dispatch mechanism, it is very difficult and often impossible for the developer to employ alternative lexical analysis or parser technologies, such as faster deterministic finite automata or alternatives to re, to match URL patterns in the URL dispatcher. Ned Batchelder provides a nice list of such alternatives here.

The URLResolver API

Interface

A URL resolver performs two actions:

  • resolve(), which maps a URL path to a view callable.
  • reverse(), which maps a callable and a set of arguments to a URL path.
#!python
class URLResolver(object):

	def resolve(self, path):
	"""
	path: path of a URL string.
	
	Returns an initialized ResolverMatch() upon successful resolution; raises Resolver404, otherwise.
	"""
	pass

	def reverse(self, viewname, *args, **kwargs): 
	"""
	viewname: a string containing the Python path to the view object, a URL pattern name, or a callable.
	*args or **kwargs: arguments/keyword arguments, if accepted by the URL pattern. 
	
	Returns a URL string upon successful reverse; raises NoReverseMatch, otherwise.
	"""
	pass

An implementation of URLResolver should not have to handle the population of its mappings between URL patterns and views; rather, it should only implement methods to resolve a URL string to a view by matching it to one of the patterns in its url_patterns attribute and to reverse a view and its arguments into a URL string. If the developer were to decide to use a different URL resolver, he or she should only have to specify that alternative resolver within their URL configuraiton and modify the URL patterns to match the new schema.

Usage

With Django's current resolving infrastructure, the handler thread constructs a root instance of RegexURLResolver, passing it the root URL configuration. This resolver then recursively traverses the nested URLconf tree, populating its own url_patterns attribute with RegexURLResolver instances for each included URL configuration. The leaves of the tree (the mappings between a URL pattern and a view) are loaded into the resolver's url_patterns as instances of RegexURLPattern.

The developer should be able to specify, at each internal node of the nested URL configuration tree, the resolver to use for all leaves and descendants of that URL configuration. A descendant configuration should be able to re-specify its resolver. One benefit of this approach is performance; apart from loading an alternative mechanism, the traversal of the configuration will be carried out in a manner consistent to that of the current framework. Further, the developer will be able to apply a resolver to a set of URL patterns by grouping those patterns into a URL configuration; he or she will not have to specify a resolver for each declared pattern.

One possible mechanism for the specification of a URL resolver within a URL configuration is to permit the definition of a urlresolver variable within a given URLconf. For example,

#!python
from django.conf.urls import url

from sampleapp.resolver import SampleURLResolver

urlresolver = SampleURLResolver

urlpatterns = [url(...),
	url(...)
	url(...)
	...
]

Each URL pattern specified in urlpatterns will be resolved using SampleURLResolver, and, unless overridden, all included URL configurations will also be resolved with SampleURLResolver.

Backwards Compatibility

The URLResolver interface should be backwards compatible with the framework's current URL resolving mechanism. Thus, RegexURLResolver and RegexURLPatter should be refactored into the default implementations of the interface, and should work out of the box.

Internationalization

Django allows the developer to internationalize URL patterns through two mechanisms:

  • by prepending the language prefix to the root URL pattern for detection by LocaleMiddleware.
  • by making the URL patterns themselves translatable django.utils.translation.ugettext_lazy().

The first mechanism implemented in LocaleRegexURLProvider. As it uses regular expressions only insofar as it matches the language code prefix of a URL string, there is no need to make this support extensible or public-facing; it should continue to be handled internally, wrapping any URLResolver implementation without the need for configuration.

Other Design Considerations

Among the proposed modifications to Django's URL dispatcher, extending middleware to be configurable for a set of views, rather than applying all middleware to every request, and pattern matching for a request's subdomain are two of the more popular desired dispatcher features that are lacking from the current dispatch infrastructure. While the implementation of these components are outside of the scope of this proposal, the URLResolver API should be implementated with these impending features in mind.

  • Subdomain Matching

    Currently, in order for a view to have access to the subdomain of a given request, custom middleware that extracts the subdomain for the request's full path and resolves the URL string through a settings-specified subdomain-specific URL configuration must be applied. Further, the current dispatch infrastructure suffers from tight coupling of the URL configuration tree traversal and the pattern matching mechanism. Were the handler to be modified to pass the resolver both the subdomain and the path, rather than just the path_info request attribute, then the resolver could match both the request's path and subdomain to a URL pattern.

    The proposed URLResolver API would help facilitate a subdomain matching feature, particularly by de-coupling the URL configuration tree traversal and the pattern matching mechanisms. The URLResolver interface would handle the incorporation of subdomain patterns into the nested pattern tree, such that the mechanism to accomodate subdomain patterns in constructing the URL configuration tree is internalized, while the signatures of resolve() and reverse() would be modified to allow a resolver to match both the path and an option subdomain with any pattern matching schema.

  • Specifying Middleware for Collections of URL Patterns

    The middleware specified in the MIDDLEWARE_CLASSES setting is currently applied to all requests, regardless of their paths. Thus, the developer is unable to specify middleware to be applied to only a subset of patterns in the URL configuration tree, and is forced to turn to multiple applications of single-view decorators to achieve the desired grouping.

    While the mechanism for applying middleware is currently implemented separately from the URL resolver, the base URLResolver class could be modified to dynamically construct the list of middleware to be applied to a request or associated response as it searches the URL configuration tree. Instead of enumerating middleware in the project settings, which forces application on every request, middleware could be applied to a given subtree by specifying a list of middleware classes in a certain URL configuration.

Timeline and Milestones

Milestone 1: API Implementation (Weeks 1 - 4)

  • GSoC Interim and Community Bonding Periods

    • Develop a formal specification for the URLResolver API, soliciting developer feedback and concerns.
  • Weeks 1 and 2

    • Implement the URLResolver interface. Refactor _populate() to apply to all interface implementations.
    • Modify BaseHandler and url() to handle multiple resolvers.
  • Weeks 3 and 4

    • Modify URL dispatcher tests to support the URLResolver interface..
    • Refactor URL pattern internationalization to be resolver-independent.
    • Refactor utility functions in django.core.urlresolvers.
    • Write documentation for the URLResolver interface.

Milestone 2: Refactoring the Existing URL Resolver (Weeks 5 - 8)

  • Weeks 5 and 6

    • Refactor RegexURLResolver and RegexURLPattern as the default implementations of the URLResolver API.
    • Configure framework to serve RegexURLResolver as the default URL resolving mechanism.
  • Weeks 5 and 6

    • Performance testing to ensure the re-implementation of the default resolver matches performance of current mechanism.
    • Modify existing URL dispatch documentation to reflect updates.
    • Refactor current mechanism tests to reflect RegexURLResolver as the default resolver implementation, not just a hard-coded mechanism.

Milestone 3: Proof-of-Concept: Werkzeug-Style Routing for Django (Weeks 9 - 12)

URL routing for the Flask microframework employs the Werkzeug URL Routing system to support URL patterns in which variables can be extracted from URL strings and passed to views as keyword arguments with the syntax "<converter: name>", with converter being one of string, int, float, or a string with slashes (a path). For example, in Flask, the view show_article() can be registered with a URL pattern that passes to it an integer argument called article_id as follows:

#!python
@app.route('/article/<int:article_id>')
def show_article(article_id):
    pass

Werkzeug's routing syntax is expressive, yet very simple with a low learning curve. With Django's current URL dispatch infrastructure, however, using Werkzeug-style URL patterns in a Django project is quite difficult.

As a proof-of-concept for the functionality of the URLResolver API, Werkzeug-style URL patterns for Django will be implemented as a stand-alone project. The project will serve as proof of concept for both the success of the refactoring and API formalization proposed in the first part of the project, as well as an example application for 3rd-party application developers who wish to begin building alternative URL resolution mechanisms. Optimally, a deliverable for the implementation will include documentation, proper free software licensing, and distribution over a package management system such as pip. The remaining four weeks of the project will be allocated for:

  • Implementating this alternative resolver.
  • Outreach efforts to developers of existing URL pattern tools to discuss refactoring their tools for the new public-facing URLResolver API.

About Me

My name is Alex Patel, and I am an undergraduate studying mathematics and philosophy at Harvard College and Cambridge, MA in the United States. For the better part of the last year, I've been working as student campaigns organizer for the Free Software Foundation in Cambridge, MA, and have become a passionate advocate for the adoption and development of free software. I am applying to work on Django as part of the Google Summer of Code in order pursue contributing to free software past advocacy and outreach, as a developer.

Further, for the last two years, I have hacked on Django as a technology associate for The Crimson, Harvard's student newspaper, which receives over 1.5 million page views per month on it's Django-powered web application. I have built several tools on top of the framework, such as a natural language processing engine to generate recommendations for article in a corpus with over 500,000 articles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment