Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
"Revamping validation functionality and merging django-secure" proposal for Google Summer of Code 2013.

"Revamping validation functionality and merging django-secure" proposal for Google Summer of Code 2013.

Table of content

  1. Abstract 1.1 Drawbacks of the existing validation framework 1.2 Goals 1.3 Benefits
  2. The new framework 2.1 Overview 2.2 Advantages
  3. Merging django-secure
  4. Schedule and milestones 4.1 New validation framework 4.2 Merging django-secure 4.3 If time permits...
  5. About me

1. Abstract

1.1 Drawbacks of the existing validation framework

Django currently has a validation framework, but there are a lot of problems with it. First of all, it is monolithic and developers cannot write custom validation (see #16905) or modify the existing functionality (see #12674); validation lives in a few functions like django.core.management.validation.get_validation_errors or django.contrib.admin.validation.validate. The validation functionality is not separated from other stuff like printing found errors during validating models in get_validation_errors or registering models in admin app (see #8579); it is sometimes done during first call to an important method, i. e. CurrentSiteManager is validated in its get_queryset method.

There are few tests of the validation framework and it is not easily testable because validation functions return concatenated error messages instead of list of errors (see django.tests.invalid_models.invalid_models.models). It also lacks some features like warnings (see #19126). Due to this disadvantages lots of apps do not have any validation, i. e. they do not check inter-app dependencies.

1.2 Goals

First part of this proposal is about revamping current validation framework. First of all, we need to write more tests and rewrite existing ones. Then we need an consistent API of validating different kinds of objects like models, fields, managers or whole apps so it will be easy to add new kind of object. Validation functionality should be separated from other stuff and it should minimize dependencies. We should allow developers to add validation to their apps and any other kind of objects, so custom validation is a must. We will not break backward compatibility.

This proposal is not only about refactoring but also new features. The second part of the proposal is bringing django-secure into core. This topic is covered in section 3 ("merging django-secure").

1.3 Benefits

There are a lot of benefits. Cleaning code, removing unwanted dependencies and adding more tests are the most obvious ones. We will also benefit from long term solution which will be easy to maintain since it is extendable. We will improve security of Django projects thanks to django-secure. This also implies that Django will be considered as a safe and secure framework. Better opinion is always desired.

2. The new framework

2.1 Overview

The API is based on Honza Kral idea from his patch. An developer can add new validation functionality by writing a callable piece of code. It will be automatically called during validating whole project (triggered by python manage.py validate) and it must fulfill the following contract: it has no arguments (except for self or cls) and returns a list of warnings and errors or yields each of them.

The validated object may be a model, a manager, a field or an app. In case of a field or a manager, the callable piece of code is a method. In case of a model, it is a classmethod. In case of an app, a developer have to put functions in validation.py file inside the app directory.

Let's see an example:

class FieldFile(File):
    (lots of stuff)

    def validate_upload_to_attribute(self):
        if not self.upload_to:
            yield Error(obj=self, msg="required 'upload_to' attribute",
                explanation="Django need to know the directory where uploaded files should be stored.",
                hint="Add \"upload_to = 'path/to/directory/on/server'\" to model %(model)s in file %(model_file)s.")

Notice that validation stuff is inherited by child classes. In an uncommon case when some validation stuff should be turned off, you can overwrite the method with an empty method to prevent it from executing. "Private" methods starting with _validate_ are not executed.

validate_upload_to_attribute method and all other validation methods are called from validate_all method which fulfills the same contract as other validate_* methods and calls all validate_* methods by default. The name of the function cannot be validate because it would collide with existing method of fields. By default, validate_all method is inherited so you do not have to write it. In case of an app, the behaviour is similar -- if functions validate_all or validate_models (described later) are omitted (or whole validation.py file is missing), then default ones are used.

Validation chain.

When a developer types python manage.py validate (or any other command which triggers validation) then all apps are loaded and then for each app its validate_all function from validation.py file is called (if it is missing, then the default one is used). It calls all other validate_* functions. One of them named validate_models calls validate_all classmethod for each model of this app. Then models validate its fields and managers. That is the "validation chain".

Errors and warnings.

The new framework introduces two new classes called Warning and Error (ValidationError would collide with django.forms.ValidationError). They are really similar, they differ only in their meaning. Their fields are: obj, app, msg, explanation and hint. app is the app where the error or warning was created; the attribute is not set in the example because it is set by default validate_all function of an app. obj is the invalid object (it may be the same as app). Errors connected with particular model (like in the example -- the invalid object is a field, but the field will be attached to a model) will have additional model attribute thanks to default validate_all classmethod of models.

I think that Django error messages are often confusing and they often contain neither hints, solution proposals nor explanation of a problem. I think that we should force Django contributors to think about it and separating error message (msg), explanation and hint (suggesting solutions list) is a way to do it. We need to make using Django as simple as possible and suggesting solutions and describing a problem in details is a way to do it. It will be really important when contributors and developers start to write more complex validation like checking inter-app dependencies.

Error messages (as well as hint and explanation fields) can be formatted. %(model)s will be replaced with model attribute of the error. %(model_file)s will be path to the file where the model exists. This allows us to write really user-friendly errors in style of "go to file <file> and add <something> to model <model>".

2.2 Advantages

First of all, the solution is as simple as possible. There is only one concept which developers have to learn -- the contract of validate_* methods. You do not have to know about validate_all method or validation chain to write your first validation piece of code which makes it easier for newbies to play with Django.

As you can see, the solution is consistent between different kinds of objects and it does not assume that only a fixed set of object types can be validated. I believe that good long term solution should be extendable and the new framework allows us to easily add validation of new type of objects -- just modify the validation chain.

One may argue that a developer have to remember when he has to use methods, when classmethods and when functions. He may propose using validator class (like in Honza Kral's patch) -- a validated object would have to point to its validator instance. That would be a progress if it would not cause a lot of other problems. For example, if you have field A inheriting from field B then you have to remember that A validator should inherit from B validator. It also implies that you have to write a new class even though you want to validate only one small thing.

My proposal solves also a lot of existing problems. This solution plays well with almost all validation stuff in Django, i. e. existing validation of ModelAdmin can be done in admin app (all apps are loaded before validation so all models will be registered before validation starts). What's more, it is also good solution for a lot of other use cases. The new framework allows us to write new kind of apps -- apps containing mainly validation stuff.

An ideal example of an app which dovetails nicely with the framework may be django-secure app. Since the second part of the proposal is bringing django- secure into core, that will increase security of Django projects. An another example is an app which inspects settings.py file and predicts some problems while switching to newer version of Django -- that would make updating Django a piece of cake, but it is not part of this proposal.

3. Merging django-secure

The first part of this proposal was introducing the new framework and it is mainly refactoring. During last 4 weeks I would like to focus on django-secure app. This app does security checks (like checking if CsrfViewMiddleware is in MIDDLEWARE_CLASSES setting or checking if x-frame-options in a request header is set to DENY). I will merge it with Django and fit it to the new validation framework. That will be an evidence of flexibility of the new framework.

The app will emit only warnings (no errors). For backward compatibility, we cannot switch on all functionality of django-secure by default. For example, we cannot redirect all requests to SSL. So if a developer wants the redirection, he has to set new setting named SSL_REDIRECT to True. If the setting is omitted, then a warning will be displayed. The warning will contain information why the developer should always use SSL, but it will also say that the developer can disable the warning by setting the setting to False. The default value of SSL_REDIRECT will be None (that will be a triple state boolean). Other settings will work in similar way.

Some warnings will be emitted only when DEBUG is False (so a developer is deploying their project, not working at it). If DEBUG is True, then the developer is warn that they work in debug mode.

I will drop checksecure command (see #17101) -- the security checks will be part of validate command. This is better then the command. There is no danger of forgetting to run the command. It is simpler and easier for new developers because they do not have to know the command -- the security checks are turned on by default and you do not have to trigger it. To cut the long story short, it is safer.

The app will live in django.contrib and will be renamed to secure because django.contrib.djangosecure is too long. The app will be enabled by default for new projects. When switching to newer version of Django, a developer will have to manually append django-secure to INSTALLED_APPS.

Another issue is that Django 1.4 shipped with django.middleware.clickjacking.XFrameOptionsMiddleware. django-secure should use this middleware instead of its own.

4. Schedule and milestones

Before starting coding I would like to do some preparation:

  • Discussing and writing full API of the new validation framework, i. e., what data will be available while error message formatting.
  • A list of new tests for the new validation framework, particularly model, app and manager validation.
  • Discussing new API of django-secure, especially names of settings, their behavior and default values.
  • Improving English writing skills, i. e. reading "The elements of style".

For the first two or three weeks, I will have exams at university so I cannot work 8 hours a day. After 6 July (or after 29 June if I pass all exams quickly) I will have no job.

At the end of June, I'm going to Norway (for about five weeks) to visit my family. That will be time of one-day trips at weekends, but I will be still free at week.

A much more important issue is that I'm going on holiday about September 6. This is not backpacking-trip, I will live in a hotel with net access, but it means that I will not be able to work full time (I assume 50% of full speed). I hope you will not disqualify my proposal on that basis -- that can be an advantage because I will be highly motivated to finish before time.

4.1 New validation framework -- first milestone (8 weeks)

(From June 17 until August 12).

I will write code bottom-up -- starting from field validation (4.1.1-4.1.3), then models (4.1.4), apps (4.1.5, 4.1.6, 4.1.8) and ending in triggering whole validation framework (4.1.9). Managers validation (4.1.7), as an easy part, will be implemented near the end.

4.1.1 Rewriting tests of field validation (1 week)

(I will have exams at university -- I will work probably at 50% of full speed).

The tests live in tests.invalid_models package. Now, there is one file models.py with lots of models containing invalid fields. There is tests.py file with only one test that checks everything. And there is one huge model_errors variable that contains concatenated error messages from each field.

  • Rewritting tests from scratch. One test for each invalid field.
  • Deleting models.py file.
  • If time permits: adding new tests (like checking clashing with ORM querylookups).

4.1.2 Rewriting field validation (1 week)

(Exams at university).

The field validation lives in django.core.management.validation.get_validation_errors now.

  • Introducing Warning and Error classes.
  • Adding validate_all method to Field class.
  • Moving fields validation to classes of the fields.
  • get_validation_errors will only call validate_all for each field of each model and validate custom User model.

4.1.3 Writing documentation (mainly overview) (0.5 week)

(I may have still exams).

  • Writing the overview of the new framework. It will be high-level description of basic ideas and concepts (like warnings or validation chain, inheriting validation functionality) and a general rule how to write validation, override or delete the existing one as well as how the validation framework works. It will be a new "validation framework" topic, probably in "The development process" section.
  • Field validation section.
  • Writing full reference of Warning and Error classes.

4.1.4 Tests, implementation and documentation of models validation (1 week)

  • Adding validate_all and validate_fields methods to Model class. The former will be same as validate_all of Field except that it sets model attribute of all warnings and errors. The latter should trigger validation of all fields.
  • Moving temporarily custom User validation to new validate_custom_user method of Model class. get_validation_errors will only call validate_all of each model.
  • Documentation: adding section how to write model validation.

4.1.5 Tests, implementation and documentation of apps validation (1 week)

  • Adding mechanism to fetch validation module of given app (if the validation.py file in app directory is missing, then it should return default module; if the file exists but it misses some default functions like validate_all or validate_models then add the default ones).
  • Renaming django/django/contrib/admin/validation.py file temporarily -- the file should not be visible for the validation framework at this point.
  • get_validation_errors will only trigger validate_all method of an app.
  • Documentation: adding section how to write app validation.

4.1.6 Rewriting validation of custom User model (0.5 week)

  • Moving the validation from the temporary validate_custom_user method of Model class to auth app. Removing the temporary method.
  • Rewriting existing tests (tests.invalid_models) -- they assume that errors are raised by validate_custom_user method at this point.

4.1.7 Tests, implementation and documentation of manager validation (0.5 week)

  • Adding validate_managers method to Model class; the method triggers validation of each manager.
  • Adding validate_all to Manager class.
  • Moving existing validation of django.contrib.sites.managers.CurrentSiteManager to new validate_* methods of CustomSiteManager class. Removing triggering validation in get_query_set of the CurrentSiteManager class.
  • Rewriting tests living in tests.sites_framework.
  • Documentation: adding section how to write manager validation.

4.1.8 Rewriting validation of AdminModel and its tests (1 week)

4.1.9 Rewriting mechanism of triggering validation framework (1 week)

4.1.10 Finishing documentation (0.5 week)

  • Polishing all new parts of documentation (mainly the new "validation framework" topic).
  • Adding note to "release notes".
  • Checking if the difference between the new validation framework and the form validation framework is strongly emphasized in documentation.
  • Checking if rest of documentation is up-to-date and updating possible out-of-date parts (i. e. "How to write reusable apps").

4.2 Merging django-secure -- second milestone (4 weeks)

From August 12 until September 9.

4.2.1 Rewriting django-secure tests (1 week)

  • Focusing on CheckSettingCommandTest because we dropped checksecure command in favour of validate command.
  • Focusing also on ConfTest class because we will use our own mechanism of finding tests and we will replace conf.py with new settings in settings.py file.
  • Making final decision about API of django-secure.

4.2.2 Starting merging (1 week)

  • Creating new django.contrib.secure app turned on by default.
  • Creating new settings in settings.py (based on conf.py file).
  • Adding checking values of the settings in the new app. Writing warning messages that should be displayed when a triple state setting is set to None. Adding emitting warning that debug mode is turned on.
  • Documentation: adding overview of django.contrib.secure -- its purpose and how it can be controlled via settings without describing every setting.

4.2.3 Continuing merging (1 week)

  • Implementing security functionality -- rewriting the security middleware.
  • Using XFrameOptionsMiddleware from Django core instead of own middleware.
  • Documentation: adding documentation of each setting.

4.2.4 Finising merging (1 week)

(Starting the trip on Friday this week).

  • Polishing new parts of documentation.
  • Checking if documentation (security topics) is up-to-date.
  • Updating "release notes" documentation page.
  • Updating AUTHORS file.

4.3 If time permits...

If I finished earlier, I would focus on developing new django-update app which will make updating Django version piece of cake. It would predict some problems and issues based on settings.py content. For example, if a developer were using sites app, then while upgrading to 1.6 there would be a warning that the sites framework was removed. The list of predictable problems would base on "release notes" page.

If there were some time but not so much to write django-update, I would focus on small improvements of django-secure like checking for exposed admin panel.

5. About me

My name is Christopher Mędrela and I am student of University of Science and Technology in Krakow (Poland). My time zone is UTC+01:00. I program in Python for at least 4 years. I program also in C and Java. I have started contributing Django 16 months ago. I have submitted a lot of patches (this is not list of all patches). This year I have started working at big #17093 ticket ("Refactor django.template to quarantine global state"). It is not finished, but it shows that I am able to deal with big tasks.

Some time ago I was working for long time (more than one year) on my own project named Scriptcraft that was a programming game, but I suspended this project to focus on Django contribution. It also shows that I am not lazy and I can push myself even though there is no external motivator. :)

My English level is not worse than FCE. I prefer written communication by email or IRC.

My e-mail is chrismedrela+gsoc magic_character gmail.com. You can find me also at #django-dev and #gsoc IRC channels. My nick is chrismed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.