"Revamping validation functionality and merging django-secure" proposal for Google Summer of Code 2013.
Table of content
- Abstract 1.1 Drawbacks of the existing validation framework 1.2 Goals 1.3 Benefits
- The new framework 2.1 Overview 2.2 Advantages
- Merging django-secure
- Schedule and milestones 4.1 New validation framework 4.2 Merging django-secure 4.3 If time permits...
- About me
1.1 Drawbacks of the existing validation framework
Django currently has a validation framework, but there are a lot of problems with it. First of all, it is monolithic and developers cannot write custom validation (see #16905) or modify the existing functionality (see #12674); validation lives in a few functions like django.core.management.validation.get_validation_errors or django.contrib.admin.validation.validate. The validation functionality is not separated from other stuff like printing found errors during validating models in get_validation_errors or registering models in admin app (see #8579); it is sometimes done during first call to an important method, i. e. CurrentSiteManager is validated in its get_queryset method.
There are few tests of the validation framework and it is not easily testable because validation functions return concatenated error messages instead of list of errors (see django.tests.invalid_models.invalid_models.models). It also lacks some features like warnings (see #19126). Due to this disadvantages lots of apps do not have any validation, i. e. they do not check inter-app dependencies.
First part of this proposal is about revamping current validation framework. First of all, we need to write more tests and rewrite existing ones. Then we need an consistent API of validating different kinds of objects like models, fields, managers or whole apps so it will be easy to add new kind of object. Validation functionality should be separated from other stuff and it should minimize dependencies. We should allow developers to add validation to their apps and any other kind of objects, so custom validation is a must. We will not break backward compatibility.
This proposal is not only about refactoring but also new features. The second part of the proposal is bringing django-secure into core. This topic is covered in section 3 ("merging django-secure").
There are a lot of benefits. Cleaning code, removing unwanted dependencies and adding more tests are the most obvious ones. We will also benefit from long term solution which will be easy to maintain since it is extendable. We will improve security of Django projects thanks to django-secure. This also implies that Django will be considered as a safe and secure framework. Better opinion is always desired.
2. The new framework
The API is based on Honza Kral idea from his patch. An developer can add
new validation functionality by writing a callable piece of code. It will be
automatically called during validating whole project (triggered by
manage.py validate) and it must fulfill the following contract: it has no
arguments (except for
cls) and returns a list of warnings and
errors or yields each of them.
The validated object may be a model, a manager, a field or an app. In case of
a field or a manager, the callable piece of code is a method. In case of a
model, it is a classmethod. In case of an app, a developer have to put
validation.py file inside the app directory.
Let's see an example:
class FieldFile(File): (lots of stuff) def validate_upload_to_attribute(self): if not self.upload_to: yield Error(obj=self, msg="required 'upload_to' attribute", explanation="Django need to know the directory where uploaded files should be stored.", hint="Add \"upload_to = 'path/to/directory/on/server'\" to model %(model)s in file %(model_file)s.")
Notice that validation stuff is inherited by child classes. In an uncommon
case when some validation stuff should be turned off, you can overwrite the
method with an empty method to prevent it from executing. "Private" methods
_validate_ are not executed.
validate_upload_to_attribute method and all other validation methods are
validate_all method which fulfills the same contract as other
validate_* methods and calls all
validate_* methods by default. The
name of the function cannot be
validate because it would collide with
existing method of fields. By default,
validate_all method is inherited so
you do not have to write it. In case of an app, the behaviour is similar -- if
validate_models (described later) are
omitted (or whole
validation.py file is missing), then default ones are
When a developer types
python manage.py validate (or any other command
which triggers validation) then all apps are loaded and then for each app its
validate_all function from
validation.py file is called (if it is
missing, then the default one is used). It calls all other
functions. One of them named
classmethod for each model of this app. Then models validate its fields and
managers. That is the "validation chain".
Errors and warnings.
The new framework introduces two new classes called
ValidationError would collide with
django.forms.ValidationError). They are really similar, they differ only
in their meaning. Their fields are:
app is the app where the error or warning was created;
the attribute is not set in the example because it is set by default
validate_all function of an app.
obj is the invalid object (it may be the
app). Errors connected with particular model (like in the example
-- the invalid object is a field, but the field will be attached to a model)
will have additional
model attribute thanks to default
classmethod of models.
I think that Django error messages are often confusing and they often contain
neither hints, solution proposals nor explanation of a problem. I think that
we should force Django contributors to think about it and separating error
hint (suggesting solutions list) is a
way to do it. We need to make using Django as simple as possible and
suggesting solutions and describing a problem in details is a way to do it. It
will be really important when contributors and developers start to write more
complex validation like checking inter-app dependencies.
Error messages (as well as
explanation fields) can be
%(model)s will be replaced with
model attribute of the
%(model_file)s will be path to the file where the model
exists. This allows us to write really user-friendly errors in style of "go to
file <file> and add <something> to model <model>".
First of all, the solution is as simple as possible. There is only one concept
which developers have to learn -- the contract of
methods. You do not have to know about
validate_all method or validation chain
to write your first validation piece of code which makes it easier for newbies
to play with Django.
As you can see, the solution is consistent between different kinds of objects and it does not assume that only a fixed set of object types can be validated. I believe that good long term solution should be extendable and the new framework allows us to easily add validation of new type of objects -- just modify the validation chain.
One may argue that a developer have to remember when he has to use methods, when classmethods and when functions. He may propose using validator class (like in Honza Kral's patch) -- a validated object would have to point to its validator instance. That would be a progress if it would not cause a lot of other problems. For example, if you have field A inheriting from field B then you have to remember that A validator should inherit from B validator. It also implies that you have to write a new class even though you want to validate only one small thing.
My proposal solves also a lot of existing problems. This solution plays well with almost all validation stuff in Django, i. e. existing validation of ModelAdmin can be done in admin app (all apps are loaded before validation so all models will be registered before validation starts). What's more, it is also good solution for a lot of other use cases. The new framework allows us to write new kind of apps -- apps containing mainly validation stuff.
An ideal example of an app which dovetails nicely with the framework may be
django-secure app. Since the second part of the proposal is bringing
django- secure into core, that will increase security of Django projects. An
another example is an app which inspects
settings.py file and predicts
some problems while switching to newer version of Django -- that would make
updating Django a piece of cake, but it is not part of this proposal.
3. Merging django-secure
The first part of this proposal was introducing the new framework and it is
mainly refactoring. During last 4 weeks I would like to focus on
django-secure app. This app does security checks (like checking if
CsrfViewMiddleware is in
MIDDLEWARE_CLASSES setting or checking if
x-frame-options in a request header is set to
DENY). I will merge it
with Django and fit it to the new validation framework. That will be an
evidence of flexibility of the new framework.
The app will emit only warnings (no errors). For backward compatibility, we
cannot switch on all functionality of django-secure by default. For example,
we cannot redirect all requests to SSL. So if a developer wants the
redirection, he has to set new setting named
SSL_REDIRECT to True. If the
setting is omitted, then a warning will be displayed. The warning will contain
information why the developer should always use SSL, but it will also say that
the developer can disable the warning by setting the setting to False. The
default value of
SSL_REDIRECT will be None (that will be a triple state
boolean). Other settings will work in similar way.
Some warnings will be emitted only when
DEBUG is False (so a developer is
deploying their project, not working at it). If
DEBUG is True, then the
developer is warn that they work in debug mode.
I will drop
checksecure command (see #17101) -- the security checks will
be part of
validate command. This is better then the command. There is no
danger of forgetting to run the command. It is simpler and easier for new
developers because they do not have to know the command -- the security checks
are turned on by default and you do not have to trigger it. To cut the long
story short, it is safer.
The app will live in
django.contrib and will be renamed to
django.contrib.djangosecure is too long. The app will be enabled
by default for new projects. When switching to newer version of Django, a
developer will have to manually append django-secure to
Another issue is that Django 1.4 shipped with django.middleware.clickjacking.XFrameOptionsMiddleware. django-secure should use this middleware instead of its own.
4. Schedule and milestones
Before starting coding I would like to do some preparation:
- Discussing and writing full API of the new validation framework, i. e., what data will be available while error message formatting.
- A list of new tests for the new validation framework, particularly model, app and manager validation.
- Discussing new API of django-secure, especially names of settings, their behavior and default values.
- Improving English writing skills, i. e. reading "The elements of style".
For the first two or three weeks, I will have exams at university so I cannot work 8 hours a day. After 6 July (or after 29 June if I pass all exams quickly) I will have no job.
At the end of June, I'm going to Norway (for about five weeks) to visit my family. That will be time of one-day trips at weekends, but I will be still free at week.
A much more important issue is that I'm going on holiday about September 6. This is not backpacking-trip, I will live in a hotel with net access, but it means that I will not be able to work full time (I assume 50% of full speed). I hope you will not disqualify my proposal on that basis -- that can be an advantage because I will be highly motivated to finish before time.
4.1 New validation framework -- first milestone (8 weeks)
(From June 17 until August 12).
I will write code bottom-up -- starting from field validation (4.1.1-4.1.3), then models (4.1.4), apps (4.1.5, 4.1.6, 4.1.8) and ending in triggering whole validation framework (4.1.9). Managers validation (4.1.7), as an easy part, will be implemented near the end.
4.1.1 Rewriting tests of field validation (1 week)
(I will have exams at university -- I will work probably at 50% of full speed).
The tests live in tests.invalid_models package. Now, there is one file models.py with lots of models containing invalid fields. There is tests.py file with only one test that checks everything. And there is one huge model_errors variable that contains concatenated error messages from each field.
- Rewritting tests from scratch. One test for each invalid field.
- If time permits: adding new tests (like checking clashing with ORM querylookups).
4.1.2 Rewriting field validation (1 week)
(Exams at university).
The field validation lives in django.core.management.validation.get_validation_errors now.
- Moving fields validation to classes of the fields.
get_validation_errorswill only call
validate_allfor each field of each model and validate custom
4.1.3 Writing documentation (mainly overview) (0.5 week)
(I may have still exams).
- Writing the overview of the new framework. It will be high-level description of basic ideas and concepts (like warnings or validation chain, inheriting validation functionality) and a general rule how to write validation, override or delete the existing one as well as how the validation framework works. It will be a new "validation framework" topic, probably in "The development process" section.
- Field validation section.
- Writing full reference of
4.1.4 Tests, implementation and documentation of models validation (1 week)
Modelclass. The former will be same as
Fieldexcept that it sets
modelattribute of all warnings and errors. The latter should trigger validation of all fields.
- Moving temporarily custom User validation to new
get_validation_errorswill only call
validate_allof each model.
- Documentation: adding section how to write model validation.
4.1.5 Tests, implementation and documentation of apps validation (1 week)
- Adding mechanism to fetch
validationmodule of given app (if the
validation.pyfile in app directory is missing, then it should return default module; if the file exists but it misses some default functions like
validate_modelsthen add the default ones).
- Renaming django/django/contrib/admin/validation.py file temporarily -- the file should not be visible for the validation framework at this point.
get_validation_errorswill only trigger
validate_allmethod of an app.
- Documentation: adding section how to write app validation.
4.1.6 Rewriting validation of custom
User model (0.5 week)
- Moving the validation from the temporary
authapp. Removing the temporary method.
- Rewriting existing tests (tests.invalid_models) -- they assume that
errors are raised by
validate_custom_usermethod at this point.
4.1.7 Tests, implementation and documentation of manager validation (0.5 week)
Modelclass; the method triggers validation of each manager.
- Moving existing validation of
django.contrib.sites.managers.CurrentSiteManager to new
CustomSiteManagerclass. Removing triggering validation in
- Rewriting tests living in tests.sites_framework.
- Documentation: adding section how to write manager validation.
4.1.8 Rewriting validation of
AdminModel and its tests (1 week)
- Renaming validation module of admin app back to django/django/contrib/admin/validation.py.
- Rewritting the file (mainly renaming the main function called
validateso it will be triggered by the validation framework).
- If time permits, splitting the main function into smaller ones.
- Removing triggering validation in django.contrib.admin.sites.AdminSite.register method.
- Rewritting tests living in tests.admin_validation.
4.1.9 Rewriting mechanism of triggering validation framework (1 week)
- Removing get_validation_errors function and the module where it lives.
- Rewriting django.core.management.base.BaseCommand.validate which triggers
whole validation and prints errors. It should call
validate_allof each app instead of calling
get_validation_errors. Printing errors and warnings also has to be rewritten.
4.1.10 Finishing documentation (0.5 week)
- Polishing all new parts of documentation (mainly the new "validation framework" topic).
- Adding note to "release notes".
- Checking if the difference between the new validation framework and the form validation framework is strongly emphasized in documentation.
- Checking if rest of documentation is up-to-date and updating possible out-of-date parts (i. e. "How to write reusable apps").
4.2 Merging django-secure -- second milestone (4 weeks)
From August 12 until September 9.
4.2.1 Rewriting django-secure tests (1 week)
- Focusing on CheckSettingCommandTest because we dropped
checksecurecommand in favour of
- Focusing also on ConfTest class because we will use our own mechanism of
finding tests and we will replace conf.py with new settings in
- Making final decision about API of django-secure.
4.2.2 Starting merging (1 week)
- Creating new
django.contrib.secureapp turned on by default.
- Creating new settings in
settings.py(based on conf.py file).
- Adding checking values of the settings in the new app. Writing warning messages that should be displayed when a triple state setting is set to None. Adding emitting warning that debug mode is turned on.
- Documentation: adding overview of
django.contrib.secure-- its purpose and how it can be controlled via settings without describing every setting.
4.2.3 Continuing merging (1 week)
- Implementing security functionality -- rewriting the security middleware.
XFrameOptionsMiddlewarefrom Django core instead of own middleware.
- Documentation: adding documentation of each setting.
4.2.4 Finising merging (1 week)
(Starting the trip on Friday this week).
- Polishing new parts of documentation.
- Checking if documentation (security topics) is up-to-date.
- Updating "release notes" documentation page.
4.3 If time permits...
If I finished earlier, I would focus on developing new
which will make updating Django version piece of cake. It would predict some
problems and issues based on
settings.py content. For example, if a
developer were using
sites app, then while upgrading to 1.6 there would
be a warning that the
sites framework was removed. The list of predictable
problems would base on "release notes" page.
If there were some time but not so much to write django-update, I would focus on small improvements of django-secure like checking for exposed admin panel.
5. About me
My name is Christopher Mędrela and I am student of University of Science and
Technology in Krakow (Poland). My time zone is UTC+01:00. I program in Python
for at least 4 years. I program also in C and Java. I have started
contributing Django 16 months ago. I have submitted a lot of patches (this
is not list of all patches). This year I have started working at big #17093
django.template to quarantine global state"). It is not
finished, but it shows that I am able to deal with big tasks.
Some time ago I was working for long time (more than one year) on my own
Scriptcraft that was a programming game, but I suspended
this project to focus on Django contribution. It also shows that I am not lazy
and I can push myself even though there is no external motivator. :)
My English level is not worse than FCE. I prefer written communication by email or IRC.
My e-mail is
chrismedrela+gsoc magic_character gmail.com. You can find me also
#gsoc IRC channels. My nick is