chrismedrela/gist:82cbda8d2a78a280a129 Secret

## gistfile1.rst

      
    Raw
  

              gistfile1.rst
            
          
    "Revamping validation functionality and merging django-secure" proposal for
Google Summer of Code 2013.

Table of content


Abstract 1.1 Drawbacks of the existing validation framework 1.2 Goals 1.3 Benefits
The new framework 2.1 Overview 2.2 Advantages
Merging django-secure
Schedule and milestones 4.1 New validation framework 4.2 Merging django-secure 4.3 If time permits...
About me


1. Abstract


1.1 Drawbacks of the existing validation framework

Django currently has a validation framework, but there are a lot of problems
with it. First of all, it is monolithic and developers cannot write custom
validation (see #16905) or modify the existing functionality (see
#12674); validation lives in a few functions like
django.core.management.validation.get_validation_errors or
django.contrib.admin.validation.validate. The validation functionality is
not separated from other stuff like printing found errors during validating
models in get_validation_errors or registering models in admin app (see
#8579); it is sometimes done during first call to an important method,
i. e. CurrentSiteManager is validated in its get_queryset method.
There are few tests of the validation framework and it is not easily testable
because validation functions return concatenated error messages instead of
list of errors (see django.tests.invalid_models.invalid_models.models). It
also lacks some features like warnings (see #19126). Due to this
disadvantages lots of apps do not have any validation, i. e. they do not check
inter-app dependencies.

1.2 Goals

First part of this proposal is about revamping current validation
framework. First of all, we need to write more tests and rewrite existing
ones. Then we need an consistent API of validating different kinds of objects
like models, fields, managers or whole apps so it will be easy to add new kind
of object. Validation functionality should be separated from other stuff and
it should minimize dependencies. We should allow developers to add validation
to their apps and any other kind of objects, so custom validation is a
must. We will not break backward compatibility.
This proposal is not only about refactoring but also new features. The second
part of the proposal is bringing django-secure into core. This topic is
covered in section 3 ("merging django-secure").

1.3 Benefits

There are a lot of benefits. Cleaning code, removing unwanted dependencies and
adding more tests are the most obvious ones. We will also benefit from long
term solution which will be easy to maintain since it is extendable. We will
improve security of Django projects thanks to django-secure. This also
implies that Django will be considered as a safe and secure framework. Better
opinion is always desired.

2. The new framework


2.1 Overview

The API is based on Honza Kral idea from his patch. An developer can add
new validation functionality by writing a callable piece of code. It will be
automatically called during validating whole project (triggered by python
manage.py validate) and it must fulfill the following contract: it has no
arguments (except for self or cls) and returns a list of warnings and
errors or yields each of them.
The validated object may be a model, a manager, a field or an app. In case of
a field or a manager, the callable piece of code is a method. In case of a
model, it is a classmethod. In case of an app, a developer have to put
functions in validation.py file inside the app directory.
Let's see an example:
class FieldFile(File):
    (lots of stuff)

    def validate_upload_to_attribute(self):
        if not self.upload_to:
            yield Error(obj=self, msg="required 'upload_to' attribute",
                explanation="Django need to know the directory where uploaded files should be stored.",
                hint="Add \"upload_to = 'path/to/directory/on/server'\" to model %(model)s in file %(model_file)s.")

Notice that validation stuff is inherited by child classes. In an uncommon
case when some validation stuff should be turned off, you can overwrite the
method with an empty method to prevent it from executing. "Private" methods
starting with _validate_ are not executed.
validate_upload_to_attribute method and all other validation methods are
called from validate_all method which fulfills the same contract as other
validate_* methods and calls all validate_* methods by default. The
name of the function cannot be validate because it would collide with
existing method of fields. By default, validate_all method is inherited so
you do not have to write it. In case of an app, the behaviour is similar -- if
functions validate_all or validate_models (described later) are
omitted (or whole validation.py file is missing), then default ones are
used.

Validation chain.

When a developer types python manage.py validate (or any other command
which triggers validation) then all apps are loaded and then for each app its
validate_all function from validation.py file is called (if it is
missing, then the default one is used). It calls all other validate_*
functions. One of them named validate_models calls validate_all
classmethod for each model of this app. Then models validate its fields and
managers. That is the "validation chain".

Errors and warnings.

The new framework introduces two new classes called Warning and Error
(ValidationError would collide with
django.forms.ValidationError). They are really similar, they differ only
in their meaning. Their fields are: obj, app, msg, explanation
and hint. app is the app where the error or warning was created;
the attribute is not set in the example because it is set by default
validate_all function of an app. obj is the invalid object (it may be the
same as app). Errors connected with particular model (like in the example
-- the invalid object is a field, but the field will be attached to a model)
will have additional model attribute thanks to default validate_all
classmethod of models.
I think that Django error messages are often confusing and they often contain
neither hints, solution proposals nor explanation of a problem. I think that
we should force Django contributors to think about it and separating error
message (msg), explanation and hint (suggesting solutions list) is a
way to do it. We need to make using Django as simple as possible and
suggesting solutions and describing a problem in details is a way to do it. It
will be really important when contributors and developers start to write more
complex validation like checking inter-app dependencies.
Error messages (as well as hint and explanation fields) can be
formatted. %(model)s will be replaced with model attribute of the
error. %(model_file)s will be path to the file where the model
exists. This allows us to write really user-friendly errors in style of "go to
file <file> and add <something> to model <model>".

2.2 Advantages

First of all, the solution is as simple as possible. There is only one concept
which developers have to learn -- the contract of validate_*
methods. You do not have to know about validate_all method or validation chain
to write your first validation piece of code which makes it easier for newbies
to play with Django.
As you can see, the solution is consistent between different kinds of objects
and it does not assume that only a fixed set of object types can be
validated. I believe that good long term solution should be extendable and the
new framework allows us to easily add validation of new type of objects --
just modify the validation chain.
One may argue that a developer have to remember when he has to use methods,
when classmethods and when functions. He may propose using validator class
(like in Honza Kral's patch) -- a validated object would have to point to its
validator instance. That would be a progress if it would not cause a lot of
other problems. For example, if you have field A inheriting from field B then
you have to remember that A validator should inherit from B validator. It also
implies that you have to write a new class even though you want to validate
only one small thing.
My proposal solves also a lot of existing problems. This solution plays well
with almost all validation stuff in Django, i. e. existing validation of
ModelAdmin can be done in admin app (all apps are loaded before validation
so all models will be registered before validation starts). What's more, it is
also good solution for a lot of other use cases. The new framework allows us
to write new kind of apps -- apps containing mainly validation stuff.
An ideal example of an app which dovetails nicely with the framework may be
django-secure app. Since the second part of the proposal is bringing
django- secure into core, that will increase security of Django projects. An
another example is an app which inspects settings.py file and predicts
some problems while switching to newer version of Django -- that would make
updating Django a piece of cake, but it is not part of this proposal.

3. Merging django-secure

The first part of this proposal was introducing the new framework and it is
mainly refactoring. During last 4 weeks I would like to focus on
django-secure app. This app does security checks (like checking if
CsrfViewMiddleware is in MIDDLEWARE_CLASSES setting or checking if
x-frame-options in a request header is set to DENY). I will merge it
with Django and fit it to the new validation framework. That will be an
evidence of flexibility of the new framework.
The app will emit only warnings (no errors). For backward compatibility, we
cannot switch on all functionality of django-secure by default. For example,
we cannot redirect all requests to SSL. So if a developer wants the
redirection, he has to set new setting named SSL_REDIRECT to True. If the
setting is omitted, then a warning will be displayed. The warning will contain
information why the developer should always use SSL, but it will also say that
the developer can disable the warning by setting the setting to False. The
default value of SSL_REDIRECT will be None (that will be a triple state
boolean). Other settings will work in similar way.
Some warnings will be emitted only when DEBUG is False (so a developer is
deploying their project, not working at it). If DEBUG is True, then the
developer is warn that they work in debug mode.
I will drop checksecure command (see #17101) -- the security checks will
be part of validate command. This is better then the command. There is no
danger of forgetting to run the command. It is simpler and easier for new
developers because they do not have to know the command -- the security checks
are turned on by default and you do not have to trigger it. To cut the long
story short, it is safer.
The app will live in django.contrib and will be renamed to secure
because django.contrib.djangosecure is too long. The app will be enabled
by default for new projects. When switching to newer version of Django, a
developer will have to manually append django-secure to INSTALLED_APPS.
Another issue is that Django 1.4 shipped with
django.middleware.clickjacking.XFrameOptionsMiddleware. django-secure
should use this middleware instead of its own.

4. Schedule and milestones

Before starting coding I would like to do some preparation:

Discussing and writing full API of the new validation framework, i. e.,
what data will be available while error message formatting.
A list of new tests for the new validation framework, particularly model,
app and manager validation.
Discussing new API of django-secure, especially names of settings, their
behavior and default values.
Improving English writing skills, i. e. reading "The elements of style".

For the first two or three weeks, I will have exams at university so I cannot
work 8 hours a day. After 6 July (or after 29 June if I pass all exams
quickly) I will have no job.
At the end of June, I'm going to Norway (for about five weeks) to visit my
family. That will be time of one-day trips at weekends, but I will be still
free at week.
A much more important issue is that I'm going on holiday about September 6.
This is not backpacking-trip, I will live in a hotel with net access, but it
means that I will not be able to work full time (I assume 50% of full speed).
I hope you will not disqualify my proposal on that basis -- that can be an
advantage because I will be highly motivated to finish before time.

4.1 New validation framework -- first milestone (8 weeks)

(From June 17 until August 12).
I will write code bottom-up -- starting from field validation (4.1.1-4.1.3),
then models (4.1.4), apps (4.1.5, 4.1.6, 4.1.8) and ending in triggering whole
validation framework (4.1.9). Managers validation (4.1.7), as an easy part, will
be implemented near the end.

4.1.1 Rewriting tests of field validation (1 week)

(I will have exams at university -- I will work probably at 50% of full
speed).
The tests live in tests.invalid_models package. Now, there is one file
models.py with lots of models containing invalid fields. There is
tests.py file with only one test that checks everything. And there is one
huge model_errors variable that contains concatenated error messages from
each field.

Rewritting tests from scratch. One test for each invalid field.
Deleting models.py file.
If time permits: adding new tests (like checking clashing with ORM
querylookups).


4.1.2 Rewriting field validation (1 week)

(Exams at university).
The field validation lives in
django.core.management.validation.get_validation_errors now.

Introducing Warning and Error classes.
Adding validate_all method to Field class.
Moving fields validation to classes of the fields.
get_validation_errors will only call validate_all for each field of
each model and validate custom User model.


4.1.3 Writing documentation (mainly overview) (0.5 week)

(I may have still exams).

Writing the overview of the new framework. It will be high-level description
of basic ideas and concepts (like warnings or validation chain, inheriting
validation functionality) and a general rule how to write validation,
override or delete the existing one as well as how the validation framework
works. It will be a new "validation framework" topic, probably in "The
development process" section.
Field validation section.
Writing full reference of Warning and Error classes.


4.1.4 Tests, implementation and documentation of models validation (1 week)


Adding validate_all and validate_fields methods to Model class.
The former will be same as validate_all of Field except that it sets
model attribute of all warnings and errors. The latter should trigger
validation of all fields.
Moving temporarily custom User validation to new validate_custom_user
method of Model class. get_validation_errors will only call
validate_all of each model.
Documentation: adding section how to write model validation.


4.1.5 Tests, implementation and documentation of apps validation (1 week)


Adding mechanism to fetch validation module of given app (if the
validation.py file in app directory is missing, then it should return
default module; if the file exists but it misses some default functions like
validate_all or validate_models then add the default ones).
Renaming django/django/contrib/admin/validation.py file temporarily --
the file should not be visible for the validation framework at this point.
get_validation_errors will only trigger validate_all method of an
app.
Documentation: adding section how to write app validation.


4.1.6 Rewriting validation of custom User model (0.5 week)


Moving the validation from the temporary validate_custom_user method of
Model class to auth app. Removing the temporary method.
Rewriting existing tests (tests.invalid_models) -- they assume that
errors are raised by validate_custom_user method at this point.


4.1.7 Tests, implementation and documentation of manager validation (0.5 week)


Adding validate_managers method to Model class; the method triggers
validation of each manager.
Adding validate_all to Manager class.
Moving existing validation of
django.contrib.sites.managers.CurrentSiteManager to new validate_*
methods of CustomSiteManager class. Removing triggering validation in
get_query_set of the CurrentSiteManager class.
Rewriting tests living in tests.sites_framework.
Documentation: adding section how to write manager validation.


4.1.8 Rewriting validation of AdminModel and its tests (1 week)


Renaming validation module of admin app back to
django/django/contrib/admin/validation.py.
Rewritting the file (mainly renaming the main function called validate
so it will be triggered by the validation framework).
If time permits, splitting the main function into smaller ones.
Removing triggering validation in
django.contrib.admin.sites.AdminSite.register method.
Rewritting tests living in tests.admin_validation.


4.1.9 Rewriting mechanism of triggering validation framework (1 week)


Removing get_validation_errors function and the module where it lives.
Rewriting django.core.management.base.BaseCommand.validate which triggers
whole validation and prints errors. It should call validate_all of each
app instead of calling get_validation_errors. Printing errors and
warnings also has to be rewritten.


4.1.10 Finishing documentation (0.5 week)


Polishing all new parts of documentation (mainly the new "validation
framework" topic).
Adding note to "release notes".
Checking if the difference between the new validation framework and the form
validation framework is strongly emphasized in documentation.
Checking if rest of documentation is up-to-date and updating possible
out-of-date parts (i. e. "How to write reusable apps").


4.2 Merging django-secure -- second milestone (4 weeks)

From August 12 until September 9.

4.2.1 Rewriting django-secure tests (1 week)


Focusing on CheckSettingCommandTest because we dropped
checksecure command in favour of validate command.
Focusing also on ConfTest class because we will use our own mechanism of
finding tests and we will replace conf.py with new settings in
settings.py file.
Making final decision about API of django-secure.


4.2.2 Starting merging (1 week)


Creating new django.contrib.secure app turned on by default.
Creating new settings in settings.py (based on conf.py file).
Adding checking values of the settings in the new app. Writing warning
messages that should be displayed when a triple state setting is set to
None. Adding emitting warning that debug mode is turned on.
Documentation: adding overview of django.contrib.secure -- its purpose
and how it can be controlled via settings without describing every setting.


4.2.3 Continuing merging (1 week)


Implementing security functionality -- rewriting the security middleware.
Using XFrameOptionsMiddleware from Django core instead of own middleware.
Documentation: adding documentation of each setting.


4.2.4 Finising merging (1 week)

(Starting the trip on Friday this week).

Polishing new parts of documentation.
Checking if documentation (security topics) is up-to-date.
Updating "release notes" documentation page.
Updating AUTHORS file.


4.3 If time permits...

If I finished earlier, I would focus on developing new django-update app
which will make updating Django version piece of cake. It would predict some
problems and issues based on settings.py content. For example, if a
developer were using sites app, then while upgrading to 1.6 there would
be a warning that the sites framework was removed. The list of predictable
problems would base on "release notes" page.
If there were some time but not so much to write django-update, I would focus
on small improvements of django-secure like checking for exposed admin panel.

5. About me

My name is Christopher Mędrela and I am student of University of Science and
Technology in Krakow (Poland). My time zone is UTC+01:00. I program in Python
for at least 4 years. I program also in C and Java. I have started
contributing Django 16 months ago. I have submitted a lot of patches (this
is not list of all patches). This year I have started working at big #17093
ticket ("Refactor django.template to quarantine global state"). It is not
finished, but it shows that I am able to deal with big tasks.
Some time ago I was working for long time (more than one year) on my own
project named Scriptcraft that was a programming game, but I suspended
this project to focus on Django contribution. It also shows that I am not lazy
and I can push myself even though there is no external motivator. :)
My English level is not worse than FCE. I prefer written communication by
email or IRC.
My e-mail is chrismedrela+gsoc magic_character gmail.com. You can find me also
at #django-dev and #gsoc IRC channels. My nick is chrismed.