"Revamping validation functionality and merging django-secure" proposal for Google Summer of Code 2013.
- Abstract 1.1 Drawbacks of the existing validation framework 1.2 Goals 1.3 Benefits
- The new framework 2.1 Overview 2.2 Advantages
- Merging django-secure
- Schedule and milestones 4.1 New validation framework 4.2 Merging django-secure 4.3 If time permits...
- About me
Django currently has a validation framework, but there are a lot of problems with it. First of all, it is monolithic and developers cannot write custom validation (see #16905) or modify the existing functionality (see #12674); validation lives in a few functions like django.core.management.validation.get_validation_errors or django.contrib.admin.validation.validate. The validation functionality is not separated from other stuff like printing found errors during validating models in get_validation_errors or registering models in admin app (see #8579); it is sometimes done during first call to an important method, i. e. CurrentSiteManager is validated in its get_queryset method.
There are few tests of the validation framework and it is not easily testable because validation functions return concatenated error messages instead of list of errors (see django.tests.invalid_models.invalid_models.models). It also lacks some features like warnings (see #19126). Due to this disadvantages lots of apps do not have any validation, i. e. they do not check inter-app dependencies.
First part of this proposal is about revamping current validation framework. First of all, we need to write more tests and rewrite existing ones. Then we need an consistent API of validating different kinds of objects like models, fields, managers or whole apps so it will be easy to add new kind of object. Validation functionality should be separated from other stuff and it should minimize dependencies. We should allow developers to add validation to their apps and any other kind of objects, so custom validation is a must. We will not break backward compatibility.
This proposal is not only about refactoring but also new features. The second part of the proposal is bringing django-secure into core. This topic is covered in section 3 ("merging django-secure").
There are a lot of benefits. Cleaning code, removing unwanted dependencies and adding more tests are the most obvious ones. We will also benefit from long term solution which will be easy to maintain since it is extendable. We will improve security of Django projects thanks to django-secure. This also implies that Django will be considered as a safe and secure framework. Better opinion is always desired.
The API is based on Honza Kral idea from his patch. An developer can add
new validation functionality by writing a callable piece of code. It will be
automatically called during validating whole project (triggered by python
manage.py validate
) and it must fulfill the following contract: it has no
arguments (except for self
or cls
) and returns a list of warnings and
errors or yields each of them.
The validated object may be a model, a manager, a field or an app. In case of
a field or a manager, the callable piece of code is a method. In case of a
model, it is a classmethod. In case of an app, a developer have to put
functions in validation.py
file inside the app directory.
Let's see an example:
class FieldFile(File): (lots of stuff) def validate_upload_to_attribute(self): if not self.upload_to: yield Error(obj=self, msg="required 'upload_to' attribute", explanation="Django need to know the directory where uploaded files should be stored.", hint="Add \"upload_to = 'path/to/directory/on/server'\" to model %(model)s in file %(model_file)s.")
Notice that validation stuff is inherited by child classes. In an uncommon
case when some validation stuff should be turned off, you can overwrite the
method with an empty method to prevent it from executing. "Private" methods
starting with _validate_
are not executed.
validate_upload_to_attribute
method and all other validation methods are
called from validate_all
method which fulfills the same contract as other
validate_*
methods and calls all validate_*
methods by default. The
name of the function cannot be validate
because it would collide with
existing method of fields. By default, validate_all
method is inherited so
you do not have to write it. In case of an app, the behaviour is similar -- if
functions validate_all
or validate_models
(described later) are
omitted (or whole validation.py
file is missing), then default ones are
used.
When a developer types python manage.py validate
(or any other command
which triggers validation) then all apps are loaded and then for each app its
validate_all
function from validation.py
file is called (if it is
missing, then the default one is used). It calls all other validate_*
functions. One of them named validate_models
calls validate_all
classmethod for each model of this app. Then models validate its fields and
managers. That is the "validation chain".
The new framework introduces two new classes called Warning
and Error
(ValidationError
would collide with
django.forms.ValidationError
). They are really similar, they differ only
in their meaning. Their fields are: obj
, app
, msg
, explanation
and hint
. app
is the app where the error or warning was created;
the attribute is not set in the example because it is set by default
validate_all
function of an app. obj
is the invalid object (it may be the
same as app
). Errors connected with particular model (like in the example
-- the invalid object is a field, but the field will be attached to a model)
will have additional model
attribute thanks to default validate_all
classmethod of models.
I think that Django error messages are often confusing and they often contain
neither hints, solution proposals nor explanation of a problem. I think that
we should force Django contributors to think about it and separating error
message (msg
), explanation
and hint
(suggesting solutions list) is a
way to do it. We need to make using Django as simple as possible and
suggesting solutions and describing a problem in details is a way to do it. It
will be really important when contributors and developers start to write more
complex validation like checking inter-app dependencies.
Error messages (as well as hint
and explanation
fields) can be
formatted. %(model)s
will be replaced with model
attribute of the
error. %(model_file)s
will be path to the file where the model
exists. This allows us to write really user-friendly errors in style of "go to
file <file> and add <something> to model <model>".
First of all, the solution is as simple as possible. There is only one concept
which developers have to learn -- the contract of validate_*
methods. You do not have to know about validate_all
method or validation chain
to write your first validation piece of code which makes it easier for newbies
to play with Django.
As you can see, the solution is consistent between different kinds of objects and it does not assume that only a fixed set of object types can be validated. I believe that good long term solution should be extendable and the new framework allows us to easily add validation of new type of objects -- just modify the validation chain.
One may argue that a developer have to remember when he has to use methods, when classmethods and when functions. He may propose using validator class (like in Honza Kral's patch) -- a validated object would have to point to its validator instance. That would be a progress if it would not cause a lot of other problems. For example, if you have field A inheriting from field B then you have to remember that A validator should inherit from B validator. It also implies that you have to write a new class even though you want to validate only one small thing.
My proposal solves also a lot of existing problems. This solution plays well with almost all validation stuff in Django, i. e. existing validation of ModelAdmin can be done in admin app (all apps are loaded before validation so all models will be registered before validation starts). What's more, it is also good solution for a lot of other use cases. The new framework allows us to write new kind of apps -- apps containing mainly validation stuff.
An ideal example of an app which dovetails nicely with the framework may be
django-secure app. Since the second part of the proposal is bringing
django- secure into core, that will increase security of Django projects. An
another example is an app which inspects settings.py
file and predicts
some problems while switching to newer version of Django -- that would make
updating Django a piece of cake, but it is not part of this proposal.
The first part of this proposal was introducing the new framework and it is
mainly refactoring. During last 4 weeks I would like to focus on
django-secure app. This app does security checks (like checking if
CsrfViewMiddleware
is in MIDDLEWARE_CLASSES
setting or checking if
x-frame-options
in a request header is set to DENY
). I will merge it
with Django and fit it to the new validation framework. That will be an
evidence of flexibility of the new framework.
The app will emit only warnings (no errors). For backward compatibility, we
cannot switch on all functionality of django-secure by default. For example,
we cannot redirect all requests to SSL. So if a developer wants the
redirection, he has to set new setting named SSL_REDIRECT
to True. If the
setting is omitted, then a warning will be displayed. The warning will contain
information why the developer should always use SSL, but it will also say that
the developer can disable the warning by setting the setting to False. The
default value of SSL_REDIRECT
will be None (that will be a triple state
boolean). Other settings will work in similar way.
Some warnings will be emitted only when DEBUG
is False (so a developer is
deploying their project, not working at it). If DEBUG
is True, then the
developer is warn that they work in debug mode.
I will drop checksecure
command (see #17101) -- the security checks will
be part of validate
command. This is better then the command. There is no
danger of forgetting to run the command. It is simpler and easier for new
developers because they do not have to know the command -- the security checks
are turned on by default and you do not have to trigger it. To cut the long
story short, it is safer.
The app will live in django.contrib
and will be renamed to secure
because django.contrib.djangosecure
is too long. The app will be enabled
by default for new projects. When switching to newer version of Django, a
developer will have to manually append django-secure to INSTALLED_APPS
.
Another issue is that Django 1.4 shipped with django.middleware.clickjacking.XFrameOptionsMiddleware. django-secure should use this middleware instead of its own.
Before starting coding I would like to do some preparation:
- Discussing and writing full API of the new validation framework, i. e., what data will be available while error message formatting.
- A list of new tests for the new validation framework, particularly model, app and manager validation.
- Discussing new API of django-secure, especially names of settings, their behavior and default values.
- Improving English writing skills, i. e. reading "The elements of style".
For the first two or three weeks, I will have exams at university so I cannot work 8 hours a day. After 6 July (or after 29 June if I pass all exams quickly) I will have no job.
At the end of June, I'm going to Norway (for about five weeks) to visit my family. That will be time of one-day trips at weekends, but I will be still free at week.
A much more important issue is that I'm going on holiday about September 6. This is not backpacking-trip, I will live in a hotel with net access, but it means that I will not be able to work full time (I assume 50% of full speed). I hope you will not disqualify my proposal on that basis -- that can be an advantage because I will be highly motivated to finish before time.
(From June 17 until August 12).
I will write code bottom-up -- starting from field validation (4.1.1-4.1.3), then models (4.1.4), apps (4.1.5, 4.1.6, 4.1.8) and ending in triggering whole validation framework (4.1.9). Managers validation (4.1.7), as an easy part, will be implemented near the end.
(I will have exams at university -- I will work probably at 50% of full speed).
The tests live in tests.invalid_models package. Now, there is one file models.py with lots of models containing invalid fields. There is tests.py file with only one test that checks everything. And there is one huge model_errors variable that contains concatenated error messages from each field.
- Rewritting tests from scratch. One test for each invalid field.
- Deleting
models.py
file. - If time permits: adding new tests (like checking clashing with ORM querylookups).
(Exams at university).
The field validation lives in django.core.management.validation.get_validation_errors now.
- Introducing
Warning
andError
classes. - Adding
validate_all
method toField
class. - Moving fields validation to classes of the fields.
get_validation_errors
will only callvalidate_all
for each field of each model and validate customUser
model.
(I may have still exams).
- Writing the overview of the new framework. It will be high-level description of basic ideas and concepts (like warnings or validation chain, inheriting validation functionality) and a general rule how to write validation, override or delete the existing one as well as how the validation framework works. It will be a new "validation framework" topic, probably in "The development process" section.
- Field validation section.
- Writing full reference of
Warning
andError
classes.
- Adding
validate_all
andvalidate_fields
methods toModel
class. The former will be same asvalidate_all
ofField
except that it setsmodel
attribute of all warnings and errors. The latter should trigger validation of all fields. - Moving temporarily custom User validation to new
validate_custom_user
method ofModel
class.get_validation_errors
will only callvalidate_all
of each model. - Documentation: adding section how to write model validation.
- Adding mechanism to fetch
validation
module of given app (if thevalidation.py
file in app directory is missing, then it should return default module; if the file exists but it misses some default functions likevalidate_all
orvalidate_models
then add the default ones). - Renaming django/django/contrib/admin/validation.py file temporarily -- the file should not be visible for the validation framework at this point.
get_validation_errors
will only triggervalidate_all
method of an app.- Documentation: adding section how to write app validation.
- Moving the validation from the temporary
validate_custom_user
method ofModel
class toauth
app. Removing the temporary method. - Rewriting existing tests (tests.invalid_models) -- they assume that
errors are raised by
validate_custom_user
method at this point.
- Adding
validate_managers
method toModel
class; the method triggers validation of each manager. - Adding
validate_all
toManager
class. - Moving existing validation of
django.contrib.sites.managers.CurrentSiteManager to new
validate_*
methods ofCustomSiteManager
class. Removing triggering validation inget_query_set
of theCurrentSiteManager
class. - Rewriting tests living in tests.sites_framework.
- Documentation: adding section how to write manager validation.
- Renaming validation module of admin app back to django/django/contrib/admin/validation.py.
- Rewritting the file (mainly renaming the main function called
validate
so it will be triggered by the validation framework). - If time permits, splitting the main function into smaller ones.
- Removing triggering validation in django.contrib.admin.sites.AdminSite.register method.
- Rewritting tests living in tests.admin_validation.
- Removing get_validation_errors function and the module where it lives.
- Rewriting django.core.management.base.BaseCommand.validate which triggers
whole validation and prints errors. It should call
validate_all
of each app instead of callingget_validation_errors
. Printing errors and warnings also has to be rewritten.
- Polishing all new parts of documentation (mainly the new "validation framework" topic).
- Adding note to "release notes".
- Checking if the difference between the new validation framework and the form validation framework is strongly emphasized in documentation.
- Checking if rest of documentation is up-to-date and updating possible out-of-date parts (i. e. "How to write reusable apps").
From August 12 until September 9.
- Focusing on CheckSettingCommandTest because we dropped
checksecure
command in favour ofvalidate
command. - Focusing also on ConfTest class because we will use our own mechanism of
finding tests and we will replace conf.py with new settings in
settings.py
file. - Making final decision about API of django-secure.
- Creating new
django.contrib.secure
app turned on by default. - Creating new settings in
settings.py
(based on conf.py file). - Adding checking values of the settings in the new app. Writing warning messages that should be displayed when a triple state setting is set to None. Adding emitting warning that debug mode is turned on.
- Documentation: adding overview of
django.contrib.secure
-- its purpose and how it can be controlled via settings without describing every setting.
- Implementing security functionality -- rewriting the security middleware.
- Using
XFrameOptionsMiddleware
from Django core instead of own middleware. - Documentation: adding documentation of each setting.
(Starting the trip on Friday this week).
- Polishing new parts of documentation.
- Checking if documentation (security topics) is up-to-date.
- Updating "release notes" documentation page.
- Updating
AUTHORS
file.
If I finished earlier, I would focus on developing new django-update
app
which will make updating Django version piece of cake. It would predict some
problems and issues based on settings.py
content. For example, if a
developer were using sites
app, then while upgrading to 1.6 there would
be a warning that the sites
framework was removed. The list of predictable
problems would base on "release notes" page.
If there were some time but not so much to write django-update, I would focus on small improvements of django-secure like checking for exposed admin panel.
My name is Christopher Mędrela and I am student of University of Science and
Technology in Krakow (Poland). My time zone is UTC+01:00. I program in Python
for at least 4 years. I program also in C and Java. I have started
contributing Django 16 months ago. I have submitted a lot of patches (this
is not list of all patches). This year I have started working at big #17093
ticket ("Refactor django.template
to quarantine global state"). It is not
finished, but it shows that I am able to deal with big tasks.
Some time ago I was working for long time (more than one year) on my own
project named Scriptcraft
that was a programming game, but I suspended
this project to focus on Django contribution. It also shows that I am not lazy
and I can push myself even though there is no external motivator. :)
My English level is not worse than FCE. I prefer written communication by email or IRC.
My e-mail is chrismedrela+gsoc magic_character gmail.com
. You can find me also
at #django-dev
and #gsoc
IRC channels. My nick is chrismed
.
Bro, I have just started up with git and git hub and i'm confused on how to contribute to django i mean how to start debugging and contributing projects .It seems intimidating for me and i want to crack GSOC 2020 can u give me any advices