public
Created

Upgrading auth.User - the profile approach

  • Download Gist
authuser.md
Markdown

Upgrading auth.User - the profile approach

This proposal presents a "middle ground" approach to improving and refactoring auth.User, based around a new concept of "profiles". These profiles provide the main customization hook for the user model, but the user model itself stays concrete and cannot be replaced.

I call it a middle ground because it doesn't go as far as refactoring the whole auth app -- a laudable goal, but one that I believe will ultimately take far too long -- but goes a bit further than just fixing the most egregious errors (username length, for example).

This proposal includes a fair number of design decisions -- you're reading the fifth or sixth draft. To keep things clear, the options have been pruned out and on the one I think is the "winner" is still there. But see the FAQ at the end for some discussion and justification of various choices.

The User model

This proposal vastly pare down the User model to the absolute bare minimum and defers all "user field additions" to a new profile system.

The new User model:

class User(models.Model):
    identifier = models.CharField(unique=True, db_index=True)
    password = models.CharField(default="!")

Points of interest:

  • The identifier field is an arbitrary identifier for the user. For the common case it's a username. However, I've avoided the term username since this field can be used for anything -- including, most notably, an email address for the common case of login-by-email.

  • The identifier field is unique and indexed. This is to optomize for the common case: looking up a user by name/email/whatever. This does mean that different auth backends that don't have something like a username will still need to figure out and generate something to put in this field.

  • If possible, identifier will be an unbounded varchar (something supported by most (all?) dbs, but not by Django). If not, we'll make it varchar(512) ` or something. The idea is to support pretty much anything as a user identifier, leaving it up to each user to decide what's valid.

  • Password's the same -- if possible, make it unbounded to be as future-proof as possible. If not, we'll make it varchar(512) or something.

  • There's no validation on identifier, but the profile system allows individual profiles to contribute site-specific constraints. See below.

  • Why have an "identifier" at all? Why not just leave it up to the profiles? Most uses will have a primary "login identifier" -- username, email, URL, etc. -- and making that something that 3rd-party apps can depend on is probably good. Making it indexed means the common case -- look up user by identifier -- is as fast as possible.

  • Why have a password at all? Because if we don't, users will invent their own password management and storage, and that's a loaded gun pointed at their feet. However, password newly defaults to "!", which is the unusable password. Thus, if an auth backend doesn't use passwords, it can ignore the password field; the user object will automatically be marked as one that can't be auth'd by password.

Profiles

OK, so if User gets neutered, all the user data needs to go somewhere... that's where profile comes in. Don't think about AUTH_USER_PROFILE which is weaksauce; this proposed new profile system is a lot more powerful.

Here's what a profile looks like:

from django.contrib.auth.models import Profile

class MyProfile(Profile):
    first_name = models.CharField()
    last_name = models.CharField()
    homepage = models.URLField()

Looks pretty simple, and it is. It's just syntactic sugar for the following:

class MyProfile(models.Model):
    user = models.OneToOneField(User)
    ...

That is, a Profile subclass is just a model with a one-to-one back to user.

HOWEVER, we can do a few other interesting things here:

Multiple profiles

First, User.get_profile() and AUTH_USER_PROFILE go die in a fire. See below for backwards-compatibility concerns.

Thus, it should be obvious that supporting multiple profiles is trivial. In fact, it's basically a requirement since the auth app is going to need to ship with a profile that includes all the legacy fields (permissions, groups, etc), and that clearly can't be the only profile. So multiple profile objects: fully supported.

Auto-creation of profiles

Right now, one problem with the profile pattern is that when users are created you've got to create the associated profile somehow or risk ProfileDoesNotExist errors. People work around this with post_save signals, User monkeypatches, etc.

The new auth system will auto-create each profile when a user is created. If new profiles are added later, those profile objects will be created lazily (when they're accessed for the first time).

This behavior can be disabled:

class MyProfile(Profile):
    ...

    class Meta(object):
        auto_create = False

Extra user validation

Profiles may contribute extra validation to the User object. For example, let's say that for my site I want to enforce the thought that User.identifier is a valid email address (thus making the built-in login forms require emails to log in):

from django.core import validators

class MyProfile(Profile):
    ...

    def validate_identifier(self):
        return validators.is_valid_email(self.user.identifier)

That is, we get a special callback, validate_identifier, that lets us contribute validation to identifier. This looks a bit like a model validator function, and that's the point. User will pick up this validation function in its own validation, and thus that'll get passed down to forms and errors will be displayed as appropriate.

Profile data access from User

There's two ways of accessing profile data given a user: directly through the one-to-one accessor, and indirectly through a user data bag.

Direct access is simple: since Profile is just syntactic sugar for a one-to-one field, given a profile...

class MyProfile(Profile):
    name = models.CharField()

... you can access it as user.myprofile.name.

The accessor name can be overidden via a Meta option:

class MyProfile(Profile):
    ...

    class Meta(object):
        related_name = 'myprof'

[Alternatively, if this is deemed too magical, we could require users to manually specify the OneToOneField and provide related_name there.]

This method is explicit and obvious to anyone who understands that a profile is just a object with a one-to-one relation to user.

However, it requires the accessing code to know the name of the profile class providing a piece of data. This starts to fall apart when it comes to reusable apps: I should be able to write an app that has a requirement like "some profile must define a name field for this app to function." Thus, users expose a second interface for profile data: user.data. This is an object that exposes an amalgamated view onto all profile data and allows access to profile data without knowing exactly where it comes from.

For example, let's imagine two profiles:

class One(Profile):
    name = models.CharField()
    age = models.IntegerField()

class Two(Profile):
    name = models.CharField()
    phone = models.CharField()

And some data:

user.one.name = "Joe"
user.one.age = 17
user.two.name = "Joe Smith"
user.two.phone = "555-1212"

Let's play:

>>> user.data["age"]
17

>>> user.data["phone"]
"555-1212"

>>> user.data["spam"]
Traceback (most recent call last):
    ...
KeyError: spam

>>> user.data["name"]
"Joe"

Notice that both profiles are collapsed. This means that if there's an overlapping name, I only get one profile's data back. Which? By default it's undefined and arbitrary, but users can set a AUTH_PROFILES settings to control order; see below. If AUTH_PROFILES is set, the first profile defining a given key will be returned.

If you need to get all values for an overlapping key, you can use user.data.dict:

>>> user.data.dict("name")
{"one": "Joe", "two": "Joe Smith"}

Setting data works; however, "in the face of ambiguity, refuse the temptation to guess":

>>> user.data["age"] = 24
>>> user.one.age
24

>>> user.data["name"] = "Joe"
Traceback (most recent call last):
    ...
KeyError: "name" overlaps on multiple profiles; use `user.one.name = ...` or `user.two.name = ...`

Like all models, just setting user.data keys doesn't actually save the associated profile back to the db. For that, user user.data.save(). This saves all associated profiles (or perhaps just modified ones if we're feeling fancy).

Querying against profile data

Making queries against profiles falls into a similar situation as accessing profile data. Since profiles are sugar for one-to-ones, you can always simply do:

User.objects.filter(prof1__field1="foo", prof2__field2="bar")

However, just like with data access, reusable apps may need the ability to to make queries against profile data. That looks like this:

User.objects.filter(data__field1="foo", data__field2="bar")

This data__ syntax also works for order_by(), etc.

Once again, "in the face of ambiguity, refuse the temptation to guess": if a data field is duplicated, you'll get an exception if you try to query against it.

Performance optimization

One of the main criticisms I anticipate is that this approach introduces a potentially large performance hit. Code like this:

user = User.objects.get(...)
user.prof1.field
user.prof2.field
user.prof3.field

could end up doing 4 queries. This could be even worse if we go with the magic-attributes described above: those DB queries would be eventually hidden.

Luckily this is fairly easy to optimize: allow user queries to pre-join onto all profile fields. THat is, instead of SELECT * FROM user do SELECT user.*, prof1.* FROM user JOIN prof1. Since profiles all subclass Profile it's trivial to know which models to do this to.

In other words, User.objects.all() works the same as User.objects.select_related(*all_profile_fields). On many databases, this JOIN across indexed columns is nearly as fast as local column access. However, since there are situations where these JOINs aren't wanted, it's easy to turn off: User.objects.select_related(None).

Controlling which profiles are available: AUTH_PROFILES

AUTH_PROFILES is an optional setting that controls profile discovery. It's unset by default, and if let unset Django will simply assume any installed profile -- any Profile subclass in an app that's in INSTALLED_APPS is an installed profile. This is probably good enough for the common case. However, it falls down in two situations:

  • If multiple profiles defined the same fields, then the user.data accessor will find those fields in an arbitrary order.

  • If users want to install an app with a profile they don't want, or if an app ships multiple profiles, etc.

In both of these cases, you can use the AUTH_PROFILES setting to control which profiles are considered installed, and in which "order". It's just a list pointing to profile classes:

AUTH_PROFILES = ["path.to.OneProfile", "path.to.TwoProfile"]

If a profile isn't listed in the list but is a model in INSTALLED_APPS, the model will still get installed (the table will be there), but it won't be considered a profile. That means none of the special behavior -- user.data, performance optimization, etc. It's an error to have a model in AUTH_PROFILES that's not a Profile or not installed.

Auth backends

Auth backends continue to work almost exactly as they did before. Most notably, they'll still need to return an instance of django.contrib.auth.models.User, and that user will require some sort of unique identifier.

However, auth backends now can take profiles into account, which means that things'll like OpenID backends can have an OpenIDProfile and store the URL field there (or use the URL as the identifier, perhaps).

Forms

Under the new system, if you simple create a model form for user:

class UserForm(ModelForm):
    class Meta:
        model = User

... you'll get a field that only has identifier and password.

Thus, Django will ship with an convenience form, django.contrib.auth.forms.UserWithProfilesForm that automatically brings in all profile fields onto a single form and properly saves users and their profiles. This'll be useful for registration. We may also need to give this form a hook to only include particular profiles; that's TBD.

There's also a set of existing user forms that're used for login, password changing, etc. These'll stay the same, although they'll switch what data they talk to a bit.

Backward compatibility

The big one, of course.

First, there's deprecation to consider. AUTH_USER_PROFILE and user.get_profile() will simply be removed. Access to attributes directly on User objects (user.is_staff, etc.). This will be replaced by user.data and/or user.defaultprofile attributes. Deprecation will be according to the normal schedule: polite warnings in 1.5, more urgent ones in 1.6, and outright removal in 1.7.

[If it turns out that this schedule causes pain for some users we might consider a longer deprecation cycle for these things.]

After that, there's two facets here; an easy one and a hard one. Let's do the easy one first:

The "default profile"

Many, many apps rely on existing user fields (user.is_staff, user.permissions, etc.) -- the admin for one! The fields need to stick around at least for the normal deprecation period, and possibly for longer. Thus, we'll ship with a DefaultProfile that includes all the old removed fields, and we'll include sugar such that user.username, user.is_staff, and all that stuff continues to use.

Django will ship with backwards-compatible shims for this default profile. Data access (user.is_staff, etc.) will continue to work, as will support in queries (User.objects.filter(is_staff=True)). This'll get deprecated according to the normal schedule.

[We might want to come up with a better name than DefaultProfile. If we plan on deprecating the object, maybe LegacyProfile is more appropriate.]

At some point, people may want to remove the default profile; they can do so by using AUTH_PROFILES. Obviously some stuff won't work -- the admin, again -- but if people turn off the default profile they should be prepared to deal with those changes.

Model migration

This one's the big one: there has to be a model migration. I'm not tied to the solution below, but there are a couple of rules this process needs to follow:

  1. This migration cannot block on getting schema migration into core. It'd be great if we could leverage the migration tools, but we can't block on that work.

  2. Until the new auth behavior is switched on, Django 1.5 has to be 100% backwards compatible with 1.4. That is, we need something similar to the USE_TZ setting behavior: until you ask for the new features, you get the old behavior. This decouples upgrading Django from upgrading auth, and makes the whole upgrade process much less low-risk. If we don't do this, we're effectively requiring downtime for a schema migration from all our users, and that's not OK.

Given those rules, here's my plan:

Django 1.5 ships with the ability to run in two "modes": legacy user mode, and new user mode. There's no setting to switch modes: the mode is determined by looking at the database: if auth_user has an identifier field, then we're in new mode; otherwise we're in old.

In old mode, django.contrib.auth.User behaves much as it did before:

  • The auth_user table looks as it did before -- i.e. user.username and friends are real, concrete fields.

  • None of the special Profile handling runs (no auto-joins, etc). Profile objects still work 'cause they're just special cases of models, but no magic identifiers, no validation contribution, etc.

  • user.identifier exists as a proxy to username to ease forward motion, but it's just a property proxy.

The new mode gets all the new behavior, natch.

How to upgrade

A single command:

./manage.py upgrade_auth

(or whatever). This means we have to ship with a bunch of REALLY WELL TESTED, hand-rolled SQL for all the supported Django backends and versions. That'll be a pain to write, but see rule #1 above. This'll do something along the lines of:

CREATE TABLE auth_defaultprofile (first_name, last_name, ...);
INSERT INTO auth_defaultprofile (first_name, ...) 
    SELECT first_name, ... FROM auth_user;
ALTER TABLE auth_user DELETE COLUMN first_name;
...
ALTER TABLE auth_user RENAME username TO identifier;

This means that the upgrade process will look like this:

  1. Upgrade your app to Django 1.5. Deploy. Note that everything behaves as it has in the past.
  2. Run manage.py upgade_auth.
  3. Restart the server (ew, sorry.)
  4. Now start using all the new profile stuff.

Note that an initial sycndb will create the new models, so new projects get the new stuff without upgrading.

Warnings, etc.

Fairly standard, but with a twist:

  • In Django 1.5, if you haven't yet issued an upgrade_auth, you'll get a deprecation warning when Django starts.

  • In Django 1.6, this'll be a louder warning.

  • In Django 1.7, upgrade_auth will still be there, but Django will now refuse to start if the upgrade hasn't run yet.

  • In Django 1.8, upgrade_auth is gone.

FAQ

Where does this idea come from?

It's basically what I do already, and from looking at other people's code it appears to be on its way towards being something of a best practice pattern. That is, I tend to see code like:

class MyProfile(models.Model):
    user = models.OneToOneField(user, related_name='profile')
    ...

... and then access to the profile as user.profile.

This profile essentially formalizes this pattern, provides for some improved syntactic sugar, and allows for multiple profiles in a fairly pluggable way.

Why not a swappable user model?

I'm convinced that such an idea is ultimately a bad idea: it allows apps exert action at a distance over other apps. It would allow the idea of a user to completely change without any warning simply by modifying a setting. Django did this in the past -- replaces_module -- and it was a nightmare. I'm strongly against re-introducing it.

However, please do note that this proposal doesn't actually preclude introducing a swappable user in the future. It's possible that the right implementation could change my mind, and so this proposal leaves the option available.

Will user.save() call user.validate() by default?

(This idea was in an earlier draft of this proposal.)

No. Doing this would make the extra contributed validation a bit stronger, but it would ultimately make User behave differently from a "normal" model, and that's probably a bad idea.

Why user.data?

It's not perfect, but it's the best of a bunch of flawed options. Other things we considered:

  • Nothing: simply make users access profile data as user.someprofile.somefield. There's no magic here, but it ultimately falls down since it doesn't allow "duck typing" of profiles. That is, if I'm the author of a reusable app I want to be able to grab an email address from "some profile" without having to know which profile provided that field. If we had no combined data bag, apps would do things like hardcoding user.defaultprofile.email, and that'd fail if projects remove the default profile.

  • Do the above, but provide some mechanism for apps to determine which accessor they'd need to use for some field. That is, there'd be a way to pass into the app the name of the profile, and then apps would use user.<thatprofile>.somefield. This mechanism could be the app object introduced by app-refactor, for example. This is workable, but it feels like a lot of configuration and bookkeeping for what's really a basic thing: getting information from a profile without caring where that information came from.

  • Magic attributes: let user.somefield magically proxy to user.someprofile.somefield. This I deem to simply to too much magic: it blurs the difference between profile data and local data, and leads to expectations that things like User.objects.filter(somefield=...) would work (which wouldn't without even more magic). This would also seriously muddle what user.save() does.

Ultimately, user.data seems to be the best option. It's clear that user data isn't the same as an attribute, it provides the ability to other things like call dict() and save(), and it preserves reusability.

Why filter(data__field=foo)?

Most for symmetry with user.data. We also considered User.objects.data(foo=bar), but ultimately data__ is the most extensible as it allows for the same syntax for order_by(), etc.

Is this special sugar for OneToOne available for other models?

That's out of scope for the purposes of this proposal. It may very well be the case that this sort of "privileged OneToOne" could be useful for other projects, and it may turn out to be just as much work to create a general API as a specific one. But that's not something that's required for this to work, and can always be a future refactor/improvement.

Gonna just add my thoughts where while I still have them in my head.

I think that the basic problem with AUTH_PROFILE_MODEL, and get_profile() is that with them there is no way to have
a "bucket of user data" that comes from multiple sources. An example would be, I want to know a user's email address, I don't
care where this comes from, I just need his email. If we don't have some form of "bucket of collected user data" (via magic attributes,
or something else) then this "I just need a piece of data X, I don't care where from" becomes very hard to do generically.

Currently my favorite proposal is to use a querydict like interface on an attribute such as data, or details, or profiles.

So you would do:

user.data["email"] => "user@foo.com"
user.data.getlist("email") => ["user@foo.com", "user@foowork.com"]
# This last one maybe not, since it could be done by user.work.email, or user.personal.email
user.data.getdict("email") => {"work": "user@foowork.com", "personal": "user@foo.com"} 

This would let you pick and choose on a attribute by attribute basis if you care where it comes from. It also solves the issue of namespace
clashes between related_name and "magic attributes".

I feel that if there isn't some concept of a generic bucket of user data that we would be better off just doing the work to rename username to
identifier and making it and password variable length. The reason being that the majority of apps that depend on user data is just going to use
whatever the default profile is (assuming their data is available on the default profile). Then we've moved the goal posts from the issue being
everyone depending on User, to the issue being everyone depends on the default profile.

Without the concept of a generic bucket of data all we have done is move data, made things more complex, but ultimately still have the same
thing as we have now (apps can already use a OneToOneField to User to get user.blah.data). To me the genius of this idea is that it allows
multiple apps to all "contribute" to the in memory concept of a user.

Additionally I think that we are going to need some form of project controlled registry of profile models. The best I can come up with is a
USER_PROFILE_MODELS list that is used to determine the order for. I think that without both a method of controlling the order of resolution
and putting this control in the project (as opposed to app, or no one) makers hands that the "generic bucket of data" use case with multiple
names is a non starter.

Additionally I believe that we should allow multiple values for the same field, consider something like email where you might have a WorkData
profile and a PersonalData profile both with email addresses.

An issue brought up by ptone is that of querysets, which is another reason why i've leaned towards the QueryDict like interface for user data.

User is a Model, it is not unusual to expect that if I have u.email that I can do User.objects.get(email=u.email), moving it to u.data["email"]
makes a conceptual break from model attributes and makes it less likely that a person would assume they can queryset on it.

Quick comments:

  • In general multiple user profiles is a really nice thing. Lets add that in any case.
  • Please don't add a field called identifier which is not unique. That is just evil. I think you could make it unique, but allow nulls (in SQL you can have multiple nulls in a unique field). Or, you could say that it is a requirement that you have a unique identifier for the user.
  • How about nosql camp? They do not have joins available, so multiple user profiles means multiple queries and no possibility for efficient filtering on multiple columns.
  • I am a little worried about backwards compatibility: you can add the properties to the User stub, but you can not .order_by('lastname', 'firstname') for example.

I'm a bit concerned we'll get apps out there each with it's own profile which will get messy, and then you'll have a problem because each profile adds it's own validation to the user which could be two like: make sure there is a '@' in the identifier and make sure there isn't a '@' in the identifier... who wins?

Also, an identifier and password, sure if you don't need the password you don't use it and if you don't need the identifier you don't use it. But what if my authentication requires some extra information. Would I end up with a profile for it?

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.