hashlash/gsoc-django-secret-manager.md

## gsoc-django-secret-manager.md

      
    Raw
  

              gsoc-django-secret-manager.md
            
          
    GSOC 2020: Django Secrets Manager

GSOC Proposal for Django Secrets Manager by Muhammad Ashlah Shinfain
Table of Contents:

Abstract
Usage and Implementations

Developer (Django's Users)

Retrieving secrets
Choosing Backend


Implementation

Module-level secrets variable
Secret Backends


Extension
Additional Features


Milestones
Timeline

Community Bonding
Coding

First Phase
Second Phase
Final Phase


Q&A
Conclusion
About Me

Abstract

settings.py: the source of configs
Django's users usually create settings.py file to define configurations needed to run their application. These
configurations can be obtained easily by using django.conf.settings module-level variable. Unfortunately, this kind of
setting is hard to maintain when we bring the concept of multiple deployments. Each deploy can have its own
configuration to handle, for example, backing services resource handles, external services credentials, and other per
deploy values, as stated in 12factor. That's why some of Django projects define
multiple settings (like
djangoproject.com and
readthedocs.org) for each
environment. But still, we must maintain each one of deploys settings, if any deploy configs changes we need to also
change the codebase. There's also potential security issue when we (accidentally) expose secret credentials.
Various outsource config solution
To add more flexibilities, we can use other non-codebase resources. The most basic source is using environment
variables that can be easily obtained by Python's os.environ. It is very simple, but the format is very limited and
kind of hard to maintain as there's no fixed source configuration (its mixed with other systems environment variables).
Another alternatives are using a file and parse it inside the code. There are various formats to choose for the file:
.env (this one is pretty popular in several programming languages), json (like what
djangoproject.com use),
yaml, cfg, ini, etc. A more advanced option is using a secret manager (such as Google Secret Manager, Ansible
Vault and HashiCorp Vault) which usually used remotely through the internet so applications might share some of their
settings. These various options may complicate things to achieve one simple thing: get a config, and unfortunately,
Django hasn't had any solution for this yet.
What's this project about
However, fortunately, Django has initiated the solution by putting this as one of Django's Google Summer of Code 2020
ideas list. As mentioned on the page, this
project aims to "design and add an abstraction interface over secrets managers that allows users to easily map to an
external secret in a settings file". In addition to Django's defined goal, I want to make this really simple for
developers to use, just like using django.conf.settings or os.environ anywhere they need it. There should also be
flexibilities for developers to extend the predefined logic (to retrieve the secrets) easily.

For convenience in this proposal, we will call the type of config that does not vary between deploys as config and
the one that varies between deploys as secret.

Usage and Implementations

This section will explain how basic usage and implementation of the proposed secret manager, how developers extend
predefined logic, and some possible additional features.
Developer (Django's Users)

Retrieving secrets

Since we need to give the best design of API to Django's user, as mentioned in one of Django's Forum
topics, I think we can give
them something that they're familiar with: django.conf.settings and os.environ. I propose to give access to the
secrets using a module-level variable, which gives us singleton capability without having the user to instantiate it
(this pattern can be found many times in Django: django.apps.app, django.admin.sites.site, etc). Once imported, this
module-level variable should have capabilities to be used as a mapping object just like os.environ.
Thus, assuming this secrets module-level variable will live in django.conf.secrets module, developers can do
something like this anywhere the secrets needed:
from django.conf.secrets import secrets

secrets['KEY']
Choosing Backend

Django's user can have multiple sources of secrets and we need to facilitate this. As part of Django components, the
secrets manager can use the advantages of settings.py to configure its behavior. I propose a SECRET_BACKENDS config
to let the users choosing the secrets sources they needed. Its simplest form would be just a list of strings that
represent the dotted Python path to secrets backend class. But, to add more flexibilities on configuring the secrets I
think it's better to define it as a list of dictionary of backends configuration, just like Django's
TEMPLATES and
AUTH_PASSWORD_VALIDATORS.
SECRET_BACKENDS = [
    {
        'BACKEND': 'django.conf.secrets.backends.DotEnvSecret',
        'OPTIONS': {
            'PATH': secrets['DOTENV_PATH'],
        },
    },
    {
        'BACKEND': 'custom_secret_backends.GoogleSecretManager',
        'OPTIONS': {
            'URL': secrets['GOOGLE_SECRETS_URL'],
            'CREDENTIALS': secrets['GOOGLE_SECRETS_CREDENTIALS'],
        },
    },
]
The secret backends will be evaluated with the order from top to bottom. This will take advantage when the setup of
later secret backend depends on the previous secrets. For example, GoogleSecretManager will need GOOGLE_SECRETS_URL
and GOOGLE_SECRETS_CREDENTIALS secrets which can be retrieved from the previous backend. The system environment
variables can also be used by default -- unless the Django Community decides its better to exclude from default and give
an option to include using an environment variable backend. Even though there are multiple secrets backend, which each
secret may share some same variables, the later backend will overwrite it. The reasoning behind this is each Django
project will only use one value for each variable on each deployment.
Implementation

Module-level secrets variable

The secrets variable will be the container of all secrets loaded from defined backends. This variable should have the
capabilities that a mapping object has (similar to
os.environ). This includes implementing all mapping methods, such as __getitem__, __iter__, __len__, etc. We can
also add some additional functionalities to this container, such as retrieving specific source secrets, reloading
secrets, etc.
If you are aware, in the previous example of settings.py, we use the secrets variable to define the
SECRET_BACKENDS which will be used to configure the secrets. This sounds like a circular import which can cause a
serious problem. Fortunately, Django has tackled this kind of problem and create some lazy functionalities in
django.utils.functional module. The snippets below roughly describe how secrets handle a query when it's not
configured yet:
from django.conf import settings
from django.utils.functional import empty, SimpleLazyObject
from django.utils.module_loading import import_string

class Secrets:
    _secrets = empty

    def __init__(self):
        self._setup()

    def _setup(self):
        if settings.configured and not self.configured:
            secret_backends = settings.SECRET_BACKENDS
            self._secrets = {}
            for backend_str in secret_backends:
                backend_cls = import_string(backend_str)
                backend = backend_cls()
                self._secrets.update(backend.get_secrets())

    def __getitem__(self, item):
        self._setup()
        if self.configured:
            return self._secrets[item]

        def proxy_getitem():
            if not self.configured:
                self._setup()
            return self._secrets[item]

        return SimpleLazyObject(proxy_getitem)

    @property
    def configured(self):
        return self._secrets is not empty

secrets = Secrets()
The only time this secrets variable would return a SimpleLazyObject is when the django.conf.settings is not
configured yet. After being configured, all interactions on this lazy object will be evaluated and used as a normal
object.
Secret Backends

The backend class is the key to flexibility in this proposed secret management. Each concrete backend class will
represent how the system retrieves the secrets. For now, there will be 3 base backend classes, one for the root base,
one for secrets retrieved from some filesystem, and another for secrets retrieved from the internet.


BaseSecretBackend
This backend class will be the root base for all secrets backend classes. Even though current implementation will only
have one method, it's better to provide a base so anyone using this base will have future updates from the base. The
most important method for this class (and for all of its subclasses) is the method for getting secrets from the
corresponding source, which we called get_secrets() from the previous snippet.


BaseFileSecretBackend
This backend class (and its derivatives) will take flexibility in parsing various filesystem formats such as .env,
json, yaml, etc. In addition to what BaseSecretBackend defined, we need some other attributes: path to the
filesystem, and the format parser.


BaseHttpSecretBackend
This kind of backend will be used to retrieve secrets from the internet through HTTP. As all HTTP requests, there
should be HTTP method, request URI, headers, and optional payload. The request URI may be formed as string format, so
the users can inject some parameters inside it. After getting response through HTTP request, we still have to parse it
to Python's mapping object before we can use it. These parsers may be shared with the filesystem secret parser.


For convenience, I would like to propose two things for all secret backends:


Each secrets backend should give rational default values whenever possible, so users can use the backend with minimum
setup. The default values can be discussed with Django Community.


Each secrets backend should have capabilities to retrieve the required parameters from environment variables (or
previously loaded secrets) without having to explicitly state them in SECRET_BACKENDS (using 'OPTIONS' key in the
first snippet). If the parameters defined in SECRET_BACKENDS, the secrets will use that parameter instead of the one
defined in the environment.


Both items above should be well documented, so users can easily refer to the docs when using the secrets.


Extension

When developers need to customize the defined backends, they can easily inherit the most appropriate backend class and
override some of the functionalities needed. The custom backend can be used easily by including its dotted Python path
in the SECRET_BACKENDS config.
Additional Features

Belows are some possible additional features that can be implemented using the proposed secrets management.
Per-source secret management
Some times, developers might need to use some secrets from a specific source. We can facilitate the need by giving the
secrets variable a method for retrieving secrets from a specific source. We may also give a method for checking a key
This would also be beneficial for debugging which variables come from which source.
Subset variables from each source
Using 'OPTIONS' in SECRET_BACKENDS, we can specify what variables we expect to retrieve from each source, thus
ignoring unnecessary secrets.
Generate source on startproject
We can create a Django management command that create a secrets source file based on the first secret backend
configuration in SECRET_BACKENDS. The content might be something that shouldn't explicitly shows up in settings.py,
such as SECRET_KEY. With this, we can also introduce and promote this new feature to developers.
Runtime reload
We might need the secrets variable reloads from the source in runtime. But I think we should check whether the reload
without restart will works well with all Django components.
Auto decode base64
Some secrets might need binary data that can't be represented using normal characters. That's when the base64 comes to
rescue. Actually I was inspired by how Kubernetes require the value of their
secrets as base64 encoded.
Milestones

The list below describes roughly what I've decided to work during the GSOC coding period by default. This list might
change if needed, conforming to what the community wants. I've ordered the list in the matter of importance and put the
weight (difficulty level) of each task.


Module-level secrets variable


Loading from backends relatively easy


Overriding the same variables based on SECRET_BACKENDS relatively easy


Lazy resolution of secrets (when the django.conf.settings is not configured yet) relatively hard


Gradual resolution of secrets (using previously loaded secret for loading the next one) relatively hard


Per-source secrets management intermediate


Retrieving only the subset of secrets from each source intermediate


Secret backends


BaseSecretBackend relatively easy


Filesystem secrets


BaseFileSecretBackend relatively easy


DotEnvSecretBackend intermediate


JsonSecretBackend intermediate


Other filesystem backends


Remote secrets


BaseHttpSecretBackend intermediate


AnsibleVaultSecretBackend relatively hard


HashicorpVaultSecretBackend relatively hard


Miscellaneous

Generate .env (or other preferred formats) and SECRET_BACKENDS settings on startproject intermediate


I pick the DotEnvSecretBackend and JsonSecretBackend for filesystem secrets and either AnsibleVaultSecretBackend
or HashicorpVaultSecretBackend for remote secrets because I think those are the most popular source of secrets for
their category. If time allows, I will implement other secrets backend as well.
Timeline

This timeline was designed based on the
Google Summer of Code 2020 Program Rules
Community Bonding

May 5, 2020 - June 2, 2020
I will use this time to make some adjustment about the things listed (but not limited to) below, by asking to the Django
Community.


Where to place the secrets module


Secrets backends default values


Secrets backends required parameters environment variable naming


Deciding implementation priority (secrets backend, additional features, etc)


Coding

June 2, 2020 - August 25, 2020
First Phase

The first phase will about working on basic implementation of secrets and some of the secrets backend.
Week 1-2: June 2, 2020- June 14, 2020

Due to the COVID-19 outbreak, the calendar at my university was pushed back for 2 weeks and will have the final exam on
June 2 - June 10. For this period of time, it will be hard for me to do the work. But, I will make sure by the end of
June 10, I will have the work on BaseSecretBackend, BaseFileSecretBackend, and DotEnvSecretBackend done. The rest
of the week will be used for implementing JsonSecretBackend and their test. Those two concrete backends
(DotEnvSecretBackend and JsonSecretBackend) will be used for simulating secrets retrieval by the secrets module
level variable.
Week 3: June 15, 2020 - June 21, 2020

This week will be used for implementing some basic functionalities of secrets. This includes loading secrets from the
backend class, overriding shared variables based on backends order in SECRET_BACKENDS, and selecting the subset of
secrets for each secret sources. The implementation of these will include their test to make sure each functionality
works as expected.
Week 4: June 22, 2020 - June 28, 2020

The first part of this week will be use for implementing (including the test) the remaining basic functionality, which
is per-source secret management. The later part of the week will be used for evaluation and documentation.
Second Phase

The second phase goal is to make the secrets variable usable anywhere anytime using the Django's lazy
functionalities.
Week 5 and Half of Week 6: June 29, 2020 - July 8, 2020

In this period of time, I will implement the functionality that makes the secrets can be used inside settings.py
without compromising circular import. Since I've worked some part of this (provided in one of the snippets above), I can
take the time to make sure that the implementation is working at its best as the concept of laziness is still a pretty
tricky concept for me.
Half of Week 6 and Week 7: July 9, 2020 - July 19, 2020

In this period of time, I will implement the functionality that makes the secrets can use previously loaded secrets to
configure the next secrets backend. This implementation will require more advanced trick on using laziness concept.
Week 8: July 20, 2020 - July 26, 2020

I will use this period of time flexibly based on some situation. If there was a problem in the previous week, I'll make
sure to finish the task this week. I also can do the implementation for generating .env file on startproject when
the times fit. Or maybe I will just start the next task early. Whatever the situation I will make some time to do the
evaluation and documentation of this phase.
Final Phase

In the final phase, I will work on the HTTP API based secrets manager. This require some exploration first, as I have
limited experience on using this kind of secrets manager.
Week 9: July 27, 2020 - August 2, 2020

This week will be used for exploring HTTP API based secret managers. Some main information that I will look for: the
request schema for users to take their secret out of the secret manager, the response format of the secrets, and if
there's a CLI tool, does the tool use some environment variables to retrieve the credentials. While doing the
exploration, I will start the implementation for BaseHttpSecretBackend. I will also do some setup on one of the secret
managers for me to use when implementing the secrets backend.
Week 10-11: August 3, 2020 - August 16, 2020

This week will be used for implementing the secrets backend of one of the HTTP API based secret managers previously
explored. The biggest candidate is either HashicorpVaultSecretBackend or AnsibleVaultSecretBackend. Considering I
have limited experience in using any of HTTP API based secret managers, I estimate this work will be done in two weeks.
Half of Week 12: August 17, 2020 - August 19, 2020

I will do some documentation for this phase in this period of time.
End of Final Phase: August 20, 2020 - August 25, 2020

In this period of time, I aim to iron out any issues left and do the final evaluation for all of the previous
implementations. After the issues are solved, the whole work will be ready for Django's core developers to be merged
into the master codebase.
Q&A

Would this replace the use of os.environ?
It's not meant to replace the os.environ. If developers need to retrieve environment variables only, they still can
use the os.environ instead of the secrets variable. Indeed, this secrets variable includes the os.environ
variables by default, but it will be mixed up with other variables from configured secret sources. So it is still easier
to use os.environ if developers want to retrieve environment variables only.
Can we remove the need for multiple settings.py?
The multiple settings.py approach has its own purpose -- it has its own
wiki page. There are some logic that can reside inside the Python
settings.py files that can't represented by secrets. But using secrets, developers can tune their settings.py
files more conveniently.
Conclusion

The secrets module-level variable that has laziness capability make developers can easily use it anywhere anytime
like django.conf.settings and os.environ. Developers can also tweak the secrets retrieval so easily thanks to
SECRET_BACKENDS settings. Using this approach, it will make the future development of this secrets management easy.
And for a bonus point, this change will not break any of your current project when you update the Django version.
About me

Hi! My name is Muhammad Ashlah Shinfain, my friends call me Ashlah. Currently, it's my final (4th) year as Computer Science student at University of Indonesia. I lived in Depok -- a city next to Jakarta -- Indonesia (UTC+7).
It's my 4th year knowing Python and has been using Django for 3 years now. There are several projects that I've done
during these 3 years with Django. Most notably is when I lead an organization's dev team for one year. This is when I
learn so much about Django. This dev team responsible for developing and maintaining the organization's system, such as
recruitment system, publication request system, and book lending system. Another project that I'm proud of is when I
single-handedly develop an API service for a health tracker mobile app. This is when I got the feel on how to code
properly. Because I'm the only one who manage this API service development, I can easily maintain the best practices
used in the project.
Recently when I code, I often look into its source code to learn the pattern they use and best practices the conform. It
can be said that I learn to code by examples. My favorite packages that I use, and usually became my code style
reference, are django-cas-ng,
django-allauth,
django-rest-framework, and of course django itself. For Django
projects, I often refer to djangoproject.com and
readthedocs.org, which I found as two of the most popular Django
project from djangopackages.org.
Currently, my contribution to Django codebase are these two PRs: django/django#12596 and
django/django#12591. Although it's been reviewed, it's not merged yet. Even after finishing this
GSOC milestone, I think Django will remain to be my favorite open source project to contribute.
If there's something to be discussed, I can be reached through my email: muh.ashlah@gmail.com. If you want to know me
more, you can see my GitHub profile and my
StackOverflow profile. I also shared my developer journey on
Twitter