Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active February 13, 2025 13:39
Show Gist options
  • Save simonw/4a13c4b10176d7b8e3d1260f5dcc9de3 to your computer and use it in GitHub Desktop.
Save simonw/4a13c4b10176d7b8e3d1260f5dcc9de3 to your computer and use it in GitHub Desktop.

Writing internal documentation with o3-mini

I generated this document by running the following in the root of https://github.com/simonw/datasette

files-to-prompt datasette -e py -c | \
  llm -m o3-mini -s 'write extensive documentation for how the permissions system works, as markdown'

The -e py command gathers all of the *.py files, but ignores *.md or *.html.

The datasette/ folder this targets does not contain the tests - those live in tests/.

The -c option means "Claude-style XML".

Here's the output of that files-to-prompt command: https://gist.github.com/simonw/1922544763b08c76f0b904e2ece364ea

llm logs -u shows that it used 99,348 input tokens, 3,118 output tokens. That's a cost of 12.3 cents.

(o3-mini has an input limit of 200,000 tokens, so we only used half of that.)

Here's what I got:


Datasette Permissions System Documentation

This document describes in detail how Datasette’s permissions system works, how permissions are registered and checked, and how customizations (both via configuration and plugins) are applied. The permissions system is central to Datasette’s security, controlling who may view databases, tables, execute SQL queries, modify data, and perform other actions.


Overview

Datasette uses a flexible and extensible permissions system based on Pluggy hooks. The system is built around two core ideas:

  1. Permission Objects: These objects encapsulate attributes related to each permission (such as its name, abbreviation, description, and whether it applies to an instance, database, or resource).
  2. Permission Checks via Hooks: When a permission needs to be verified, Datasette calls a chain of hook implementations (using the permission_allowed hook) provided by the defaults as well as any custom plugins. Each hook returns an opinion (typically True, False, or None to indicate no opinion).

The final decision is determined by applying a veto rule—if any hook returns a denial (False), the permission is denied; if one or more return approval and no denial is present, then the permission is granted; if no plugin provides any direct opinion, Datasette falls back on the default value defined on the registered permission.


Registering Permissions

Default Registration

Datasette’s built-in permissions are registered via the register_permissions hook. In particular, the module datasette/default_permissions.py returns a tuple of pre-defined Permission objects. An example of one such object is:

Permission(
    name="view-instance",
    abbr="vi",
    description="View Datasette instance",
    takes_database=False,
    takes_resource=False,
    default=True,
)

Each registered permission has several attributes:

  • name:
    A unique string identifier for the permission. For example, "view-instance", "insert-row", "drop-table", etc.

  • abbr:
    A shorter abbreviation for the permission (e.g. "vi" for "view-instance"). This abbreviation may be used in configuration or token restrictions.

  • description:
    A human-readable string describing what the permission controls.

  • takes_database & takes_resource:
    Boolean flags indicating whether the permission is context-sensitive.

    • If takes_database is True, then the permission applies in the scope of a database.
    • If takes_resource is True, it applies even more specifically (for example, to a table or row).
  • default:
    The default state of the permission (usually True or False). This value is used if no overriding configuration or plugin opinion is provided.

  • implies_can_view (internal):
    An internal flag used to mark permissions that imply the ability to view a database, table, or instance. This flag is used by default permission logic when evaluating view-related requests.

Extending Permissions via Plugins

Plugins can register their own permissions using the same register_permissions hook. When the Datasette startup process calls all implementations registered on that hook, it collects all Permission objects provided by core and external plugins. The datasette instance then maintains an internal mapping of permission names (and abbreviations) to these Permission objects.

A duplicate permission name or abbreviation (with different Permission objects) will cause Datasette to signal a startup error. This ensures that every permission is uniquely defined.


Checking Permissions

When Datasette needs to verify if an actor (a user or other requester) has a given permission, it calls the hook permission_allowed. This hook can be implemented by both the default implementation and any additional plugins.

Flow of a Permission Check

  1. Default Opinion:
    The default implementation (see datasette/default_permissions.py in the function permission_allowed_default) contains logic that:

    • Grants unrestricted access to privileged users (e.g. an actor with ID "root") for several sensitive operations.
    • For view-related permissions (like "view-instance", "view-database", "view-table") it optionally defers to custom configuration blocks.
    • For actions like executing SQL, additional fallback rules are applied. For example, if the Datasette configuration disallows arbitrary SQL execution (the "default_allow_sql" setting is off), the permission check returns False.
  2. Custom “Allow” Blocks in Metadata/Configuration:
    Datasette allows administrators to further refine permission decisions by configuring allow blocks in the metadata (typically in datasette.json or YAML configuration). These blocks can be applied at different levels:

    • Global / Instance Level: An "allow" block provided at the top level of configuration governs "view-instance" for example.
    • Database Level: Under a specific database’s configuration (databases.<db_name>.allow), you can restrict or permit viewing the database.
    • Table or Resource Level: Within the configuration for a table (databases.<db_name>.tables.<table_name>.allow) or a canned query (databases.<db_name>.queries.<query_name>.allow), you can control access on a more granular level.

    When a user performs an action, Datasette examines the configuration hierarchy from top-level to database and then to table/query level. The helper function (see _resolve_config_view_permissions and _resolve_config_permissions_blocks in the default permissions module) uses the helper function actor_matches_allow to compare the actor’s attributes with the allow rules defined in the configuration.

  3. Actor Restrictions from Token Authentication:
    In addition to configuration-based rules, Datasette supports issuing signed API tokens. Tokens may carry restrictions in a structured form under the _r key. The hook permission_allowed_actor_restrictions inspects the token’s embedded restrictions.

    • If an actor’s token has an _r dict defining specific allowed actions for all databases, a particular database, or even individual table resources, then during a permission check Datasette calls the helper restrictions_allow_action to decide if the requested action is permitted.
    • Notice that these token-based restrictions are restrictive—if the actor’s token prohibits a particular permission, the check returns False.
  4. Gathering Opinions:
    Datasette consults every registered plugin for its input on a permission check. The results (opinions) are collected as a list. The logic then applies the following rules:

    • Veto Rule: If at least one opinion is False, the final result is denied.
    • Grant Rule: Otherwise, if at least one opinion is True, then the permission is granted.
    • Default Fallback: If no plugin provides any opinion (i.e. all returned None), then the system falls back on the default provided by the corresponding Permission object (the default flag).

Checking Permissions in Application Code

Inside Datasette’s request handling code (for example, in datasette/app.py and various view modules), permission checks are performed like this:

allowed = await datasette.permission_allowed(actor=request.actor, action="view-database", resource=database)
if not allowed:
    raise Forbidden("You do not have permission to view this database")

Additionally, helper functions such as ensure_permissions() may be used when multiple permission checks must be satisfied; this function iterates over each required permission and raises a Forbidden error if any check fails.


Developer Hooks for Permissions

register_permissions

Plugin authors can implement the register_permissions hook to contribute new permission objects. For example:

from datasette import hookimpl, Permission

@hookimpl
def register_permissions(datasette):
    return [
        Permission(
            name="custom-action",
            abbr="ca",
            description="Allow custom action",
            takes_database=True,
            takes_resource=True,
            default=False,
        )
    ]

Each permission object contributed here will be merged with core permissions, and subsequently used in permission checks.

permission_allowed

Plugins may also implement the permission_allowed hook to control the logic that determines whether an actor has a particular permission. The default permissions system is implemented as a hook (with tryfirst=True) to ensure that core behavior is applied before any other opinions. Custom implementations can be provided to override or extend default behavior.

For example, a plugin might include:

from datasette import hookimpl

@hookimpl
def permission_allowed(datasette, actor, action, resource):
    # Custom logic: deny a specific action for all actors except one.
    if action == "delete-row" and (actor is None or actor.get("id") != "admin"):
        return False
    # Otherwise, don't provide an opinion.
    return None

permission_allowed_actor_restrictions

This hook (implemented in the core default permissions module) examines an actor’s signed token for restrictions. It ensures that if a token has a restrictions block (under the _r key), then every permission check considers those restrictions, and if the restrictions do not explicitly allow an action the check returns False.


Authentication and Tokens

Datasette’s permissions system works in tandem with its actor authentication mechanism. The actor is typically stored in a signed cookie (or provided via an Authorization header when using an API token). The token creation code (see datasette/app.py and the CLI command in datasette/cli.py) uses the create_token() method of the Datasette class.

When creating a token, you may pass in parameters to restrict the token’s use:

  • restrict_all:
    A list of actions that are allowed (or restricted) globally.

  • restrict_database:
    A dictionary mapping database names to lists of actions.

  • restrict_resource:
    A dictionary mapping database names to, then, resource names (tables, queries) and lists of actions.

When a token is used to authenticate a request, the hook actor_from_request in datasette/default_permissions.py (or in related actor authentication modules) unsigns the token and extracts the actor information. Later, during permission checks, the permission_allowed_actor_restrictions hook compares these restrictions with the permissions being requested.


Configuration-Based Customization

Datasette allows administrators to customize permission behavior via metadata configuration. In the configuration file (commonly datasette.json or YAML), you can specify:

  • Top-Level Allow Blocks:
    Using an "allow" key at the top level to control instance-wide permissions (e.g. "allow": {"view-instance": "*"}).

  • Database-Level Permissions:
    Under databases.<db_name>, you can specify "allow" (or "allow_sql") to restrict queries or viewing on a per-database basis.

  • Table-Level Permissions:
    Under databases.<db_name>.tables.<table>, you can provide an "allow" block that limits actions on that table.
    For example:

    {
      "databases": {
        "sample.db": {
          "tables": {
            "secrets": {
              "allow": {
                "view-table": false
              }
            }
          }
        }
      }
    }

Such configuration blocks override the default permission values when the system uses the helper function actor_matches_allow() during permission checks.

Additionally, there is a migration mechanism in Datasette (see datasette/utils/internal_db.py and datasette/utils/resolve_env_secrets) that can move certain configuration keys (for example, plugin configurations or table configuration options) from metadata to a persistent internal database. This allows administrators to dynamically change permissions without restarting Datasette.


Debugging and Administration

Datasette includes several debug views which give administrators insight into the permissions system. For example:

  • Permissions Debug View (-/permissions):
    Displays a history of recent permission checks including:

    • The actor
    • Action being checked
    • Resource (if applicable)
    • Whether the default permission value was used
    • The final result (allowed/denied)
  • Allow Debug View (-/allow-debug):
    Offers a playground where you can input an actor’s JSON data and a custom "allow" block to see whether the actor would be permitted a given action.

Reading these debug pages can help administrators and plugin authors understand how permissions decisions are being derived in practice.


Summary

In summary, Datasette’s permissions system:

  • Is built around Permission objects which describe what each permission means.
  • Uses a chain-of-responsibility pattern via Pluggy hooks (such as permission_allowed and permission_allowed_actor_restrictions) to gather opinions on a permission check.
  • Respects configuration-based allow blocks supplied by administrators at instance, database, and resource levels.
  • Supports token-based authentication where tokens can impose additional restrictions.
  • Defaults to the preconfigured value if no explicit plugin opinion is provided.

This flexible design allows Datasette to be secured out-of-the-box while also being highly customizable by administrators and plugin authors.


For more details, refer to the Datasette source files (such as datasette/default_permissions.py and datasette/permissions.py) where the implementation of permission registration and checks is provided.

@simonw
Copy link
Author

simonw commented Feb 5, 2025

I created this using:

files-to-prompt datasette -e py -c | llm -m o3-mini -s 'write extensive documentation for how the permissions system works, as markdown'

@simonw
Copy link
Author

simonw commented Feb 5, 2025

I said:

llm -c 'Suggest improvements to the design of this. I am thinking about if defaults are a good idea or if I should default to False for everything instead.'

Below is a set of suggestions and considerations for improving the design of the permissions system—with a focus on whether the default outcome should be approving (True) or denying (False) permissions.


1. Rethinking Default Behaviors

a. Following the Principle of Least Privilege

By default, it is generally safer to assume “deny” unless there is an explicit permission granted. Currently, many of the core permissions (such as "view-instance", "view-database", "view-table", or even "execute-sql") default to True if no explicit opinion is provided. However, with least privilege in mind, you may want to require that every action be explicitly allowed. This would reduce the risk of accidentally exposing a database or table to unauthorized users.

Suggestion:
• Consider having a configuration option (or a build-time flag) that switches the overall default policy. With this switch, one could choose either “deny by default” (permission defaults to False) or “allow by default” in a trusted environment.
• If moving toward a more secure posture by default (denying by default), then ensure that the built-in permissions are audited so that admins are forced to opt into allowing each action—perhaps by editing a “permissions.json” file or through an administrative UI.

b. Explicit Is Better Than Implicit

Because defaulting to True may lead to surprise privileges being granted when no configuration is provided, a more explicit approach could be more maintainable. For example, instead of relying on the “default” attribute on each Permission object, you might require that every permission check has an explicit rule from one of the following sources:

• A permission is granted only if a configuration block (or actor’s token) explicitly allows it.
• The absence of any explicit allow or deny leads to a secure default (False).

If you decide to default everything to False, many of the legacy views (or plugin-provided actions) will need to be updated or “opted in” to. This may be disruptive unless a migration strategy is provided.


2. Policy Configurability

a. Administrator-Provided Override

Consider adding a top-level setting (e.g. default_permissions_policy) that globally controls the default outcome of permission checks. When set to a policy such as "deny by default," then even if no plugin opinion is provided, the system would return False—unless a particular cache or allow rule explicitly overrides it.

b. Granular Defaults Per Action

Not every action carries the same risk. For example, a read-only SQL query might be fine by default, but writing (inserting, updating, or deleting rows) might require explicit opt-in. You could provide a more granular "default" that changes based on the type of action. Develop a tiered default schema:

Viewing operations may default to False in a less trusted environment (or True in trusted inner networks).
Write operations always deny by default almost everywhere.

This modification gives you a chance to audit pros and cons for each type of permission while keeping the code self‐documenting.


3. Transparency and Auditability

a. Logging and Audit Trails

Even if you stick with defaults (either True or False), it is vital to track which permissions were used and why a particular decision was made. Datasette’s current implementation pushes permission check details into a history (e.g. _permission_checks), but you might want to make this audit log richer (or even externalizable) so that administrators can review whether the default “allow” decisions were acceptable or whether they led to security holes.

b. Developer and Administrator Feedback Tools

Your current debug views (permissions_debug, allow_debug) are excellent; consider adding a “simulation mode” where an administrator can simulate a set of actions with varying defaults (or see a “preview” of the effect of different default settings). A simple tool that shows, for each permission, what the final decision would be with different default strategies could help in transition if you decide to switch defaults later.


4. Isolation Between Core and Plugin Logic

a. “Pluggy-First” versus “Permissions-First”

In your current design, the default permission implementation returns opinions (e.g. for privileged "root" users) and also uses configuration “allow” blocks as overrides. One improvement would be to separate the core (“hard-coded”) rules from the plugin-provided opinions and then combine them with a more robust decision routine. For example:

  1. Core Hard Rules:
    Some actions—like writing to an immutable database—should be checked in a straightforward way that always fails even if no configuration exists.

  2. Policy Aggregation:
    Then, you sum up any additional rules coming from configuration or tokens (e.g. via actor_matches_allow) and run the veto or grant rules.

This layered approach makes it easier to change the defaults for specific actions, as you can change the core layer without touching the plugin hook machinery.


5. Developer Experience and Migration

a. Documentation and Warnings

Changing the default from True to False for many actions could disrupt existing deployments. Provide clear warnings (both in the code and in the documentation) as a migration path. Allow system administrators to explicitly “enable” dependent permissions if they wish.

b. Transition via Token or Environment Flag

You might keep the current defaults for the majority of users but offer an experimental flag (for example, --strict-permissions) that when enabled causes all defaults to become False. This gives system administrators time for testing before making a permanent switch.


Conclusion

Defaulting permission checks to True everywhere may be convenient from a usability standpoint in a trusted environment, but it increases the risk of unintentional data exposure. Defaulting to False (i.e. denying by default) aligns better with the least privilege principle, forces explicit permission configuration, and enhances overall security. However, such a change must be carefully rolled out with migration tools, richer auditing, and clear configuration options so that administrators can deliberately balance usability and security.

Implementing a policy switch or per-action defaults, along with improved logging, will give developers and administrators more control and clarity over the behavior of the permissions system.


By carefully designing these improvements, you can both harden Datasette’s security and provide a clear path for users to tailor the experience to their environment without resorting to “magic” defaults that may not be obvious during deployment.

@oneryalcin
Copy link

Great insights @simonw! many thanks for this.

Just a qq, forgive my ignorance but what is files-to-prompt is it just a command line alias kinf of find . -maxdepth 4 -type f -name "*.py" -exec sh -c 'echo "--- File: {} ---"; cat "{}"' sh {} \; or is it part of datasette?

@KranthiGV
Copy link

@oneryalcin

It's the CLI tool at https://github.com/simonw/files-to-prompt
Looks like it can format using xml (which Simon used here) and supports other simple patterns for excluding files.

@oneryalcin
Copy link

ah many thanks @KranthiGV super useful, I've been only using unix tools so far but good to have some ergonomics.

@aculich
Copy link

aculich commented Feb 5, 2025

I don't yet have access to o3-mini on my API tier since it's not generally available yet, but trying the same using 4o-mini produces a useful table by default that o3-mini doesn't generate, but would be helpful to include:

files-to-prompt datasette -e py -c | \
  llm -m 4o-mini -s 'write extensive documentation for how the permissions system works, as markdown'

--

Datasette Permissions System Documentation

Overview

The Datasette permissions system is designed to control user access to resources and actions within a Datasette application. The system allows for granular permission management, defining permissions for databases, tables, and individual queries. This documentation details how permissions are structured, how they can be configured, and how they are enforced in the API.

Permissions Structure

Permission Class

The core of the permission system is the Permission class, which encapsulates the attributes and behavior of a permission. The attributes defined in this class include:

  • name: The unique name of the permission.
  • abbr: An optional abbreviation for the permission (shortened reference).
  • description: A description of what the permission allows.
  • takes_database: A boolean indicating if the permission relates to database-level actions.
  • takes_resource: A boolean indicating if the permission relates to resource-level actions (like tables).
  • default: A boolean indicating the default state of the permission (enabled/disabled).
  • implies_can_view: A reserved flag for internal use that indicates whether granting this permission allows other view actions.

Registering Permissions

Permissions are registered using the register_permissions hook, which allows plugins to define custom permissions. The registered permissions can then be queried and validated based on actor privileges.

Available Permissions

Here is a list of default permissions that are typically included in the Datasette application:

Permission Name Abbreviation Description Takes Database Takes Resource Default
view-instance vi View Datasette instance No No Yes
view-database vd View database Yes No Yes
view-database-download vdd Download database file Yes No Yes
view-table vt View table Yes Yes Yes
view-query vq View named query results Yes Yes Yes
execute-sql es Execute read-only SQL queries Yes No Yes
permissions-debug pd Access permission debug tool No No No
debug-menu dm View debug menu items No No No
insert-row ir Insert rows Yes Yes No
delete-row dr Delete rows Yes Yes No
update-row ur Update rows Yes Yes No
create-table ct Create tables Yes No No
alter-table at Alter tables Yes Yes No
drop-table dt Drop tables Yes Yes No

How Permissions Are Checked

The permission_allowed hook evaluates whether an actor has the specified permission for a given resource. This includes checks against the actor's roles, the requested action, and the resource affected.

Actor Restrictions

In addition to the basic permission checks, the system allows for restrictions based on the actor's identity. For instance, special rules can be defined for the "root" user (typically an admin or superuser), granting them additional permissions.

Using Permissions

Setting Up Permissions

To effectively use permissions in a Datasette application, the following steps should be followed:

  1. Define Permissions: Use the register_permissions hook in your plugin to define custom permissions that fit your application needs.

  2. Assign Permissions to Actors: Determine how permissions will be granted to different actors (users) in your application. Implement logic within your application to check these permissions based on the actor’s role.

  3. Validate Permissions: Use the permission_allowed function to check permissions dynamically, based on the actions taken by actors.

Example

Here’s a hypothetical example of checking if a user can view a specific database:

async def some_function(datasette, actor, database_name):
    can_view = await datasette.permission_allowed(
        actor, "view-database", resource=database_name)

    if not can_view:
        raise Forbidden("You do not have access to view this database.")

Customizing and Airing Permissions

Using Hook Implementations

Plugins can offer new ways to interact with the built-in permissions using hook implementations such as permission_allowed, allowing you to implement custom logic for your permissions.

Example Hook Implementation

Here’s an example of a custom hook:

@hookimpl(specname="permission_allowed")
async def custom_permission_allowed(datasette, actor, action, resource):
    if actor.get("role") == "admin":
        return True  # Admins have all permissions
    # Fallback to standard checking below

Conclusion

The Datasette permissions system is flexible and can be tailored to suit a wide variety of application requirements. By leveraging the structures and hooks defined in this documentation, you can set up a secure and effective access control system that governs who can do what within your Datasette application. Always ensure to test permissions and access under various scenarios to confirm that the intended access control is enforced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment