RogerTangos/dhwriteup.md

## dhwriteup.md

      
    Raw
  

              dhwriteup.md
            
          
    Hey Will and Others,
Here's a brief summary of what's going on in different parts of DataHub:
The command pipeline:

All calls to the database are routed through what I'm calling the "command pipeline."
Calls are made to either manager.py or rlsmanager.py (that's row level security
manager), which instantiate connection.py.connection.py calls pg.py, which is
where the postgres-specific sql commands are stored.
A datahub application -> manager.py/rlsmanager.py -> connection.py -> pg.py
DataHub is architected in this way, so that it can support other databases with
minimal effort. For BT, this means writing a new file, oracle.py, and pointing
connection.py at it, instead.

manager.py
rlsmanager.py [note: this is in an unmerged branch]
connection.py
pg.py

Account Management & Repo Collaboration

Users in DataHub are granted postgres roles and a database.
In the DataHub codebase, it's helpful to define some terms.

repo == postgres schema
repo_base == postgres database (which is owned by a user of the same name)
web application username == postgres role

User b grants access to user a on schema s.
User a connects to user b's repo_base, and only schema s is visible.
The GRANT command happens in postgres, and is also stored in a web application
table, so that the web app knows to make schema s visible to user a.
In BT's model, it sounds like users all get schemas, but are working in a large
shared database. Some DH checks for permission are done in manager.py, and check
to see that repo_base == username, so integrating with BT would necessitate a
partial rewrite of manager.py
Public Repos and Unauthenticated Users

There are four models for exposing data to all users/unauthenticated users

Users "publish" their entire repo to the internet. In this case, unauthenticated
users can select from this repo data through DataHub.
Users "publish" cards to the internet. In this case, a user authorizes
unauthenticated users to execute a pre-defined query. This does not support passing parameters
Users allow the "ALL" user to operate on rows in their table through row level security
(not merged, but I believe that this supports unauthenticated users)
Users grant repo access to another specified user (see above)

In the first case, the user is actually granting SELECT access to a special dh_public role in the postgres. All users in datahub are part of the dh_public role. Unauthenticated users are logged on using a dh_anonymous role, which is also part of the dh_public role, and so they also have access to repos that dh_public can select from.
Aside from the UX, there's nothing preventing users from also granting INSERT and UPDATE to dh_public. This would allow unauthenticated people to manipulate their data, much like google docs.