Skip to content

Instantly share code, notes, and snippets.

@dylanmcreynolds
Last active June 30, 2020 21:05
Show Gist options
  • Save dylanmcreynolds/b0a7e04723e70ced23454054ce25d27b to your computer and use it in GitHub Desktop.
Save dylanmcreynolds/b0a7e04723e70ced23454054ce25d27b to your computer and use it in GitHub Desktop.

Authorization Thoughts

Data acquired at BES light sources is being stored in a variety of locations with a variety of policies touching on retention, storage, access and requirements to make data public.

More facilities are implementing instances of Bluesky Data Broker, which serves as a data access framework for raw beamline data. Based on up-coming work, analyzed data will also be stored in Data Broker, with an association stored that links analyzed data sets with their raw collected sets. The data is stored in several formats, from mongodb to data files outside of mongodb.

Data Broker provides a framework that eases the access of data and metadata for beamline experiments. Bluesky can store beamline data directly in mongodb, and can reference external files collected by detectors, providing users with a unified view of their data with an API that can support multiple analysis tools. It does not, however, provide any built-in enforcement of access controls. One could attempt to control access directly through mongo, but that is not a robust solution.

Many potential use cases exist for accessing data stored in Data Broker. Collaboration teams may want to share data among themselves. These teams can be formed at many levels, from members of listed on an experiment safety form, to members of a larger collaboration team. Eventually, users may want to share data publicly. Additionally, future AI/ML will want to use Data Broker as a way to provide access to data sets. All of these use cases involve policy decisions that require policy enforcement software that needs to be developed. Software tools that access Data Broker (like the Splash web portal or AI framework under planning) needs well thought-out access policy enforcement framework. Such a framework should be carefully thought out and designed.

I have been thinking about this a bit as I develop the Splash web portal. Its purpose is allow users to investigate data that they took themselves and within collaboration teams. I would like to design an authorization mechanism that is both flexible enough to cover complicated use cases but does not require complicated maintenance.

I do not want to reinvent the wheel, I outline some of the common authorization design patterns: Access Control Lists, Role Based Access Control and Attribute Based Access Control. I look at some of these and try to imagine how they might be implemented for the an access control system that can support a variety of these use cases.

Conventions

This document makes use of several types of Unified Modeling Language (UML) diagrams (specifically, Use Case, Class and Activity diagrams). I have tried to keep them pretty simple, and they are by no means an attempt to create an exhaustive design of a system.

Definitions

  • Authentication - the act of validating a user's identity
  • Authorization - the act of deciding whether or not a user can perform an action in the system
  • Policy - a set of rules that can be enforced. This document uses the term both in an abstract term and as a software abstraction in the ABAC section.

Use Cases

I list a number of use cases that an authorization system must support. For the sake of brevity, let's just leave it as an image for now without going into detail on the use case. In this diagram, ovals are use cases, boxes are a random organizational tool. Actors are shown and the lines depict those use cases that the actor is involved in.

usecases

Top Level Classes

We're going to be talking about some of the parts or our system a lot, so let's show another picture that describes what we're thinking of. We have users who can be members of a Team. We also have special Investigators who are PIs. A Team is aggregation of Investigators and PIs. A Proposal can have a team, and is an aggregation of multiple Samples and Scans.

Top Level

Access Control Design Patterns

What follows is by not an exhaustive list of patterns for managing access controls. We include some of the more common design patterns with known implementations. A much more exhaustive information is available on the web for each. We limit the descriptions here for brevity.

Access Control List (ACL)

ACL is a very common pattern in computing. It is used extensively to control file control permissions for multiple types of systems as well as directory services like LDAP and Active Directory.

ACL Classes

A particular Resource is associated with a list of Permissions, and each permission is associated with one or more Users or Groups of users.

Permissions Resource Role Applies
AddMember, DeactivateUser, ReactivateUser, ViewMembers,
ChangePermissions Project
ViewRawData
ViewAnalyzedData
StoreAnalyzedData Project, DataSet

ACL Access Decision for Data Set

Role Based Access Control (RBAC)

In RBAC, a Role represents a set of permissions that can be applied to a resource. A single user can occupy multiple roles for the same resource as well as for multiple resources. What is a role, permission and resource is defined by the system developer.

RBAC Classes

Borrowing from wikipedia "RBAC differs from access control lists (ACLs), used in traditional discretionary access-control systems, in that RBAC systems assign permissions to specific operations with meaning in the organization, rather than to low-level data objects."

RBAC is well-suited for very large organizations with large large numbers permissions decisions. RBAC has many implementation, including MongoDB's access control for database objects.

RABC systems can add more flexibility and complexity to the model. For the purpose of this overview, we will keep it simple and not include these features: Users can be collected into Groups, and those Groups can be assigned Roles. Roles can contain Groups and Users at the same time. Roles can inherit from other Roles, adding additional permissions.

Example sets of Roles, Permissions and Resources that we might consider for this project might by:

Role Permissions Resource that Role Applies
PI AddMember, DeactivateUser, ReactivateUser, ViewMembers, ChangePermissions Project
Team Member ViewRawData, ViewAnalyzedData StoreAnalyzedData DataSet

RBAC Access Decision for Data Set

Attribute Based Access Control (ABAC)

With ACL, permission decisions are made by interrogating the user and potentially group membership, and a resource's access control lists. In RBAC, Roles, Permissions are also interrogated to make permission decisions. In ABAC, Policies can be set up that can potentially use attributes from many parts of the system. As such, Policies can be be set up with high degrees of flexibility

ABAC has seen adoption in a number of large enterprises, including within AWS Identity Management.

NIST provides guidance on ABAC implementation.

"Under ABAC, access decisions can change between requests simply by altering attribute values, without re-quiring changes to the subject/object relationships defining the underly-ing rule sets … Further, ABAC enables object owners or administrators to apply AC policy without prior knowledge of the specific subject and for an unlimited number of subjects that might require access. As new subjects join the organization, rules and objects need not be modified, and as long as the subject is assigned the attributes necessary for access to the required objects—for example, all Nurse Practitioners in the Cardiology Department are assigned those attri-butes—no modifications to existing rules or object attributes are required. This accommodation of the external (unanticipated) user is one of the pri-mary benefits of employing ABAC. " Vincent C. Hu, D. Richard Kuhn, and David F. Ferraiolo, NIST, 2015 Attribute-Based Access Control

ABAC Classes

A possible decision flow for access to a data set could look like:

ACL Access Decision for Data Set

Because of ABAC's flexibility, it seems that there could be added complication in figuring out who to build policies over RBAC.

XACML serves as a mature (and complex) standard for defining policies, resources and actions. It was originally expressed as XML, but has JSON implementations as well. There is at least one interesting python project for implementing ABAC: https://pypi.org/project/py-abac/, which is a python-based library that works with JSON and/or dictionaries.

An additional interesting feature of XACML is the concept of an Obligation. From wikipedia

"An obligation is a directive from the policy decision point (PDP) to the policy enforcement point (PEP) on what must be carried out before or after an access is approved."

I can imagine this being used in an ML training scenario, where access to a data set can be granted AS LONG AS the data can be anonymized.

Discussion

Because there are many potential software frameworks and many different deployment scenarios, I imagine the need for an authorization service that can make decisions with a wide variety of systems. One obviously interesting deployment scenario is the Intake Server. Intake server provides a plugin mechanism where a thin client could be written to make requests to an authorization service that implements the access policies configured for the system.

deployment

Much more thought needs to be put into designing an access control system, I think ABAC bears some research and prototyping. We certainly need to better define the scope of what we are trying to accomplish. Are we addressing controlling access to a single beamline's data, a facility's data or multiple facility's data? The scope of our effort certainly influences the design, but I think it's good to design for wider use than a single beamline. ABAC is attractive because of its flexibility, even if that comes at the potential cost of added complexity.

@dylanmcreynolds
Copy link
Author

dylanmcreynolds commented Jun 25, 2020

ABAC Access Decision for Data Set
ABAC Classes
ACL Access deployment Decision for Data Set
ACL Classes
Top Level
deployment
RBAC Access Decision for Data Set
RBAC Classes
usecases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment