derekwaynecarr/projects.md

## projects.md

      
    Raw
  

              projects.md
            
          
    Kubernetes Proposal - Projects

Related PR:


Topic
Link


Access.md
kubernetes/kubernetes#891


Background

High level goals:

Enable an easy-to-use mechanism to provide a top-level scope of Kubernetes resources
Ensure it aligns with the access control proposal
Ensure its efficient out of the box and provides a pattern for upstream extensions

Use cases

Initial:

Ability for an administrator to create projects in a Kubernetes cluster.
Ability for an administrator delete projects in a Kubernetes cluster.
Ability to associate resources with a project.
Ability to tie clean-up of project-scoped resources to the project life-cycle.
Project has a DNS-compatible name to support compound naming conventions.
Project scoped resources are not transferrable to another project. i.e. no moving pods across projects

Improvements:

Ability to have a REST API to create and delete project objects.

Proposed Design

Model Changes

Introduce a top-level object called project that provides a scope to Kubernetes cluster resources.
// Project is a top-level object for Kubernetes resources used to provide scoping of resources.
type Project struct {
	JSONBase `json:",inline" yaml:",inline"`

	// ID is an immutable identifier provided by the system to identify this object
	ID string

	// Name is a DNS compatible name that provides a namespace to project-scoped resources
	Name string

	// This project's set of labels
	Labels map[string]string `json:"labels,omitempty" yaml:"labels,omitempty"`
}

A project name is unique.
The following resources are project-scoped.

policy
pod
controller
service

Each project-scoped resource has a name unique to its scope.
Each project-scoped resource is addressable via the following tuple: {project.name}/{resource.name}
Each resource has an immutable identifier, but resources are not necessary retrievable by their ID.  The identifier is intended to let you distinguish an object as unique from its address.  Meaning a resource with a specified name can be deleted and added back with the same name, but it would have a different id.
Kubernetes API Server

The server is responsible for enforcing CRUD operations for project scoped resources.

policy
pod
controller
service

As a result, the API server must be provided with a project scoping on each request.

Query parameters, e.g. ?project=
HTTP header, e.g. X_KUBERNETES_PROJECT=
Subdomain, e.g. http://myproject.kubernetes.example.com
HTTP Path, e.g. http://kubernetes.example.com/myproject/

Kubernetes Scheduler

The scheduler is responsible for fitting a pod to a host machine.

It requires a machine-to-pod mapping in order to function.
It does NOT need to know about a pods project in order to schedule.
If in the future, a project is scoped to a set of dedicated machines, its a post-filter operation against machine-to-pod mapping.

Changes required:
Today, the scheduler builds the machine to pod mapping by enumerating all pods, and iterating their information to build an in-memory machine-to-pod mapping on each scheduling request.  This will not scale at large numbers of pods.  A more efficient registry / cache is needed to get this information by querying the etcd /registry/hosts storage that has this exact information already.
Persistent Etcd storage

Option 1: Scope API Server resources by project, Kubelet/Scheduler resources by host


Key
Description


/registry/projects/{project}
Holds information about the {project} resource


/registry/projects/{project}/policies/{policy}
Holds information about the {policy} resource in {project}


/registry/projects/{project}/services/{service}
Holds information about the {service} resource in {project}


/registry/projects/{project}/controllers/{controller}
Holds information about the {controller} resource in {project}


/registry/projects/{project}/pods/{pod}
Holds information about the {pod} resource in {project}


/registry/hosts/{host}
Holds information about the pods scheduled on {host}


Pros:

No need for a secondary index to filter resource type by project
Logical scoping of project managed resources reflected in etcd hierarchy
Simpler cleanup of project-scoped resources [recursive delete]
Plugin non-core resources (builds, deployments, etc.) have clear storage path and model to scope to project.

Cons:

Need secondary index to list any resource independent of project

Option 2: Apply a project label on all project-scoped resources


Key
Description


/registry/projects/{project}
Holds information about the {project} resource


/registry/policies/{policy}
Holds information about the {policy} resource


/registry/services/{service}
Holds information about the {service} resource


/registry/controllers/{controller}
Holds information about the {controller} resource


/registry/pods/{pod}
Holds information about the {pod} resource


/registry/hosts/{host}
Holds information about the pods scheduled on {host}


Pros:

Limited disruption over current model
Easy to query resources independent of project scope

Cons:

Enforcing project scoped ACL requires a secondary index to avoid an expensive post-filter where cluster has many projects.
It is not logically clear how resources are scoped in storage.
Harder to clean-up resources with a project.

Option 3: Project scope each resource type


Key
Description


/registry/projects/{project}
Holds information about the {project} resource


/registry/services/{project}/{service}
Holds information about the {service} resource in {project}


/registry/controllers/{project}/{controller}
Holds information about the {controller} resource in {project}


/registry/pods/{project}/{pod}
Holds information about the {pod} resource in {project}


/registry/hosts/{host}
Holds information about the pods scheduled on {host}


Pros:

No need for a secondary index to filter resource type by project
Do not need secondary index to list any resource independent of project
Simpler cleanup of project-scoped resources [recursive delete]

Cons:

No ability to recursively roll-up all resources for a given project in single call
Clean-up of project scoped resources is more complicated.

Preferred Option:

Option 1 is preferred.

Data is scoped logically to ACL model.
Does not require a secondary index to implement ACL model efficiently.
Simpler to enumerate resources scoped to a project.

Kubecfg client

How to specify a project on kubecfg requests?

If no project is specified, then the default project associated with the user is leveraged.
A user may specify a project via a new option:
$ kubecfg -project="" list pods

In addition, this proposal proposes two new operations to kubecfg

setProject (name) - sets the default project for the client to specify on operations
unsetProject - removes the default project for the client to specify on operations

The kubecfg client would store default project information in the same manner it caches authentication information today.
How do I list projects for a user?

In order to enforce project to policy constraints, listing of projects for a user will require a secondary index to scale
for large numbers of projects in Kubernetes cluster.
A default implementation of this index could be authored in upstream that performs a post-filter for those cases where number of projects is small.
$ kubecfg list /projects

Open Issues

Given a project scoped resource, how can I find its associated project?

Option 1:
A pod, replication-controller, service type is augmented to have a new immutable field to represent the project assigned at creation time.
Option 2:
Use an immutable label,  "project={project}", that is fixed on the project scoped resource at creation time.
How is uniquely naming a resource within a project-scope constraint enforced?

The etcd key-path is the unique name.
How do we pass a call-context (project id, user-account, etc.) through Kubernetes call-stack?

Propose a new Context object is defined that is the first argument on all internal k8s operations by convention.
Should the unique identifier for a resource encode the project information?

This gets back to a core issue on can I look-up a resource by its address {project.name}/{resource.name} and by its id?
The challenge with id look-up is efficient retrieval of the resource from persistent storage.
Paging on projects?

It's anticipated that Kubernetes deployments may run with a few projects, or thousands of projects.  As a result,
it's anticipated that some amount of indexing will need to occur to make project->policy enforced look-up efficient when
applying ACL.
Deleting a project, when are related resources removed?
Key	Description
/registry/projects/{project}	Holds information about the {project} resource
/registry/projects/{project}/policies/{policy}	Holds information about the {policy} resource in {project}
/registry/projects/{project}/services/{service}	Holds information about the {service} resource in {project}
/registry/projects/{project}/controllers/{controller}	Holds information about the {controller} resource in {project}
/registry/projects/{project}/pods/{pod}	Holds information about the {pod} resource in {project}
/registry/hosts/{host}	Holds information about the pods scheduled on {host}
Key	Description
/registry/projects/{project}	Holds information about the {project} resource
/registry/policies/{policy}	Holds information about the {policy} resource
/registry/services/{service}	Holds information about the {service} resource
/registry/controllers/{controller}	Holds information about the {controller} resource
/registry/pods/{pod}	Holds information about the {pod} resource
/registry/hosts/{host}	Holds information about the pods scheduled on {host}
Key	Description
/registry/projects/{project}	Holds information about the {project} resource
/registry/services/{project}/{service}	Holds information about the {service} resource in {project}
/registry/controllers/{project}/{controller}	Holds information about the {controller} resource in {project}
/registry/pods/{project}/{pod}	Holds information about the {pod} resource in {project}
/registry/hosts/{host}	Holds information about the pods scheduled on {host}