Skip to content

Instantly share code, notes, and snippets.

@derekwaynecarr
Last active August 29, 2015 14:07
Show Gist options
  • Save derekwaynecarr/f51da4ff763a2cea3db5 to your computer and use it in GitHub Desktop.
Save derekwaynecarr/f51da4ff763a2cea3db5 to your computer and use it in GitHub Desktop.
Indexing proposal

Kubernetes Proposal - Indexing

Background

The Kubernetes cluster uses etcd as its primary data store.

High level goals:

  • Support ability to look-up an API object by something other than its key, e.g. uid
  • Support ability to look-up a list of API objects that have a field match on a particular value, e.g. label, status, etc.

Use cases

Actors:

  1. k8s admin - administers a kubernetes cluster
  2. k8s registry - provides
  3. k8s user - uses a kubernetes cluster to perform tasks

User stories:

  1. Ability to build an index over a set of API objects given a projection rule
  2. Ability to query index for API object keys that match on a search key

Proposed Design

In k8s, these requirements are intended to be fulfilled by the registry tier, and storage implemetations are expecting optimal performance.

At this point, the primary storage location for k8s is etcd. Unlike traditional RDBMS systems or document-oriented data stores, etcd does not facilitate query patterns over resources with a filter constraint without requiring algorithms that follow linear scale.

As a result, we need a solution that the k8s etcd registry can leverage to improve system performance.

Keep in mind, if we chose to use an alternate data store that natively supported indexing, this solution would not be used. Basically, its just for etcd registry optimized queries. We anticipate SQL based stores would just work without additional local indexes.

Implementation strategy:

An individual Index manages a set of IndexRecord objects that correlate a particular Value to a Key in a data store.

// IndexRecord is the individual entity managed by an Index
type IndexRecord struct {
  // Key is the location in the repository that correlates to this record
  Key string
  // Value is the value that is inserted into the index
  Value string
}

An Indexer is responsible for projecting a object into a set of IndexRecord rows.

// An indexer is responsible for projecting a node into a set of IndexRecord objects
type Indexer interface {
  // Identifier is the unique label that defines this indexer, used by IndexManager to avoid duplicate indexes being managed
  Identifier() string
  // Reduce projects a node into zero-or-more IndexRecord objects
  Reduce(ob)
  Reduce(object interface{}) ([]IndexRecord, error)
  // TODO need method to get a IndexRecord.Key given an input object, what to do when etcd log resets??
}

An Index is used to traverse the set of IndexRecord objects that conform to a particular value. Index objects are live-updated in response to changes in the repository in the background.

type Index interface {
  // Returns true if an IndexRecord exists with the specified key
  Contains(key string) bool
  // Returns true if an IndexRecord exists with the specified value
  Contains(value string) bool
  // ListIndexRecords returns a list of IndexRecord objects that conform to the specified value
  ListIndexRecords(value string) []IndexRecord
  // ListKeys returns a list of keys that match the specified value
  ListKeys(value string) []string
}

An IndexManager is responsible for managing Index objects

type IndexMananager interface {
  Index(location string, indexer Indexer) (*Index, error)   
}

Example:

I want to index all resources by Uid

indexManager := ... index := indexManager.Index("/registry", indexer.NewUidIndexer())

I want to index policy by members

indexManager := ... indexer := indexer.NewMembersIndexer() // this would create an index record per unique member index := indexManager.Index("/registry/policy", indexer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment