The Kubernetes cluster uses etcd as its primary data store.
High level goals:
- Support ability to look-up an API object by something other than its key, e.g. uid
- Support ability to look-up a list of API objects that have a field match on a particular value, e.g. label, status, etc.
Actors:
- k8s admin - administers a kubernetes cluster
- k8s registry - provides
- k8s user - uses a kubernetes cluster to perform tasks
User stories:
- Ability to build an index over a set of API objects given a projection rule
- Ability to query index for API object keys that match on a search key
In k8s, these requirements are intended to be fulfilled by the registry tier, and storage implemetations are expecting optimal performance.
At this point, the primary storage location for k8s is etcd. Unlike traditional RDBMS systems or document-oriented data stores, etcd does not facilitate query patterns over resources with a filter constraint without requiring algorithms that follow linear scale.
As a result, we need a solution that the k8s etcd registry can leverage to improve system performance.
Keep in mind, if we chose to use an alternate data store that natively supported indexing, this solution would not be used. Basically, its just for etcd registry optimized queries. We anticipate SQL based stores would just work without additional local indexes.
Implementation strategy:
An individual Index manages a set of IndexRecord objects that correlate a particular Value to a Key in a data store.
// IndexRecord is the individual entity managed by an Index
type IndexRecord struct {
// Key is the location in the repository that correlates to this record
Key string
// Value is the value that is inserted into the index
Value string
}
An Indexer is responsible for projecting a object into a set of IndexRecord rows.
// An indexer is responsible for projecting a node into a set of IndexRecord objects
type Indexer interface {
// Identifier is the unique label that defines this indexer, used by IndexManager to avoid duplicate indexes being managed
Identifier() string
// Reduce projects a node into zero-or-more IndexRecord objects
Reduce(ob)
Reduce(object interface{}) ([]IndexRecord, error)
// TODO need method to get a IndexRecord.Key given an input object, what to do when etcd log resets??
}
An Index is used to traverse the set of IndexRecord objects that conform to a particular value. Index objects are live-updated in response to changes in the repository in the background.
type Index interface {
// Returns true if an IndexRecord exists with the specified key
Contains(key string) bool
// Returns true if an IndexRecord exists with the specified value
Contains(value string) bool
// ListIndexRecords returns a list of IndexRecord objects that conform to the specified value
ListIndexRecords(value string) []IndexRecord
// ListKeys returns a list of keys that match the specified value
ListKeys(value string) []string
}
An IndexManager is responsible for managing Index objects
type IndexMananager interface {
Index(location string, indexer Indexer) (*Index, error)
}
Example:
I want to index all resources by Uid
indexManager := ... index := indexManager.Index("/registry", indexer.NewUidIndexer())
I want to index policy by members
indexManager := ... indexer := indexer.NewMembersIndexer() // this would create an index record per unique member index := indexManager.Index("/registry/policy", indexer)