Skip to content

Instantly share code, notes, and snippets.

@dummy-andra
Forked from bastjan/braindump.jira
Created April 14, 2022 08:26
Show Gist options
  • Save dummy-andra/83d0d5a7e9c603492e2f64f966216439 to your computer and use it in GitHub Desktop.
Save dummy-andra/83d0d5a7e9c603492e2f64f966216439 to your computer and use it in GitHub Desktop.
openshift logging elasticsearch tenant log size retention braindump
h2. Summary
*As* a VSHN customer user
*I want* to search and visualise my logs
*So that* I can check my application health and be aided with debugging it.-
h2. Context
APPUiO Public has 2.4TB Log Volumes for ~10 days retention.
h2. Alternatives
h3. Elastic Cloud on Kubernetes
I did not found much information on retention.
h3. graylog
Uses ES under the hood.
h3. Loki
Loki has a great GUI (Grafana Explore View) close to Kibana.
Loki has [time based retention per tenant and stream|https://grafana.com/docs/loki/latest/operations/storage/retention/]. It supports [flexible tenant usage e.g. customer label|https://grafana.com/docs/loki/latest/clients/promtail/stages/tenant/].
Loki has the [APIs|https://grafana.com/docs/loki/latest/operations/storage/logs-deletion/] to implement size based retention with a custom manager, we created a prototype of this in my last Company.
h2. Notes
Kibana/ES [Access Control|https://github.com/openshift/elasticsearch-operator/blob/master/docs/access-control.md]. Roles [here|https://github.com/openshift/origin-aggregated-logging/tree/master/elasticsearch/sgconfig].
{{app}} index is pretty [hard-coded|https://github.com/openshift/origin-aggregated-logging/blob/558bd6478bbb37eb17c06f6fa145d6266f8a5c68/elasticsearch/sgconfig/roles.yml#L143].
Size plugin is not installed.
Calculate size from query https://www.elastic.co/guide/en/elasticsearch/plugins/7.16/mapper-size-usage.html
{code}
PUT my-index-000002
{
"mappings": {
"_doc": {
"_size": {
"enabled": true
}
}
}
}
PUT /my-index-000002/_doc/1
{
"text": "This is a document"
}
PUT /my-index-000002/_doc/2
{
"text": "This is another document"
}
GET my-index-000002/_search
{
"query": {
"range": {
"_size": {
"gt": 10
}
}
},
"aggs": {
"sizes": {
"terms": {
"field": "_size",
"size": 10
}
}
},
"sort": [
{
"_size": {
"order": "desc"
}
}
],
"script_fields": {
"size": {
"script": "doc['_size']"
}
}
}
{code}
OCP has a function to prune namespace: https://github.com/openshift/elasticsearch-operator/blob/56070e8e0a2154806e14a779c9f954cb0cfdeeae/internal/indexmanagement/scripts.go#L54 this could be extended for custom retention policies.
* With per tenant index it would be possible to limit storage size indirectly:
* * Rollover every nth gigabyte https://www.elastic.co/guide/en/elasticsearch/client/curator/current/ex_rollover.html
* * [Delete indexes|https://www.elastic.co/guide/en/elasticsearch/client/curator/current/ex_delete_indices.html] with [count filter|https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_count.html].
* [Curator has been removed with no replacement|https://docs.openshift.com/container-platform/4.9/logging/cluster-logging-release-notes.html#openshift-logging-5-1-0-elasticsearch-curator]
* It is not possible to limit indexes to size with the logging operator https://github.com/openshift/cluster-logging-operator/blob/97838b2e64b34711c1a17606637b54e9f91ac553/vendor/github.com/openshift/elasticsearch-operator/apis/logging/v1/index_management_types.go#L51
* Log ingestion in bytes by namespace
{code}
sum (
label_replace(increase(log_collected_bytes_total[24h]), "znamespace", "$1", "path", ".*_(.*)_.*")
)by(znamespace)
{code}
* Newer ES have an option to check index sizes: https://www.elastic.co/guide/en/elasticsearch/reference/7.16/indices-disk-usage.html it dows not work in our ES https://kibana-openshift-logging.apps.lpg1.ocp4-poc.appuio-beta.ch/app/kibana#/dev_tools/console?load_from=https:%2F%2Fwww.elastic.co%2Fguide%2Fen%2Felasticsearch%2Freference%2F7.16%2Fsnippets%2F2108.console&_g=()
h2. Out of Scope
* List aspects that are not part of this story
h2. Further links
* URLs of relevant Git repositories, PRs, other starting points.
h2. Acceptance criteria
* Elasticsearch logging is enabled
* RBAC rules so customers can view their logs
* Announce in [discuss.vshn.net|https://discuss.vshn.net]
h2. Implementation Ideas
1. Install size-mapper plugin, [custom index rollover, deletion scripts with sizemapper query|https://github.com/openshift/elasticsearch-operator/blob/56070e8e0a2154806e14a779c9f954cb0cfdeeae/internal/indexmanagement/scripts.go#L54].
2. Patch fluentd deployment to produce one index per customer, custom index rollover, deletion scripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment