Skip to content

Instantly share code, notes, and snippets.

@velavokr
Last active June 17, 2024 07:20
Show Gist options
  • Save velavokr/e8410555385db7bc9b1a1c184fe99b72 to your computer and use it in GitHub Desktop.
Save velavokr/e8410555385db7bc9b1a1c184fe99b72 to your computer and use it in GitHub Desktop.
Thanos and prometheus helm configuration

This is the configuration of thanos and kube-prometheus-stack on which the thanos-io/thanos#7466 happens.

---
name: prom-stack
version: 0.5.0
apiVersion: v2
home: https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html
sources:
- https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
- https://github.com/prometheus-operator/prometheus-operator
- https://github.com/kubernetes-monitoring/kubernetes-mixin
- https://github.com/grafana/grafana
dependencies:
- name: kube-prometheus-stack
version: 57.2.0
repository: https://prometheus-community.github.io/helm-charts/
---
# all default values https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml
global:
env: ""
cluster_type: eks
kube-prometheus-stack:
crds:
enabled: true
fullnameOverride: prom-stack
## Component scraping coreDns. Use either this or kubeDns
# Note: We install and monitor coreDNS separately.
coreDns:
enabled: false
## Corrected default to suite our cluster's components.
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
## Design
# https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md
## Schema
# https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/custom-metrics-elements.png
## API docs
# https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md
prometheus:
podDisruptionBudget:
enabled: true
service:
# Session stickness for grafana when replicas > 1
sessionAffinity: ClientIP
additionalPorts:
# For dnsDiscovery to work with thanos sidecar
- name: grpc
port: 10901
targetPort: grpc
prometheusSpec:
image:
repository: prometheus
logFormat: json
replicas: 2
retention: 3d
podAntiAffinity: hard
podAntiAffinityTopologyKey: topology.kubernetes.io/zone
replicaExternalLabelName: replica
priorityClassName: high-priority
podDisruptionBudget:
enabled: true
minAvailable: 1
resources:
requests:
cpu: 4
memory: 8Gi
limits:
cpu: 4
memory: 8Gi
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: gp3-retain
resources:
requests:
storage: 70Gi
## Thanos configuration allows configuring various aspects of a Prometheus server in a Thanos environment.
## This section is experimental, it may change significantly without deprecation notice in any release.
## This is experimental and may change significantly without backward compatibility in any release.
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#thanosspec
##
thanos:
# Needs to be not null, or it throws spec.image in body must be of type string
image: bitnami/thanos:v0.34.1
# Referenece the Thanos objstore config secret.
objectStorageConfig:
existingSecret:
key: objstore.yml # type: s3
name: prom-stack-thanos-objstore-secret
resources:
requests:
cpu: 100m
memory: 1Gi
limits:
# Overriding the default
cpu: 200m
memory: 1Gi
# Leave this job for Thanos compactor
disableCompaction: true
# The --web.enable-admin-api flag is enabled to support sidecar to get metadata from Prometheus like external labels.
enableAdminAPI: true
## If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector will cause the
## prometheus resource to be created with selectors based on values in the helm deployment,
## which will also match the servicemonitors created
##
# https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#prometheusioscrape
serviceMonitorSelectorNilUsesHelmValues: false
## ServiceMonitors to be selected for target discovery.
## If {}, select all ServiceMonitors
##
serviceMonitorSelector: {}
## If {}, select all namespaces
##
serviceMonitorNamespaceSelector: {}
## The same for PodMonitor and PrometheusRule CRDs
podMonitorSelectorNilUsesHelmValues: false
podMonitorSelector: {}
podMonitorNamespaceSelector: {}
ruleSelectorNilUsesHelmValues: false
ruleSelector:
matchExpressions:
- key: thanos
operator: DoesNotExist
ruleNamespaceSelector: {}
probeSelectorNilUsesHelmValues: true
probeSelector: {}
probeNamespaceSelector: {}
## See examples in opertaor chart values.yaml
additionalScrapeConfigs: []
additionalAlertManagerConfigs: []
additionalAlertRelabelConfigs: []
additionalServiceMonitors:
# Job name should be unique
- name: sm-default
# Label selector for services to which this ServiceMonitor applies
selector:
matchLabels:
# Just put such label in Service manifest to use the default ServiceMonitor
servicemonitor: default
namespaceSelector:
any: true
# Endpoints of the selected service to be monitored
endpoints:
# Name of the endpoint's service port
- port: metrics
path: /metrics
scheme: http
# In case monitoring target don't expose any Services.
additionalPodMonitors:
- name: pm-default
# Label selector for pods to which this PodMonitor applies
selector:
matchLabels:
podmonitor: default
namespaceSelector:
any: true
# Endpoints of the selected pods to be monitored
podMetricsEndpoints:
- port: metrics
path: /metrics
scheme: http
- name: thanos-sidecar
selector:
matchLabels:
operator.prometheus.io/name: prom-stack-prometheus
podMetricsEndpoints:
- port: http
path: /metrics
scheme: http
prometheusOperator:
priorityClassName: high-priority
image:
repository: prometheus-operator
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 256Mi
tlsProxy:
resources:
requests:
cpu: 50m
memory: 32Mi
limits:
cpu: 100m
memory: 32Mi
## Set the prometheus config reloader resources
##
configReloaderCpu: 50m
configReloaderMemory: 32Mi
prometheusConfigReloader:
image:
repository: prometheus-config-reloader
prometheus-node-exporter:
priorityClassName: daemonset
image:
repository: node-exporter
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
memory: 64Mi
# Overwrite child chart tolerations because we want this tool on every node, also ones with NoExecute
tolerations:
- operator: Exists
kube-state-metrics:
image:
repository: kube-state-metrics
prometheus:
monitor:
resourcePath: /metrics/resource
metricRelabelings:
- regex: container_id
action: labeldrop
- regex: kube_pod_info
action: labeldrop
priorityClassName: high-priority
metricLabelsAllowlist:
- nodes=[*],jobs=[cronjob,cluster],pods=[*],statefulsets=[*],deployments=[*]
resources:
requests:
cpu: 50m
memory: 512Mi
limits:
cpu: 50m
memory: 512Mi
releaseLabel: true
collectors:
- cronjobs
- daemonsets
- deployments
- horizontalpodautoscalers
- jobs
- namespaces
- nodes
- persistentvolumeclaims
- persistentvolumes
- pods
- replicasets
- statefulsets
---
name: thanos
version: 0.4.2
apiVersion: v2
home: https://thanos.io/
sources:
- https://github.com/bitnami/charts/tree/master/bitnami/thanos
- https://github.com/thanos-io/thanos
- https://github.com/bitnami/bitnami-docker-thanos
dependencies:
- name: thanos
version: 15.0.5
repository: https://charts.bitnami.com/bitnami
---
thanosTargetGroupBinding:
enabled: false
thanos:
image:
repository: thanos
objstoreConfig: |-
type: s3
config:
aws_sdk_auth: true
metrics:
enabled: true
serviceMonitor:
enabled: true
compactor:
logFormat: json
enabled: true
retentionResolutionRaw: 14d
retentionResolution5m: 60d
retentionResolution1h: 180d
extraFlags:
- --deduplication.replica-label=replica
updateStrategy:
type: Recreate
containerSecurityContext:
readOnlyRootFilesystem: true
resources:
requests:
cpu: 500m
memory: 4Gi
limits:
memory: 4Gi
persistence:
size: 300Gi
storageClass: magnetic-retain
networkPolicy:
enabled: false
serviceAccount:
create: true
automountServiceAccountToken: false
storegateway:
logFormat: json
enabled: true
replicaCount: 1
extraFlags:
- --min-time=-30d
- --max-time=-6d
- --store.enable-index-header-lazy-reader
service:
additionalHeadless: true
clusterIP: None
containerSecurityContext:
readOnlyRootFilesystem: true
persistence:
size: 8Gi
resources:
requests:
cpu: 200m
memory: 6Gi
limits:
memory: 6Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: storegateway
app.kubernetes.io/name: thanos
topologyKey: topology.kubernetes.io/zone
networkPolicy:
enabled: false
serviceAccount:
create: true
automountServiceAccountToken: false
query:
logFormat: json
extraFlags:
- --query.auto-downsampling
replicaLabel:
- replica
dnsDiscovery:
enabled: true
sidecarsService: prom-stack-prometheus
sidecarsNamespace: monitoring
stores:
- dnssrv+_grpc._tcp.thanos-tenants-receive-headless.monitoring.svc
- dnssrv+_grpc._tcp.thanos-tenants-storegateway.monitoring.svc
replicaCount: 10
pdb:
create: true
minAvailable: false
maxUnavailable: 1
pspEnabled: false
containerSecurityContext:
readOnlyRootFilesystem: true
rbac:
create: true
resources:
requests:
cpu: 1
memory: 24Gi
limits:
memory: 24Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: query
app.kubernetes.io/name: thanos
topologyKey: kubernetes.io/hostname
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: query
app.kubernetes.io/name: thanos
topologyKey: topology.kubernetes.io/zone
networkPolicy:
enabled: false
serviceAccount:
create: true
automountServiceAccountToken: false
queryFrontend:
logFormat: json
replicaCount: 2
extraFlags:
# Maximum number of labels requests will be scheduled in parallel by the Frontend.
- --labels.max-query-parallelism=40
# Split labels requests by an interval and execute in parallel,
# it should be greater than 0 when labels.response-cache-config is configured.
- --labels.split-interval=1h
# Alternative to 'labels.response-cache-config-file' flag (mutually exclusive).
# Content of YAML file that contains response cache configuration.
- '--labels.response-cache-config={type: IN-MEMORY, config: {max_size: 3GB}}'
# Log queries that are slower than the specified duration.
# Set to 0 to disable. Set to < 0 to enable on all queries.
- --query-frontend.log-queries-longer-than=1m
# Number of shards to use when distributing shardable PromQL queries.
# For more details, you can refer to the Vertical query sharding proposal:
# https://thanos.io/tip/proposals-accepted/202205-vertical-query-sharding.md
- --query-frontend.vertical-shards=10
# Mutate incoming queries to align their start and end with their step for better cache-ability.
# Note: Grafana dashboards do that by default.
- --query-range.align-range-with-step
# Maximum number of query range requests will be scheduled in parallel by the Frontend.
- --query-range.max-query-parallelism=40
# Split query range requests by an interval and execute in parallel, it should be greater than
# 0 when query-range.response-cache-config is configured.
- --query-range.split-interval=1h
# Alternative to 'query-range.response-cache-config-file' flag (mutually exclusive).
# Content of YAML file that contains response cache configuration.
- '--query-range.response-cache-config={type: IN-MEMORY, config: {max_size: 3GB}}'
# Use compression in results cache. Supported values are: 'snappy' and ” (disable compression).
- --cache-compression-type=snappy
- --query-frontend.compress-responses
pdb:
create: true
minAvailable: false
maxUnavailable: 1
pspEnabled: false
containerSecurityContext:
readOnlyRootFilesystem: true
rbac:
create: true
resources:
requests:
cpu: 2
memory: 8Gi
limits:
memory: 8Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: query-frontend
app.kubernetes.io/name: thanos
topologyKey: topology.kubernetes.io/zone
networkPolicy:
enabled: false
serviceAccount:
create: true
automountServiceAccountToken: false
containerPorts:
http: "10902"
receive:
enabled: false
bucketweb:
enabled: false
networkPolicy:
enabled: false
# Settings for external grafana.
grafana:
datasource:
enabled: true
# Label for grafana sidecar container(defined in prometheus stack) which loads datasources.
label: grafana_datasource
dashboards:
enabled: true
# Label for grafana sidecar container(defined in prometheus stack) which loads dashboards.
label: grafana_dashboard
queryFrontendHeadless:
enabled: true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment