Skip to content

Instantly share code, notes, and snippets.

@asw101
Created January 21, 2022 01:34
Show Gist options
  • Save asw101/d4881919bbbdf0068f95605ec3395a6a to your computer and use it in GitHub Desktop.
Save asw101/d4881919bbbdf0068f95605ec3395a6a to your computer and use it in GitHub Desktop.

Store Prometheus Metrics with Thanos, Azure Storage and Azure Kubernetes Service (AKS)

In our previous article we explored how to deploy Grafana and Prometheus on AKS. However, despite Prometheus being an excellent way to monitor your AKS cluster, it lacks some functionality to save metrics for future use and enable us to query historical data.

Prometheus is a metrics collector, it can be used to collect metrics from any source, and it can be used to generate graphs. However, it is not a data storage solution and therefore isn't not focused on storing data for historical purposes. Prometheus has its storage solution and supports both on-disk storage as well as remote locations. You can configure a lot of options but the data management options are limited. This is where Thanos can help.

There is also a configuration overhead when we want to scale our Prometheus deployment and make it highly available. This generally includes the use of a federated set-up, and the use of a shared storage solution which, in Kubernetes, is usually accompised with a shared PersistentVolume.

What is Thanos?

Thanos is an open source Prometheus setup with long term storage capabilities. It is not a new implementation of Prometheus, but a pre-built setup that has been designed to be used in production environments when long term storage is needed.

Storing metrics data for long term use requires it to be stored in a way that is optimized for that use. Long term storage may require unlimited retention periods, with ever-growing storage requirements.

We will need to address two issues. Firstly, unlimited data requires storage that will scale accordingly. Secondly, the more data we have the slower the querying will become, and for this downsampling and compaction techniques are used to reduce the size of the data and improve query times.

Thanos provides all of this and more out of the box in a single binary. You also don’t need to install all of those features, and you can have a subset of Thanos running in your cluster.

Thanos currently depends on Prometheus v2.2.1+ and an optional object storage if you want to store your data in a remote location. Several supported object storage clients are available. We will be using Azure Blob Storage.

Thanos Architecture

Thanos is composed of several components:

  • Thanos Sidecar: The sidecar runs alongside Prometheus server to gather the metrics that are stored on disk. It’s composed of a StoreAPI and a Shipper, the shipper is responsible for sending the metrics to the object storage.

  • Thanos Store Gateway: This component is responsible for querying the object storage and exposing a StoreAPI that is queried by the other components.

  • Query Layer: Provides all the components required to query the Data, including the Web UI and the API.

  • Compactor: Reads from object storage and compacts the data that’s not compacted yet. It’s completely independent of the other components.

  • Ruler: Provides the ruler API that is used to evaluate rules and alerts from the Prometheus Alertmanager.

The overall architecture can be described as follows (via: Thanos Quick Tutorial).

Thanos overview of architecture

Thanos uses a mix of HTTP and gRPC requests. HTTP requests are mostly used to query Prometheus, whilst gRPC requests are mostly used within Thanos' Store API.

We'll be deploying our metrics server to an Azure Kubernetes Service (AKS) cluster. If you don’t have a running AKS cluster, take a look at the quickstart to Deploy an Azure Kubernetes Service cluster using the Azure CLI.

We’ll also need an Azure Storage Account, you can create one using the Azure Portal or the Azure CLI. You will also need the storage account access key which can also be retrieved using the Azure CLI.

Create a storage account.

az storage account create --name <name> --resource-group <resource-group>

Create a storage container called metrics.

az storage container create --name metrics --account-name <name>

Retrieve the storage account access key for later use.

az storage account keys list --account-name <name> --resource-group <resource-group> -o tsv --query "[0].value"

Thanos is designed to scale and extend vanilla Prometheus. Start by creating the Prometheus configuration that you will use to deploy the kube-prometheus-stack.

First, create a file called prometheus.yaml.

# prometheus.yaml 
grafana:
  adminPassword: admin 
   
prometheus:
  thanosService:
    enabled: true
   
  thanosServiceMonitor:
    enabled: true
    interval: 5s
   
  prometheusSpec:
    thanos:
      objectStorageConfig:
        key: thanos.yaml
        name: thanos-objstore-config
   
prometheusOperator:
  thanosImage:
    repository: quay.io/thanos/thanos
    version: v0.23.0
    tag: v0.23.0
   
kubelet:
  serviceMonitor:
    https: false

In this file, we’re first configuring the Prometheus Operator to use the Thanos image, and we’re also configuring the Grafana to use a password for the admin user (this is optional, but the default password is prom-operator). We’re also enabling the Thanos service and monitor (for Thanos metrics).

The thanos key is the configuration object for the storage object, the remote configuration that Thanos will use to upload the metrics to. In our case, this is the Azure Storage account that we created earlier. Thanos configuration for Azure object can be found here, but we’ll only need a handful of options.

Create a thanos.yaml file locally.

# thanos.yaml
type: AZURE
config:
  storage_account: '<storage-account-name>'
  storage_account_key: '<storage-account-key>'
  container: 'metrics'

Replace <storage-account-name> with your storage account name and <storage-account-key> with the storage account access key you retrieved earlier.

Make sure you are authenticated to your Azure Kubernetes Service cluster (e.g. via az aks get-credentials) before running the following kubectl commands below.

Create a new namespace called monitoring.

kubectl create ns monitoring

Make sure you are in the same directory as thanos.yaml, then create a secret called thanos-objstore-config.

kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos.yaml

Add the helm repo for the Prometheus Community Helm Charts

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Run the helm installation for the Prometheus Operator.

helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -n monitoring --values prometheus.yaml

After a while, you’ll see that the Prometheus Operator is installing the Prometheus and Thanos components.

Check out the result by querying the Prometheus.

kubectl --namespace monitoring get pods

You’ll see that there are three containers inside the prometheus-prometheus-kube-prometheus-prometheus-0 pod, one of them is thanos-sidecar. By default it will back up all the blocks that prometheus generates to our Azure Storage Account every two hours.

Other components

Installing the sidecar is the first step to create a complete Thanos deployment. It’s important to install both the Querier, and the Compactor along with the Store for a complete solution. Each of these components is installed as a separate deployment with its own network configuration.

In the following section you will create multiple kubernetes manifests. After you create each file, apply it to your cluster using the kubectl apply command (e.g. kubectl -f FILENAME.yaml).

Querier

The Querier is the layer that will allow us to query all Prometheus instances at once. It needs a Deployment that will be pointed to all sidecars, and it also needs its own Service to be able to be discovered and used.

Create the querier Deployment (don't forget to run kubectl apply -f querier-deployment.yaml).

# querier-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
  namespace: monitoring
  labels:
    app: thanos-query
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-query
  template:
    metadata:
      labels:
        app: thanos-query
    spec:
      containers:
        - name: thanos-query
          image: quay.io/thanos/thanos:v0.23.0
          args:
            - 'query'
            - '--log.level=debug'
            - '--query.replica-label=prometheus_replica'
            - '--store=prometheus-kube-prometheus-thanos-discovery.monitoring.svc:10901'
          resources:
            requests:
              cpu: '100m'
              memory: '64Mi'
            limits:
              cpu: '250m'
              memory: '256Mi'
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
            - name: cluster
              containerPort: 10900

Note that the image name and version must match the image name and version we deployed on the helm chart above. When we activated the thanosService option, we created a new service discovery network that would allow us to query all the Prometheus instances in the cluster from a single point, this is the option we used to configure the Querier with.

We also need the Service that will be used to expose the Querier and the metrics about that component so we can also monitor it using a ServiceMonitor. These can be can split across multiple files, or delimited with --- in the same file as we have below.

Create the Service and ServiceMonitor.

# querier-service-servicemonitor.yaml
apiVersion: v1
kind: Service
metadata:
  name: thanos-query
  labels:
    app: thanos-query
    release: prometheus-operator
    jobLabel: thanos
  namespace: monitoring
spec:
  selector:
    app: thanos-query
  ports:
    - port: 9090
      protocol: TCP
      targetPort: http
      name: http-query
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prom-thanos-query
  namespace: monitoring
spec:
  jobLabel: thanos
  selector:
    matchLabels:
      app: thanos-query
  namespaceSelector:
    matchNames:
      - 'monitoring'
  endpoints:
    - port: http-query
      path: /metrics
      interval: 5s

This is telling prometheus to scrape the metrics from the path /metrics at port 10902 from the Querier application. Lastly, we add the ServiceMonitor to monitor our Querier.

Upon running the above, you’ll see that the Querier is being installed.

Take a look at the logs for that deployment.

kubectl -n monitoring logs deploy/thanos-query

You should see output as follows:

level=debug ts=2022-01-01T19:56:05.927270012Z caller=main.go:65 msg="maxprocs: Updating GOMAXPROCS=[1]: using minimum allowed GOMAXPROCS"
ts=2022-01-01T19:56:05.928679219Z caller=log.go:168 level=debug msg="Lookback delta is zero, setting to default value" value=5m0s
level=info ts=2022-01-01T19:56:05.932400335Z caller=options.go:27 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2022-01-01T19:56:05.93349084Z caller=query.go:618 msg="starting query node"
level=debug ts=2022-01-01T19:56:05.933743641Z caller=endpointset.go:320 component=endpointset msg="starting to update API endpoints" cachedEndpoints=0
level=debug ts=2022-01-01T19:56:05.933774441Z caller=endpointset.go:323 component=endpointset msg="checked requested endpoints" activeEndpoints=0 cachedEndpoints=0
level=info ts=2022-01-01T19:56:05.933864742Z caller=intrumentation.go:60 msg="changing probe status" status=healthy
level=info ts=2022-01-01T19:56:05.933887542Z caller=http.go:63 service=http/server component=query msg="listening for requests and metrics" address=0.0.0.0:10902
ts=2022-01-01T19:56:05.934207943Z caller=log.go:168 service=http/server component=query level=info msg="TLS is disabled." http2=false
level=info ts=2022-01-01T19:56:05.934254744Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2022-01-01T19:56:05.934281344Z caller=grpc.go:127 service=gRPC/server component=query msg="listening for serving gRPC" address=0.0.0.0:10901

Store

The store runs along with the Querier to bring the data from our Object Storage to the queries. It’s composed of a StatefulSet, and a configuration that contains the configuration for the Store, which we previously created as a secret.

Create a StatefulSet.

# store-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
  labels:
    app: thanos-store
spec:
  serviceName: 'thanos-store'
  replicas: 1
  selector:
    matchLabels:
      app: thanos-store
  template:
    metadata:
      labels:
        app: thanos-store
    spec:
      containers:
        - name: thanos-store
          image: quay.io/thanos/thanos:v0.23.0
          args:
            - 'store'
            - '--log.level=debug'
            - '--data-dir=/var/thanos/store'
            - '--objstore.config-file=/config/thanos.yaml'
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
            - name: cluster
              containerPort: 10900
          volumeMounts:
            - name: config
              mountPath: /config/
              readOnly: true
            - name: data
              mountPath: /var/thanos/store
      volumes:
        - name: data
          emptyDir: {}
        - name: config
          secret:
            secretName: thanos-objstore-config

Create a ServiceMonitor.

# store-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: thanos-store
  namespace: monitoring
  labels:
    release: prom-op
spec:
  jobLabel: thanos
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
  selector:
    matchLabels:
      app: thanos-store

Compactor

The compactor is the service that will downsample the historical data. It’s recommended when you have a lot of incoming data in order to reduce the storage requirements. Just like the Querier component, it is composed of a StatefulSet and a Service. It takes configurations like the Store.

Create the StatefulSet.

# compactor-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-compactor
  namespace: monitoring
  labels:
    app: thanos-compactor
spec:
  serviceName: 'thanos-compactor'
  replicas: 1
  selector:
    matchLabels:
      app: thanos-compactor
  template:
    metadata:
      labels:
        app: thanos-compactor
    spec:
      containers:
        - name: thanos-compactor
          image: quay.io/thanos/thanos:v0.23.0
          args:
            - 'compact'
            - '--log.level=debug'
            - '--data-dir=/var/thanos/store'
            - '--objstore.config-file=/config/thanos.yaml'
            - '--wait'
          ports:
            - name: http
              containerPort: 10902
          volumeMounts:
            - name: config
              mountPath: /config/
              readOnly: true
            - name: data
              mountPath: /var/thanos/store
      volumes:
        - name: data
          emptyDir: {}
        - name: config
          secret:
            secretName: thanos-objstore-config

Create the Service and the ServiceMonitor.

# compactor-service-servicemonitor.yaml
apiVersion: v1
kind: Service
metadata:
  name: thanos-compactor
  labels:
    app: thanos-compactor
  namespace: monitoring
spec:
  selector:
    app: thanos-compactor
  ports:
    - port: 10902
      name: http
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: thanos-compactor
  namespace: monitoring
  labels:
    release: prom-op
spec:
  jobLabel: thanos
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
  selector:
    matchLabels:
      app: thanos-compactor

Querying Thanos

To query Thanos using the stack we installed, you need to port-forward your Grafana to your computer.

kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring

Open your browser at localhost:3000 and login with username admin and password admin.

Open the "Configurations" menu on the left hand side and click "Data sources" then click "Add data source".

Select "Prometheus" under "Time series databases". Name it "Prometheus Thanos" and set the "HTTP URL" to http://thanos-query:9090 and click "Save & test".

Congratulations! You have successfully deployed Thanos to your Azure Kubernetes Service cluster, with data stored in Azure Blob Storage.

You can now click on "Explore" on the left hand-side and select "Prometheus Thanos" from the drop-down menu to submit PromQL queries.

Additionally, select the "Prometheus Thanos" data source when using Dashboards such as Kubernetes API Server (Dashboards > Browse > General / Kubernetes / API Server) below:

Don't forget to delete your cluster, and storage account, with the az group delete command to avoid any ongoing charges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment