Skip to content

Instantly share code, notes, and snippets.

@jboyd01
Last active December 4, 2017 17:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jboyd01/aa85b90aa640d6ec1492faef1493ed8b to your computer and use it in GitHub Desktop.
Save jboyd01/aa85b90aa640d6ec1492faef1493ed8b to your computer and use it in GitHub Desktop.
Using Prometheus metrics with Service Catalog
We want to expose Metrics from Catalog via prometheus to enable monitoring and track key metrics and provide the ability
to alert on specific conditions.
Prometheus provides a client api that enables you to register a HTTP handler (ie /metrics) that automatically exposes
Prometheus metrics objects. Many core components within Kubernetes already do this including the CAdvisor, Kubelet,
Scheduler, Proxy, and many more. The Prometheus server can be easily configured with scrap configurations that will
poll the /metrics endpoints and provide a centralized UI for discovery & analysis. Advanced analytic and graphing tools
such as Grafana can consume Prometheus data are often used to augment monitoring.
Kubernetes API Servers are exposing Prometheus metrics out of the box, Prometheus must be
Installing and configuring Prometheus is simple. Use the attached yaml files with the following commands:
kubectl create -f kubernetes-service-account-and-roles.yml
kubectl create -f prometheus.yml
And then port forward to Prometheus so you can hit it in a browser:
kubectl get pods -l app=prometheus -o name | sed 's/^.*\///' | xargs -I{} kubectl port-forward {} 9090:9090 &
and now hit http://localhost:9090 (you should get the prometheus app). There is a drop down on the top left part of the
page that shows you the available metrics. Selecting one or more will graph them. Prometheus provides a powerful SQL like
quering language called PromQL that can be used to create charts or graphs. The Targets page show the sources that it
is gathering metrics from along with the status of each. This comes in handy when you want to verify if its collecting
metrics from our newly instrumented Catalog Controller.
The prometheus.yaml contains a scrape configuration that matches all pods that have a deployment descriptor that contains
prometheus.io/scrape with a value of true. This has been added to our
charts/catalog/templates/controller-manager-deployment.yaml.
The new file pkg/metrics/metrics.go creates and registers a few Prometheus metrics and establishes a
handler for /metrics.
Troubleshooting
Not seeing Controller metrics in Prometheus?
1) Within the Prometheus app, view the Targets and locate the "Pods" section. It should show the Controller pod. If it
doesn't, this most likely means the pod deployment doesn't have the prometheus.io/scrape attribute. You can add this live
and Prometheus picks this up immediately.
2) You can verify metrics are being exposed in the container by setting up a port
forward to the controller's /metrics endpoint:
kubectl get pods -l app=catalog-catalog-controller-manager -n catalog -o name | \
sed 's/^.*\///' | \
xargs -I{} kubectl port-forward {} -n catalog 8089:8080 &
and then curl -s http://localhost:8089/metrics and you should see a bunch of metrics provided by Prometheus about
Go and the controller process along with the new metrics from Catalog Controller (search for servicecatalog).
$ curl -s http://localhost:8089/metrics | grep -e catalog
# HELP servicecatalog_broker_service_class_count Number of services classes by Broker.
# TYPE servicecatalog_broker_service_class_count gauge
servicecatalog_broker_service_class_count{broker="ups-broker"} 1
# HELP servicecatalog_broker_service_plan_count Number of services classes by Broker.
# TYPE servicecatalog_broker_service_plan_count gauge
servicecatalog_broker_service_plan_count{broker="ups-broker"} 2
Some examples I tried for demo purposes, these are not currently instrumented but showing them here for examples:
# HELP servicecatalog_reconcile_duration_microseconds Reconcilation latency distributions by service.
# TYPE servicecatalog_reconcile_duration_microseconds summary
servicecatalog_reconcile_duration_microseconds{service="binding",quantile="0.5"} 7.576423763478622e-07
servicecatalog_reconcile_duration_microseconds{service="binding",quantile="0.9"} 2.410238141422033e-06
servicecatalog_reconcile_duration_microseconds{service="binding",quantile="0.99"} 4.732418370977172e-06
servicecatalog_reconcile_duration_microseconds_sum{service="binding"} 0.04537562186808275
servicecatalog_reconcile_duration_microseconds_count{service="binding"} 45446
servicecatalog_reconcile_duration_microseconds{service="instance",quantile="0.5"} 0.00010496636854785499
servicecatalog_reconcile_duration_microseconds{service="instance",quantile="0.9"} 0.0001806733047164655
servicecatalog_reconcile_duration_microseconds{service="instance",quantile="0.99"} 0.00019821761566611137
servicecatalog_reconcile_duration_microseconds_sum{service="instance"} 2.280662444099565
servicecatalog_reconcile_duration_microseconds_count{service="instance"} 22794
servicecatalog_reconcile_duration_microseconds{service="serviceplan",quantile="0.5"} 3.078785250833388e-05
servicecatalog_reconcile_duration_microseconds{service="serviceplan",quantile="0.9"} 0.000284775667039806
servicecatalog_reconcile_duration_microseconds{service="serviceplan",quantile="0.99"} 0.0004637287422755563
servicecatalog_reconcile_duration_microseconds_sum{service="serviceplan"} 0.27024435038837413
servicecatalog_reconcile_duration_microseconds_count{service="serviceplan"} 30386
# HELP servicecatalog_reconcile_errors Cumulative number of errors encountered during reconcilation by service.
# TYPE servicecatalog_reconcile_errors counter
servicecatalog_reconcile_errors{service="binding"} 90
servicecatalog_reconcile_errors{service="instance"} 209
servicecatalog_reconcile_errors{service="serviceplan"} 125
error rate can be determined with the rate() or irate() function. It is showing the rate of change per second. i.e.,
rate(osb_responses[400][5m]) / rate(osb_requests[5m])
instrumented:
osb_requests - # of requests to broker [brokername]
osb_responses - # of responses [brokername][status (100/200/300/400/500)]
broker_service_class_count - # of ServiceClasses [brokername]
broker_service_plan_count - # of ServicePlans [brokername]
instance_count
count of async operations?
count of orphan mitigations?
#
# Prometheus will run uner the "prometheus" Service Account, create
# access to the necessary resources
#
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
---
#
# Deploy Prometheus
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: prometheus-deployment
name: prometheus
spec:
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- image: quay.io/prometheus/prometheus:latest
name: prometheus
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention=24h"
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: "/prometheus"
name: data
- mountPath: "/etc/prometheus"
name: config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
volumes:
- emptyDir: {}
name: data
- configMap:
name: prometheus-config
name: config-volume
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 30s
scrape_timeout: 30s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# A scrape configuration for running Prometheus on a Kubernetes cluster.
# This uses separate scrape configs for cluster components (i.e. API server, node)
# and services to allow each to use different authentication configs.
#
# Kubernetes labels will be added as Prometheus labels on metrics via the
# `labelmap` relabeling action.
#
# If you are using Kubernetes 1.7.2 or earlier, please take note of the comments
# for the kubernetes-cadvisor job; you will need to edit or remove this job.
# Scrape config for API servers.
#
# Kubernetes exposes API servers as endpoints to the default/kubernetes
# service so this uses `endpoints` role and uses relabelling to only keep
# the endpoints associated with the default/kubernetes service using the
# default named port `https`. This works for single API server deployments as
# well as HA API server deployments.
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# Keep only the default/kubernetes service endpoints for the https port. This
# will add targets for each API server which Kubernetes adds an endpoint to
# the default/kubernetes service.
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# Scrape config for nodes (kubelet).
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
- job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor.
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
# the --cadvisor-port=0 Kubelet flag).
#
# This job is not necessary and should be removed in Kubernetes 1.6 and
# earlier versions, or it will cause the metrics to be scraped twice.
- job_name: 'kubernetes-cadvisor'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment