Last active
December 4, 2017 17:03
-
-
Save jboyd01/aa85b90aa640d6ec1492faef1493ed8b to your computer and use it in GitHub Desktop.
Using Prometheus metrics with Service Catalog
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We want to expose Metrics from Catalog via prometheus to enable monitoring and track key metrics and provide the ability | |
to alert on specific conditions. | |
Prometheus provides a client api that enables you to register a HTTP handler (ie /metrics) that automatically exposes | |
Prometheus metrics objects. Many core components within Kubernetes already do this including the CAdvisor, Kubelet, | |
Scheduler, Proxy, and many more. The Prometheus server can be easily configured with scrap configurations that will | |
poll the /metrics endpoints and provide a centralized UI for discovery & analysis. Advanced analytic and graphing tools | |
such as Grafana can consume Prometheus data are often used to augment monitoring. | |
Kubernetes API Servers are exposing Prometheus metrics out of the box, Prometheus must be | |
Installing and configuring Prometheus is simple. Use the attached yaml files with the following commands: | |
kubectl create -f kubernetes-service-account-and-roles.yml | |
kubectl create -f prometheus.yml | |
And then port forward to Prometheus so you can hit it in a browser: | |
kubectl get pods -l app=prometheus -o name | sed 's/^.*\///' | xargs -I{} kubectl port-forward {} 9090:9090 & | |
and now hit http://localhost:9090 (you should get the prometheus app). There is a drop down on the top left part of the | |
page that shows you the available metrics. Selecting one or more will graph them. Prometheus provides a powerful SQL like | |
quering language called PromQL that can be used to create charts or graphs. The Targets page show the sources that it | |
is gathering metrics from along with the status of each. This comes in handy when you want to verify if its collecting | |
metrics from our newly instrumented Catalog Controller. | |
The prometheus.yaml contains a scrape configuration that matches all pods that have a deployment descriptor that contains | |
prometheus.io/scrape with a value of true. This has been added to our | |
charts/catalog/templates/controller-manager-deployment.yaml. | |
The new file pkg/metrics/metrics.go creates and registers a few Prometheus metrics and establishes a | |
handler for /metrics. | |
Troubleshooting | |
Not seeing Controller metrics in Prometheus? | |
1) Within the Prometheus app, view the Targets and locate the "Pods" section. It should show the Controller pod. If it | |
doesn't, this most likely means the pod deployment doesn't have the prometheus.io/scrape attribute. You can add this live | |
and Prometheus picks this up immediately. | |
2) You can verify metrics are being exposed in the container by setting up a port | |
forward to the controller's /metrics endpoint: | |
kubectl get pods -l app=catalog-catalog-controller-manager -n catalog -o name | \ | |
sed 's/^.*\///' | \ | |
xargs -I{} kubectl port-forward {} -n catalog 8089:8080 & | |
and then curl -s http://localhost:8089/metrics and you should see a bunch of metrics provided by Prometheus about | |
Go and the controller process along with the new metrics from Catalog Controller (search for servicecatalog). | |
$ curl -s http://localhost:8089/metrics | grep -e catalog | |
# HELP servicecatalog_broker_service_class_count Number of services classes by Broker. | |
# TYPE servicecatalog_broker_service_class_count gauge | |
servicecatalog_broker_service_class_count{broker="ups-broker"} 1 | |
# HELP servicecatalog_broker_service_plan_count Number of services classes by Broker. | |
# TYPE servicecatalog_broker_service_plan_count gauge | |
servicecatalog_broker_service_plan_count{broker="ups-broker"} 2 | |
Some examples I tried for demo purposes, these are not currently instrumented but showing them here for examples: | |
# HELP servicecatalog_reconcile_duration_microseconds Reconcilation latency distributions by service. | |
# TYPE servicecatalog_reconcile_duration_microseconds summary | |
servicecatalog_reconcile_duration_microseconds{service="binding",quantile="0.5"} 7.576423763478622e-07 | |
servicecatalog_reconcile_duration_microseconds{service="binding",quantile="0.9"} 2.410238141422033e-06 | |
servicecatalog_reconcile_duration_microseconds{service="binding",quantile="0.99"} 4.732418370977172e-06 | |
servicecatalog_reconcile_duration_microseconds_sum{service="binding"} 0.04537562186808275 | |
servicecatalog_reconcile_duration_microseconds_count{service="binding"} 45446 | |
servicecatalog_reconcile_duration_microseconds{service="instance",quantile="0.5"} 0.00010496636854785499 | |
servicecatalog_reconcile_duration_microseconds{service="instance",quantile="0.9"} 0.0001806733047164655 | |
servicecatalog_reconcile_duration_microseconds{service="instance",quantile="0.99"} 0.00019821761566611137 | |
servicecatalog_reconcile_duration_microseconds_sum{service="instance"} 2.280662444099565 | |
servicecatalog_reconcile_duration_microseconds_count{service="instance"} 22794 | |
servicecatalog_reconcile_duration_microseconds{service="serviceplan",quantile="0.5"} 3.078785250833388e-05 | |
servicecatalog_reconcile_duration_microseconds{service="serviceplan",quantile="0.9"} 0.000284775667039806 | |
servicecatalog_reconcile_duration_microseconds{service="serviceplan",quantile="0.99"} 0.0004637287422755563 | |
servicecatalog_reconcile_duration_microseconds_sum{service="serviceplan"} 0.27024435038837413 | |
servicecatalog_reconcile_duration_microseconds_count{service="serviceplan"} 30386 | |
# HELP servicecatalog_reconcile_errors Cumulative number of errors encountered during reconcilation by service. | |
# TYPE servicecatalog_reconcile_errors counter | |
servicecatalog_reconcile_errors{service="binding"} 90 | |
servicecatalog_reconcile_errors{service="instance"} 209 | |
servicecatalog_reconcile_errors{service="serviceplan"} 125 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
error rate can be determined with the rate() or irate() function. It is showing the rate of change per second. i.e., | |
rate(osb_responses[400][5m]) / rate(osb_requests[5m]) | |
instrumented: | |
osb_requests - # of requests to broker [brokername] | |
osb_responses - # of responses [brokername][status (100/200/300/400/500)] | |
broker_service_class_count - # of ServiceClasses [brokername] | |
broker_service_plan_count - # of ServicePlans [brokername] | |
instance_count | |
count of async operations? | |
count of orphan mitigations? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Prometheus will run uner the "prometheus" Service Account, create | |
# access to the necessary resources | |
# | |
apiVersion: rbac.authorization.k8s.io/v1beta1 | |
kind: ClusterRole | |
metadata: | |
name: prometheus | |
rules: | |
- apiGroups: [""] | |
resources: | |
- nodes | |
- nodes/proxy | |
- services | |
- endpoints | |
- pods | |
verbs: ["get", "list", "watch"] | |
- nonResourceURLs: ["/metrics"] | |
verbs: ["get"] | |
--- | |
apiVersion: v1 | |
kind: ServiceAccount | |
metadata: | |
name: prometheus | |
namespace: default | |
--- | |
apiVersion: rbac.authorization.k8s.io/v1beta1 | |
kind: ClusterRoleBinding | |
metadata: | |
name: prometheus | |
roleRef: | |
apiGroup: rbac.authorization.k8s.io | |
kind: ClusterRole | |
name: prometheus | |
subjects: | |
- kind: ServiceAccount | |
name: prometheus | |
namespace: default | |
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Deploy Prometheus | |
# | |
apiVersion: extensions/v1beta1 | |
kind: Deployment | |
metadata: | |
labels: | |
name: prometheus-deployment | |
name: prometheus | |
spec: | |
replicas: 1 | |
template: | |
metadata: | |
labels: | |
app: prometheus | |
spec: | |
serviceAccountName: prometheus | |
containers: | |
- image: quay.io/prometheus/prometheus:latest | |
name: prometheus | |
command: | |
- "/bin/prometheus" | |
args: | |
- "--config.file=/etc/prometheus/prometheus.yml" | |
- "--storage.tsdb.path=/prometheus" | |
- "--storage.tsdb.retention=24h" | |
ports: | |
- containerPort: 9090 | |
protocol: TCP | |
volumeMounts: | |
- mountPath: "/prometheus" | |
name: data | |
- mountPath: "/etc/prometheus" | |
name: config-volume | |
resources: | |
requests: | |
cpu: 100m | |
memory: 100Mi | |
limits: | |
cpu: 500m | |
memory: 2500Mi | |
volumes: | |
- emptyDir: {} | |
name: data | |
- configMap: | |
name: prometheus-config | |
name: config-volume | |
--- | |
apiVersion: v1 | |
kind: ConfigMap | |
metadata: | |
name: prometheus-config | |
data: | |
prometheus.yml: | | |
global: | |
scrape_interval: 30s | |
scrape_timeout: 30s | |
scrape_configs: | |
- job_name: 'prometheus' | |
static_configs: | |
- targets: ['localhost:9090'] | |
# A scrape configuration for running Prometheus on a Kubernetes cluster. | |
# This uses separate scrape configs for cluster components (i.e. API server, node) | |
# and services to allow each to use different authentication configs. | |
# | |
# Kubernetes labels will be added as Prometheus labels on metrics via the | |
# `labelmap` relabeling action. | |
# | |
# If you are using Kubernetes 1.7.2 or earlier, please take note of the comments | |
# for the kubernetes-cadvisor job; you will need to edit or remove this job. | |
# Scrape config for API servers. | |
# | |
# Kubernetes exposes API servers as endpoints to the default/kubernetes | |
# service so this uses `endpoints` role and uses relabelling to only keep | |
# the endpoints associated with the default/kubernetes service using the | |
# default named port `https`. This works for single API server deployments as | |
# well as HA API server deployments. | |
scrape_configs: | |
- job_name: 'kubernetes-apiservers' | |
kubernetes_sd_configs: | |
- role: endpoints | |
# Default to scraping over https. If required, just disable this or change to | |
# `http`. | |
scheme: https | |
# This TLS & bearer token file config is used to connect to the actual scrape | |
# endpoints for cluster components. This is separate to discovery auth | |
# configuration because discovery & scraping are two separate concerns in | |
# Prometheus. The discovery auth config is automatic if Prometheus runs inside | |
# the cluster. Otherwise, more config options have to be provided within the | |
# <kubernetes_sd_config>. | |
tls_config: | |
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt | |
# If your node certificates are self-signed or use a different CA to the | |
# master CA, then disable certificate verification below. Note that | |
# certificate verification is an integral part of a secure infrastructure | |
# so this should only be disabled in a controlled environment. You can | |
# disable certificate verification by uncommenting the line below. | |
# | |
# insecure_skip_verify: true | |
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token | |
# Keep only the default/kubernetes service endpoints for the https port. This | |
# will add targets for each API server which Kubernetes adds an endpoint to | |
# the default/kubernetes service. | |
relabel_configs: | |
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] | |
action: keep | |
regex: default;kubernetes;https | |
# Scrape config for nodes (kubelet). | |
# | |
# Rather than connecting directly to the node, the scrape is proxied though the | |
# Kubernetes apiserver. This means it will work if Prometheus is running out of | |
# cluster, or can't connect to nodes for some other reason (e.g. because of | |
# firewalling). | |
- job_name: 'kubernetes-nodes' | |
# Default to scraping over https. If required, just disable this or change to | |
# `http`. | |
scheme: https | |
# This TLS & bearer token file config is used to connect to the actual scrape | |
# endpoints for cluster components. This is separate to discovery auth | |
# configuration because discovery & scraping are two separate concerns in | |
# Prometheus. The discovery auth config is automatic if Prometheus runs inside | |
# the cluster. Otherwise, more config options have to be provided within the | |
# <kubernetes_sd_config>. | |
tls_config: | |
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt | |
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token | |
kubernetes_sd_configs: | |
- role: node | |
relabel_configs: | |
- action: labelmap | |
regex: __meta_kubernetes_node_label_(.+) | |
- target_label: __address__ | |
replacement: kubernetes.default.svc:443 | |
- source_labels: [__meta_kubernetes_node_name] | |
regex: (.+) | |
target_label: __metrics_path__ | |
replacement: /api/v1/nodes/${1}/proxy/metrics | |
# Scrape config for Kubelet cAdvisor. | |
# | |
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics | |
# (those whose names begin with 'container_') have been removed from the | |
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to | |
# retrieve those metrics. | |
# | |
# In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor | |
# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics" | |
# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with | |
# the --cadvisor-port=0 Kubelet flag). | |
# | |
# This job is not necessary and should be removed in Kubernetes 1.6 and | |
# earlier versions, or it will cause the metrics to be scraped twice. | |
- job_name: 'kubernetes-cadvisor' | |
# Default to scraping over https. If required, just disable this or change to | |
# `http`. | |
scheme: https | |
# This TLS & bearer token file config is used to connect to the actual scrape | |
# endpoints for cluster components. This is separate to discovery auth | |
# configuration because discovery & scraping are two separate concerns in | |
# Prometheus. The discovery auth config is automatic if Prometheus runs inside | |
# the cluster. Otherwise, more config options have to be provided within the | |
# <kubernetes_sd_config>. | |
tls_config: | |
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt | |
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token | |
kubernetes_sd_configs: | |
- role: node | |
relabel_configs: | |
- action: labelmap | |
regex: __meta_kubernetes_node_label_(.+) | |
- target_label: __address__ | |
replacement: kubernetes.default.svc:443 | |
- source_labels: [__meta_kubernetes_node_name] | |
regex: (.+) | |
target_label: __metrics_path__ | |
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor | |
# Scrape config for service endpoints. | |
# | |
# The relabeling allows the actual service scrape endpoint to be configured | |
# via the following annotations: | |
# | |
# * `prometheus.io/scrape`: Only scrape services that have a value of `true` | |
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need | |
# to set this to `https` & most likely set the `tls_config` of the scrape config. | |
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this. | |
# * `prometheus.io/port`: If the metrics are exposed on a different port to the | |
# service then set this appropriately. | |
- job_name: 'kubernetes-service-endpoints' | |
kubernetes_sd_configs: | |
- role: endpoints | |
relabel_configs: | |
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] | |
action: keep | |
regex: true | |
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] | |
action: replace | |
target_label: __scheme__ | |
regex: (https?) | |
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] | |
action: replace | |
target_label: __metrics_path__ | |
regex: (.+) | |
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] | |
action: replace | |
target_label: __address__ | |
regex: ([^:]+)(?::\d+)?;(\d+) | |
replacement: $1:$2 | |
- action: labelmap | |
regex: __meta_kubernetes_service_label_(.+) | |
- source_labels: [__meta_kubernetes_namespace] | |
action: replace | |
target_label: kubernetes_namespace | |
- source_labels: [__meta_kubernetes_service_name] | |
action: replace | |
target_label: kubernetes_name | |
# Example scrape config for probing services via the Blackbox Exporter. | |
# | |
# The relabeling allows the actual service scrape endpoint to be configured | |
# via the following annotations: | |
# | |
# * `prometheus.io/probe`: Only probe services that have a value of `true` | |
- job_name: 'kubernetes-services' | |
metrics_path: /probe | |
params: | |
module: [http_2xx] | |
kubernetes_sd_configs: | |
- role: service | |
relabel_configs: | |
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] | |
action: keep | |
regex: true | |
- source_labels: [__address__] | |
target_label: __param_target | |
- target_label: __address__ | |
replacement: blackbox | |
- source_labels: [__param_target] | |
target_label: instance | |
- action: labelmap | |
regex: __meta_kubernetes_service_label_(.+) | |
- source_labels: [__meta_kubernetes_namespace] | |
target_label: kubernetes_namespace | |
- source_labels: [__meta_kubernetes_service_name] | |
target_label: kubernetes_name | |
# Example scrape config for pods | |
# | |
# The relabeling allows the actual pod scrape endpoint to be configured via the | |
# following annotations: | |
# | |
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true` | |
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this. | |
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the | |
# pod's declared ports (default is a port-free target if none are declared). | |
- job_name: 'kubernetes-pods' | |
kubernetes_sd_configs: | |
- role: pod | |
relabel_configs: | |
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] | |
action: keep | |
regex: true | |
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] | |
action: replace | |
target_label: __metrics_path__ | |
regex: (.+) | |
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] | |
action: replace | |
regex: ([^:]+)(?::\d+)?;(\d+) | |
replacement: $1:$2 | |
target_label: __address__ | |
- action: labelmap | |
regex: __meta_kubernetes_pod_label_(.+) | |
- source_labels: [__meta_kubernetes_namespace] | |
action: replace | |
target_label: kubernetes_namespace | |
- source_labels: [__meta_kubernetes_pod_name] | |
action: replace | |
target_label: kubernetes_pod_name |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment