Skip to content

Instantly share code, notes, and snippets.

@shpwrck
Last active September 19, 2022 13:58
Show Gist options
  • Save shpwrck/ca70d86ef9700ffce96d53183d25f34d to your computer and use it in GitHub Desktop.
Save shpwrck/ca70d86ef9700ffce96d53183d25f34d to your computer and use it in GitHub Desktop.

Datadog & Gloo Mesh Options

Option 1 - Leverage Envoy/Istio Integrations

Explanation

Datadog offers integrations at a lower price point, but with preselected metrics and dashboards.

Sample Configs

Helm Values

datadog:
  clusterChecks:
    enabled: true
  logs:
    containerCollectAll: true
    enabled: true
  processAgent:
    processCollection: true
  networkMonitoring:
    enabled: true
  apiKey: <<INSERT_APIKEY>>
  appKey: <<INSERT_APPKEY>>
  clusterName: <<INSERT_CLUSTERNAME>>
clusterAgent:
  replicas: 2
  createPodDisruptionBudget: "true"

Example Annotation

istio:

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/discovery.checks: |
          {
            "istio": {
              "init_config": {},
              "instances": [
                {
                  "istiod_endpoint": "http://%%host%%:15014/metrics"
                }
              ]
            }
          }

istio-proxy:

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/istio-proxy.checks: |
          {
            "envoy": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:15090/stats/prometheus"
                }
              ]
            }
          }

The above can be applied to istio workload using the following command:

kubectl patch -n istio-system deployment ${NAME_OF_ISTIO_DEPLOYMENT} --patch-file ${NAME_OF_PATCH_FILE} --context ${CLUSTER_CONTEXT}

Notes

  • This will not gather metrics from the Gloo Mesh Components
  • The above is for the Istio Control Plane, the envoy configuration will still be used for sidecars.

Option 2 - Leverage OpenMetrics In Each Cluster

Explanation

OpenMetrics can be configured to leverage pre-existing prometheus endpoints on any workload. There is a limit of 2000 metrics.

Sample Configs

Helm Values

datadog:
  clusterChecks:
    enabled: true
  logs:
    containerCollectAll: true
    enabled: true
  processAgent:
    processCollection: true
  networkMonitoring:
    enabled: true
  apiKey: "INSERT_KEY"
  appKey: "INSERT_APPKEY"
  clusterName: "alvin"
  ignoreAutoConfig:
  - istio
  - redisdb
    #  prometheusScrape:   <----Use with caution (will trigger a lot of metrics)
    #    enabled: true
    #    serviceEndpoints: true
    #    additionalConfigs:
    #    - autodiscovery:
    #        kubernetes_annotations:
    #          exclude:
    #            datadog: disabled
clusterAgent:
  replicas: 2
  createPodDisruptionBudget: "true"

Sample Annotations

istio-d:

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/discovery.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:15014/metrics",
                    "namespace": "istio-system",
                    "metrics": ["pilot.*"]
                  }
                ]
              }
            }

gloo-mesh-agent:

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/gloo-mesh-agent.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:9091/metrics",
                    "namespace": "gloo-mesh",
                    "metrics": [".*_err"]
                  }
                ]
              }
            }

gloo-mesh-mgmt:

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/gloo-mesh-mgmt-server.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:9091/metrics",
                    "namespace": "gloo-mesh",
                    "metrics": ["cluster.*"]
                  }
                ]
              }
            }

workload:

spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/istio-proxy.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:%%port%%/stats/prometheus",
                    "namespace": "default",
                    "metrics": ["istio_requests_total"]
                  }
                ]
              }
            }

If you'd like to simplify the workload observability, you can modify the istio installation by adding the following values:

...
sidecarInjectorWebhook:
  injectedAnnotations:
     ad.datadoghq.com/istio-proxy.checks: '{\n  \"openmetrics\": {\n    \"init_config\":      {},\n    \"instances\": [\n      {\n        \"openmetrics_endpoint\": \"http://%%host%%:%%port%%/stats/prometheus\",\n      \       \"namespace\": \"default\",\n        \"metrics\": [\"envoy*\"]\n      \     }\n    ]\n  }\n}              \n'
...

## To generate the format used above:
# 1. Patch the workload with the patchfile above.
# 2. Replace the first and last double-quote (") with single quotes (')

Notes

  • Can allow for automatic discovery, but datadog doesn't always choose the right information
  • The patches include examples of metrics for illustration, not for production use.
  • Leverage the patch command from option1 with different variables to apply these examples.
  • The exclude annotation does not behave as expected.

Option 3 - Leverage OpenMetrics In Management Cluster

Explanation

Workload clusters attached to a Gloo Mesh management plane can be configured to forward metrics to the management service.

Sample Configs

Use helm chart and gloo-mesh-mgmt example from above.

Notes

  • Less to configure
  • Does introduce an additional step in the metric data flow.

Option 3-bonus

Explanation

A standard Gloo Mesh install includes a prometheus server. To reduce the cardinality of metrics being sent off cluster, you can leverage this server as a federation endpoint.

Example Guide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment