shpwrck/DATADOG-README.md

## DATADOG-README.md

      
    Raw
  

              DATADOG-README.md
            
          
    Datadog & Gloo Mesh Options

Option 1 - Leverage Envoy/Istio Integrations

Explanation

Datadog offers integrations at a lower price point, but with preselected metrics and dashboards.
Sample Configs

Helm Values

datadog:
  clusterChecks:
    enabled: true
  logs:
    containerCollectAll: true
    enabled: true
  processAgent:
    processCollection: true
  networkMonitoring:
    enabled: true
  apiKey: <<INSERT_APIKEY>>
  appKey: <<INSERT_APPKEY>>
  clusterName: <<INSERT_CLUSTERNAME>>
clusterAgent:
  replicas: 2
  createPodDisruptionBudget: "true"

Example Annotation

istio:
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/discovery.checks: |
          {
            "istio": {
              "init_config": {},
              "instances": [
                {
                  "istiod_endpoint": "http://%%host%%:15014/metrics"
                }
              ]
            }
          }

istio-proxy:
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/istio-proxy.checks: |
          {
            "envoy": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:15090/stats/prometheus"
                }
              ]
            }
          }

The above can be applied to istio workload using the following command:
kubectl patch -n istio-system deployment ${NAME_OF_ISTIO_DEPLOYMENT} --patch-file ${NAME_OF_PATCH_FILE} --context ${CLUSTER_CONTEXT}

Notes


This will not gather metrics from the Gloo Mesh Components
The above is for the Istio Control Plane, the envoy configuration will still be used for sidecars.

Option 2 - Leverage OpenMetrics In Each Cluster

Explanation

OpenMetrics can be configured to leverage pre-existing prometheus endpoints on any workload.
There is a limit of 2000 metrics.
Sample Configs

Helm Values

datadog:
  clusterChecks:
    enabled: true
  logs:
    containerCollectAll: true
    enabled: true
  processAgent:
    processCollection: true
  networkMonitoring:
    enabled: true
  apiKey: "INSERT_KEY"
  appKey: "INSERT_APPKEY"
  clusterName: "alvin"
  ignoreAutoConfig:
  - istio
  - redisdb
    #  prometheusScrape:   <----Use with caution (will trigger a lot of metrics)
    #    enabled: true
    #    serviceEndpoints: true
    #    additionalConfigs:
    #    - autodiscovery:
    #        kubernetes_annotations:
    #          exclude:
    #            datadog: disabled
clusterAgent:
  replicas: 2
  createPodDisruptionBudget: "true"

Sample Annotations

istio-d:
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/discovery.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:15014/metrics",
                    "namespace": "istio-system",
                    "metrics": ["pilot.*"]
                  }
                ]
              }
            }

gloo-mesh-agent:
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/gloo-mesh-agent.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:9091/metrics",
                    "namespace": "gloo-mesh",
                    "metrics": [".*_err"]
                  }
                ]
              }
            }

gloo-mesh-mgmt:
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/gloo-mesh-mgmt-server.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:9091/metrics",
                    "namespace": "gloo-mesh",
                    "metrics": ["cluster.*"]
                  }
                ]
              }
            }

workload:
spec:
  template:
    metadata:
      annotations:
        ad.datadoghq.com/istio-proxy.checks: |
            {
              "openmetrics": {
                "init_config": {},
                "instances": [
                  {
                    "openmetrics_endpoint": "http://%%host%%:%%port%%/stats/prometheus",
                    "namespace": "default",
                    "metrics": ["istio_requests_total"]
                  }
                ]
              }
            }

If you'd like to simplify the workload observability, you can modify the istio installation by adding the following values:
...
sidecarInjectorWebhook:
  injectedAnnotations:
     ad.datadoghq.com/istio-proxy.checks: '{\n  \"openmetrics\": {\n    \"init_config\":      {},\n    \"instances\": [\n      {\n        \"openmetrics_endpoint\": \"http://%%host%%:%%port%%/stats/prometheus\",\n      \       \"namespace\": \"default\",\n        \"metrics\": [\"envoy*\"]\n      \     }\n    ]\n  }\n}              \n'
...

## To generate the format used above:
# 1. Patch the workload with the patchfile above.
# 2. Replace the first and last double-quote (") with single quotes (')

Notes


Can allow for automatic discovery, but datadog doesn't always choose the right information
The patches include examples of metrics for illustration, not for production use.
Leverage the patch command from option1 with different variables to apply these examples.
The exclude annotation does not behave as expected.

Option 3 - Leverage OpenMetrics In Management Cluster

Explanation

Workload clusters attached to a Gloo Mesh management plane can be configured to forward metrics to the management service.
Sample Configs

Use helm chart and gloo-mesh-mgmt example from above.
Notes


Less to configure
Does introduce an additional step in the metric data flow.

Option 3-bonus

Explanation

A standard Gloo Mesh install includes a prometheus server. To reduce the cardinality of metrics being sent off cluster, you can leverage this server as a federation endpoint.
Example Guide