so0k/datadog-support.md

## datadog-support.md

      
    Raw
  

              datadog-support.md
            
          
    my Helmfile definition is
context: taurus-stage.kube.swatmobile.io

releases:
- name: datadog
  namespace: kube-system
  chart: stable/datadog
  version: 0.11.2
  values:
  - datadog/values.yaml
  secrets:
  - datadog/secrets.yaml 
datadog/secrets.yaml contains the api keys, so I'm not putting it here
datadog/values.yaml contains:
image:
  repository: datadog/agent               # Agent6
  tag: 6.3.0                              # Use 6.3.0-jmx to enable jmx fetch collection
  pullPolicy: IfNotPresent

daemonset:
  enabled: true
  updateStrategy: RollingUpdate
  tolerations:
  - key: "node-role.kubernetes.io/master"
    effect: NoSchedule

deployment:
  enabled: true
  # datadog.collectEvents requires datadog.leaderElection enabled (which also ensures proper RBAC)
  replicas: 1

kubeStateMetrics:
  enabled: false                           # deployed separately to decouple dependency
                                           # also, chart service-name is non-default and cumbersome

datadog:
  name: datadog-agent
  nonLocalTraffic: true
  apmEnabled: true                         # APM also known as trace agent
  leaderElection: true
  collectEvents: true                    

  ## All datadog configuration: https://sourcegraph.com/github.com/DataDog/datadog-agent@6.2.0/-/blob/pkg/config/config.go#L60:2
  env:
  - name: DD_CHECK_RUNNERS                 # Agent6: default 1, increase if collector_queue fails health checks due to high number of checks
    value: "1"              
  - name: DD_KUBERNETES_POD_LABELS_AS_TAGS # Agent6: needs you to whilelist relevant labels
    value: '{"app":"helm_app","release":"helm_release","component":"helm_component","k8s-app":"k8s-app","chart":"helm_chart","heritage":"helm_heritage"}'
  - name: DD_KUBERNETES_NODE_LABELS_AS_TAGS
    value: '{"kubernetes.io/hostname":"node_name","beta.kubernetes.io/os":"node_os","beta.kubernetes.io/instance-type":"node_type","kubernetes.io/role":"node_role","failure-domain.beta.kubernetes.io/region":"node_region","failure-domain.beta.kubernetes.io/zone":"node_zone"}'
  - name: DD_PROCESS_AGENT_ENABLED         # Agent6: process monitoring - https://docs.datadoghq.com/graphing/infrastructure/process/
    value: "true"
  - name: DD_LOGS_ENABLED                  # Agent6: enable DataDog logs
    value: "true"
  - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
    value: "true"

  ## required for process monitoring
  volumes:
  - hostPath:
      path: /etc/passwd
    name: passwd
  - hostPath:
      path: /opt/datadog-agent/run  # Logs: stores the last line collected for each container 
    name: opt-ddog-run
  volumeMounts:
  - name: passwd
    mountPath: /etc/passwd
    readOnly: true
  - name: opt-ddog-run
    mountPath: /opt/datadog-agent/run
    readOnly: false

  tags: "cluster:taurus-staging, cluster_group:staging"
  
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 200m
      memory: 256Mi
More details about the Helm chart can be found here: https://hub.kubeapps.com/charts/stable/datadog
I'm not using the latest chart version, but I went to the diff and there's nothing new in the chart which I'm not doing already with current version
I think to only get the logs for a limited stack (we refer to this stack as "panama") I'd just need to add this to vaules:
datadog:
  ...
  # ref: https://github.com/DataDog/docker-dd-agent#configuration-files
  # ref: https://docs.datadoghq.com/logs/log_collection/docker/#option-1-configuration-file
  # ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
  confd:
    panama-dev.yaml: |-
      init_config:
      instances: [{}]

      logs:
      - type: docker
        label: release:panama-dev # this is a k8s annotation, not a docker label...
        source: panama
        service: panama

  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 200m
      memory: 256Mi

and set DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL back to false