Skip to content

Instantly share code, notes, and snippets.

@smith
Last active September 23, 2022 13:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save smith/94cdcf988b60cb0108f63eed1ec89fac to your computer and use it in GitHub Desktop.
Save smith/94cdcf988b60cb0108f63eed1ec89fac to your computer and use it in GitHub Desktop.
OpenTelemetry Demo with Elastic notes

OpenTelemetry Demo - Elastic Style

In order to ensure Elastic Observability works the best with OpenTelemetry, we should run, maintain, and contribute to a fork of the OpenTelemetry demo.

This demo can be run with Docker Compose or on Kubernetes with the OpenTelemetry Demo Helm Chart.

Exploration steps

I did this a few months ago (see notes below) but am taking another crack at it in the hopes that we can use it to improve our product. It's also a good place to add the OpenTelmetry Host Metrics Receiver to assist us in making all of our apps show metrics collected by this method.

I'm imagining a working setup would:

  • Set up the collector to send OTLP logs, traces, and metrics to a configured APM Server URL
  • Have the host metrics receiver enabled
  • Disable (optionally?) the other observability backends (Jaeger, Prometheus, etc.)

Host metrics receiver

We know the host metrics receiver won't actually work yet, see https://github.com/elastic/observability-dev/issues/2293. The metrics will end up in metrics-*, but they're using otel semantic conventions and not ECS so our inventory/host/APM? UIs don't work with them. We'll turn it on anyway and be able to get the documents and verify when it works properly.

Kubernetes

I've never used helm and I've enabled k8s in my Docker Desktop but I'll probably skip this for now, learn to use Kubernetes with Docker Desktop at a later time, then check out the helm stuff. If we can get the robots team to put this on the dev clusters then I can defer learning k8s even longer.

Docker Compose

Follow the instructions worked.

Collector modifications

Going to modify src/otelcollector/otelcol-config.yml to see if we can send to a backend.

Following the docs for that here: https://www.elastic.co/guide/en/apm/guide/current/open-telemetry.html#connect-open-telemetry-collector

Added the following to src/otelcollector/otelcol-config-extras.yml:

exporters:
  otlp/elastic:
    # Elastic APM server https endpoint without the "https://" prefix
    #endpoint: "${ELASTIC_APM_SERVER_ENDPOINT}"
    endpoint: hosts-view.apm.us-west2.gcp.elastic-cloud.com:443
    headers:
      # Elastic APM Server secret token
      #Authorization: "Bearer ${ELASTIC_APM_SERVER_TOKEN}"
      Authorization: "Bearer XXX"

service:
  pipelines:
    traces:
      exporters: [logging, otlp/elastic]
    metrics:
      exporters: [logging, otlp/elastic]

I got the endpoint from the cloud UI, and had to dig into the APM integration settings to find the token (this used to be in the cloud UI, but is no longer there. Casper mentioned this in an issue recently that I have to dig up)

I've currently got the values hard-coded into the otel col extras config...

and it's totally working and I've got all the data in APM.

I wanted to see if any of the hosts show up in the System integration's dashboards, but I can't install the system integration without it telling me to install Elastic Agent on a host. That's kind of a bummer if I just want the dashboards, etc.

So, left to is:

  • Enable host metrics receiver
  • Make it so I can set the env vars for the APM server endpoint and token

The Instances table in APM shows up as "(Empty)". I wonder what it will look like if host metrics is enabled.

Logs show up in logs-* and the logs UI but I don't think they show up in APM because whatever otel is sending doesn't have the trace id, maybe?

Copilot says...

I also had to add the following to the docker-compose.yml file:

    environment:
      ELASTIC_APM_SERVER_ENDPOINT: hosts-view.apm.us-west2.gcp.elastic-cloud.com:443
      ELASTIC_APM_SERVER_TOKEN: XXX

Host metrics receiver

Adding this gets me metrics:

receivers:
  hostmetrics:
    scrapers:
      cpu:
      disk:
      load:
      filesystem:
      memory:
      network:
      paging:
      processes:
      #process:

When process is enabled it fails. There was an error about /etc/passwd not being found so it must be a problem with running in a container.

Here's an example document, where the metric is system.paging.usage:

{
  "_index": ".ds-metrics-apm.app.unknown-default-2022.09.23-000001",
  "_id": "6g63aIMBY0xCB-tphK3_",
  "_version": 1,
  "_score": 0,
  "_source": {
    "agent": {
      "name": "otlp",
      "version": "unknown"
    },
    "data_stream.namespace": "default",
    "system.paging.usage": 16121856,
    "processor": {
      "name": "metric",
      "event": "metric"
    },
    "data_stream.type": "metrics",
    "labels": {
      "state": "used",
      "device": "/swap"
    },
    "metricset.name": "app",
    "observer": {
      "hostname": "12e5c3b74e03",
      "id": "b3b2b54d-bf28-4f50-b372-d7afe23a922e",
      "ephemeral_id": "75af55d5-048c-47c9-84f2-3c92539ce98b",
      "type": "apm-server",
      "version": "8.5.0"
    },
    "@timestamp": "2022-09-23T05:00:00.433Z",
    "ecs": {
      "version": "1.12.0"
    },
    "service": {
      "name": "unknown",
      "language": {
        "name": "unknown"
      }
    },
    "data_stream.dataset": "apm.app.unknown",
    "event": {
      "agent_id_status": "missing",
      "ingested": "2022-09-23T05:00:01Z"
    }
  },
  "fields": {
    "service.name": [
      "unknown"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "system.paging.usage": [
      16121856
    ],
    "processor.name": [
      "metric"
    ],
    "labels.device": [
      "/swap"
    ],
    "labels.state": [
      "used"
    ],
    "data_stream.type": [
      "metrics"
    ],
    "service.language.name": [
      "unknown"
    ],
    "observer.hostname": [
      "12e5c3b74e03"
    ],
    "metricset.name": [
      "app"
    ],
    "observer.id": [
      "b3b2b54d-bf28-4f50-b372-d7afe23a922e"
    ],
    "event.ingested": [
      "2022-09-23T05:00:01.000Z"
    ],
    "@timestamp": [
      "2022-09-23T05:00:00.433Z"
    ],
    "observer.ephemeral_id": [
      "75af55d5-048c-47c9-84f2-3c92539ce98b"
    ],
    "ecs.version": [
      "1.12.0"
    ],
    "observer.version": [
      "8.5.0"
    ],
    "observer.type": [
      "apm-server"
    ],
    "data_stream.dataset": [
      "apm.app.unknown"
    ],
    "processor.event": [
      "metric"
    ],
    "agent.name": [
      "otlp"
    ],
    "agent.version": [
      "unknown"
    ],
    "event.agent_id_status": [
      "missing"
    ]
  }
}

The labels and the metric itself look like what's sent by the receiver, and APM server is filling out the rest.

Logs

Logs from services show up in discover and logs UI under logs-* but are not showing up in APM even though it looks like they have the correct attributes.

Notes

  • Forks by other vendors exist. None of them have made changes to the README.
  • Gist on how to run it from June 2022. This worked by running Elastic Agent on my Mac and running the demo with docker compose. A "proper" implementation would either include the Elastic Agent as a service and configure it in compose/k8s, or just allow it to be configured to send to a cluster's APM server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment