Skip to content

Instantly share code, notes, and snippets.

@nelsonsilva
Last active January 19, 2021 12:23
Show Gist options
  • Save nelsonsilva/5c18c50859d2572891dff89a7e714542 to your computer and use it in GitHub Desktop.
Save nelsonsilva/5c18c50859d2572891dff89a7e714542 to your computer and use it in GitHub Desktop.
Metrics dashboards

Analytics

Status

Simple online analytics built on top of our ES HTTP passthrough.

Nuxeo Data Visualization elements to expose a simple declarative DSL, ie:

  <nuxeo-repository-data ecm-primary-type="Note"
                         where='[{"range": {"dc:created": {"gte": startDate, "lte": endDate}}}]'
                         grouped-by="dc:creator"
                         data="{{data}}">
  </nuxeo-repository-data>
  • nuxeo-repository-data
  • nuxeo-audit-data
  • nuxeo-search-data (from audit)
  • nuxeo-workflow-data (from audit_wf)

Problem: Querying the application's ES cluster can be a big performance hit

Possible solutions

  • Online analytics solution
  • Dedicated ES index
  • Precompute the statistics (no requirement for realtime stats)

Current plan

It's all about observability : Metrics + Events + Logs + Traces

Metrics

We already have nuxeo-metrics (using Dropwizard's Metrics) and several reporters (Datadog, Graphite, JMX, Prometheus, Stackdriver, ...)

Multiple immediate usages (Analytics, APM, Billing, etc...)

Instrumentation

  • Compute metrics asynchronously => they should be low overhead to collect
  • Store then in a KV store => cached and available to the cluster
  • Can have multiple computations at different levels (application / low)

Sample metrics

  • Repository metrics (VCS/DBS, ES, PGSQL/MongoDB/...):

    • Total number of documents
    • Total number of deleted documents
  • Storage metrics (Binary manager, GCS/S3/...):

    • Total GB
    • Total GB deleted
  • Workload metrics (Workmanager, K8s, ...):

    • "Processed" documents
    • "Processed" GB
    • Worker pool size, CPU usage, etc..
  • Usage metrics (Tomcat, Apache, Ingress, ...):

    • Number of active users
    • Inbound / outbound traffic

Telemetry

Telemetry refers to the collection of metrics over time, so usually using a Time-series DB.

  • Graphite, Prometheus, AWS Cloud Watch, Google Cloud Monitoring, etc..

Visualization

Need a holistic solution that allows fetching and visualizing the data as well as building custom dashboards.

Need to embed dashboards in Web UI.

  • Grafana: shareable dashboard and panels + REST API

TL;DR

Metrics

Metric Dimensions Panel
Number of documents Time (dc:created, dc:modified) Number of documents (total, per week, etc..)
Doctype (ecm:primaryType) Document count per type
Creator (dc:creator) Top creators
Mimetype (file:content.mime-type) Files by mime-type

Analytics

Event based analytics

Audit Log

Name Event
Top downloads download (downloadReason = 'download' )
search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment