dPacc/log_trace_metrics.md

## log_trace_metrics.md

      
    Raw
  

              log_trace_metrics.md
            
          
    Logging vs Tracing vs Metrics


Logging, Tracing, Metrics

Logging, tracing, and metrics are 3 pillars of system observability.
Logging

Logging records discrete events in the system. For example, we can record an incoming request or a visit to databases as events. It has the highest volume. ELK (Elastic-Logstash-Kibana) stack is often used to build a log analysis platform. We often define a standardized logging format for different teams to implement, so that we can leverage keywords when searching among massive amounts of logs.
Explanation:

Logging involves recording specific events within a system, such as user requests or database interactions. These logs are voluminous and need to be managed efficiently. The ELK stack, comprising Elastic Search, Logstash, and Kibana, is commonly used to collect, process, and visualize these logs. Standardized formats for logs help in simplifying search and analysis across different teams.
Tracing

Tracing is usually request-scoped. For example, a user request goes through the API gateway, load balancer, service A, service B, and database, which can be visualized in the tracing systems. This is useful when we are trying to identify the bottlenecks in the system. We use OpenTelemetry to showcase the typical architecture, which unifies the 3 pillars in a single framework.
Explanation:

Tracing tracks the journey of a request across various services and components within an application, providing visibility into its path and performance at each stage. This helps in identifying and resolving bottlenecks. OpenTelemetry is a tool that facilitates tracing by integrating it with logging and metrics, thereby offering a holistic view of system performance.
Metrics

Metrics are usually aggregatable information from the system. For example, service QPS, API responsiveness, service latency, etc. The raw data is recorded in time-series databases like InfluxDB. Prometheus pulls the data and transforms the data based on pre-defined alerting rules. Then the data is sent to Grafana for display or to the alert manager which then sends out email, SMS, or Slack notifications or alerts.
Explanation:

Metrics provide quantitative data about the system's performance, such as queries per second, API responsiveness, and service latency. Tools like InfluxDB store this time-series data, while Prometheus is used to collect, process, and create alerts based on the metrics. Grafana is then used for visualization, and alerts can be sent through various channels like email, SMS, or Slack to notify of important or critical events.