joelonsql/flashback_logging.md

## flashback_logging.md

      
    Raw
  

              flashback_logging.md
            
          
    Flashback Logging

NOTE: Post-writing, I discovered a similar concept named Ring Buffer Logging. What sets Flashback Logging apart is its approach to handling the buffer. When a log event matches the standard log level threshold, such as 'INFO', the buffer is cleared. This behavior is based on the rationale that such a log event indicates the system's return to normal operations, eliminating the need to keep previously buffered entries.
Introduction

Flashback Logging is a design pattern in the logging and debugging domain. Its primary goal
is to reduce the time spent on debugging by making verbose logs readily available
during error occurrences without the storage overhead of always keeping verbose logs.
Problem Statement

Developers often face a dilemma:

Maintain verbose logs continuously, increasing storage overhead and potentially
impacting performance.
Log only errors to be efficient, but then struggle to diagnose issues due to the
lack of context.

The Solution: Flashback Logging

Flashback Logging intelligently buffers debug-level log entries. The decision to either preserve or discard these buffered entries depends on the severity levels of logs that follow.


Matching the Severity Threshold:
When a log hits the defined baseline severity:

The buffer is purged, indicating a return to standard operational status.


Exceeding or Meeting the Severity Threshold:
Upon a log event that meets or surpasses the predefined severity threshold:

The buffered debug logs are saved to persistent storage, providing valuable context to the moments leading up to the notable event.


Falling Below the Severity Threshold:
If a log event's severity doesn't reach the baseline threshold:

It quietly joins the buffer, ready to provide context if later events deem it necessary.


Example

Without Flashback Logging:

In this scenario, we're only logging operations at [INFO] and [ERROR] levels, without the detailed [DEBUG] logs.
[INFO]  2023-08-17 08:45:02 - Started processing files.
[INFO]  2023-08-17 08:45:10 - File 'data1.txt' successfully processed.
[INFO]  2023-08-17 08:45:10 - File 'data2.txt' successfully processed.
[ERROR] 2023-08-17 08:45:15 - Permission denied.

From this log, you can deduce that something went wrong after processing 'data2.txt', but the exact context is missing.
With Flashback Logging:

In contrast, with Flashback Logging, the [DEBUG] logs are retained, offering a detailed pre-error context.
[INFO]  2023-08-17 08:45:02 - Started processing files.
[INFO]  2023-08-17 08:45:10 - File 'data1.txt' successfully processed.
[INFO]  2023-08-17 08:45:10 - File 'data2.txt' successfully processed.
[DEBUG] 2023-08-17 08:45:09 - Opening 'data3.txt'.
[ERROR] 2023-08-17 08:45:15 - Permission denied.

From the [DEBUG] logs, it's clear that the "Permission denied" error occurred when the system tried to open 'data3.txt'.
This comparison illustrates the benefits of the Flashback Logging pattern, as it allows developers to quickly understand the events leading up to an error.
Benefits


Immediate Debug Context: When an error occurs, developers have the necessary
verbose logs at their fingertips.
Storage Efficiency: By not persisting all debug logs, storage overhead is
significantly reduced.
Performance: Buffering allows for more efficient logging, reducing the I/O
operations compared to persistent logging.

Trade-offs


Memory Usage: Buffered logs consume memory. The size of the buffer should be
judiciously chosen.
Limited History: Only events leading up to the error are captured, which might
omit valuable context from earlier events.

Implementation Example

In Rust, a mechanism could be implemented using the tracing framework.
pub struct LogLayer {
    log_handler: Mutex<Box<dyn LogHandler + Send>>,
    sequence_number: AtomicI64,
    buffered_events: Mutex<VecDeque<LogEntry>>,
    log_level: tracing::Level,
}

impl<S> Layer<S> for LogLayer
where
    S: Subscriber + for<'a> tracing_subscriber::registry::LookupSpan<'a>,
{
    fn on_event(&self, event: &Event<'_>, _ctx: Context<'_, S>) {
        const RING_BUFFER_CAPACITY: usize = 1000;
        let logged_at = time::OffsetDateTime::now_utc();
        let event_level = event.metadata().level();
        let mut message = String::new();
        let mut extra = HashMap::new();
        let mut visitor = RecordVisitor::new(&mut message, &mut extra);
        event.record(&mut visitor);
        let extra_value = serde_json::to_value(&extra).expect("Failed to serialize the HashMap");
        let entry = LogEntry {
            logged_at,
            severity: event_level.to_string(),
            message,
            extra: extra_value,
        };
        let mut buffered_events_lock = self.buffered_events.lock().unwrap();
        // If the current event's level matches the set log level, clear the buffer.
        // This indicates that we're back to a "normal" state.
        // E.g. INFO == INFO, so we clear the buffer.
        if *event_level == self.log_level {
            buffered_events_lock.clear();
        }
        // If the current event's level is less than or equal to the set log level:
        // - Flush the buffered entries to the log.
        // - Also log the current event.
        // E.g. ERROR <= INFO, so we log the buffered entries and the current event.
        if *event_level <= self.log_level {
            for buffered_entry in buffered_events_lock.drain(..) {
                self.log_entry(buffered_entry);
            }
            self.log_entry(entry);
        }
        // If the current event's level is above the set log level, buffer the event.
        // These are the debug-level logs we might need for context later.
        // E.g. DEBUG
        else {
            if buffered_events_lock.len() >= RING_BUFFER_CAPACITY {
                buffered_events_lock.pop_front();
            }
            buffered_events_lock.push_back(entry);
        }
    }
}
Real-World Use Cases


Microservices Architecture: In complex, distributed systems, understanding the
events leading up to an error is pivotal.
Performance-Critical Applications: Where storage I/O might be a bottleneck,
buffering and selectively persisting logs can make a significant difference.

By adopting the Flashback Logging pattern, developers can enjoy the best of both worlds:
immediate context for errors and efficient storage usage.