Skip to content

Instantly share code, notes, and snippets.

@lmolkova
Last active June 3, 2024 21:39
Show Gist options
  • Save lmolkova/e062e2de2cf642a5844c42163c144a37 to your computer and use it in GitHub Desktop.
Save lmolkova/e062e2de2cf642a5844c42163c144a37 to your computer and use it in GitHub Desktop.
Writing log-based event with Python logging API

Prereqs

  • opentelemetry-sdk, azure-monitor-opentelemetry-exporter
  • set APPLICATIONINSIGHTS_CONNECTION_STRING (or remove AzMon exporter)

Result

image
{
    "name": "completions gpt-4",
    "context": {
        "trace_id": "0x6194e8840c82d3d4976b004835d78696",
        "span_id": "0x988bf35e7dd5ba02",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": null,
    "start_time": "2024-05-31T20:46:32.954871Z",
    "end_time": "2024-05-31T20:46:32.954871Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "gen_ai.system": "openai"
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.25.0",
            "service.name": "unknown_service"
        },
        "schema_url": ""
    }
}
{
    "body": {
        "input": "foo",
        "output": "bar"
    },
    "severity_number": "<SeverityNumber.UNSPECIFIED: 0>",
    "severity_text": null,
    "attributes": {
        "event.name": "gen_ai.evaluation",
        "gen_ai.evaluation.status": "contains_apology",
        "gen_ai.evaluation.score": 42
    },
    "dropped_attributes": 0,
    "timestamp": "2024-05-31T20:46:33.137853Z",
    "observed_timestamp": "2024-05-31T20:46:33.137853Z",
    "trace_id": "0x6194e8840c82d3d4976b004835d78696",
    "span_id": "0x988bf35e7dd5ba02",
    "trace_flags": 1,
    "resource": ""
}
...
from time import time_ns
import typing
import opentelemetry
from opentelemetry import trace, _logs # _log is unfortunate hack that will eventually be resolved on OTel side with new Event API
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._logs.export import SimpleLogRecordProcessor, ConsoleLogExporter
from azure.monitor.opentelemetry.exporter import AzureMonitorLogExporter, AzureMonitorTraceExporter
def configure_tracing() -> TracerProvider:
provider = TracerProvider()
trace.set_tracer_provider(provider)
#trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317")))
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
provider.add_span_processor(SimpleSpanProcessor(AzureMonitorTraceExporter()))
return provider
def configure_logging() -> LoggerProvider:
provider = LoggerProvider()
_logs.set_logger_provider(provider)
#logger_provider.add_log_record_processor(SimpleLogRecordProcessor(OTLPLogExporter()))
provider.add_log_record_processor(SimpleLogRecordProcessor(ConsoleLogExporter()))
provider.add_log_record_processor(SimpleLogRecordProcessor(AzureMonitorLogExporter()))
return provider
def create_evaluation_event(status: str, score: typing.Any, span_context: trace.SpanContext, body) -> opentelemetry.sdk._logs.LogRecord:
return opentelemetry.sdk._logs.LogRecord(
timestamp=time_ns(),
observed_timestamp=time_ns(),
trace_id=span_context.trace_id,
span_id=span_context.span_id,
trace_flags=span_context.trace_flags,
severity_text=None,
severity_number=_logs.SeverityNumber.UNSPECIFIED,
body=body,
attributes={"event.name": "gen_ai.evaluation", "gen_ai.evaluation.status": status, "gen_ai.evaluation.score": score },
)
tracer_provider = configure_tracing()
logging_provider = configure_logging()
tracer = tracer_provider.get_tracer("otel-instrumentation-openai")
promptflow_logger = logging_provider.get_logger("promptflow")
span_context=None
with tracer.start_as_current_span("completions gpt-4") as span:
span.set_attribute("gen_ai.system", "openai")
span_context = span.get_span_context()
# there will be special Event API to do it properly, currently we have to use experimental _logs API :(
promptflow_logger.emit(create_evaluation_event("contains_apology", 42, span_context, {"input": "foo", "output": "bar"}))
promptflow_logger.emit(create_evaluation_event("hallucination", 0.42, span_context, {"input": "foo", "output": "bar" * 4000}))
@lmolkova
Copy link
Author

lmolkova commented Jun 3, 2024

Some additional thoughts:

  1. input/output are already available on the parent gen-ai span. Recording them again on the evaluation event would be a duplication.

    I guess they should be configurable (opt-in):

    • user can opt into content on all gen_ai spans and then they don't need content on evaluations and won't enable it.
    • they can only opt-into content on evaluations
    • they can opt it/out everywhere as well
  2. When thinking about evaluation as a metric, it seems to be too broad.

    • different evaluations have different results: one is binary (has apology/does not have apology), another one is a double score [0, 1] or int score [0....♾️]
    • we won't be able to report them as a single metric.

    I.e. we can do this:
    When evaluation is reported as event, it's reported with

    • gen_ai.evaluation.* prefix. Can be gen_ai.evaluation.contains_apology, gen_ai.evaluation.groundedness, etc
    • we can document a minimal set of attributes (if any), payload, etc
    • each event name/structure should be documented. We can have different structure for different events in addition to common (if any) things

    When evaluation is reported as metric, it's reported as

    • gen_ai.evaluation.* metric. gen_ai.evaluation.contains_apology is just a counter, gen_ai.evaluation.groundedness is a histogram, etc
    • similarly to events, these metrics may have similar attributes
    • they are documented individually - they have different units and instrument types

    This way we keep different signals consistent, but also it's easy to find all things related to evaluation by matching events/metrics which names start with gen_ai.evaluation.*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment