Summary
Distributed tracing is an observability technique that records the path of a request through a distributed system, attaching timing and contextual metadata at each service hop to reconstruct the full execution path.
What is Distributed Tracing?
In a microservices architecture, a single user action may traverse dozens of services. When something goes wrong or performance degrades, pinpointing the cause without tracing is nearly impossible. Distributed tracing solves this by injecting a unique trace ID into each request and propagating it across service calls via HTTP headers or messaging metadata.
Each service creates a span — a timed unit of work — and reports it to a tracing backend. The backend assembles spans into a trace: a visual timeline showing which services were called, in what order, and how long each took. This makes it trivial to spot which service introduced latency or threw an error.
Standards like OpenTelemetry, OpenTracing, and OpenCensus define how context is propagated, while tools like Jaeger, Zipkin, Honeycomb, and Datadog APM provide storage and visualization.
Why is Distributed Tracing relevant?
- Root-cause isolation: Identifies exactly which service or database call caused a slowdown
- Dependency mapping: Reveals unexpected service dependencies in complex systems
- SLO debugging: Pinpoints which operations violate latency budgets
- Incident response: Reduces mean time to resolution by providing full request context