Glossary term
Glossary term
Evaluation and Benchmarks
A timed operation within a trace.
Datadog APM records spans for each OpenAI API call within an agent trace - capturing model name, prompt tokens, completion tokens, latency, and error codes - enabling per-call cost and latency analysis.
OpenTelemetry LLM semantic conventions define standard span attributes (gen_ai.model, gen_ai.usage.input_tokens) - used by Honeycomb customers to query agent performance by model version across millions of spans.
LangSmith's span-level view lets engineers drill into a specific retrieval span within a RAG trace, inspecting the exact query sent to the vector database, the chunks returned, and the similarity scores.