1 Odesa National Polytechnic University (ONPU)
- Type
- Technical Research Series
- Status
- Ongoing · 1+ articles · 2026–
- Tool
- OTel AI Inspector → OpenTelemetry → GitHub
Production AI systems demand observability beyond traditional monitoring frameworks. This series explores comprehensive observability for AI — from OpenTelemetry-based distributed tracing of ML pipelines to LLM token consumption tracking, model drift detection, and production reliability engineering for AI systems. Spanning foundational OpenTelemetry principles, AI-specific instrumentation patterns, and real-world deployment scenarios, the series documents best practices for monitoring and troubleshooting AI systems in production environments.
Idea and Motivation
Production AI systems operate in environments where traditional application monitoring is insufficient. Token consumption, model inference latency, inference queue depth, and model performance drift are not captured by infrastructure metrics alone. Observability for AI requires a shift from passive monitoring to active tracing of decisions, embeddings, and model interactions.
The motivation for this series arose from the gap between OpenTelemetry’s rich capabilities and the specific instrumentation patterns required for ML systems. Existing observability frameworks handle request tracing beautifully but lack AI-specific semantic context. This series bridges that gap by documenting how to instrument AI systems for production observability using standardized protocols while capturing the unique concerns of ML workloads.
Goal
The series aims to establish a reproducible, open framework for observability in AI systems. This means providing practical guidance for instrumenting ML pipelines, LLM applications, and model serving infrastructure; demonstrating how to detect model drift and performance degradation; documenting integration patterns with OpenTelemetry and standard observability backends; and validating these approaches against real production workloads.
The goal is to equip teams with both conceptual understanding and practical tooling to achieve visibility into their AI systems, enabling faster debugging, better reliability, and informed decisions about model updates and retraining.
Scope
The series covers observability across the full AI system lifecycle:
| Phase | Focus Area | Key Topics |
|---|---|---|
| 1 | Foundations | Observability vs. monitoring, OpenTelemetry primitives (spans, traces, metrics, logs), AI system architecture context, production constraints |
| 2 | ML Instrumentation | Tracing ML pipelines, data pipeline observability, model training telemetry, feature store instrumentation, distributed training traces |
| 3 | LLM Observability | Token counting and cost tracking, prompt/completion tracing, embedding and vector search instrumentation, multi-turn conversation tracking |
| 4 | Production Monitoring | Model drift detection, inference latency breakdown, queue monitoring, batch vs. real-time serving telemetry, resource utilization in inference |
| 5 | Integration & Tooling | OTel AI Inspector architecture, integration with Jaeger/Datadog/New Relic/Honeycomb, SDKs and libraries for common ML frameworks, custom collectors for AI metrics |
| 6 | Reliability & Scale | High-cardinality data management, sampling strategies for observability at scale, cost optimization, incident response workflows, debugging production AI issues |
Focus
The primary technical focus is on OpenTelemetry as the foundation for AI observability, extended with AI-specific semantic conventions and instrumentation patterns. Special emphasis is placed on distributed tracing for ML pipelines, where understanding the end-to-end flow from data ingestion through model prediction is critical. LLM observability receives dedicated treatment, including token cost attribution, latency profiling across LLM API calls, and context window management. Model drift and performance degradation detection are treated as first-class observability concerns.
Limitations
Scientific Value
This series makes three contributions to the field. First, it documents OpenTelemetry as the foundational standard for AI system observability, establishing semantic conventions and instrumentation patterns that can be adopted widely across the ML community. Second, it addresses a gap in observability literature by treating AI-specific concerns (model drift, token accounting, prompt tracing) as primary observability challenges rather than afterthoughts. Third, the OTel AI Inspector tool provides a tangible artefact that implements the series’ recommendations, serving as both a reference implementation and a practical tool for teams instrumenting AI systems.
By anchoring this work in OpenTelemetry rather than proprietary observability platforms, the series ensures that recommendations remain vendor-agnostic and accessible to teams with varied infrastructure and budget constraints.
Resources
- OTel AI Inspector Tool→
- OpenTelemetry Community→
- OpenTelemetry Documentation→
- OpenTelemetry GitHub→
- Series DOI: 10.5281/zenodo.13947152→
Status
Ongoing. First article published March 2026. Additional articles planned across all research phases. The OTel AI Inspector tool is available for public use and feedback. This series is a living research effort; updates and new articles will be added as the field of AI observability evolves.
Contribution Opportunities
Researchers and practitioners wishing to build on this work are encouraged to engage in the following directions:
- Framework extensions: Develop OpenTelemetry instrumentation libraries for ML frameworks (PyTorch, TensorFlow, JAX, Hugging Face) that are missing comprehensive tracing support.
- Semantic conventions: Collaborate on formalizing AI-specific OpenTelemetry semantic conventions for model monitoring, embedding operations, and LLM interactions.
- OTel AI Inspector: Contribute to the open-source tool repository with new analysis capabilities, visualization features, and backend integrations.
- Case studies: Document real-world implementations of AI observability in production environments, highlighting lessons learned and operational patterns.
- Cost analysis: Build tools and models for predicting and optimizing observability costs in high-volume AI inference scenarios.