AI Observability & Monitoring — A Research Series

API Access for Researchers — All data and models from this series are available via the API Gateway. Get your API key →

Data center infrastructure — monitoring and observability infrastructure

Research Series

DOI 10.5281/zenodo.13947152

AI Observability and Monitoring: Frameworks, Tools, and Production Reliability

Oleh Ivchenko¹

¹ Odesa National Polytechnic University (ONPU)

Type: Technical Research Series
Status: Ongoing · 1+ articles · 2026–
Tool: OTel AI Inspector → OpenTelemetry → GitHub

1+ Articles · Multiple Phases · 2026– · Ongoing

Abstract

Production AI systems demand observability beyond traditional monitoring frameworks. This series explores comprehensive observability for AI — from OpenTelemetry-based distributed tracing of ML pipelines to LLM token consumption tracking, model drift detection, and production reliability engineering for AI systems. Spanning foundational OpenTelemetry principles, AI-specific instrumentation patterns, and real-world deployment scenarios, the series documents best practices for monitoring and troubleshooting AI systems in production environments.

Idea and Motivation

Production AI systems operate in environments where traditional application monitoring is insufficient. Token consumption, model inference latency, inference queue depth, and model performance drift are not captured by infrastructure metrics alone. Observability for AI requires a shift from passive monitoring to active tracing of decisions, embeddings, and model interactions.

The motivation for this series arose from the gap between OpenTelemetry’s rich capabilities and the specific instrumentation patterns required for ML systems. Existing observability frameworks handle request tracing beautifully but lack AI-specific semantic context. This series bridges that gap by documenting how to instrument AI systems for production observability using standardized protocols while capturing the unique concerns of ML workloads.

Goal

The series aims to establish a reproducible, open framework for observability in AI systems. This means providing practical guidance for instrumenting ML pipelines, LLM applications, and model serving infrastructure; demonstrating how to detect model drift and performance degradation; documenting integration patterns with OpenTelemetry and standard observability backends; and validating these approaches against real production workloads.

The goal is to equip teams with both conceptual understanding and practical tooling to achieve visibility into their AI systems, enabling faster debugging, better reliability, and informed decisions about model updates and retraining.

Scope

The series covers observability across the full AI system lifecycle:

Table 1. Research phases and thematic coverage
Phase	Focus Area	Key Topics
1	Foundations	Observability vs. monitoring, OpenTelemetry primitives (spans, traces, metrics, logs), AI system architecture context, production constraints
2	ML Instrumentation	Tracing ML pipelines, data pipeline observability, model training telemetry, feature store instrumentation, distributed training traces
3	LLM Observability	Token counting and cost tracking, prompt/completion tracing, embedding and vector search instrumentation, multi-turn conversation tracking
4	Production Monitoring	Model drift detection, inference latency breakdown, queue monitoring, batch vs. real-time serving telemetry, resource utilization in inference
5	Integration & Tooling	OTel AI Inspector architecture, integration with Jaeger/Datadog/New Relic/Honeycomb, SDKs and libraries for common ML frameworks, custom collectors for AI metrics
6	Reliability & Scale	High-cardinality data management, sampling strategies for observability at scale, cost optimization, incident response workflows, debugging production AI issues

Focus

The primary technical focus is on OpenTelemetry as the foundation for AI observability, extended with AI-specific semantic conventions and instrumentation patterns. Special emphasis is placed on distributed tracing for ML pipelines, where understanding the end-to-end flow from data ingestion through model prediction is critical. LLM observability receives dedicated treatment, including token cost attribution, latency profiling across LLM API calls, and context window management. Model drift and performance degradation detection are treated as first-class observability concerns.

Limitations

Framework scopeFocus on OpenTelemetry-based approaches. Proprietary ML observability platforms are discussed as reference points but not deeply integrated.

Production dataExamples use public datasets and synthetic scenarios. Real production traces are anonymized or simulated.

No production SLAsResearch-grade implementations. Commercial SLA guarantees and operational support are outside scope.

Cost modelingObservability cost analysis is approximate and depends heavily on infrastructure choices and data volume.

Scientific Value

This series makes three contributions to the field. First, it documents OpenTelemetry as the foundational standard for AI system observability, establishing semantic conventions and instrumentation patterns that can be adopted widely across the ML community. Second, it addresses a gap in observability literature by treating AI-specific concerns (model drift, token accounting, prompt tracing) as primary observability challenges rather than afterthoughts. Third, the OTel AI Inspector tool provides a tangible artefact that implements the series’ recommendations, serving as both a reference implementation and a practical tool for teams instrumenting AI systems.

By anchoring this work in OpenTelemetry rather than proprietary observability platforms, the series ensures that recommendations remain vendor-agnostic and accessible to teams with varied infrastructure and budget constraints.

Resources

Status

Ongoing. First article published March 2026. Additional articles planned across all research phases. The OTel AI Inspector tool is available for public use and feedback. This series is a living research effort; updates and new articles will be added as the field of AI observability evolves.

Contribution Opportunities

Researchers and practitioners wishing to build on this work are encouraged to engage in the following directions:

Framework extensions: Develop OpenTelemetry instrumentation libraries for ML frameworks (PyTorch, TensorFlow, JAX, Hugging Face) that are missing comprehensive tracing support.
Semantic conventions: Collaborate on formalizing AI-specific OpenTelemetry semantic conventions for model monitoring, embedding operations, and LLM interactions.
OTel AI Inspector: Contribute to the open-source tool repository with new analysis capabilities, visualization features, and backend integrations.
Case studies: Document real-world implementations of AI observability in production environments, highlighting lessons learned and operational patterns.
Cost analysis: Build tools and models for predicting and optimizing observability costs in high-volume AI inference scenarios.

Published Articles

Technical Research · 5 published

By Oleh Ivchenko

All Articles

Observability for AI Systems: Why OpenTelemetry Is Not Enough and What the Community Needs DOI 10/10 32stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	20%	○	≥80% from verified, high-quality sources
[a]	DOI	5%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	20%	○	≥80% have metadata indexed
[l]	Academic	20%	○	≥80% from journals/conferences/preprints
[f]	Free Access	40%	○	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,815	✓	Minimum 2,000 words for a full research article. Current: 2,815
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.18864333
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	5%	✗	≥60% of references from 2025–2026. Current: 5%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (19 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Technical Research · Mar 4, 2026 · 14 min read

Manufacturing AI Observability: Monitoring Explanation Quality in Predictive Maintenance Systems DOI 1/10 33stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	50%	○	≥80% from verified, high-quality sources
[a]	DOI	25%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	0%	○	≥80% have metadata indexed
[l]	Academic	50%	○	≥80% from journals/conferences/preprints
[f]	Free Access	100%	✓	≥80% are freely accessible
[r]	References	4 refs	○	Minimum 10 references required
[w]	Words [REQ]	1,089	✗	Minimum 2,000 words for a full research article. Current: 1,089
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19761055
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	25%	✗	≥60% of references from 2025–2026. Current: 25%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (31 × 60%) + Required (2/5 × 30%) + Optional (1/4 × 10%)

Technical Research · Apr 25, 2026 · 5 min read

—

XAI Observability: Monitoring Explainability Drift in Production Models (Draft — in preparation)

XAI Observability: Monitoring Explainability Drift in Production Models DOI 1/10 43stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	18%	○	≥80% from editorially reviewed sources
[t]	Trusted	55%	○	≥80% from verified, high-quality sources
[a]	DOI	36%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	18%	○	≥80% indexed in CrossRef
[i]	Indexed	27%	○	≥80% have metadata indexed
[l]	Academic	64%	○	≥80% from journals/conferences/preprints
[f]	Free Access	73%	○	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,762	✗	Minimum 2,000 words for a full research article. Current: 1,762
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19823676
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	36%	✗	≥60% of references from 2025–2026. Current: 36%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (47 × 60%) + Required (2/5 × 30%) + Optional (1/4 × 10%)

Technical Research · Apr 26, 2026 · 9 min read

Trusted Open Source AI in Finance: Compliance-Ready Stack for Financial AI DOI 2/10 70stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	95%	✓	≥80% from verified, high-quality sources
[a]	DOI	86%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	0%	○	≥80% have metadata indexed
[l]	Academic	91%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	100%	✓	≥80% are freely accessible
[r]	References	22 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,423	✓	Minimum 2,000 words for a full research article. Current: 2,423
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.20084678
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	86%	✓	≥60% of references from 2025–2026. Current: 86%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	4	✓	Mermaid architecture/flow diagrams. Current: 4
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (69 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Technical Research · May 8, 2026 · 12 min read

The Legal Industry AI Transformation: From Research to Courtroom DOI 1/10 46stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	73%	○	≥80% from verified, high-quality sources
[a]	DOI	55%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	9%	○	≥80% have metadata indexed
[l]	Academic	64%	○	≥80% from journals/conferences/preprints
[f]	Free Access	91%	✓	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,401	✗	Minimum 2,000 words for a full research article. Current: 1,401
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.20168865
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	27%	✗	≥60% of references from 2025–2026. Current: 27%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	2	✓	Mermaid architecture/flow diagrams. Current: 2
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (52 × 60%) + Required (2/5 × 30%) + Optional (1/4 × 10%)

Technical Research · May 13, 2026 · 7 min read

5 published1,683 total views47 min total readingMar 2026 – May 2026 published