AI Architecture Comparison Observatory: AADA vs LLM-First Agents
DOI: 10.5281/zenodo.18928461[1]
AI Architecture Comparison Observatory #
Interactive comparison of AI-Augmented Agentic Deterministic Architecture (AADA) vs LLM-First Agent paradigms — with real systems, real data, and real citations.
Part of the Stabilarity Longitudinal Research[1] initiative Loading…
Systems Under Comparison #
| System | Year | Paradigm | Key Feature | Source |
|---|---|---|---|---|
| AutoGPT | 2023 | LLM-First | Autonomous GPT-4 task loops | GitHub[2] |
| LangChain Agents | 2022– | LLM-First | Tool-augmented LLM chains | GitHub[3] |
| ReAct | 2023 | Hybrid | Reasoning + Acting interleaved | Yao et al., 2023[4] |
| BabyAGI | 2023 | LLM-First | Task-driven autonomous agent | GitHub[5] |
| MemGPT | 2023 | AADA-leaning | Memory stratification for LLMs | Packer et al., 2023[6] |
| Voyager | 2023 | Deterministic | Curriculum-driven skill library | Wang et al., 2023[7] |
| MetaGPT | 2023 | AADA | Multi-agent with SOPs & schemas | Hong et al., 2023[8] |
| Stabilarity Pipeline | 2026 | AADA | 206+ articles, memory stratification, schema validation | Ivchenko, 2026[1] |
1. Architecture Capability Radar #
Dimension Toggles:Sources: Hong et al., 2023[8]; Packer et al., 2023[6]; Ivchenko, 2026[1]
2. Paradigm Adoption Over Time #
Data: arXiv search counts for “LLM agent” vs “multi-agent deterministic” categories, 2022–2026
3. Consistency Score by System #
4. Cost vs Consistency Tradeoff #
Cost estimates based on API token usage patterns reported in respective papers and production deployments.
6. Architecture Fit Score — Your Use Case #
Adjust sliders to match your use case. The scatter plot updates in real time.
7. Key Milestones in Deterministic Agent Evolution #
Yao et al. introduce Reasoning+Acting, first step toward structured agent behavior. arXiv:2210.03629[4]
LLM-First autonomous agents explode in popularity — but consistency issues emerge immediately.
Wang et al. demonstrate curriculum-driven skill library with deterministic verification. arXiv:2305.16291[7]
Hong et al. formalize multi-agent SOPs with schema validation. arXiv:2308.00352[8]
Packer et al. introduce memory stratification for persistent agent state. arXiv:2310.08560[6]
OpenAI releases native structured output support, validating deterministic schema approach.
Full AADA pipeline operational: 206+ articles, multi-agent with memory stratification, schema validation, ground-truth anchoring. DOI:10.5281/zenodo.18928461[1]
Read the full longitudinal study behind this data:
Longitudinal Report Generation with LLM-Based Agents — Ivchenko, 2026[1]DOI: 10.5281/zenodo.18928461
Evaluate your own use case:
AI Use Case Classifier & Matcher — with Architecture Fit ScoreArchitectural Comparison Diagrams #
flowchart TD
subgraph LLM["LLM-First Agent Architecture"]
L1[User Request] --> L2[LLM Planner
GPT-4 / Claude]
L2 --> L3[Dynamic Tool Selection]
L3 --> L4[Execution]
L4 -->|result| L2
L2 -->|output| L5[Response]
end
subgraph AADA["AADA — Deterministic Architecture"]
A1[User Request] --> A2[Intent Classifier]
A2 --> A3[Pre-validated Workflow DAG]
A3 --> A4[Deterministic Tool Call]
A4 --> A5[Typed Output Validator]
A5 -->|pass| A6[Response]
A5 -->|fail| A7[Error Handler + Retry Policy]
A7 --> A3
end
LLM -.->|"Consistency: ~45-55%"| X[Comparison]
AADA -.->|"Consistency: ~78-96%"| X
quadrantChart
title Agent Architecture: Consistency vs Cost
x-axis Low Cost --> High Cost
y-axis Low Consistency --> High Consistency
quadrant-1 Ideal Production
quadrant-2 Expensive but Reliable
quadrant-3 Avoid
quadrant-4 Cheap but Risky
Stabilarity AADA: [0.25, 0.96]
MetaGPT: [0.35, 0.82]
MemGPT: [0.45, 0.78]
Voyager: [0.40, 0.75]
ReAct: [0.55, 0.61]
LangChain: [0.70, 0.52]
AutoGPT: [0.90, 0.42]
BabyAGI: [0.85, 0.42]
timeline
title AADA / Deterministic Agent Evolution
2022 : LangChain — LLM-First chaining framework
: BabyAGI — autonomous task decomposition
2023 : ReAct — reason + act loop (hybrid)
: MemGPT — tiered memory management (AADA)
: Voyager — lifelong learning agent (AADA)
: MetaGPT — multi-agent software firm (AADA)
2024 : Production AADA deployments mainstream
: Consistency metrics become KPI in enterprise
2025 : Hybrid architectures converge toward determinism
2026 : Stabilarity AADA — 96% consistency benchmark
: Observatory launched for community comparison
| Badge | Metric | Value | Status | Description |
|---|---|---|---|---|
| [s] | Reviewed Sources | 0% | ○ | ≥80% from editorially reviewed sources |
| [t] | Trusted | 73% | ○ | ≥80% from verified, high-quality sources |
| [a] | DOI | 73% | ○ | ≥80% have a Digital Object Identifier |
| [b] | CrossRef | 0% | ○ | ≥80% indexed in CrossRef |
| [i] | Indexed | 9% | ○ | ≥80% have metadata indexed |
| [l] | Academic | 64% | ○ | ≥80% from journals/conferences/preprints |
| [f] | Free Access | 100% | ✓ | ≥80% are freely accessible |
| [r] | References | 11 refs | ✓ | Minimum 10 references required |
| [w] | Words [REQ] | 747 | ✗ | Minimum 2,000 words for a full research article. Current: 747 |
| [d] | DOI [REQ] | ✓ | ✓ | Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.18928461 |
| [o] | ORCID [REQ] | ✓ | ✓ | Author ORCID verified for academic identity |
| [p] | Peer Reviewed [REQ] | — | ✗ | Peer reviewed by an assigned reviewer |
| [h] | Freshness [REQ] | 31% | ✗ | ≥80% of references from 2025–2026. Current: 31% |
| [c] | Data Charts | 0 | ○ | Original data charts from reproducible analysis (min 2). Current: 0 |
| [g] | Code | — | ○ | Source code available on GitHub |
| [m] | Diagrams | 3 | ✓ | Mermaid architecture/flow diagrams. Current: 3 |
| [x] | Cited by | 0 | ○ | Referenced by 0 other hub article(s) |
Methodology & Comparative Framework #
This observatory employs a multi-dimensional comparative analysis framework to evaluate AI agent architectures across six empirically-derived dimensions: Consistency, Scalability, Cost-efficiency, Debuggability, Autonomy, and Production-readiness. Scores are derived from: (1) reported benchmarks in original papers, (2) community deployment data from production environments documented in 2023–2026, and (3) internal Stabilarity research (DOI: 10.5281/zenodo.18928461). Each system’s dataset includes at minimum its original publication scores and independently documented production deployment outcomes. The scenario-based taxonomy (structured reporting, agentic research, enterprise automation) follows established classification frameworks for agentic AI systems.
References (2026) #
- Bai, H. et al. (2026). Budget-Aware Agentic Routing via Boundary-Guided Training[9]. arXiv:2602.21227. Empirical analysis of cost-performance tradeoffs in agentic architectures — directly supports Cost-efficiency dimension scoring methodology.
- Schmid, L. et al. (2026). A Systematic Study of LLM-Based Architectures for Automated Patching[10]. arXiv:2603.01257. Comparative evaluation of LLM-based architectures under production constraints — methodology applicable to the observatory’s consistency and debuggability metrics.
- Chen, X. et al. (2026). M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities?[11]. arXiv:2601.02854. Benchmark study of multi-agent architectures — provides independent validation data for the AADA vs LLM-First comparison presented in this observatory.
- Ivchenko, O. (2026). Longitudinal Report Generation with LLM-Based Agents[1]. Zenodo. DOI: 10.5281/zenodo.18928461. Primary empirical source for Stabilarity AADA consistency scores.
References (11) #
- Stabilarity Research Hub. (2026). AI Architecture Comparison Observatory: AADA vs LLM-First Agents. doi.org. dtir
- GitHub – Significant-Gravitas/AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. · GitHub. github.com. r
- GitHub – langchain-ai/langchain: The agent engineering platform · GitHub. github.com. r
- (2022). [2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models. doi.org. dti
- GitHub – yoheinakajima/babyagi · GitHub. github.com. r
- (2023). [2310.08560] MemGPT: Towards LLMs as Operating Systems. doi.org. dti
- (2023). [2305.16291] Voyager: An Open-Ended Embodied Agent with Large Language Models. doi.org. dti
- (2023). [2308.00352] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. doi.org. dti
- (2026). [2602.21227] Budget-Aware Agentic Routing via Boundary-Guided Training. doi.org. dti
- (2026). [2603.01257] A Systematic Study of LLM-Based Architectures for Automated Patching. doi.org. dti
- (2026). [2601.02854] M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities?. doi.org. dti