AI Architecture Comparison Observatory: AADA vs LLM-First Agents

Future of AIJournal Commentary · Article 16 of 44

AI Architecture Comparison Observatory: AADA vs LLM-First Agents

Academic Citation: Ivchenko, O. (2026). AI Architecture Comparison Observatory: AADA vs LLM-First Agents. Stabilarity Research Hub. Odesa National Polytechnic University.
DOI: 10.5281/zenodo.18928461^[1]

DOI: 10.5281/zenodo.18928461^[1]ORCID

25% fresh refs · 3 diagrams · 13 references

AI Architecture Comparison Observatory #

Interactive comparison of AI-Augmented Agentic Deterministic Architecture (AADA) vs LLM-First Agent paradigms — with real systems, real data, and real citations.

Part of the Stabilarity Longitudinal Research^[1] initiative Loading…

Time Range System Filter

Systems Under Comparison #

System	Year	Paradigm	Key Feature	Source
AutoGPT	2023	LLM-First	Autonomous GPT-4 task loops	GitHub^[2]
LangChain Agents	2022–	LLM-First	Tool-augmented LLM chains	GitHub^[3]
ReAct	2023	Hybrid	Reasoning + Acting interleaved	Yao et al., 2023^[4]
BabyAGI	2023	LLM-First	Task-driven autonomous agent	GitHub^[5]
MemGPT	2023	AADA-leaning	Memory stratification for LLMs	Packer et al., 2023^[6]
Voyager	2023	Deterministic	Curriculum-driven skill library	Wang et al., 2023^[7]
MetaGPT	2023	AADA	Multi-agent with SOPs & schemas	Hong et al., 2023^[8]
Stabilarity Pipeline	2026	AADA	206+ articles, memory stratification, schema validation	Ivchenko, 2026^[1]

1. Architecture Capability Radar #

Dimension Toggles:

AADA (avg) LLM-First (avg)

Sources: Hong et al., 2023^[8]; Packer et al., 2023^[6]; Ivchenko, 2026^[1]

2. Paradigm Adoption Over Time #

Data: arXiv search counts for “LLM agent” vs “multi-agent deterministic” categories, 2022–2026

3. Consistency Score by System #

4. Cost vs Consistency Tradeoff #

Cost estimates based on API token usage patterns reported in respective papers and production deployments.

5. Compare Mode #

6. Architecture Fit Score — Your Use Case #

Adjust sliders to match your use case. The scatter plot updates in real time.

Task Duration: 50

Structure Need: 50

Consistency Need: 50

Oversight Level: 50

Drag sliders to see your fit score

7. Key Milestones in Deterministic Agent Evolution #

Oct 2022 — ReAct
Yao et al. introduce Reasoning+Acting, first step toward structured agent behavior. arXiv:2210.03629^[4]

Mar 2023 — AutoGPT / BabyAGI
LLM-First autonomous agents explode in popularity — but consistency issues emerge immediately.

May 2023 — Voyager
Wang et al. demonstrate curriculum-driven skill library with deterministic verification. arXiv:2305.16291^[7]

Aug 2023 — MetaGPT
Hong et al. formalize multi-agent SOPs with schema validation. arXiv:2308.00352^[8]

Oct 2023 — MemGPT
Packer et al. introduce memory stratification for persistent agent state. arXiv:2310.08560^[6]

Aug 2024 — OpenAI Structured Outputs
OpenAI releases native structured output support, validating deterministic schema approach.

Feb 2026 — AADA (Stabilarity)
Full AADA pipeline operational: 206+ articles, multi-agent with memory stratification, schema validation, ground-truth anchoring. DOI:10.5281/zenodo.18928461^[1]

Read the full longitudinal study behind this data:

Longitudinal Report Generation with LLM-Based Agents — Ivchenko, 2026^[1]

DOI: 10.5281/zenodo.18928461

Evaluate your own use case:

AI Use Case Classifier & Matcher — with Architecture Fit Score

Architectural Comparison Diagrams #

flowchart TD
    subgraph LLM["LLM-First Agent Architecture"]
        L1[User Request] --> L2[LLM Planner
GPT-4 / Claude]
        L2 --> L3[Dynamic Tool Selection]
        L3 --> L4[Execution]
        L4 -->|result| L2
        L2 -->|output| L5[Response]
    end
    subgraph AADA["AADA — Deterministic Architecture"]
        A1[User Request] --> A2[Intent Classifier]
        A2 --> A3[Pre-validated Workflow DAG]
        A3 --> A4[Deterministic Tool Call]
        A4 --> A5[Typed Output Validator]
        A5 -->|pass| A6[Response]
        A5 -->|fail| A7[Error Handler + Retry Policy]
        A7 --> A3
    end
    LLM -.->|"Consistency: ~45-55%"| X[Comparison]
    AADA -.->|"Consistency: ~78-96%"| X

quadrantChart
    title Agent Architecture: Consistency vs Cost
    x-axis Low Cost --> High Cost
    y-axis Low Consistency --> High Consistency
    quadrant-1 Ideal Production
    quadrant-2 Expensive but Reliable
    quadrant-3 Avoid
    quadrant-4 Cheap but Risky
    Stabilarity AADA: [0.25, 0.96]
    MetaGPT: [0.35, 0.82]
    MemGPT: [0.45, 0.78]
    Voyager: [0.40, 0.75]
    ReAct: [0.55, 0.61]
    LangChain: [0.70, 0.52]
    AutoGPT: [0.90, 0.42]
    BabyAGI: [0.85, 0.42]

timeline
    title AADA / Deterministic Agent Evolution
    2022 : LangChain — LLM-First chaining framework
         : BabyAGI — autonomous task decomposition
    2023 : ReAct — reason + act loop (hybrid)
         : MemGPT — tiered memory management (AADA)
         : Voyager — lifelong l[REDACTED]g agent (AADA)
         : MetaGPT — multi-agent software firm (AADA)
    2024 : Production AADA deployments mainstream
         : Consistency metrics become KPI in enterprise
    2025 : Hybrid architectures converge toward determinism
    2026 : Stabilarity AADA — 96% consistency benchmark
         : Observatory launched for community comparison

45stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	62%	○	≥80% from verified, high-quality sources
[a]	DOI	62%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	8%	○	≥80% have metadata indexed
[l]	Academic	62%	○	≥80% from journals/conferences/preprints
[f]	Free Access	100%	✓	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	749	✗	Minimum 2,000 words for a full research article. Current: 749
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.18928461
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	25%	✗	≥60% of references from 2025–2026. Current: 25%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (50 × 60%) + Required (2/5 × 30%) + Optional (1/4 × 10%)

Methodology & Comparative Framework #

This observatory employs a multi-dimensional comparative analysis framework to evaluate AI agent architectures across six empirically-derived dimensions: Consistency, Scalability, Cost-efficiency, Debuggability, Autonomy, and Production-readiness. Scores are derived from: (1) reported benchmarks in original papers, (2) community deployment data from production environments documented in 2023–2026, and (3) internal Stabilarity research (DOI: 10.5281/zenodo.18928461). Each system’s dataset includes at minimum its original publication scores and independently documented production deployment outcomes. The scenario-based taxonomy (structured reporting, agentic research, enterprise automation) follows established classification frameworks for agentic AI systems.

References (2026) #

Bai, H. et al. (2026). Budget-Aware Agentic Routing via Boundary-Guided Training^[9]. arXiv:2602.21227. Empirical analysis of cost-performance tradeoffs in agentic architectures — directly supports Cost-efficiency dimension scoring methodology.
Schmid, L. et al. (2026). A Systematic Study of LLM-Based Architectures for Automated Patching^[10]. arXiv:2603.01257. Comparative evaluation of LLM-based architectures under production constraints — methodology applicable to the observatory’s consistency and debuggability metrics.
Chen, X. et al. (2026). M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities?^[11]. arXiv:2601.02854. Benchmark study of multi-agent architectures — provides independent validation data for the AADA vs LLM-First comparison presented in this observatory.
Ivchenko, O. (2026). Longitudinal Report Generation with LLM-Based Agents^[1]. Zenodo. DOI: 10.5281/zenodo.18928461. Primary empirical source for Stabilarity AADA consistency scores.

References (11) #

Stabilarity Research Hub. (2026). AI Architecture Comparison Observatory: AADA vs LLM-First Agents. doi.org. d t i i
GitHub – Significant-Gravitas/AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. · GitHub. github.com. r
GitHub – langchain-ai/langchain: The agent engineering platform · GitHub. github.com. r
(2022). [2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models. doi.org. d t i
GitHub – yoheinakajima/babyagi · GitHub. github.com. r
(2023). [2310.08560] MemGPT: Towards LLMs as Operating Systems. doi.org. d t i
(2023). [2305.16291] Voyager: An Open-Ended Embodied Agent with Large Language Models. doi.org. d t i
(2023). [2308.00352] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. doi.org. d t i
(2026). [2602.21227] Budget-Aware Agentic Routing via Boundary-Guided Training. doi.org. d t i
(2026). [2603.01257] A Systematic Study of LLM-Based Architectures for Automated Patching. doi.org. d t i
(2026). [2601.02854] M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities?. doi.org. d t i

Version History · 9 revisions

Rev	Date	Status	Action	By	Size
v1	Mar 9, 2026	DRAFT	Initial draft First version created	(w) Author	14,465 (+14465)
v2	Mar 9, 2026	PUBLISHED	Published Article published to research hub	(w) Author	14,465 (~0)
v3	Mar 9, 2026	REVISED	Major revision Significant content expansion (+1,876 chars)	(w) Author	16,341 (+1876)
v4	Mar 9, 2026	REDACTED	Minor edit Formatting, typos, or styling corrections	(r) Redactor	16,412 (+71)
v5	Mar 9, 2026	REDACTED	Content consolidation Removed 16,412 chars	(r) Redactor	0 (-16412)
v6	Mar 9, 2026	REVISED	Major revision Significant content expansion (+16,412 chars)	(w) Author	16,412 (+16412)
v7	Mar 9, 2026	REFERENCES	Reference update Added 1 DOI reference(s)	(r) Reference Checker	16,619 (+207)
v8	Mar 10, 2026	REVISED	Major revision Significant content expansion (+1,818 chars)	(w) Author	18,437 (+1818)
v9	Mar 10, 2026	CURRENT	Content update Section additions or elaboration	(w) Author	18,896 (+459)