30 Articles · 4 Research Phases · 2026 · In Progress
Abstract
How do large language models remember? The key-value cache is the dominant memory structure in transformer inference, yet its behaviour, limitations, and optimisation remain poorly systematised. This series provides a rigorous, benchmark-driven investigation of AI memory systems: from KV-cache fundamentals and attention memory patterns through compression techniques, architectural comparisons, and distributed caching infrastructure, to the economics of context caching and emerging paradigms that blur the line between attention memory and retrieval-augmented memory. Across 30 articles organised in four phases — Foundation & Benchmarking, Optimisation Techniques, Infrastructure, and Economics & Emerging Directions — the series builds a unified evidence base for understanding, measuring, and improving how transformer models store, retrieve, and forget information.
Articles Technical Research · 29 published
All Articles
1 KV-Cache Fundamentals — How Transformers Remember (and Forget) DOI 1/10 71 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 6% ○ ≥80% from editorially reviewed sources [t] Trusted 88% ✓ ≥80% from verified, high-quality sources [a] DOI 88% ✓ ≥80% have a Digital Object Identifier [b] CrossRef 13% ○ ≥80% indexed in CrossRef [i] Indexed 81% ✓ ≥80% have metadata indexed [l] Academic 88% ✓ ≥80% from journals/conferences/preprints [f] Free Access 88% ✓ ≥80% are freely accessible [r] References 16 refs ✓ Minimum 10 references required [w] Words [REQ] 2,798 ✓ Minimum 2,000 words for a full research article. Current: 2,798 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19112532 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 47% ✗ ≥60% of references from 2025–2026. Current: 47% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (84 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
Technical Research · Mar 19, 2026 · 14 min read
2 Attention Memory Patterns — What Models Actually Store in KV-Cache DOI 1/10 72 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 10% ○ ≥80% from editorially reviewed sources [t] Trusted 90% ✓ ≥80% from verified, high-quality sources [a] DOI 86% ✓ ≥80% have a Digital Object Identifier [b] CrossRef 10% ○ ≥80% indexed in CrossRef [i] Indexed 90% ✓ ≥80% have metadata indexed [l] Academic 90% ✓ ≥80% from journals/conferences/preprints [f] Free Access 95% ✓ ≥80% are freely accessible [r] References 21 refs ✓ Minimum 10 references required [w] Words [REQ] 2,736 ✓ Minimum 2,000 words for a full research article. Current: 2,736 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19116558 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 33% ✗ ≥60% of references from 2025–2026. Current: 33% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (86 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
Technical Research · Mar 19, 2026 · 14 min read
3 Context Window Utilization — How Much of the Window Do Models Really Use? DOI 2/10 73 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 6% ○ ≥80% from editorially reviewed sources [t] Trusted 89% ✓ ≥80% from verified, high-quality sources [a] DOI 67% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 89% ✓ ≥80% have metadata indexed [l] Academic 78% ○ ≥80% from journals/conferences/preprints [f] Free Access 94% ✓ ≥80% are freely accessible [r] References 18 refs ✓ Minimum 10 references required [w] Words [REQ] 2,878 ✓ Minimum 2,000 words for a full research article. Current: 2,878 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19160303 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 63% ✓ ≥60% of references from 2025–2026. Current: 63% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (78 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)
Technical Research · Mar 22, 2026 · 14 min read
4 Long-Context Retrieval Benchmarks — Needle-in-Haystack and Beyond DOI 10/10 58 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 14% ○ ≥80% from editorially reviewed sources [t] Trusted 79% ○ ≥80% from verified, high-quality sources [a] DOI 21% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 86% ✓ ≥80% have metadata indexed [l] Academic 71% ○ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 14 refs ✓ Minimum 10 references required [w] Words [REQ] 2,043 ✓ Minimum 2,000 words for a full research article. Current: 2,043 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19163187 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 31% ✗ ≥60% of references from 2025–2026. Current: 31% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
Technical Research · Mar 22, 2026 · 10 min read
5 Memory Degradation Curves — How Accuracy Decays with Context Length DOI 1/10 71 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 0% ○ ≥80% from editorially reviewed sources [t] Trusted 83% ✓ ≥80% from verified, high-quality sources [a] DOI 67% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 83% ✓ ≥80% have metadata indexed [l] Academic 72% ○ ≥80% from journals/conferences/preprints [f] Free Access 94% ✓ ≥80% are freely accessible [r] References 18 refs ✓ Minimum 10 references required [w] Words [REQ] 2,525 ✓ Minimum 2,000 words for a full research article. Current: 2,525 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19170557 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 75% ✓ ≥60% of references from 2025–2026. Current: 75% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (74 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)
Technical Research · Mar 22, 2026 · 13 min read
6 KV-Cache Compression Benchmarks — Quantization vs Eviction vs Pruning DOI 4/10 74 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 0% ○ ≥80% from editorially reviewed sources [t] Trusted 89% ✓ ≥80% from verified, high-quality sources [a] DOI 72% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 89% ✓ ≥80% have metadata indexed [l] Academic 78% ○ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 18 refs ✓ Minimum 10 references required [w] Words [REQ] 2,395 ✓ Minimum 2,000 words for a full research article. Current: 2,395 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19176966 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 75% ✓ ≥60% of references from 2025–2026. Current: 75% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (79 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)
Technical Research · Mar 23, 2026 · 12 min read
7 Cross-Architecture Memory Comparison — Llama vs Mistral vs Gemma vs Qwen DOI 4/10 64 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 0% ○ ≥80% from editorially reviewed sources [t] Trusted 76% ○ ≥80% from verified, high-quality sources [a] DOI 53% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 88% ✓ ≥80% have metadata indexed [l] Academic 65% ○ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 17 refs ✓ Minimum 10 references required [w] Words [REQ] 2,222 ✓ Minimum 2,000 words for a full research article. Current: 2,222 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 47% ✗ ≥60% of references from 2025–2026. Current: 47% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (68 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 23, 2026 · 11 min read
8 Prompt Caching Efficiency — Measuring Reuse Across Real Workloads DOI 1/10 73 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 7% ○ ≥80% from editorially reviewed sources [t] Trusted 86% ✓ ≥80% from verified, high-quality sources [a] DOI 57% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 86% ✓ ≥80% have metadata indexed [l] Academic 71% ○ ≥80% from journals/conferences/preprints [f] Free Access 93% ✓ ≥80% are freely accessible [r] References 14 refs ✓ Minimum 10 references required [w] Words [REQ] 2,628 ✓ Minimum 2,000 words for a full research article. Current: 2,628 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 67% ✓ ≥60% of references from 2025–2026. Current: 67% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (73 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 23, 2026 · 13 min read
9 Multi-Turn Memory — How Conversation History Degrades Model Performance DOI 1/10 54 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 0% ○ ≥80% from editorially reviewed sources [t] Trusted 88% ✓ ≥80% from verified, high-quality sources [a] DOI 6% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 88% ✓ ≥80% have metadata indexed [l] Academic 71% ○ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 17 refs ✓ Minimum 10 references required [w] Words [REQ] 1,599 ✗ Minimum 2,000 words for a full research article. Current: 1,599 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 7% ✗ ≥60% of references from 2025–2026. Current: 7% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (61 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 23, 2026 · 8 min read
10 Meta-Analysis of Context Benchmarks — Building a Unified Evaluation Framework DOI 1/10 61 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 16% ○ ≥80% from editorially reviewed sources [t] Trusted 89% ✓ ≥80% from verified, high-quality sources [a] DOI 5% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 89% ✓ ≥80% have metadata indexed [l] Academic 79% ○ ≥80% from journals/conferences/preprints [f] Free Access 84% ✓ ≥80% are freely accessible [r] References 19 refs ✓ Minimum 10 references required [w] Words [REQ] 2,528 ✓ Minimum 2,000 words for a full research article. Current: 2,528 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19199439 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 29% ✗ ≥60% of references from 2025–2026. Current: 29% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 24, 2026 · 13 min read
11 Paged Attention and Virtual Memory for LLM Inference DOI 3/10 59 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 13% ○ ≥80% from editorially reviewed sources [t] Trusted 73% ○ ≥80% from verified, high-quality sources [a] DOI 27% ○ ≥80% have a Digital Object Identifier [b] CrossRef 13% ○ ≥80% indexed in CrossRef [i] Indexed 80% ✓ ≥80% have metadata indexed [l] Academic 60% ○ ≥80% from journals/conferences/preprints [f] Free Access 87% ✓ ≥80% are freely accessible [r] References 15 refs ✓ Minimum 10 references required [w] Words [REQ] 2,912 ✓ Minimum 2,000 words for a full research article. Current: 2,912 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19203099 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 31% ✗ ≥60% of references from 2025–2026. Current: 31% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (60 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 24, 2026 · 15 min read
12 Grouped-Query Attention — Cache-Efficient Architecture Design DOI 1/10 73 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 4% ○ ≥80% from editorially reviewed sources [t] Trusted 92% ✓ ≥80% from verified, high-quality sources [a] DOI 79% ○ ≥80% have a Digital Object Identifier [b] CrossRef 4% ○ ≥80% indexed in CrossRef [i] Indexed 88% ✓ ≥80% have metadata indexed [l] Academic 83% ✓ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 24 refs ✓ Minimum 10 references required [w] Words [REQ] 2,403 ✓ Minimum 2,000 words for a full research article. Current: 2,403 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209159 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 36% ✗ ≥60% of references from 2025–2026. Current: 36% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (83 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 24, 2026 · 12 min read
13 Speculative Decoding and Cache Reuse DOI 6/10 61 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 0% ○ ≥80% from editorially reviewed sources [t] Trusted 90% ✓ ≥80% from verified, high-quality sources [a] DOI 5% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 90% ✓ ≥80% have metadata indexed [l] Academic 81% ✓ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 21 refs ✓ Minimum 10 references required [w] Words [REQ] 2,662 ✓ Minimum 2,000 words for a full research article. Current: 2,662 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19210815 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 21% ✗ ≥60% of references from 2025–2026. Current: 21% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 24, 2026 · 13 min read
14 Semantic Prompt Caching — Beyond Exact Match DOI 3/10 59 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 0% ○ ≥80% from editorially reviewed sources [t] Trusted 86% ✓ ≥80% from verified, high-quality sources [a] DOI 7% ○ ≥80% have a Digital Object Identifier [b] CrossRef 0% ○ ≥80% indexed in CrossRef [i] Indexed 86% ✓ ≥80% have metadata indexed [l] Academic 71% ○ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 14 refs ✓ Minimum 10 references required [w] Words [REQ] 2,336 ✓ Minimum 2,000 words for a full research article. Current: 2,336 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19211071 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 33% ✗ ≥60% of references from 2025–2026. Current: 33% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code — ○ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (60 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 24, 2026 · 12 min read
15 Token Pruning and Attention Sparsity DOI 1/10 79 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 63% ○ ≥80% from editorially reviewed sources [t] Trusted 89% ✓ ≥80% from verified, high-quality sources [a] DOI 74% ○ ≥80% have a Digital Object Identifier [b] CrossRef 63% ○ ≥80% indexed in CrossRef [i] Indexed 84% ✓ ≥80% have metadata indexed [l] Academic 74% ○ ≥80% from journals/conferences/preprints [f] Free Access 89% ✓ ≥80% are freely accessible [r] References 19 refs ✓ Minimum 10 references required [w] Words [REQ] 2,304 ✓ Minimum 2,000 words for a full research article. Current: 2,304 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19269070 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 75% ✓ ≥60% of references from 2025–2026. Current: 75% [c] Data Charts 0 ○ Original data charts from reproducible analysis (min 2). Current: 0 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (84 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)
Technical Research · Mar 28, 2026 · 12 min read
— Production Cache Monitoring — Metrics and Capacity Planning (Draft — in preparation)
— Cross-Model Cache Transfer and Universal Formats (Draft — in preparation)
16 Cross-Layer KV-Cache Sharing DOI 2/10 80 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 13% ○ ≥80% from editorially reviewed sources [t] Trusted 91% ✓ ≥80% from verified, high-quality sources [a] DOI 78% ○ ≥80% have a Digital Object Identifier [b] CrossRef 13% ○ ≥80% indexed in CrossRef [i] Indexed 83% ✓ ≥80% have metadata indexed [l] Academic 78% ○ ≥80% from journals/conferences/preprints [f] Free Access 96% ✓ ≥80% are freely accessible [r] References 23 refs ✓ Minimum 10 references required [w] Words [REQ] 2,141 ✓ Minimum 2,000 words for a full research article. Current: 2,141 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19291014 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 65% ✓ ≥60% of references from 2025–2026. Current: 65% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (81 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 28, 2026 · 11 min read
17 Sliding Window and Compressive Caching for Infinite Context DOI 2/10 81 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 23% ○ ≥80% from editorially reviewed sources [t] Trusted 88% ✓ ≥80% from verified, high-quality sources [a] DOI 77% ○ ≥80% have a Digital Object Identifier [b] CrossRef 23% ○ ≥80% indexed in CrossRef [i] Indexed 85% ✓ ≥80% have metadata indexed [l] Academic 81% ✓ ≥80% from journals/conferences/preprints [f] Free Access 96% ✓ ≥80% are freely accessible [r] References 26 refs ✓ Minimum 10 references required [w] Words [REQ] 2,252 ✓ Minimum 2,000 words for a full research article. Current: 2,252 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19299498 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 70% ✓ ≥60% of references from 2025–2026. Current: 70% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (82 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 28, 2026 · 11 min read
18 Flash Attention's Role in Memory-Efficient Inference DOI 4/10 81 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 48% ○ ≥80% from editorially reviewed sources [t] Trusted 91% ✓ ≥80% from verified, high-quality sources [a] DOI 70% ○ ≥80% have a Digital Object Identifier [b] CrossRef 48% ○ ≥80% indexed in CrossRef [i] Indexed 83% ✓ ≥80% have metadata indexed [l] Academic 70% ○ ≥80% from journals/conferences/preprints [f] Free Access 96% ✓ ≥80% are freely accessible [r] References 23 refs ✓ Minimum 10 references required [w] Words [REQ] 2,895 ✓ Minimum 2,000 words for a full research article. Current: 2,895 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19303451 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 67% ✓ ≥60% of references from 2025–2026. Current: 67% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (82 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 29, 2026 · 14 min read
19 Distributed KV-Cache in Multi-GPU Serving DOI 2/10 83 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 58% ○ ≥80% from editorially reviewed sources [t] Trusted 89% ✓ ≥80% from verified, high-quality sources [a] DOI 79% ○ ≥80% have a Digital Object Identifier [b] CrossRef 58% ○ ≥80% indexed in CrossRef [i] Indexed 84% ✓ ≥80% have metadata indexed [l] Academic 79% ○ ≥80% from journals/conferences/preprints [f] Free Access 84% ✓ ≥80% are freely accessible [r] References 19 refs ✓ Minimum 10 references required [w] Words [REQ] 2,267 ✓ Minimum 2,000 words for a full research article. Current: 2,267 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19310103 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 71% ✓ ≥60% of references from 2025–2026. Current: 71% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (86 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 29, 2026 · 11 min read
20 Disaggregated Prefill and Decode Architectures DOI 2/10 81 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 47% ○ ≥80% from editorially reviewed sources [t] Trusted 89% ✓ ≥80% from verified, high-quality sources [a] DOI 74% ○ ≥80% have a Digital Object Identifier [b] CrossRef 47% ○ ≥80% indexed in CrossRef [i] Indexed 84% ✓ ≥80% have metadata indexed [l] Academic 74% ○ ≥80% from journals/conferences/preprints [f] Free Access 58% ○ ≥80% are freely accessible [r] References 19 refs ✓ Minimum 10 references required [w] Words [REQ] 2,157 ✓ Minimum 2,000 words for a full research article. Current: 2,157 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19316904 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 69% ✓ ≥60% of references from 2025–2026. Current: 69% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (83 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 29, 2026 · 11 min read
21 Cache-Aware Request Scheduling and Batching DOI 1/10 77 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 43% ○ ≥80% from editorially reviewed sources [t] Trusted 90% ✓ ≥80% from verified, high-quality sources [a] DOI 62% ○ ≥80% have a Digital Object Identifier [b] CrossRef 43% ○ ≥80% indexed in CrossRef [i] Indexed 67% ○ ≥80% have metadata indexed [l] Academic 71% ○ ≥80% from journals/conferences/preprints [f] Free Access 76% ○ ≥80% are freely accessible [r] References 21 refs ✓ Minimum 10 references required [w] Words [REQ] 2,876 ✓ Minimum 2,000 words for a full research article. Current: 2,876 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19325142 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 67% ✓ ≥60% of references from 2025–2026. Current: 67% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (76 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 30, 2026 · 14 min read
22 Memory Hierarchy — DRAM, HBM, and SSD-Backed Caches DOI 5/10 59 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 13% ○ ≥80% from editorially reviewed sources [t] Trusted 73% ○ ≥80% from verified, high-quality sources [a] DOI 40% ○ ≥80% have a Digital Object Identifier [b] CrossRef 13% ○ ≥80% indexed in CrossRef [i] Indexed 40% ○ ≥80% have metadata indexed [l] Academic 60% ○ ≥80% from journals/conferences/preprints [f] Free Access 87% ✓ ≥80% are freely accessible [r] References 15 refs ✓ Minimum 10 references required [w] Words [REQ] 1,733 ✗ Minimum 2,000 words for a full research article. Current: 1,733 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19329971 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 62% ✓ ≥60% of references from 2025–2026. Current: 62% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (55 × 60%) + Required (3/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 30, 2026 · 9 min read
23 Cache Coherence in Multi-Tenant Deployments DOI 3/10 74 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 36% ○ ≥80% from editorially reviewed sources [t] Trusted 77% ○ ≥80% from verified, high-quality sources [a] DOI 64% ○ ≥80% have a Digital Object Identifier [b] CrossRef 36% ○ ≥80% indexed in CrossRef [i] Indexed 59% ○ ≥80% have metadata indexed [l] Academic 77% ○ ≥80% from journals/conferences/preprints [f] Free Access 77% ○ ≥80% are freely accessible [r] References 22 refs ✓ Minimum 10 references required [w] Words [REQ] 2,358 ✓ Minimum 2,000 words for a full research article. Current: 2,358 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19336721 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 70% ✓ ≥60% of references from 2025–2026. Current: 70% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (71 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 30, 2026 · 12 min read
24 Production Cache Monitoring — Metrics and Capacity Planning DOI 2/10 71 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 44% ○ ≥80% from editorially reviewed sources [t] Trusted 68% ○ ≥80% from verified, high-quality sources [a] DOI 52% ○ ≥80% have a Digital Object Identifier [b] CrossRef 48% ○ ≥80% indexed in CrossRef [i] Indexed 68% ○ ≥80% have metadata indexed [l] Academic 64% ○ ≥80% from journals/conferences/preprints [f] Free Access 60% ○ ≥80% are freely accessible [r] References 25 refs ✓ Minimum 10 references required [w] Words [REQ] 2,611 ✓ Minimum 2,000 words for a full research article. Current: 2,611 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19340506 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 74% ✓ ≥60% of references from 2025–2026. Current: 74% [c] Data Charts 5 ✓ Original data charts from reproducible analysis (min 2). Current: 5 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 30, 2026 · 13 min read
25 The Economics of Context Caching — Cost Models and Break-Even DOI 2/10 85 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 76% ○ ≥80% from editorially reviewed sources [t] Trusted 92% ✓ ≥80% from verified, high-quality sources [a] DOI 79% ○ ≥80% have a Digital Object Identifier [b] CrossRef 76% ○ ≥80% indexed in CrossRef [i] Indexed 84% ✓ ≥80% have metadata indexed [l] Academic 82% ✓ ≥80% from journals/conferences/preprints [f] Free Access 63% ○ ≥80% are freely accessible [r] References 38 refs ✓ Minimum 10 references required [w] Words [REQ] 2,944 ✓ Minimum 2,000 words for a full research article. Current: 2,944 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19343122 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 77% ✓ ≥60% of references from 2025–2026. Current: 77% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (89 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 31, 2026 · 15 min read
26 Cache-Augmented Retrieval — RAG Meets KV-Cache DOI 2/10 69 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 26% ○ ≥80% from editorially reviewed sources [t] Trusted 91% ✓ ≥80% from verified, high-quality sources [a] DOI 39% ○ ≥80% have a Digital Object Identifier [b] CrossRef 26% ○ ≥80% indexed in CrossRef [i] Indexed 48% ○ ≥80% have metadata indexed [l] Academic 61% ○ ≥80% from journals/conferences/preprints [f] Free Access 96% ✓ ≥80% are freely accessible [r] References 23 refs ✓ Minimum 10 references required [w] Words [REQ] 3,491 ✓ Minimum 2,000 words for a full research article. Current: 3,491 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19348524 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 69% ✓ ≥60% of references from 2025–2026. Current: 69% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 31, 2026 · 17 min read
27 Retrieval-Augmented Memory vs Pure Attention Memory DOI 1/10 67 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 14% ○ ≥80% from editorially reviewed sources [t] Trusted 91% ✓ ≥80% from verified, high-quality sources [a] DOI 27% ○ ≥80% have a Digital Object Identifier [b] CrossRef 14% ○ ≥80% indexed in CrossRef [i] Indexed 36% ○ ≥80% have metadata indexed [l] Academic 73% ○ ≥80% from journals/conferences/preprints [f] Free Access 95% ✓ ≥80% are freely accessible [r] References 22 refs ✓ Minimum 10 references required [w] Words [REQ] 2,204 ✓ Minimum 2,000 words for a full research article. Current: 2,204 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19354653 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 72% ✓ ≥60% of references from 2025–2026. Current: 72% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (59 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 31, 2026 · 11 min read
28 Biological Memory Models and Their AI Analogues DOI 1/10 63 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 5% ○ ≥80% from editorially reviewed sources [t] Trusted 85% ✓ ≥80% from verified, high-quality sources [a] DOI 15% ○ ≥80% have a Digital Object Identifier [b] CrossRef 5% ○ ≥80% indexed in CrossRef [i] Indexed 30% ○ ≥80% have metadata indexed [l] Academic 75% ○ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 20 refs ✓ Minimum 10 references required [w] Words [REQ] 2,763 ✓ Minimum 2,000 words for a full research article. Current: 2,763 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19360007 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 76% ✓ ≥60% of references from 2025–2026. Current: 76% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (52 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Mar 31, 2026 · 14 min read
29 The Future of AI Memory — From Fixed Windows to Persistent State DOI 2/10 65 s t a b i l f r · w d o p h c g m x Badge Metric Value Status Description [s] Reviewed Sources 5% ○ ≥80% from editorially reviewed sources [t] Trusted 91% ✓ ≥80% from verified, high-quality sources [a] DOI 23% ○ ≥80% have a Digital Object Identifier [b] CrossRef 5% ○ ≥80% indexed in CrossRef [i] Indexed 23% ○ ≥80% have metadata indexed [l] Academic 82% ✓ ≥80% from journals/conferences/preprints [f] Free Access 100% ✓ ≥80% are freely accessible [r] References 22 refs ✓ Minimum 10 references required [w] Words [REQ] 2,008 ✓ Minimum 2,000 words for a full research article. Current: 2,008 [d] DOI [REQ] ✓ ✓ Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19363248 [o] ORCID [REQ] ✓ ✓ Author ORCID verified for academic identity [p] Peer Reviewed [REQ] — ✗ Peer reviewed by an assigned reviewer [h] Freshness [REQ] 75% ✓ ≥60% of references from 2025–2026. Current: 75% [c] Data Charts 4 ✓ Original data charts from reproducible analysis (min 2). Current: 4 [g] Code ✓ ✓ Source code available on GitHub [m] Diagrams 3 ✓ Mermaid architecture/flow diagrams. Current: 3 [x] Cited by 0 ○ Referenced by 0 other hub article(s)
Score = Ref Trust (55 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
Technical Research · Apr 1, 2026 · 10 min read
29 published9,932 total views358 min total readingMar 2026 – Apr 2026 published
Scope
Foundation & Benchmarking (1–10): KV-cache fundamentals, attention memory patterns, context window utilisation, long-context retrieval benchmarks, memory degradation curves, compression benchmarks, cross-architecture comparison, prompt caching efficiency, multi-turn memory, and a meta-analysis of context benchmarks
Optimisation Techniques (11–18): Paged attention, grouped-query attention, speculative decoding, semantic prompt caching, token pruning, cross-layer cache sharing, sliding window and compressive caching, flash attention
Infrastructure (19–24): Distributed KV-cache, disaggregated prefill/decode, cache-aware scheduling, memory hierarchy (DRAM/HBM/SSD), cache coherence in multi-tenant systems, production monitoring
Economics & Emerging (25–30): Cost models, cache-augmented retrieval (RAG meets KV-cache), cross-model cache transfer, retrieval-augmented vs attention memory, biological memory analogues, the future of AI memory
Editorial Standards
Each article follows academic academic conventions: formal abstract, structured methodology, empirical evidence or rigorous literature synthesis, proper citations with DOIs, and reproducible analysis where applicable. The series maintains a monochrome design language consistent with the Stabilarity Research Hub editorial identity.
Submit
Researchers and practitioners working on KV-cache optimisation, long-context inference, or memory-efficient architectures are invited to suggest topics, share benchmarks, or propose guest contributions. Contact via the GitHub repository .