AI Memory - Stabilarity Hub

The Future of AI Memory — From Fixed Windows to Persistent State

Posted on April 1, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19363248 56stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	5%	○	≥80% from editorially reviewed sources
[t]	Trusted	55%	○	≥80% from verified, high-quality sources
[a]	DOI	20%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	5%	○	≥80% indexed in CrossRef
[i]	Indexed	10%	○	≥80% have metadata indexed
[l]	Academic	80%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	95%	✓	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,000	✓	Minimum 2,000 words for a full research article. Current: 2,000
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19363248
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	88%	✓	≥80% of references from 2025–2026. Current: 88%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (41 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

The dominant paradigm for AI memory — fixed-size context windows processed through self-attention — faces fundamental scalability barriers as large language models are deployed in long-horizon agentic tasks requiring hundreds of interaction sessions. This article investigates the transition from fixed context windows to persistent memory architectures through three research questions addressing...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19363248 56stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	5%	○	≥80% from editorially reviewed sources
[t]	Trusted	55%	○	≥80% from verified, high-quality sources
[a]	DOI	20%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	5%	○	≥80% indexed in CrossRef
[i]	Indexed	10%	○	≥80% have metadata indexed
[l]	Academic	80%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	95%	✓	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,000	✓	Minimum 2,000 words for a full research article. Current: 2,000
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19363248
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	88%	✓	≥80% of references from 2025–2026. Current: 88%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (41 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Biological Memory Models and Their AI Analogues

Posted on March 31, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19360007 51stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	6%	○	≥80% from editorially reviewed sources
[t]	Trusted	41%	○	≥80% from verified, high-quality sources
[a]	DOI	12%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	6%	○	≥80% indexed in CrossRef
[i]	Indexed	6%	○	≥80% have metadata indexed
[l]	Academic	71%	○	≥80% from journals/conferences/preprints
[f]	Free Access	88%	✓	≥80% are freely accessible
[r]	References	17 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,727	✓	Minimum 2,000 words for a full research article. Current: 2,727
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19360007
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	93%	✓	≥80% of references from 2025–2026. Current: 93%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (33 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

The rapid expansion of AI memory architectures — from KV-caches and retrieval-augmented generation to parametric weight storage — has proceeded largely without systematic reference to the biological memory systems that inspired them. This article investigates three research questions about the structural and functional parallels between biological memory systems (hippocampal-cortical consolidat...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19360007 51stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	6%	○	≥80% from editorially reviewed sources
[t]	Trusted	41%	○	≥80% from verified, high-quality sources
[a]	DOI	12%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	6%	○	≥80% indexed in CrossRef
[i]	Indexed	6%	○	≥80% have metadata indexed
[l]	Academic	71%	○	≥80% from journals/conferences/preprints
[f]	Free Access	88%	✓	≥80% are freely accessible
[r]	References	17 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,727	✓	Minimum 2,000 words for a full research article. Current: 2,727
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19360007
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	93%	✓	≥80% of references from 2025–2026. Current: 93%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (33 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Retrieval-Augmented Memory vs Pure Attention Memory

Posted on March 31, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19354653 61stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	16%	○	≥80% from editorially reviewed sources
[t]	Trusted	68%	○	≥80% from verified, high-quality sources
[a]	DOI	26%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	16%	○	≥80% indexed in CrossRef
[i]	Indexed	21%	○	≥80% have metadata indexed
[l]	Academic	74%	○	≥80% from journals/conferences/preprints
[f]	Free Access	89%	✓	≥80% are freely accessible
[r]	References	19 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,202	✓	Minimum 2,000 words for a full research article. Current: 2,202
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19354653
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	87%	✓	≥80% of references from 2025–2026. Current: 87%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (49 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

The expansion of large language model context windows to 128K+ tokens has reopened a fundamental architectural question: should AI systems remember through retrieval from external stores or through attention over internally maintained representations? This article investigates three research questions about the comparative performance of retrieval-augmented memory (RAM) and pure attention memor...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19354653 61stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	16%	○	≥80% from editorially reviewed sources
[t]	Trusted	68%	○	≥80% from verified, high-quality sources
[a]	DOI	26%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	16%	○	≥80% indexed in CrossRef
[i]	Indexed	21%	○	≥80% have metadata indexed
[l]	Academic	74%	○	≥80% from journals/conferences/preprints
[f]	Free Access	89%	✓	≥80% are freely accessible
[r]	References	19 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,202	✓	Minimum 2,000 words for a full research article. Current: 2,202
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19354653
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	87%	✓	≥80% of references from 2025–2026. Current: 87%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (49 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Cache-Augmented Retrieval — RAG Meets KV-Cache

Posted on March 31, 2026March 31, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19348524 62stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	30%	○	≥80% from editorially reviewed sources
[t]	Trusted	55%	○	≥80% from verified, high-quality sources
[a]	DOI	40%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	30%	○	≥80% indexed in CrossRef
[i]	Indexed	35%	○	≥80% have metadata indexed
[l]	Academic	55%	○	≥80% from journals/conferences/preprints
[f]	Free Access	85%	✓	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	3,487	✓	Minimum 2,000 words for a full research article. Current: 3,487
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19348524
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	85%	✓	≥80% of references from 2025–2026. Current: 85%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (50 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Retrieval-Augmented Generation (RAG) has become the dominant paradigm for grounding large language models in external knowledge, yet its runtime retrieval overhead imposes latency and consistency penalties that limit production deployability. Cache-Augmented Generation (CAG) proposes an inversion of this paradigm: preload all relevant documents into the model's key-value (KV) cache before queri...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19348524 62stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	30%	○	≥80% from editorially reviewed sources
[t]	Trusted	55%	○	≥80% from verified, high-quality sources
[a]	DOI	40%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	30%	○	≥80% indexed in CrossRef
[i]	Indexed	35%	○	≥80% have metadata indexed
[l]	Academic	55%	○	≥80% from journals/conferences/preprints
[f]	Free Access	85%	✓	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	3,487	✓	Minimum 2,000 words for a full research article. Current: 3,487
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19348524
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	85%	✓	≥80% of references from 2025–2026. Current: 85%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (50 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

The Economics of Context Caching — Cost Models and Break-Even

Posted on March 31, 2026March 31, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19343122 87stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	83%	✓	≥80% from editorially reviewed sources
[t]	Trusted	94%	✓	≥80% from verified, high-quality sources
[a]	DOI	83%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	83%	✓	≥80% indexed in CrossRef
[i]	Indexed	86%	✓	≥80% have metadata indexed
[l]	Academic	83%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	60%	○	≥80% are freely accessible
[r]	References	35 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,944	✓	Minimum 2,000 words for a full research article. Current: 2,944
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19343122
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	84%	✓	≥80% of references from 2025–2026. Current: 84%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (92 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Context caching has emerged as the primary mechanism for reducing inference costs in large language model (LLM) deployments, yet the economics governing when caching becomes cost-effective remain poorly formalized. This article investigates three research questions addressing (1) how key-value (KV) cache storage costs scale with model architecture and context length, (2) at what request reuse f...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19343122 87stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	83%	✓	≥80% from editorially reviewed sources
[t]	Trusted	94%	✓	≥80% from verified, high-quality sources
[a]	DOI	83%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	83%	✓	≥80% indexed in CrossRef
[i]	Indexed	86%	✓	≥80% have metadata indexed
[l]	Academic	83%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	60%	○	≥80% are freely accessible
[r]	References	35 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,944	✓	Minimum 2,000 words for a full research article. Current: 2,944
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19343122
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	84%	✓	≥80% of references from 2025–2026. Current: 84%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (92 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Production Cache Monitoring — Metrics and Capacity Planning

Posted on March 30, 2026 by Admin

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19340506 69stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	48%	○	≥80% from editorially reviewed sources
[t]	Trusted	57%	○	≥80% from verified, high-quality sources
[a]	DOI	57%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	52%	○	≥80% indexed in CrossRef
[i]	Indexed	61%	○	≥80% have metadata indexed
[l]	Academic	57%	○	≥80% from journals/conferences/preprints
[f]	Free Access	35%	○	≥80% are freely accessible
[r]	References	23 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,607	✓	Minimum 2,000 words for a full research article. Current: 2,607
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19340506
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	85%	✓	≥80% of references from 2025–2026. Current: 85%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (62 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

As key-value (KV) cache systems become the dominant memory consumer in production large language model (LLM) inference, the ability to monitor cache behavior and plan capacity proactively determines whether deployments meet service-level objectives (SLOs) or suffer unpredictable degradation. This article investigates three research questions addressing (1) which monitoring metrics most reliably...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19340506 69stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	48%	○	≥80% from editorially reviewed sources
[t]	Trusted	57%	○	≥80% from verified, high-quality sources
[a]	DOI	57%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	52%	○	≥80% indexed in CrossRef
[i]	Indexed	61%	○	≥80% have metadata indexed
[l]	Academic	57%	○	≥80% from journals/conferences/preprints
[f]	Free Access	35%	○	≥80% are freely accessible
[r]	References	23 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,607	✓	Minimum 2,000 words for a full research article. Current: 2,607
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19340506
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	85%	✓	≥80% of references from 2025–2026. Current: 85%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (62 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Cache Coherence in Multi-Tenant Deployments

Posted on March 30, 2026March 30, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19336721 68stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	40%	○	≥80% from editorially reviewed sources
[t]	Trusted	60%	○	≥80% from verified, high-quality sources
[a]	DOI	65%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	40%	○	≥80% indexed in CrossRef
[i]	Indexed	45%	○	≥80% have metadata indexed
[l]	Academic	60%	○	≥80% from journals/conferences/preprints
[f]	Free Access	50%	○	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,358	✓	Minimum 2,000 words for a full research article. Current: 2,358
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19336721
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	82%	✓	≥80% of references from 2025–2026. Current: 82%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (61 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

As large language model (LLM) inference platforms scale to serve dozens or hundreds of concurrent tenants on shared GPU clusters, the key-value (KV) cache—the dominant consumer of GPU memory—becomes both a performance bottleneck and a security surface. This article investigates cache coherence challenges that arise when multiple tenants share KV-cache state in production LLM serving systems. We...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19336721 68stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	40%	○	≥80% from editorially reviewed sources
[t]	Trusted	60%	○	≥80% from verified, high-quality sources
[a]	DOI	65%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	40%	○	≥80% indexed in CrossRef
[i]	Indexed	45%	○	≥80% have metadata indexed
[l]	Academic	60%	○	≥80% from journals/conferences/preprints
[f]	Free Access	50%	○	≥80% are freely accessible
[r]	References	20 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,358	✓	Minimum 2,000 words for a full research article. Current: 2,358
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19336721
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	82%	✓	≥80% of references from 2025–2026. Current: 82%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (61 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Memory Hierarchy — DRAM, HBM, and SSD-Backed Caches

Posted on March 30, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19329971 53stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	15%	○	≥80% from editorially reviewed sources
[t]	Trusted	54%	○	≥80% from verified, high-quality sources
[a]	DOI	38%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	15%	○	≥80% indexed in CrossRef
[i]	Indexed	23%	○	≥80% have metadata indexed
[l]	Academic	54%	○	≥80% from journals/conferences/preprints
[f]	Free Access	69%	○	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,733	✗	Minimum 2,000 words for a full research article. Current: 1,733
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19329971
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	80%	✓	≥80% of references from 2025–2026. Current: 80%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (45 × 60%) + Required (3/5 × 30%) + Optional (3/4 × 10%)

Large language model inference demands massive key-value (KV) cache storage that frequently exceeds GPU high-bandwidth memory (HBM) capacity, forcing system designers to exploit multi-tier memory hierarchies spanning HBM, host DRAM, and NVMe SSDs. This article investigates three research questions: how bandwidth and latency characteristics of each memory tier constrain KV cache serving throughp...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19329971 53stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	15%	○	≥80% from editorially reviewed sources
[t]	Trusted	54%	○	≥80% from verified, high-quality sources
[a]	DOI	38%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	15%	○	≥80% indexed in CrossRef
[i]	Indexed	23%	○	≥80% have metadata indexed
[l]	Academic	54%	○	≥80% from journals/conferences/preprints
[f]	Free Access	69%	○	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,733	✗	Minimum 2,000 words for a full research article. Current: 1,733
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19329971
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	80%	✓	≥80% of references from 2025–2026. Current: 80%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (45 × 60%) + Required (3/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Cache-Aware Request Scheduling and Batching

Posted on March 30, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19325142 74stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	50%	○	≥80% from editorially reviewed sources
[t]	Trusted	72%	○	≥80% from verified, high-quality sources
[a]	DOI	67%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	50%	○	≥80% indexed in CrossRef
[i]	Indexed	56%	○	≥80% have metadata indexed
[l]	Academic	72%	○	≥80% from journals/conferences/preprints
[f]	Free Access	67%	○	≥80% are freely accessible
[r]	References	18 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,876	✓	Minimum 2,000 words for a full research article. Current: 2,876
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19325142
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	80%	✓	≥80% of references from 2025–2026. Current: 80%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Efficient large language model (LLM) inference depends critically on how requests are scheduled and batched relative to the key-value (KV) cache state across GPU memory. Traditional scheduling strategies — round-robin, least-loaded, and even continuous batching — treat the KV cache as a passive byproduct of inference rather than an active scheduling constraint. This article investigates three r...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19325142 74stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	50%	○	≥80% from editorially reviewed sources
[t]	Trusted	72%	○	≥80% from verified, high-quality sources
[a]	DOI	67%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	50%	○	≥80% indexed in CrossRef
[i]	Indexed	56%	○	≥80% have metadata indexed
[l]	Academic	72%	○	≥80% from journals/conferences/preprints
[f]	Free Access	67%	○	≥80% are freely accessible
[r]	References	18 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,876	✓	Minimum 2,000 words for a full research article. Current: 2,876
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19325142
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	80%	✓	≥80% of references from 2025–2026. Current: 80%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More

Disaggregated Prefill and Decode Architectures

Posted on March 29, 2026March 29, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19316904 71stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	56%	○	≥80% from editorially reviewed sources
[t]	Trusted	56%	○	≥80% from verified, high-quality sources
[a]	DOI	81%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	56%	○	≥80% indexed in CrossRef
[i]	Indexed	56%	○	≥80% have metadata indexed
[l]	Academic	50%	○	≥80% from journals/conferences/preprints
[f]	Free Access	19%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,157	✓	Minimum 2,000 words for a full research article. Current: 2,157
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19316904
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	85%	✓	≥80% of references from 2025–2026. Current: 85%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Large language model inference comprises two computationally distinct phases — prefill and decode — that exhibit fundamentally different hardware utilization profiles. Colocating both phases on the same GPU leads to resource contention and suboptimal utilization, a problem that disaggregated architectures address by separating prefill and decode onto dedicated hardware pools. This article inves...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19316904 71stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	56%	○	≥80% from editorially reviewed sources
[t]	Trusted	56%	○	≥80% from verified, high-quality sources
[a]	DOI	81%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	56%	○	≥80% indexed in CrossRef
[i]	Indexed	56%	○	≥80% have metadata indexed
[l]	Academic	50%	○	≥80% from journals/conferences/preprints
[f]	Free Access	19%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,157	✓	Minimum 2,000 words for a full research article. Current: 2,157
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19316904
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	85%	✓	≥80% of references from 2025–2026. Current: 85%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	✓	✓	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

AI Memory Read More