Skip to content

Stabilarity Hub

Menu
  • Home
  • Research
    • Healthcare & Life Sciences
      • Medical ML Diagnosis
    • Enterprise & Economics
      • AI Economics
      • Cost-Effective AI
      • Spec-Driven AI
    • Geopolitics & Strategy
      • Anticipatory Intelligence
      • Future of AI
      • Geopolitical Risk Intelligence
    • AI & Future Signals
      • Capability–Adoption Gap
      • AI Observability
      • AI Intelligence Architecture
      • AI Memory
      • Trusted Open Source
    • Data Science & Methods
      • HPF-P Framework
      • Intellectual Data Analysis
      • Reference Evaluation
    • Publications
      • External Publications
    • Robotics & Engineering
      • Open Humanoid
    • Benchmarks & Measurement
      • Universal Intelligence Benchmark
      • Shadow Economy Dynamics
      • Article Quality Science
  • Tools
    • Healthcare & Life Sciences
      • ScanLab
      • AI Data Readiness Assessment
    • Enterprise Strategy
      • AI Use Case Classifier
      • ROI Calculator
      • Risk Calculator
      • Reference Trust Analyzer
    • Portfolio & Analytics
      • HPF Portfolio Optimizer
      • Adoption Gap Monitor
      • Data Mining Method Selector
    • Geopolitics & Prediction
      • War Prediction Model
      • Ukraine Crisis Prediction
      • Gap Analyzer
      • Geopolitical Stability Dashboard
    • Technical & Observability
      • OTel AI Inspector
    • Robotics & Engineering
      • Humanoid Simulation
    • Benchmarks
      • UIB Benchmark Tool
  • API Gateway
  • About
    • Contributors
  • Contact
  • Join Community
  • Terms of Service
  • Login
  • Register
Menu

Category: AI Memory

Research series on AI memory systems — KV-cache, context windows, attention memory, retrieval-augmented memory, and memory-efficient inference architectures

The Future of AI Memory — From Fixed Windows to Persistent State

Posted on April 1, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19363248  56stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources5%○≥80% from editorially reviewed sources
[t]Trusted55%○≥80% from verified, high-quality sources
[a]DOI20%○≥80% have a Digital Object Identifier
[b]CrossRef5%○≥80% indexed in CrossRef
[i]Indexed10%○≥80% have metadata indexed
[l]Academic80%✓≥80% from journals/conferences/preprints
[f]Free Access95%✓≥80% are freely accessible
[r]References20 refs✓Minimum 10 references required
[w]Words [REQ]2,000✓Minimum 2,000 words for a full research article. Current: 2,000
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19363248
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]88%✓≥80% of references from 2025–2026. Current: 88%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (41 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

The dominant paradigm for AI memory — fixed-size context windows processed through self-attention — faces fundamental scalability barriers as large language models are deployed in long-horizon agentic tasks requiring hundreds of interaction sessions. This article investigates the transition from fixed context windows to persistent memory architectures through three research questions addressing...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19363248 56stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources5%○≥80% from editorially reviewed sources
[t]Trusted55%○≥80% from verified, high-quality sources
[a]DOI20%○≥80% have a Digital Object Identifier
[b]CrossRef5%○≥80% indexed in CrossRef
[i]Indexed10%○≥80% have metadata indexed
[l]Academic80%✓≥80% from journals/conferences/preprints
[f]Free Access95%✓≥80% are freely accessible
[r]References20 refs✓Minimum 10 references required
[w]Words [REQ]2,000✓Minimum 2,000 words for a full research article. Current: 2,000
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19363248
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]88%✓≥80% of references from 2025–2026. Current: 88%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (41 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Biological Memory Models and Their AI Analogues

Posted on March 31, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19360007  51stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources6%○≥80% from editorially reviewed sources
[t]Trusted41%○≥80% from verified, high-quality sources
[a]DOI12%○≥80% have a Digital Object Identifier
[b]CrossRef6%○≥80% indexed in CrossRef
[i]Indexed6%○≥80% have metadata indexed
[l]Academic71%○≥80% from journals/conferences/preprints
[f]Free Access88%✓≥80% are freely accessible
[r]References17 refs✓Minimum 10 references required
[w]Words [REQ]2,727✓Minimum 2,000 words for a full research article. Current: 2,727
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19360007
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]93%✓≥80% of references from 2025–2026. Current: 93%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (33 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

The rapid expansion of AI memory architectures — from KV-caches and retrieval-augmented generation to parametric weight storage — has proceeded largely without systematic reference to the biological memory systems that inspired them. This article investigates three research questions about the structural and functional parallels between biological memory systems (hippocampal-cortical consolidat...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19360007 51stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources6%○≥80% from editorially reviewed sources
[t]Trusted41%○≥80% from verified, high-quality sources
[a]DOI12%○≥80% have a Digital Object Identifier
[b]CrossRef6%○≥80% indexed in CrossRef
[i]Indexed6%○≥80% have metadata indexed
[l]Academic71%○≥80% from journals/conferences/preprints
[f]Free Access88%✓≥80% are freely accessible
[r]References17 refs✓Minimum 10 references required
[w]Words [REQ]2,727✓Minimum 2,000 words for a full research article. Current: 2,727
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19360007
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]93%✓≥80% of references from 2025–2026. Current: 93%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (33 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Retrieval-Augmented Memory vs Pure Attention Memory

Posted on March 31, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19354653  61stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources16%○≥80% from editorially reviewed sources
[t]Trusted68%○≥80% from verified, high-quality sources
[a]DOI26%○≥80% have a Digital Object Identifier
[b]CrossRef16%○≥80% indexed in CrossRef
[i]Indexed21%○≥80% have metadata indexed
[l]Academic74%○≥80% from journals/conferences/preprints
[f]Free Access89%✓≥80% are freely accessible
[r]References19 refs✓Minimum 10 references required
[w]Words [REQ]2,202✓Minimum 2,000 words for a full research article. Current: 2,202
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19354653
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]87%✓≥80% of references from 2025–2026. Current: 87%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (49 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

The expansion of large language model context windows to 128K+ tokens has reopened a fundamental architectural question: should AI systems remember through retrieval from external stores or through attention over internally maintained representations? This article investigates three research questions about the comparative performance of retrieval-augmented memory (RAM) and pure attention memor...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19354653 61stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources16%○≥80% from editorially reviewed sources
[t]Trusted68%○≥80% from verified, high-quality sources
[a]DOI26%○≥80% have a Digital Object Identifier
[b]CrossRef16%○≥80% indexed in CrossRef
[i]Indexed21%○≥80% have metadata indexed
[l]Academic74%○≥80% from journals/conferences/preprints
[f]Free Access89%✓≥80% are freely accessible
[r]References19 refs✓Minimum 10 references required
[w]Words [REQ]2,202✓Minimum 2,000 words for a full research article. Current: 2,202
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19354653
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]87%✓≥80% of references from 2025–2026. Current: 87%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (49 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Cache-Augmented Retrieval — RAG Meets KV-Cache

Posted on March 31, 2026March 31, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19348524  62stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources30%○≥80% from editorially reviewed sources
[t]Trusted55%○≥80% from verified, high-quality sources
[a]DOI40%○≥80% have a Digital Object Identifier
[b]CrossRef30%○≥80% indexed in CrossRef
[i]Indexed35%○≥80% have metadata indexed
[l]Academic55%○≥80% from journals/conferences/preprints
[f]Free Access85%✓≥80% are freely accessible
[r]References20 refs✓Minimum 10 references required
[w]Words [REQ]3,487✓Minimum 2,000 words for a full research article. Current: 3,487
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19348524
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]85%✓≥80% of references from 2025–2026. Current: 85%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (50 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Retrieval-Augmented Generation (RAG) has become the dominant paradigm for grounding large language models in external knowledge, yet its runtime retrieval overhead imposes latency and consistency penalties that limit production deployability. Cache-Augmented Generation (CAG) proposes an inversion of this paradigm: preload all relevant documents into the model's key-value (KV) cache before queri...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19348524 62stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources30%○≥80% from editorially reviewed sources
[t]Trusted55%○≥80% from verified, high-quality sources
[a]DOI40%○≥80% have a Digital Object Identifier
[b]CrossRef30%○≥80% indexed in CrossRef
[i]Indexed35%○≥80% have metadata indexed
[l]Academic55%○≥80% from journals/conferences/preprints
[f]Free Access85%✓≥80% are freely accessible
[r]References20 refs✓Minimum 10 references required
[w]Words [REQ]3,487✓Minimum 2,000 words for a full research article. Current: 3,487
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19348524
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]85%✓≥80% of references from 2025–2026. Current: 85%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (50 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

The Economics of Context Caching — Cost Models and Break-Even

Posted on March 31, 2026March 31, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19343122  87stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources83%✓≥80% from editorially reviewed sources
[t]Trusted94%✓≥80% from verified, high-quality sources
[a]DOI83%✓≥80% have a Digital Object Identifier
[b]CrossRef83%✓≥80% indexed in CrossRef
[i]Indexed86%✓≥80% have metadata indexed
[l]Academic83%✓≥80% from journals/conferences/preprints
[f]Free Access60%○≥80% are freely accessible
[r]References35 refs✓Minimum 10 references required
[w]Words [REQ]2,944✓Minimum 2,000 words for a full research article. Current: 2,944
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19343122
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]84%✓≥80% of references from 2025–2026. Current: 84%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (92 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Context caching has emerged as the primary mechanism for reducing inference costs in large language model (LLM) deployments, yet the economics governing when caching becomes cost-effective remain poorly formalized. This article investigates three research questions addressing (1) how key-value (KV) cache storage costs scale with model architecture and context length, (2) at what request reuse f...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19343122 87stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources83%✓≥80% from editorially reviewed sources
[t]Trusted94%✓≥80% from verified, high-quality sources
[a]DOI83%✓≥80% have a Digital Object Identifier
[b]CrossRef83%✓≥80% indexed in CrossRef
[i]Indexed86%✓≥80% have metadata indexed
[l]Academic83%✓≥80% from journals/conferences/preprints
[f]Free Access60%○≥80% are freely accessible
[r]References35 refs✓Minimum 10 references required
[w]Words [REQ]2,944✓Minimum 2,000 words for a full research article. Current: 2,944
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19343122
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]84%✓≥80% of references from 2025–2026. Current: 84%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (92 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Production Cache Monitoring — Metrics and Capacity Planning

Posted on March 30, 2026 by Admin
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19340506  69stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources48%○≥80% from editorially reviewed sources
[t]Trusted57%○≥80% from verified, high-quality sources
[a]DOI57%○≥80% have a Digital Object Identifier
[b]CrossRef52%○≥80% indexed in CrossRef
[i]Indexed61%○≥80% have metadata indexed
[l]Academic57%○≥80% from journals/conferences/preprints
[f]Free Access35%○≥80% are freely accessible
[r]References23 refs✓Minimum 10 references required
[w]Words [REQ]2,607✓Minimum 2,000 words for a full research article. Current: 2,607
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19340506
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]85%✓≥80% of references from 2025–2026. Current: 85%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (62 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

As key-value (KV) cache systems become the dominant memory consumer in production large language model (LLM) inference, the ability to monitor cache behavior and plan capacity proactively determines whether deployments meet service-level objectives (SLOs) or suffer unpredictable degradation. This article investigates three research questions addressing (1) which monitoring metrics most reliably...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19340506 69stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources48%○≥80% from editorially reviewed sources
[t]Trusted57%○≥80% from verified, high-quality sources
[a]DOI57%○≥80% have a Digital Object Identifier
[b]CrossRef52%○≥80% indexed in CrossRef
[i]Indexed61%○≥80% have metadata indexed
[l]Academic57%○≥80% from journals/conferences/preprints
[f]Free Access35%○≥80% are freely accessible
[r]References23 refs✓Minimum 10 references required
[w]Words [REQ]2,607✓Minimum 2,000 words for a full research article. Current: 2,607
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19340506
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]85%✓≥80% of references from 2025–2026. Current: 85%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (62 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Cache Coherence in Multi-Tenant Deployments

Posted on March 30, 2026March 30, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19336721  68stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources40%○≥80% from editorially reviewed sources
[t]Trusted60%○≥80% from verified, high-quality sources
[a]DOI65%○≥80% have a Digital Object Identifier
[b]CrossRef40%○≥80% indexed in CrossRef
[i]Indexed45%○≥80% have metadata indexed
[l]Academic60%○≥80% from journals/conferences/preprints
[f]Free Access50%○≥80% are freely accessible
[r]References20 refs✓Minimum 10 references required
[w]Words [REQ]2,358✓Minimum 2,000 words for a full research article. Current: 2,358
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19336721
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]82%✓≥80% of references from 2025–2026. Current: 82%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (61 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

As large language model (LLM) inference platforms scale to serve dozens or hundreds of concurrent tenants on shared GPU clusters, the key-value (KV) cache—the dominant consumer of GPU memory—becomes both a performance bottleneck and a security surface. This article investigates cache coherence challenges that arise when multiple tenants share KV-cache state in production LLM serving systems. We...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19336721 68stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources40%○≥80% from editorially reviewed sources
[t]Trusted60%○≥80% from verified, high-quality sources
[a]DOI65%○≥80% have a Digital Object Identifier
[b]CrossRef40%○≥80% indexed in CrossRef
[i]Indexed45%○≥80% have metadata indexed
[l]Academic60%○≥80% from journals/conferences/preprints
[f]Free Access50%○≥80% are freely accessible
[r]References20 refs✓Minimum 10 references required
[w]Words [REQ]2,358✓Minimum 2,000 words for a full research article. Current: 2,358
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19336721
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]82%✓≥80% of references from 2025–2026. Current: 82%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (61 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Memory Hierarchy — DRAM, HBM, and SSD-Backed Caches

Posted on March 30, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19329971  53stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources15%○≥80% from editorially reviewed sources
[t]Trusted54%○≥80% from verified, high-quality sources
[a]DOI38%○≥80% have a Digital Object Identifier
[b]CrossRef15%○≥80% indexed in CrossRef
[i]Indexed23%○≥80% have metadata indexed
[l]Academic54%○≥80% from journals/conferences/preprints
[f]Free Access69%○≥80% are freely accessible
[r]References13 refs✓Minimum 10 references required
[w]Words [REQ]1,733✗Minimum 2,000 words for a full research article. Current: 1,733
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19329971
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]80%✓≥80% of references from 2025–2026. Current: 80%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (45 × 60%) + Required (3/5 × 30%) + Optional (3/4 × 10%)

Large language model inference demands massive key-value (KV) cache storage that frequently exceeds GPU high-bandwidth memory (HBM) capacity, forcing system designers to exploit multi-tier memory hierarchies spanning HBM, host DRAM, and NVMe SSDs. This article investigates three research questions: how bandwidth and latency characteristics of each memory tier constrain KV cache serving throughp...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19329971 53stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources15%○≥80% from editorially reviewed sources
[t]Trusted54%○≥80% from verified, high-quality sources
[a]DOI38%○≥80% have a Digital Object Identifier
[b]CrossRef15%○≥80% indexed in CrossRef
[i]Indexed23%○≥80% have metadata indexed
[l]Academic54%○≥80% from journals/conferences/preprints
[f]Free Access69%○≥80% are freely accessible
[r]References13 refs✓Minimum 10 references required
[w]Words [REQ]1,733✗Minimum 2,000 words for a full research article. Current: 1,733
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19329971
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]80%✓≥80% of references from 2025–2026. Current: 80%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (45 × 60%) + Required (3/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Cache-Aware Request Scheduling and Batching

Posted on March 30, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19325142  74stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources50%○≥80% from editorially reviewed sources
[t]Trusted72%○≥80% from verified, high-quality sources
[a]DOI67%○≥80% have a Digital Object Identifier
[b]CrossRef50%○≥80% indexed in CrossRef
[i]Indexed56%○≥80% have metadata indexed
[l]Academic72%○≥80% from journals/conferences/preprints
[f]Free Access67%○≥80% are freely accessible
[r]References18 refs✓Minimum 10 references required
[w]Words [REQ]2,876✓Minimum 2,000 words for a full research article. Current: 2,876
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19325142
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]80%✓≥80% of references from 2025–2026. Current: 80%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Efficient large language model (LLM) inference depends critically on how requests are scheduled and batched relative to the key-value (KV) cache state across GPU memory. Traditional scheduling strategies — round-robin, least-loaded, and even continuous batching — treat the KV cache as a passive byproduct of inference rather than an active scheduling constraint. This article investigates three r...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19325142 74stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources50%○≥80% from editorially reviewed sources
[t]Trusted72%○≥80% from verified, high-quality sources
[a]DOI67%○≥80% have a Digital Object Identifier
[b]CrossRef50%○≥80% indexed in CrossRef
[i]Indexed56%○≥80% have metadata indexed
[l]Academic72%○≥80% from journals/conferences/preprints
[f]Free Access67%○≥80% are freely accessible
[r]References18 refs✓Minimum 10 references required
[w]Words [REQ]2,876✓Minimum 2,000 words for a full research article. Current: 2,876
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19325142
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]80%✓≥80% of references from 2025–2026. Current: 80%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Disaggregated Prefill and Decode Architectures

Posted on March 29, 2026March 29, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19316904  71stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources56%○≥80% from editorially reviewed sources
[t]Trusted56%○≥80% from verified, high-quality sources
[a]DOI81%✓≥80% have a Digital Object Identifier
[b]CrossRef56%○≥80% indexed in CrossRef
[i]Indexed56%○≥80% have metadata indexed
[l]Academic50%○≥80% from journals/conferences/preprints
[f]Free Access19%○≥80% are freely accessible
[r]References16 refs✓Minimum 10 references required
[w]Words [REQ]2,157✓Minimum 2,000 words for a full research article. Current: 2,157
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19316904
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]85%✓≥80% of references from 2025–2026. Current: 85%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)

Large language model inference comprises two computationally distinct phases — prefill and decode — that exhibit fundamentally different hardware utilization profiles. Colocating both phases on the same GPU leads to resource contention and suboptimal utilization, a problem that disaggregated architectures address by separating prefill and decode onto dedicated hardware pools. This article inves...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19316904 71stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources56%○≥80% from editorially reviewed sources
[t]Trusted56%○≥80% from verified, high-quality sources
[a]DOI81%✓≥80% have a Digital Object Identifier
[b]CrossRef56%○≥80% indexed in CrossRef
[i]Indexed56%○≥80% have metadata indexed
[l]Academic50%○≥80% from journals/conferences/preprints
[f]Free Access19%○≥80% are freely accessible
[r]References16 refs✓Minimum 10 references required
[w]Words [REQ]2,157✓Minimum 2,000 words for a full research article. Current: 2,157
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19316904
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]85%✓≥80% of references from 2025–2026. Current: 85%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code✓✓Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (4/5 × 30%) + Optional (3/4 × 10%)
AI MemoryRead More
Read more

Posts pagination

  • 1
  • 2
  • 3
  • Next

Recent Posts

  • Comparative Benchmarking: HPF-P vs Traditional Portfolio Methods
  • The Future of Intelligence Measurement: A 10-Year Projection
  • All-You-Can-Eat Agentic AI: The Economics of Unlimited Licensing in an Era of Non-Deterministic Costs
  • The Future of AI Memory — From Fixed Windows to Persistent State
  • FLAI & GROMUS Mathematical Glossary: Complete Variable Reference for Social Media Trend Prediction Models

Research Index

Browse all articles — filter by score, badges, views, series →

Categories

  • ai
  • AI Economics
  • AI Memory
  • AI Observability & Monitoring
  • AI Portfolio Optimisation
  • Ancient IT History
  • Anticipatory Intelligence
  • Article Quality Science
  • Capability-Adoption Gap
  • Cost-Effective Enterprise AI
  • Future of AI
  • Geopolitical Risk Intelligence
  • hackathon
  • healthcare
  • HPF-P Framework
  • innovation
  • Intellectual Data Analysis
  • medai
  • Medical ML Diagnosis
  • Open Humanoid
  • Research
  • ScanLab
  • Shadow Economy Dynamics
  • Spec-Driven AI Development
  • Technology
  • Trusted Open Source
  • Uncategorized
  • Universal Intelligence Benchmark
  • War Prediction

About

Stabilarity Research Hub is dedicated to advancing the frontiers of AI, from Medical ML to Anticipatory Intelligence. Our mission is to build robust and efficient AI systems for a safer future.

Language

  • Medical ML Diagnosis
  • AI Economics
  • Cost-Effective AI
  • Anticipatory Intelligence
  • Data Mining
  • 🔑 API for Researchers

Connect

Facebook Group: Join

Telegram: @Y0man

Email: contact@stabilarity.com

© 2026 Stabilarity Research Hub

© 2026 Stabilarity Hub | Powered by Superbs Personal Blog theme
Stabilarity Research Hub

Open research platform for AI, machine learning, and enterprise technology. All articles are preprints with DOI registration via Zenodo.

185+
Articles
8
Series
DOI
Archived

Research Series

  • Medical ML Diagnosis
  • Anticipatory Intelligence
  • Intellectual Data Analysis
  • AI Economics
  • Cost-Effective AI
  • Spec-Driven AI

Community

  • Join Community
  • MedAI Hack
  • Zenodo Archive
  • Contact Us

Legal

  • Terms of Service
  • About Us
  • Contact
Operated by
Stabilarity OÜ
Registry: 17150040
Estonian Business Register →
© 2026 Stabilarity OÜ. Content licensed under CC BY 4.0
Terms About Contact
Language: 🇬🇧 EN 🇺🇦 UK 🇩🇪 DE 🇵🇱 PL 🇫🇷 FR
Display Settings
Theme
Light
Dark
Auto
Width
Default
Column
Wide
Text 100%

We use cookies to enhance your experience and analyze site traffic. By clicking "Accept All", you consent to our use of cookies. Read our Terms of Service for more information.