Skip to content

Stabilarity Hub

Menu
  • Home
  • Research
    • Healthcare & Life Sciences
      • Medical ML Diagnosis
    • Enterprise & Economics
      • AI Economics
      • Cost-Effective AI
      • Spec-Driven AI
    • Geopolitics & Strategy
      • Anticipatory Intelligence
      • Future of AI
      • Geopolitical Risk Intelligence
    • AI & Future Signals
      • Capability–Adoption Gap
      • AI Observability
      • AI Intelligence Architecture
      • AI Memory
      • Trusted Open Source
    • Data Science & Methods
      • HPF-P Framework
      • Intellectual Data Analysis
      • Reference Evaluation
    • Publications
      • External Publications
    • Robotics & Engineering
      • Open Humanoid
    • Benchmarks & Measurement
      • Universal Intelligence Benchmark
      • Shadow Economy Dynamics
      • Article Quality Science
  • Tools
    • Healthcare & Life Sciences
      • ScanLab
      • AI Data Readiness Assessment
    • Enterprise Strategy
      • AI Use Case Classifier
      • ROI Calculator
      • Risk Calculator
      • Reference Trust Analyzer
    • Portfolio & Analytics
      • HPF Portfolio Optimizer
      • Adoption Gap Monitor
      • Data Mining Method Selector
    • Geopolitics & Prediction
      • War Prediction Model
      • Ukraine Crisis Prediction
      • Gap Analyzer
      • Geopolitical Stability Dashboard
    • Technical & Observability
      • OTel AI Inspector
    • Robotics & Engineering
      • Humanoid Simulation
    • Benchmarks
      • UIB Benchmark Tool
  • API Gateway
  • About
    • Contributors
  • Contact
  • Join Community
  • Terms of Service
  • Login
  • Register
Menu

Category: AI Memory

Research series on AI memory systems — KV-cache, context windows, attention memory, retrieval-augmented memory, and memory-efficient inference architectures

Multi-Turn Memory — How Conversation History Degrades Model Performance

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19195991  55stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted86%✓≥80% from verified, high-quality sources
[a]DOI7%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic71%○≥80% from journals/conferences/preprints
[f]Free Access93%✓≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]1,597✗Minimum 2,000 words for a full research article. Current: 1,597
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]0%✗≥80% of references from 2025–2026. Current: 0%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)

Multi-turn conversation represents the dominant interaction mode for deployed large language models, yet mounting evidence reveals that model performance degrades severely as conversation history accumulates in the KV-cache. This article investigates three research questions: how rapidly task accuracy declines across conversation turns, what mechanisms drive this degradation at the attention an...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19195991 55stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted86%✓≥80% from verified, high-quality sources
[a]DOI7%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic71%○≥80% from journals/conferences/preprints
[f]Free Access93%✓≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]1,597✗Minimum 2,000 words for a full research article. Current: 1,597
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]0%✗≥80% of references from 2025–2026. Current: 0%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Prompt Caching Efficiency — Measuring Reuse Across Real Workloads

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19187992  72stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources9%○≥80% from editorially reviewed sources
[t]Trusted91%✓≥80% from verified, high-quality sources
[a]DOI73%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic9%○≥80% from journals/conferences/preprints
[f]Free Access18%○≥80% are freely accessible
[r]References11 refs✓Minimum 10 references required
[w]Words [REQ]2,628✓Minimum 2,000 words for a full research article. Current: 2,628
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]89%✓≥80% of references from 2025–2026. Current: 89%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Prompt caching has emerged as one of the most impactful optimizations for reducing both cost and latency in large language model inference, with major providers reporting 50-90% cost savings through prefix reuse. Yet the efficiency of prompt caching varies dramatically across workload types, caching strategies, and eviction policies. This article investigates three research questions: how cache...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19187992 72stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources9%○≥80% from editorially reviewed sources
[t]Trusted91%✓≥80% from verified, high-quality sources
[a]DOI73%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic9%○≥80% from journals/conferences/preprints
[f]Free Access18%○≥80% are freely accessible
[r]References11 refs✓Minimum 10 references required
[w]Words [REQ]2,628✓Minimum 2,000 words for a full research article. Current: 2,628
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]89%✓≥80% of references from 2025–2026. Current: 89%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Cross-Architecture Memory Comparison — Llama vs Mistral vs Gemma vs Qwen

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19183148  63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted79%○≥80% from verified, high-quality sources
[a]DOI64%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic7%○≥80% from journals/conferences/preprints
[f]Free Access36%○≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]2,222✓Minimum 2,000 words for a full research article. Current: 2,222
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]58%✗≥80% of references from 2025–2026. Current: 58%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

The proliferation of open-source large language model families in 2026 — each adopting distinct attention mechanisms and KV-cache configurations — creates a fragmented landscape where memory footprint varies by up to 4.6x across architectures at identical context lengths. This article provides a systematic cross-architecture comparison of KV-cache memory behavior across four dominant model fami...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19183148 63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted79%○≥80% from verified, high-quality sources
[a]DOI64%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic7%○≥80% from journals/conferences/preprints
[f]Free Access36%○≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]2,222✓Minimum 2,000 words for a full research article. Current: 2,222
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]58%✗≥80% of references from 2025–2026. Current: 58%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

KV-Cache Compression Benchmarks — Quantization vs Eviction vs Pruning

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19176966  72stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted93%✓≥80% from verified, high-quality sources
[a]DOI87%✓≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic0%○≥80% from journals/conferences/preprints
[f]Free Access13%○≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,393✓Minimum 2,000 words for a full research article. Current: 2,393
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19176966
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]92%✓≥80% of references from 2025–2026. Current: 92%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (75 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

The KV-cache memory bottleneck in large language model inference has generated three competing families of compression techniques — quantization, token eviction, and structured pruning — each claiming substantial memory savings with minimal accuracy loss. This article benchmarks these approaches head-to-head, drawing on 2026 research that provides standardized comparisons across architectures a...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19176966 72stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted93%✓≥80% from verified, high-quality sources
[a]DOI87%✓≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic0%○≥80% from journals/conferences/preprints
[f]Free Access13%○≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,393✓Minimum 2,000 words for a full research article. Current: 2,393
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19176966
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]92%✓≥80% of references from 2025–2026. Current: 92%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (75 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)
AI MemoryRead More
Read more

Memory Degradation Curves — How Accuracy Decays with Context Length

Posted on March 22, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19170557  69stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted87%✓≥80% from verified, high-quality sources
[a]DOI80%✓≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed93%✓≥80% have metadata indexed
[l]Academic0%○≥80% from journals/conferences/preprints
[f]Free Access13%○≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,523✓Minimum 2,000 words for a full research article. Current: 2,523
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19170557
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]92%✓≥80% of references from 2025–2026. Current: 92%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

As large language models advertise context windows spanning millions of tokens, the gap between nominal capacity and effective performance has become a central concern for deployment. This article investigates memory degradation curves — the systematic decay of model accuracy as context length increases — drawing on 2026 research that isolates context length as an independent variable affecting...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19170557 69stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted87%✓≥80% from verified, high-quality sources
[a]DOI80%✓≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed93%✓≥80% have metadata indexed
[l]Academic0%○≥80% from journals/conferences/preprints
[f]Free Access13%○≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,523✓Minimum 2,000 words for a full research article. Current: 2,523
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19170557
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]92%✓≥80% of references from 2025–2026. Current: 92%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)
AI MemoryRead More
Read more

Long-Context Retrieval Benchmarks — Needle-in-Haystack and Beyond

Posted on March 22, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19163187  61stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources17%○≥80% from editorially reviewed sources
[t]Trusted83%✓≥80% from verified, high-quality sources
[a]DOI25%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic67%○≥80% from journals/conferences/preprints
[f]Free Access92%✓≥80% are freely accessible
[r]References12 refs✓Minimum 10 references required
[w]Words [REQ]2,043✓Minimum 2,000 words for a full research article. Current: 2,043
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19163187
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]10%✗≥80% of references from 2025–2026. Current: 10%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

As large language models extend their context windows to millions of tokens, the critical question shifts from capacity to capability: can models actually retrieve and reason over information distributed across vast inputs? This article examines the evolution and current state of long-context retrieval benchmarks in 2026, from the foundational Needle-in-a-Haystack (NIAH) test to sophisticated m...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19163187 61stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources17%○≥80% from editorially reviewed sources
[t]Trusted83%✓≥80% from verified, high-quality sources
[a]DOI25%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic67%○≥80% from journals/conferences/preprints
[f]Free Access92%✓≥80% are freely accessible
[r]References12 refs✓Minimum 10 references required
[w]Words [REQ]2,043✓Minimum 2,000 words for a full research article. Current: 2,043
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19163187
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]10%✗≥80% of references from 2025–2026. Current: 10%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
AI MemoryRead More
Read more

Context Window Utilization — How Much of the Window Do Models Really Use?

Posted on March 22, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19160303  65stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources7%○≥80% from editorially reviewed sources
[t]Trusted93%✓≥80% from verified, high-quality sources
[a]DOI80%✓≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic7%○≥80% from journals/conferences/preprints
[f]Free Access13%○≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,874✓Minimum 2,000 words for a full research article. Current: 2,874
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19160303
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]77%✗≥80% of references from 2025–2026. Current: 77%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (74 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Modern large language models advertise context windows ranging from 128K to 10M tokens, yet empirical benchmarks consistently reveal a substantial gap between advertised capacity and effective utilization. This article presents a systematic analysis of context window utilization across frontier LLMs, examining the divergence between theoretical context length and the operational window within w...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19160303 65stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources7%○≥80% from editorially reviewed sources
[t]Trusted93%✓≥80% from verified, high-quality sources
[a]DOI80%✓≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic7%○≥80% from journals/conferences/preprints
[f]Free Access13%○≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,874✓Minimum 2,000 words for a full research article. Current: 2,874
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19160303
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]77%✗≥80% of references from 2025–2026. Current: 77%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (74 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
AI MemoryRead More
Read more

Attention Memory Patterns — What Models Actually Store in KV-Cache

Posted on March 19, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19116558  70stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources11%○≥80% from editorially reviewed sources
[t]Trusted100%✓≥80% from verified, high-quality sources
[a]DOI95%✓≥80% have a Digital Object Identifier
[b]CrossRef11%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic11%○≥80% from journals/conferences/preprints
[f]Free Access11%○≥80% are freely accessible
[r]References19 refs✓Minimum 10 references required
[w]Words [REQ]2,736✓Minimum 2,000 words for a full research article. Current: 2,736
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19116558
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]39%✗≥80% of references from 2025–2026. Current: 39%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

The key-value (KV) cache is the operational memory of transformer-based large language models (LLMs), storing intermediate attention representations that grow linearly with sequence length and quadratically impact computational cost. Yet what exactly do models store in these key and value vectors, and how uniformly is this information distributed across heads and layers? This article presents a...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19116558 70stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources11%○≥80% from editorially reviewed sources
[t]Trusted100%✓≥80% from verified, high-quality sources
[a]DOI95%✓≥80% have a Digital Object Identifier
[b]CrossRef11%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic11%○≥80% from journals/conferences/preprints
[f]Free Access11%○≥80% are freely accessible
[r]References19 refs✓Minimum 10 references required
[w]Words [REQ]2,736✓Minimum 2,000 words for a full research article. Current: 2,736
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19116558
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]39%✗≥80% of references from 2025–2026. Current: 39%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
AI MemoryRead More
Read more

KV-Cache Fundamentals — How Transformers Remember (and Forget)

Posted on March 19, 2026March 19, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19112532  70stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources7%○≥80% from editorially reviewed sources
[t]Trusted100%✓≥80% from verified, high-quality sources
[a]DOI100%✓≥80% have a Digital Object Identifier
[b]CrossRef14%○≥80% indexed in CrossRef
[i]Indexed93%✓≥80% have metadata indexed
[l]Academic14%○≥80% from journals/conferences/preprints
[f]Free Access0%○≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]2,794✓Minimum 2,000 words for a full research article. Current: 2,794
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19112532
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]57%✗≥80% of references from 2025–2026. Current: 57%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

The key-value (KV) cache is the dominant memory structure enabling efficient autoregressive inference in transformer-based large language models (LLMs). While the self-attention mechanism requires quadratic computation over the full sequence during training, the KV-cache converts inference into a linear-time operation by retaining previously computed key and value projections. This article prov...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19112532 70stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources7%○≥80% from editorially reviewed sources
[t]Trusted100%✓≥80% from verified, high-quality sources
[a]DOI100%✓≥80% have a Digital Object Identifier
[b]CrossRef14%○≥80% indexed in CrossRef
[i]Indexed93%✓≥80% have metadata indexed
[l]Academic14%○≥80% from journals/conferences/preprints
[f]Free Access0%○≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]2,794✓Minimum 2,000 words for a full research article. Current: 2,794
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19112532
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]57%✗≥80% of references from 2025–2026. Current: 57%
[c]Data Charts0○Original data charts from reproducible analysis (min 2). Current: 0
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)
AI MemoryRead More
Read more

Posts pagination

  • Previous
  • 1
  • 2
  • 3

Recent Posts

  • Comparative Benchmarking: HPF-P vs Traditional Portfolio Methods
  • The Future of Intelligence Measurement: A 10-Year Projection
  • All-You-Can-Eat Agentic AI: The Economics of Unlimited Licensing in an Era of Non-Deterministic Costs
  • The Future of AI Memory — From Fixed Windows to Persistent State
  • FLAI & GROMUS Mathematical Glossary: Complete Variable Reference for Social Media Trend Prediction Models

Research Index

Browse all articles — filter by score, badges, views, series →

Categories

  • ai
  • AI Economics
  • AI Memory
  • AI Observability & Monitoring
  • AI Portfolio Optimisation
  • Ancient IT History
  • Anticipatory Intelligence
  • Article Quality Science
  • Capability-Adoption Gap
  • Cost-Effective Enterprise AI
  • Future of AI
  • Geopolitical Risk Intelligence
  • hackathon
  • healthcare
  • HPF-P Framework
  • innovation
  • Intellectual Data Analysis
  • medai
  • Medical ML Diagnosis
  • Open Humanoid
  • Research
  • ScanLab
  • Shadow Economy Dynamics
  • Spec-Driven AI Development
  • Technology
  • Trusted Open Source
  • Uncategorized
  • Universal Intelligence Benchmark
  • War Prediction

About

Stabilarity Research Hub is dedicated to advancing the frontiers of AI, from Medical ML to Anticipatory Intelligence. Our mission is to build robust and efficient AI systems for a safer future.

Language

  • Medical ML Diagnosis
  • AI Economics
  • Cost-Effective AI
  • Anticipatory Intelligence
  • Data Mining
  • 🔑 API for Researchers

Connect

Facebook Group: Join

Telegram: @Y0man

Email: contact@stabilarity.com

© 2026 Stabilarity Research Hub

© 2026 Stabilarity Hub | Powered by Superbs Personal Blog theme
Stabilarity Research Hub

Open research platform for AI, machine learning, and enterprise technology. All articles are preprints with DOI registration via Zenodo.

185+
Articles
8
Series
DOI
Archived

Research Series

  • Medical ML Diagnosis
  • Anticipatory Intelligence
  • Intellectual Data Analysis
  • AI Economics
  • Cost-Effective AI
  • Spec-Driven AI

Community

  • Join Community
  • MedAI Hack
  • Zenodo Archive
  • Contact Us

Legal

  • Terms of Service
  • About Us
  • Contact
Operated by
Stabilarity OÜ
Registry: 17150040
Estonian Business Register →
© 2026 Stabilarity OÜ. Content licensed under CC BY 4.0
Terms About Contact
Language: 🇬🇧 EN 🇺🇦 UK 🇩🇪 DE 🇵🇱 PL 🇫🇷 FR
Display Settings
Theme
Light
Dark
Auto
Width
Default
Column
Wide
Text 100%

We use cookies to enhance your experience and analyze site traffic. By clicking "Accept All", you consent to our use of cookies. Read our Terms of Service for more information.