AI Memory - Stabilarity Hub

Multi-Turn Memory — How Conversation History Degrades Model Performance

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19195991 55stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	86%	✓	≥80% from verified, high-quality sources
[a]	DOI	7%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	71%	○	≥80% from journals/conferences/preprints
[f]	Free Access	93%	✓	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,597	✗	Minimum 2,000 words for a full research article. Current: 1,597
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	0%	✗	≥80% of references from 2025–2026. Current: 0%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)

Multi-turn conversation represents the dominant interaction mode for deployed large language models, yet mounting evidence reveals that model performance degrades severely as conversation history accumulates in the KV-cache. This article investigates three research questions: how rapidly task accuracy declines across conversation turns, what mechanisms drive this degradation at the attention an...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19195991 55stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	86%	✓	≥80% from verified, high-quality sources
[a]	DOI	7%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	71%	○	≥80% from journals/conferences/preprints
[f]	Free Access	93%	✓	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,597	✗	Minimum 2,000 words for a full research article. Current: 1,597
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	0%	✗	≥80% of references from 2025–2026. Current: 0%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Prompt Caching Efficiency — Measuring Reuse Across Real Workloads

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19187992 72stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	9%	○	≥80% from editorially reviewed sources
[t]	Trusted	91%	✓	≥80% from verified, high-quality sources
[a]	DOI	73%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	9%	○	≥80% from journals/conferences/preprints
[f]	Free Access	18%	○	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,628	✓	Minimum 2,000 words for a full research article. Current: 2,628
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	89%	✓	≥80% of references from 2025–2026. Current: 89%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Prompt caching has emerged as one of the most impactful optimizations for reducing both cost and latency in large language model inference, with major providers reporting 50-90% cost savings through prefix reuse. Yet the efficiency of prompt caching varies dramatically across workload types, caching strategies, and eviction policies. This article investigates three research questions: how cache...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19187992 72stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	9%	○	≥80% from editorially reviewed sources
[t]	Trusted	91%	✓	≥80% from verified, high-quality sources
[a]	DOI	73%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	9%	○	≥80% from journals/conferences/preprints
[f]	Free Access	18%	○	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,628	✓	Minimum 2,000 words for a full research article. Current: 2,628
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	89%	✓	≥80% of references from 2025–2026. Current: 89%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Cross-Architecture Memory Comparison — Llama vs Mistral vs Gemma vs Qwen

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19183148 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	79%	○	≥80% from verified, high-quality sources
[a]	DOI	64%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	7%	○	≥80% from journals/conferences/preprints
[f]	Free Access	36%	○	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,222	✓	Minimum 2,000 words for a full research article. Current: 2,222
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	58%	✗	≥80% of references from 2025–2026. Current: 58%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

The proliferation of open-source large language model families in 2026 — each adopting distinct attention mechanisms and KV-cache configurations — creates a fragmented landscape where memory footprint varies by up to 4.6x across architectures at identical context lengths. This article provides a systematic cross-architecture comparison of KV-cache memory behavior across four dominant model fami...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19183148 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	79%	○	≥80% from verified, high-quality sources
[a]	DOI	64%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	7%	○	≥80% from journals/conferences/preprints
[f]	Free Access	36%	○	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,222	✓	Minimum 2,000 words for a full research article. Current: 2,222
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	58%	✗	≥80% of references from 2025–2026. Current: 58%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

KV-Cache Compression Benchmarks — Quantization vs Eviction vs Pruning

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19176966 72stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	93%	✓	≥80% from verified, high-quality sources
[a]	DOI	87%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	0%	○	≥80% from journals/conferences/preprints
[f]	Free Access	13%	○	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,393	✓	Minimum 2,000 words for a full research article. Current: 2,393
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19176966
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	92%	✓	≥80% of references from 2025–2026. Current: 92%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (75 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

The KV-cache memory bottleneck in large language model inference has generated three competing families of compression techniques — quantization, token eviction, and structured pruning — each claiming substantial memory savings with minimal accuracy loss. This article benchmarks these approaches head-to-head, drawing on 2026 research that provides standardized comparisons across architectures a...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19176966 72stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	93%	✓	≥80% from verified, high-quality sources
[a]	DOI	87%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	0%	○	≥80% from journals/conferences/preprints
[f]	Free Access	13%	○	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,393	✓	Minimum 2,000 words for a full research article. Current: 2,393
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19176966
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	92%	✓	≥80% of references from 2025–2026. Current: 92%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (75 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

AI Memory Read More

Memory Degradation Curves — How Accuracy Decays with Context Length

Posted on March 22, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19170557 69stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	87%	✓	≥80% from verified, high-quality sources
[a]	DOI	80%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	93%	✓	≥80% have metadata indexed
[l]	Academic	0%	○	≥80% from journals/conferences/preprints
[f]	Free Access	13%	○	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,523	✓	Minimum 2,000 words for a full research article. Current: 2,523
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19170557
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	92%	✓	≥80% of references from 2025–2026. Current: 92%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

As large language models advertise context windows spanning millions of tokens, the gap between nominal capacity and effective performance has become a central concern for deployment. This article investigates memory degradation curves — the systematic decay of model accuracy as context length increases — drawing on 2026 research that isolates context length as an independent variable affecting...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19170557 69stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	87%	✓	≥80% from verified, high-quality sources
[a]	DOI	80%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	93%	✓	≥80% have metadata indexed
[l]	Academic	0%	○	≥80% from journals/conferences/preprints
[f]	Free Access	13%	○	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,523	✓	Minimum 2,000 words for a full research article. Current: 2,523
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19170557
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	92%	✓	≥80% of references from 2025–2026. Current: 92%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

AI Memory Read More

Long-Context Retrieval Benchmarks — Needle-in-Haystack and Beyond

Posted on March 22, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19163187 61stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	17%	○	≥80% from editorially reviewed sources
[t]	Trusted	83%	✓	≥80% from verified, high-quality sources
[a]	DOI	25%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	67%	○	≥80% from journals/conferences/preprints
[f]	Free Access	92%	✓	≥80% are freely accessible
[r]	References	12 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,043	✓	Minimum 2,000 words for a full research article. Current: 2,043
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19163187
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	10%	✗	≥80% of references from 2025–2026. Current: 10%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

As large language models extend their context windows to millions of tokens, the critical question shifts from capacity to capability: can models actually retrieve and reason over information distributed across vast inputs? This article examines the evolution and current state of long-context retrieval benchmarks in 2026, from the foundational Needle-in-a-Haystack (NIAH) test to sophisticated m...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19163187 61stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	17%	○	≥80% from editorially reviewed sources
[t]	Trusted	83%	✓	≥80% from verified, high-quality sources
[a]	DOI	25%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	67%	○	≥80% from journals/conferences/preprints
[f]	Free Access	92%	✓	≥80% are freely accessible
[r]	References	12 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,043	✓	Minimum 2,000 words for a full research article. Current: 2,043
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19163187
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	10%	✗	≥80% of references from 2025–2026. Current: 10%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

AI Memory Read More

Context Window Utilization — How Much of the Window Do Models Really Use?

Posted on March 22, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19160303 65stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	7%	○	≥80% from editorially reviewed sources
[t]	Trusted	93%	✓	≥80% from verified, high-quality sources
[a]	DOI	80%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	7%	○	≥80% from journals/conferences/preprints
[f]	Free Access	13%	○	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,874	✓	Minimum 2,000 words for a full research article. Current: 2,874
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19160303
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	77%	✗	≥80% of references from 2025–2026. Current: 77%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (74 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Modern large language models advertise context windows ranging from 128K to 10M tokens, yet empirical benchmarks consistently reveal a substantial gap between advertised capacity and effective utilization. This article presents a systematic analysis of context window utilization across frontier LLMs, examining the divergence between theoretical context length and the operational window within w...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19160303 65stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	7%	○	≥80% from editorially reviewed sources
[t]	Trusted	93%	✓	≥80% from verified, high-quality sources
[a]	DOI	80%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	7%	○	≥80% from journals/conferences/preprints
[f]	Free Access	13%	○	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,874	✓	Minimum 2,000 words for a full research article. Current: 2,874
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19160303
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	77%	✗	≥80% of references from 2025–2026. Current: 77%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (74 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

AI Memory Read More

Attention Memory Patterns — What Models Actually Store in KV-Cache

Posted on March 19, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19116558 70stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	11%	○	≥80% from editorially reviewed sources
[t]	Trusted	100%	✓	≥80% from verified, high-quality sources
[a]	DOI	95%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	11%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	11%	○	≥80% from journals/conferences/preprints
[f]	Free Access	11%	○	≥80% are freely accessible
[r]	References	19 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,736	✓	Minimum 2,000 words for a full research article. Current: 2,736
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19116558
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	39%	✗	≥80% of references from 2025–2026. Current: 39%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

The key-value (KV) cache is the operational memory of transformer-based large language models (LLMs), storing intermediate attention representations that grow linearly with sequence length and quadratically impact computational cost. Yet what exactly do models store in these key and value vectors, and how uniformly is this information distributed across heads and layers? This article presents a...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19116558 70stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	11%	○	≥80% from editorially reviewed sources
[t]	Trusted	100%	✓	≥80% from verified, high-quality sources
[a]	DOI	95%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	11%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	11%	○	≥80% from journals/conferences/preprints
[f]	Free Access	11%	○	≥80% are freely accessible
[r]	References	19 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,736	✓	Minimum 2,000 words for a full research article. Current: 2,736
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19116558
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	39%	✗	≥80% of references from 2025–2026. Current: 39%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

AI Memory Read More

KV-Cache Fundamentals — How Transformers Remember (and Forget)

Posted on March 19, 2026March 19, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19112532 70stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	7%	○	≥80% from editorially reviewed sources
[t]	Trusted	100%	✓	≥80% from verified, high-quality sources
[a]	DOI	100%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	14%	○	≥80% indexed in CrossRef
[i]	Indexed	93%	✓	≥80% have metadata indexed
[l]	Academic	14%	○	≥80% from journals/conferences/preprints
[f]	Free Access	0%	○	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,794	✓	Minimum 2,000 words for a full research article. Current: 2,794
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19112532
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	57%	✗	≥80% of references from 2025–2026. Current: 57%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

The key-value (KV) cache is the dominant memory structure enabling efficient autoregressive inference in transformer-based large language models (LLMs). While the self-attention mechanism requires quadratic computation over the full sequence during training, the KV-cache converts inference into a linear-time operation by retaining previously computed key and value projections. This article prov...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19112532 70stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	7%	○	≥80% from editorially reviewed sources
[t]	Trusted	100%	✓	≥80% from verified, high-quality sources
[a]	DOI	100%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	14%	○	≥80% indexed in CrossRef
[i]	Indexed	93%	✓	≥80% have metadata indexed
[l]	Academic	14%	○	≥80% from journals/conferences/preprints
[f]	Free Access	0%	○	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,794	✓	Minimum 2,000 words for a full research article. Current: 2,794
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19112532
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	57%	✗	≥80% of references from 2025–2026. Current: 57%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (82 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

AI Memory Read More