Stabilarity Hub - Page 5 of 35 -

Semantic Prompt Caching — Beyond Exact Match

Posted on March 24, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19211071 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	91%	✓	≥80% from verified, high-quality sources
[a]	DOI	9%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	73%	○	≥80% from journals/conferences/preprints
[f]	Free Access	91%	✓	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,328	✓	Minimum 2,000 words for a full research article. Current: 2,328
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19211071
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	0%	✗	≥80% of references from 2025–2026. Current: 0%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Prompt caching has emerged as a critical optimization for large language model (LLM) serving, yet production systems overwhelmingly rely on exact-match strategies that miss semantically equivalent queries. This article investigates semantic prompt caching — systems that identify and serve cached responses for semantically similar (but not identical) prompts using embedding-based similarity dete...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19211071 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	91%	✓	≥80% from verified, high-quality sources
[a]	DOI	9%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	73%	○	≥80% from journals/conferences/preprints
[f]	Free Access	91%	✓	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,328	✓	Minimum 2,000 words for a full research article. Current: 2,328
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19211071
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	0%	✗	≥80% of references from 2025–2026. Current: 0%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Speculative Decoding and Cache Reuse

Posted on March 24, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19210815 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	94%	✓	≥80% from verified, high-quality sources
[a]	DOI	6%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	83%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	94%	✓	≥80% are freely accessible
[r]	References	18 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,662	✓	Minimum 2,000 words for a full research article. Current: 2,662
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19210815
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	13%	✗	≥80% of references from 2025–2026. Current: 13%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Speculative decoding has emerged as a transformative inference optimization that breaks the sequential bottleneck of autoregressive generation by drafting multiple tokens in parallel and verifying them in a single forward pass. This article examines three research questions at the intersection of speculative decoding and KV cache management: how draft-then-verify architectures interact with cac...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19210815 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	94%	✓	≥80% from verified, high-quality sources
[a]	DOI	6%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	83%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	94%	✓	≥80% are freely accessible
[r]	References	18 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,662	✓	Minimum 2,000 words for a full research article. Current: 2,662
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19210815
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	13%	✗	≥80% of references from 2025–2026. Current: 13%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Social and Collaborative Intelligence as a UIB Dimension: Why Theory of Mind Remains the Hardest Benchmark

Posted on March 24, 2026March 24, 2026 by

Benchmark Research

Benchmark Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19209792 62stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	7%	○	≥80% from editorially reviewed sources
[t]	Trusted	87%	✓	≥80% from verified, high-quality sources
[a]	DOI	20%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	60%	○	≥80% from journals/conferences/preprints
[f]	Free Access	80%	✓	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,272	✓	Minimum 2,000 words for a full research article. Current: 2,272
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209792
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	17%	✗	≥80% of references from 2025–2026. Current: 17%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (65 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Current AI evaluation overwhelmingly focuses on individual cognitive tasks — reasoning, coding, mathematics — while neglecting the social and collaborative capabilities that define human intelligence in practice. This article introduces the UIB-Social dimension, a formal evaluation framework for measuring social intelligence in large language models across four sub-dimensions: Theory of Mind (T...

Show moreHide

Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19209792 62stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	7%	○	≥80% from editorially reviewed sources
[t]	Trusted	87%	✓	≥80% from verified, high-quality sources
[a]	DOI	20%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	60%	○	≥80% from journals/conferences/preprints
[f]	Free Access	80%	✓	≥80% are freely accessible
[r]	References	15 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,272	✓	Minimum 2,000 words for a full research article. Current: 2,272
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209792
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	17%	✗	≥80% of references from 2025–2026. Current: 17%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (65 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Universal Intellig…Read More

Grouped-Query Attention — Cache-Efficient Architecture Design

Posted on March 24, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19209159 69stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	5%	○	≥80% from editorially reviewed sources
[t]	Trusted	95%	✓	≥80% from verified, high-quality sources
[a]	DOI	90%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	5%	○	≥80% indexed in CrossRef
[i]	Indexed	90%	✓	≥80% have metadata indexed
[l]	Academic	10%	○	≥80% from journals/conferences/preprints
[f]	Free Access	19%	○	≥80% are freely accessible
[r]	References	21 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,403	✓	Minimum 2,000 words for a full research article. Current: 2,403
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209159
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	42%	✗	≥80% of references from 2025–2026. Current: 42%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (76 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

As large language models scale beyond hundreds of billions of parameters and context windows extend to millions of tokens, the key-value (KV) cache required for attention computation becomes the dominant memory bottleneck during inference. Grouped-Query Attention (GQA) addresses this by allowing multiple query heads to share fewer key-value heads, reducing cache footprint while preserving model...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19209159 69stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	5%	○	≥80% from editorially reviewed sources
[t]	Trusted	95%	✓	≥80% from verified, high-quality sources
[a]	DOI	90%	✓	≥80% have a Digital Object Identifier
[b]	CrossRef	5%	○	≥80% indexed in CrossRef
[i]	Indexed	90%	✓	≥80% have metadata indexed
[l]	Academic	10%	○	≥80% from journals/conferences/preprints
[f]	Free Access	19%	○	≥80% are freely accessible
[r]	References	21 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,403	✓	Minimum 2,000 words for a full research article. Current: 2,403
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209159
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	42%	✗	≥80% of references from 2025–2026. Current: 42%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (76 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Temporal and Planning Intelligence as a UIB Dimension: Why Horizon Length Breaks Modern Reasoning Models

Posted on March 24, 2026 by

Benchmark Research

Benchmark Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19207333 71stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	85%	✓	≥80% from verified, high-quality sources
[a]	DOI	77%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	0%	○	≥80% from journals/conferences/preprints
[f]	Free Access	23%	○	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,339	✓	Minimum 2,000 words for a full research article. Current: 2,339
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19207333
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	80%	✓	≥80% of references from 2025–2026. Current: 80%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Temporal reasoning and long-horizon planning represent perhaps the most consequential gap between current large language models and human cognitive capability. While frontier models achieve near-human performance on short planning tasks (under 15 steps), their accuracy degrades catastrophically beyond 25 planning steps — a phenomenon we term the horizon collapse. This article examines three res...

Show moreHide

Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19207333 71stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	85%	✓	≥80% from verified, high-quality sources
[a]	DOI	77%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	0%	○	≥80% from journals/conferences/preprints
[f]	Free Access	23%	○	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,339	✓	Minimum 2,000 words for a full research article. Current: 2,339
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19207333
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	80%	✓	≥80% of references from 2025–2026. Current: 80%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Universal Intellig…Read More

Paged Attention and Virtual Memory for LLM Inference

Posted on March 24, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19203099 61stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	17%	○	≥80% from editorially reviewed sources
[t]	Trusted	75%	○	≥80% from verified, high-quality sources
[a]	DOI	33%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	17%	○	≥80% indexed in CrossRef
[i]	Indexed	92%	✓	≥80% have metadata indexed
[l]	Academic	50%	○	≥80% from journals/conferences/preprints
[f]	Free Access	67%	○	≥80% are freely accessible
[r]	References	12 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,912	✓	Minimum 2,000 words for a full research article. Current: 2,912
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19203099
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	40%	✗	≥80% of references from 2025–2026. Current: 40%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

As large language models scale to billions of parameters and millions of context tokens, the key-value (KV) cache that stores attention states becomes the dominant memory bottleneck during inference. Traditional contiguous memory allocation for KV caches leads to severe fragmentation — wasting 40-60% of available GPU memory — and fundamentally limits serving throughput. This article investigate...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19203099 61stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	17%	○	≥80% from editorially reviewed sources
[t]	Trusted	75%	○	≥80% from verified, high-quality sources
[a]	DOI	33%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	17%	○	≥80% indexed in CrossRef
[i]	Indexed	92%	✓	≥80% have metadata indexed
[l]	Academic	50%	○	≥80% from journals/conferences/preprints
[f]	Free Access	67%	○	≥80% are freely accessible
[r]	References	12 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,912	✓	Minimum 2,000 words for a full research article. Current: 2,912
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19203099
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	40%	✗	≥80% of references from 2025–2026. Current: 40%
[c]	Data Charts	4	✓	Original data charts from reproducible analysis (min 2). Current: 4
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Meta-Analysis of Context Benchmarks — Building a Unified Evaluation Framework

Posted on March 24, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19199439 64stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	19%	○	≥80% from editorially reviewed sources
[t]	Trusted	94%	✓	≥80% from verified, high-quality sources
[a]	DOI	6%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	81%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	75%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,526	✓	Minimum 2,000 words for a full research article. Current: 2,526
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19199439
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	14%	✗	≥80% of references from 2025–2026. Current: 14%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (68 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

The rapid expansion of context windows — from 4K tokens to 10M tokens in models like Llama 4 — has produced a proliferation of evaluation benchmarks, yet no unified framework exists for comparing long-context capabilities across these disparate tests. This article presents a meta-analysis of ten major context benchmarks (NIAH, RULER, LongBench v2, InfiniteBench, BABILong, NoLiMa, LongGenBench, ...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19199439 64stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	19%	○	≥80% from editorially reviewed sources
[t]	Trusted	94%	✓	≥80% from verified, high-quality sources
[a]	DOI	6%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	81%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	75%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,526	✓	Minimum 2,000 words for a full research article. Current: 2,526
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19199439
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	14%	✗	≥80% of references from 2025–2026. Current: 14%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (68 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Multi-Turn Memory — How Conversation History Degrades Model Performance

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19195991 55stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	86%	✓	≥80% from verified, high-quality sources
[a]	DOI	7%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	71%	○	≥80% from journals/conferences/preprints
[f]	Free Access	93%	✓	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,597	✗	Minimum 2,000 words for a full research article. Current: 1,597
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	0%	✗	≥80% of references from 2025–2026. Current: 0%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)

Multi-turn conversation represents the dominant interaction mode for deployed large language models, yet mounting evidence reveals that model performance degrades severely as conversation history accumulates in the KV-cache. This article investigates three research questions: how rapidly task accuracy declines across conversation turns, what mechanisms drive this degradation at the attention an...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19195991 55stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	86%	✓	≥80% from verified, high-quality sources
[a]	DOI	7%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	71%	○	≥80% from journals/conferences/preprints
[f]	Free Access	93%	✓	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,597	✗	Minimum 2,000 words for a full research article. Current: 1,597
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	0%	✗	≥80% of references from 2025–2026. Current: 0%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Prompt Caching Efficiency — Measuring Reuse Across Real Workloads

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19187992 72stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	9%	○	≥80% from editorially reviewed sources
[t]	Trusted	91%	✓	≥80% from verified, high-quality sources
[a]	DOI	73%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	9%	○	≥80% from journals/conferences/preprints
[f]	Free Access	18%	○	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,628	✓	Minimum 2,000 words for a full research article. Current: 2,628
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	89%	✓	≥80% of references from 2025–2026. Current: 89%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Prompt caching has emerged as one of the most impactful optimizations for reducing both cost and latency in large language model inference, with major providers reporting 50-90% cost savings through prefix reuse. Yet the efficiency of prompt caching varies dramatically across workload types, caching strategies, and eviction policies. This article investigates three research questions: how cache...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19187992 72stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	9%	○	≥80% from editorially reviewed sources
[t]	Trusted	91%	✓	≥80% from verified, high-quality sources
[a]	DOI	73%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	9%	○	≥80% from journals/conferences/preprints
[f]	Free Access	18%	○	≥80% are freely accessible
[r]	References	11 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,628	✓	Minimum 2,000 words for a full research article. Current: 2,628
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	89%	✓	≥80% of references from 2025–2026. Current: 89%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More

Cross-Architecture Memory Comparison — Llama vs Mistral vs Gemma vs Qwen

Posted on March 23, 2026 by

Technical Research

Technical Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19183148 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	79%	○	≥80% from verified, high-quality sources
[a]	DOI	64%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	7%	○	≥80% from journals/conferences/preprints
[f]	Free Access	36%	○	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,222	✓	Minimum 2,000 words for a full research article. Current: 2,222
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	58%	✗	≥80% of references from 2025–2026. Current: 58%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

The proliferation of open-source large language model families in 2026 — each adopting distinct attention mechanisms and KV-cache configurations — creates a fragmented landscape where memory footprint varies by up to 4.6x across architectures at identical context lengths. This article provides a systematic cross-architecture comparison of KV-cache memory behavior across four dominant model fami...

Show moreHide

Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19183148 63stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	79%	○	≥80% from verified, high-quality sources
[a]	DOI	64%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	100%	✓	≥80% have metadata indexed
[l]	Academic	7%	○	≥80% from journals/conferences/preprints
[f]	Free Access	36%	○	≥80% are freely accessible
[r]	References	14 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,222	✓	Minimum 2,000 words for a full research article. Current: 2,222
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	58%	✗	≥80% of references from 2025–2026. Current: 58%
[c]	Data Charts	5	✓	Original data charts from reproducible analysis (min 2). Current: 5
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

AI Memory Read More