Skip to content

Stabilarity Hub

Menu
  • Home
  • Research
    • Healthcare & Life Sciences
      • Medical ML Diagnosis
    • Enterprise & Economics
      • AI Economics
      • Cost-Effective AI
      • Spec-Driven AI
    • Geopolitics & Strategy
      • Anticipatory Intelligence
      • Future of AI
      • Geopolitical Risk Intelligence
    • AI & Future Signals
      • Capability–Adoption Gap
      • AI Observability
      • AI Intelligence Architecture
      • AI Memory
      • Trusted Open Source
    • Data Science & Methods
      • HPF-P Framework
      • Intellectual Data Analysis
      • Reference Evaluation
    • Publications
      • External Publications
    • Robotics & Engineering
      • Open Humanoid
    • Benchmarks & Measurement
      • Universal Intelligence Benchmark
      • Shadow Economy Dynamics
      • Article Quality Science
  • Tools
    • Healthcare & Life Sciences
      • ScanLab
      • AI Data Readiness Assessment
    • Enterprise Strategy
      • AI Use Case Classifier
      • ROI Calculator
      • Risk Calculator
      • Reference Trust Analyzer
    • Portfolio & Analytics
      • HPF Portfolio Optimizer
      • Adoption Gap Monitor
      • Data Mining Method Selector
    • Geopolitics & Prediction
      • War Prediction Model
      • Ukraine Crisis Prediction
      • Gap Analyzer
      • Geopolitical Stability Dashboard
    • Technical & Observability
      • OTel AI Inspector
    • Robotics & Engineering
      • Humanoid Simulation
    • Benchmarks
      • UIB Benchmark Tool
  • API Gateway
  • About
    • Contributors
  • Contact
  • Join Community
  • Terms of Service
  • Login
  • Register
Menu

Semantic Prompt Caching — Beyond Exact Match

Posted on March 24, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19211071  63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted91%✓≥80% from verified, high-quality sources
[a]DOI9%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic73%○≥80% from journals/conferences/preprints
[f]Free Access91%✓≥80% are freely accessible
[r]References11 refs✓Minimum 10 references required
[w]Words [REQ]2,328✓Minimum 2,000 words for a full research article. Current: 2,328
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19211071
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]0%✗≥80% of references from 2025–2026. Current: 0%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Prompt caching has emerged as a critical optimization for large language model (LLM) serving, yet production systems overwhelmingly rely on exact-match strategies that miss semantically equivalent queries. This article investigates semantic prompt caching — systems that identify and serve cached responses for semantically similar (but not identical) prompts using embedding-based similarity dete...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19211071 63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted91%✓≥80% from verified, high-quality sources
[a]DOI9%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic73%○≥80% from journals/conferences/preprints
[f]Free Access91%✓≥80% are freely accessible
[r]References11 refs✓Minimum 10 references required
[w]Words [REQ]2,328✓Minimum 2,000 words for a full research article. Current: 2,328
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19211071
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]0%✗≥80% of references from 2025–2026. Current: 0%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Speculative Decoding and Cache Reuse

Posted on March 24, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19210815  63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted94%✓≥80% from verified, high-quality sources
[a]DOI6%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic83%✓≥80% from journals/conferences/preprints
[f]Free Access94%✓≥80% are freely accessible
[r]References18 refs✓Minimum 10 references required
[w]Words [REQ]2,662✓Minimum 2,000 words for a full research article. Current: 2,662
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19210815
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]13%✗≥80% of references from 2025–2026. Current: 13%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Speculative decoding has emerged as a transformative inference optimization that breaks the sequential bottleneck of autoregressive generation by drafting multiple tokens in parallel and verifying them in a single forward pass. This article examines three research questions at the intersection of speculative decoding and KV cache management: how draft-then-verify architectures interact with cac...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19210815 63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted94%✓≥80% from verified, high-quality sources
[a]DOI6%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic83%✓≥80% from journals/conferences/preprints
[f]Free Access94%✓≥80% are freely accessible
[r]References18 refs✓Minimum 10 references required
[w]Words [REQ]2,662✓Minimum 2,000 words for a full research article. Current: 2,662
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19210815
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]13%✗≥80% of references from 2025–2026. Current: 13%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (67 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Social and Collaborative Intelligence as a UIB Dimension: Why Theory of Mind Remains the Hardest Benchmark

Posted on March 24, 2026March 24, 2026 by
Benchmark Research
Benchmark Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19209792  62stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources7%○≥80% from editorially reviewed sources
[t]Trusted87%✓≥80% from verified, high-quality sources
[a]DOI20%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic60%○≥80% from journals/conferences/preprints
[f]Free Access80%✓≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,272✓Minimum 2,000 words for a full research article. Current: 2,272
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209792
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]17%✗≥80% of references from 2025–2026. Current: 17%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (65 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

Current AI evaluation overwhelmingly focuses on individual cognitive tasks — reasoning, coding, mathematics — while neglecting the social and collaborative capabilities that define human intelligence in practice. This article introduces the UIB-Social dimension, a formal evaluation framework for measuring social intelligence in large language models across four sub-dimensions: Theory of Mind (T...

Show moreHide
Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19209792 62stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources7%○≥80% from editorially reviewed sources
[t]Trusted87%✓≥80% from verified, high-quality sources
[a]DOI20%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic60%○≥80% from journals/conferences/preprints
[f]Free Access80%✓≥80% are freely accessible
[r]References15 refs✓Minimum 10 references required
[w]Words [REQ]2,272✓Minimum 2,000 words for a full research article. Current: 2,272
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209792
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]17%✗≥80% of references from 2025–2026. Current: 17%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (65 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
Universal Intellig…Read More
Read more

Grouped-Query Attention — Cache-Efficient Architecture Design

Posted on March 24, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19209159  69stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources5%○≥80% from editorially reviewed sources
[t]Trusted95%✓≥80% from verified, high-quality sources
[a]DOI90%✓≥80% have a Digital Object Identifier
[b]CrossRef5%○≥80% indexed in CrossRef
[i]Indexed90%✓≥80% have metadata indexed
[l]Academic10%○≥80% from journals/conferences/preprints
[f]Free Access19%○≥80% are freely accessible
[r]References21 refs✓Minimum 10 references required
[w]Words [REQ]2,403✓Minimum 2,000 words for a full research article. Current: 2,403
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209159
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]42%✗≥80% of references from 2025–2026. Current: 42%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (76 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

As large language models scale beyond hundreds of billions of parameters and context windows extend to millions of tokens, the key-value (KV) cache required for attention computation becomes the dominant memory bottleneck during inference. Grouped-Query Attention (GQA) addresses this by allowing multiple query heads to share fewer key-value heads, reducing cache footprint while preserving model...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19209159 69stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources5%○≥80% from editorially reviewed sources
[t]Trusted95%✓≥80% from verified, high-quality sources
[a]DOI90%✓≥80% have a Digital Object Identifier
[b]CrossRef5%○≥80% indexed in CrossRef
[i]Indexed90%✓≥80% have metadata indexed
[l]Academic10%○≥80% from journals/conferences/preprints
[f]Free Access19%○≥80% are freely accessible
[r]References21 refs✓Minimum 10 references required
[w]Words [REQ]2,403✓Minimum 2,000 words for a full research article. Current: 2,403
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19209159
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]42%✗≥80% of references from 2025–2026. Current: 42%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (76 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Temporal and Planning Intelligence as a UIB Dimension: Why Horizon Length Breaks Modern Reasoning Models

Posted on March 24, 2026 by
Benchmark Research
Benchmark Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19207333  71stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted85%✓≥80% from verified, high-quality sources
[a]DOI77%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic0%○≥80% from journals/conferences/preprints
[f]Free Access23%○≥80% are freely accessible
[r]References13 refs✓Minimum 10 references required
[w]Words [REQ]2,339✓Minimum 2,000 words for a full research article. Current: 2,339
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19207333
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]80%✓≥80% of references from 2025–2026. Current: 80%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Temporal reasoning and long-horizon planning represent perhaps the most consequential gap between current large language models and human cognitive capability. While frontier models achieve near-human performance on short planning tasks (under 15 steps), their accuracy degrades catastrophically beyond 25 planning steps — a phenomenon we term the horizon collapse. This article examines three res...

Show moreHide
Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19207333 71stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted85%✓≥80% from verified, high-quality sources
[a]DOI77%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic0%○≥80% from journals/conferences/preprints
[f]Free Access23%○≥80% are freely accessible
[r]References13 refs✓Minimum 10 references required
[w]Words [REQ]2,339✓Minimum 2,000 words for a full research article. Current: 2,339
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19207333
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]80%✓≥80% of references from 2025–2026. Current: 80%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (70 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)
Universal Intellig…Read More
Read more

Paged Attention and Virtual Memory for LLM Inference

Posted on March 24, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19203099  61stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources17%○≥80% from editorially reviewed sources
[t]Trusted75%○≥80% from verified, high-quality sources
[a]DOI33%○≥80% have a Digital Object Identifier
[b]CrossRef17%○≥80% indexed in CrossRef
[i]Indexed92%✓≥80% have metadata indexed
[l]Academic50%○≥80% from journals/conferences/preprints
[f]Free Access67%○≥80% are freely accessible
[r]References12 refs✓Minimum 10 references required
[w]Words [REQ]2,912✓Minimum 2,000 words for a full research article. Current: 2,912
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19203099
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]40%✗≥80% of references from 2025–2026. Current: 40%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

As large language models scale to billions of parameters and millions of context tokens, the key-value (KV) cache that stores attention states becomes the dominant memory bottleneck during inference. Traditional contiguous memory allocation for KV caches leads to severe fragmentation — wasting 40-60% of available GPU memory — and fundamentally limits serving throughput. This article investigate...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19203099 61stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources17%○≥80% from editorially reviewed sources
[t]Trusted75%○≥80% from verified, high-quality sources
[a]DOI33%○≥80% have a Digital Object Identifier
[b]CrossRef17%○≥80% indexed in CrossRef
[i]Indexed92%✓≥80% have metadata indexed
[l]Academic50%○≥80% from journals/conferences/preprints
[f]Free Access67%○≥80% are freely accessible
[r]References12 refs✓Minimum 10 references required
[w]Words [REQ]2,912✓Minimum 2,000 words for a full research article. Current: 2,912
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19203099
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]40%✗≥80% of references from 2025–2026. Current: 40%
[c]Data Charts4✓Original data charts from reproducible analysis (min 2). Current: 4
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Meta-Analysis of Context Benchmarks — Building a Unified Evaluation Framework

Posted on March 24, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19199439  64stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources19%○≥80% from editorially reviewed sources
[t]Trusted94%✓≥80% from verified, high-quality sources
[a]DOI6%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic81%✓≥80% from journals/conferences/preprints
[f]Free Access75%○≥80% are freely accessible
[r]References16 refs✓Minimum 10 references required
[w]Words [REQ]2,526✓Minimum 2,000 words for a full research article. Current: 2,526
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19199439
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]14%✗≥80% of references from 2025–2026. Current: 14%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (68 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

The rapid expansion of context windows — from 4K tokens to 10M tokens in models like Llama 4 — has produced a proliferation of evaluation benchmarks, yet no unified framework exists for comparing long-context capabilities across these disparate tests. This article presents a meta-analysis of ten major context benchmarks (NIAH, RULER, LongBench v2, InfiniteBench, BABILong, NoLiMa, LongGenBench, ...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19199439 64stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources19%○≥80% from editorially reviewed sources
[t]Trusted94%✓≥80% from verified, high-quality sources
[a]DOI6%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic81%✓≥80% from journals/conferences/preprints
[f]Free Access75%○≥80% are freely accessible
[r]References16 refs✓Minimum 10 references required
[w]Words [REQ]2,526✓Minimum 2,000 words for a full research article. Current: 2,526
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19199439
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]14%✗≥80% of references from 2025–2026. Current: 14%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (68 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Multi-Turn Memory — How Conversation History Degrades Model Performance

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19195991  55stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted86%✓≥80% from verified, high-quality sources
[a]DOI7%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic71%○≥80% from journals/conferences/preprints
[f]Free Access93%✓≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]1,597✗Minimum 2,000 words for a full research article. Current: 1,597
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]0%✗≥80% of references from 2025–2026. Current: 0%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)

Multi-turn conversation represents the dominant interaction mode for deployed large language models, yet mounting evidence reveals that model performance degrades severely as conversation history accumulates in the KV-cache. This article investigates three research questions: how rapidly task accuracy declines across conversation turns, what mechanisms drive this degradation at the attention an...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19195991 55stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted86%✓≥80% from verified, high-quality sources
[a]DOI7%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic71%○≥80% from journals/conferences/preprints
[f]Free Access93%✓≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]1,597✗Minimum 2,000 words for a full research article. Current: 1,597
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19195991
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]0%✗≥80% of references from 2025–2026. Current: 0%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (63 × 60%) + Required (2/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Prompt Caching Efficiency — Measuring Reuse Across Real Workloads

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19187992  72stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources9%○≥80% from editorially reviewed sources
[t]Trusted91%✓≥80% from verified, high-quality sources
[a]DOI73%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic9%○≥80% from journals/conferences/preprints
[f]Free Access18%○≥80% are freely accessible
[r]References11 refs✓Minimum 10 references required
[w]Words [REQ]2,628✓Minimum 2,000 words for a full research article. Current: 2,628
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]89%✓≥80% of references from 2025–2026. Current: 89%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)

Prompt caching has emerged as one of the most impactful optimizations for reducing both cost and latency in large language model inference, with major providers reporting 50-90% cost savings through prefix reuse. Yet the efficiency of prompt caching varies dramatically across workload types, caching strategies, and eviction policies. This article investigates three research questions: how cache...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19187992 72stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources9%○≥80% from editorially reviewed sources
[t]Trusted91%✓≥80% from verified, high-quality sources
[a]DOI73%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic9%○≥80% from journals/conferences/preprints
[f]Free Access18%○≥80% are freely accessible
[r]References11 refs✓Minimum 10 references required
[w]Words [REQ]2,628✓Minimum 2,000 words for a full research article. Current: 2,628
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19187992
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]89%✓≥80% of references from 2025–2026. Current: 89%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (72 × 60%) + Required (4/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Cross-Architecture Memory Comparison — Llama vs Mistral vs Gemma vs Qwen

Posted on March 23, 2026 by
Technical Research
Technical Research by Oleh Ivchenko  ·  DOI: 10.5281/zenodo.19183148  63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted79%○≥80% from verified, high-quality sources
[a]DOI64%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic7%○≥80% from journals/conferences/preprints
[f]Free Access36%○≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]2,222✓Minimum 2,000 words for a full research article. Current: 2,222
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]58%✗≥80% of references from 2025–2026. Current: 58%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)

The proliferation of open-source large language model families in 2026 — each adopting distinct attention mechanisms and KV-cache configurations — creates a fragmented landscape where memory footprint varies by up to 4.6x across architectures at identical context lengths. This article provides a systematic cross-architecture comparison of KV-cache memory behavior across four dominant model fami...

Show moreHide
Technical Research by Oleh Ivchenko DOI: 10.5281/zenodo.19183148 63stabilfr·wdophcgmx
BadgeMetricValueStatusDescription
[s]Reviewed Sources0%○≥80% from editorially reviewed sources
[t]Trusted79%○≥80% from verified, high-quality sources
[a]DOI64%○≥80% have a Digital Object Identifier
[b]CrossRef0%○≥80% indexed in CrossRef
[i]Indexed100%✓≥80% have metadata indexed
[l]Academic7%○≥80% from journals/conferences/preprints
[f]Free Access36%○≥80% are freely accessible
[r]References14 refs✓Minimum 10 references required
[w]Words [REQ]2,222✓Minimum 2,000 words for a full research article. Current: 2,222
[d]DOI [REQ]✓✓Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19183148
[o]ORCID [REQ]✓✓Author ORCID verified for academic identity
[p]Peer Reviewed [REQ]—✗Peer reviewed by an assigned reviewer
[h]Freshness [REQ]58%✗≥80% of references from 2025–2026. Current: 58%
[c]Data Charts5✓Original data charts from reproducible analysis (min 2). Current: 5
[g]Code—○Source code available on GitHub
[m]Diagrams3✓Mermaid architecture/flow diagrams. Current: 3
[x]Cited by0○Referenced by 0 other hub article(s)
Score = Ref Trust (66 × 60%) + Required (3/5 × 30%) + Optional (2/4 × 10%)
AI MemoryRead More
Read more

Posts pagination

  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • …
  • 35
  • Next

Recent Posts

  • Comparative Benchmarking: HPF-P vs Traditional Portfolio Methods
  • The Future of Intelligence Measurement: A 10-Year Projection
  • All-You-Can-Eat Agentic AI: The Economics of Unlimited Licensing in an Era of Non-Deterministic Costs
  • The Future of AI Memory — From Fixed Windows to Persistent State
  • FLAI & GROMUS Mathematical Glossary: Complete Variable Reference for Social Media Trend Prediction Models

Research Index

Browse all articles — filter by score, badges, views, series →

Categories

  • ai
  • AI Economics
  • AI Memory
  • AI Observability & Monitoring
  • AI Portfolio Optimisation
  • Ancient IT History
  • Anticipatory Intelligence
  • Article Quality Science
  • Capability-Adoption Gap
  • Cost-Effective Enterprise AI
  • Future of AI
  • Geopolitical Risk Intelligence
  • hackathon
  • healthcare
  • HPF-P Framework
  • innovation
  • Intellectual Data Analysis
  • medai
  • Medical ML Diagnosis
  • Open Humanoid
  • Research
  • ScanLab
  • Shadow Economy Dynamics
  • Spec-Driven AI Development
  • Technology
  • Trusted Open Source
  • Uncategorized
  • Universal Intelligence Benchmark
  • War Prediction

About

Stabilarity Research Hub is dedicated to advancing the frontiers of AI, from Medical ML to Anticipatory Intelligence. Our mission is to build robust and efficient AI systems for a safer future.

Language

  • Medical ML Diagnosis
  • AI Economics
  • Cost-Effective AI
  • Anticipatory Intelligence
  • Data Mining
  • 🔑 API for Researchers

Connect

Facebook Group: Join

Telegram: @Y0man

Email: contact@stabilarity.com

© 2026 Stabilarity Research Hub

© 2026 Stabilarity Hub | Powered by Superbs Personal Blog theme
Stabilarity Research Hub

Open research platform for AI, machine learning, and enterprise technology. All articles are preprints with DOI registration via Zenodo.

185+
Articles
8
Series
DOI
Archived

Research Series

  • Medical ML Diagnosis
  • Anticipatory Intelligence
  • Intellectual Data Analysis
  • AI Economics
  • Cost-Effective AI
  • Spec-Driven AI

Community

  • Join Community
  • MedAI Hack
  • Zenodo Archive
  • Contact Us

Legal

  • Terms of Service
  • About Us
  • Contact
Operated by
Stabilarity OÜ
Registry: 17150040
Estonian Business Register →
© 2026 Stabilarity OÜ. Content licensed under CC BY 4.0
Terms About Contact
Language: 🇬🇧 EN 🇺🇦 UK 🇩🇪 DE 🇵🇱 PL 🇫🇷 FR
Display Settings
Theme
Light
Dark
Auto
Width
Default
Column
Wide
Text 100%

We use cookies to enhance your experience and analyze site traffic. By clicking "Accept All", you consent to our use of cookies. Read our Terms of Service for more information.