Universal Intelligence Benchmark

Causal Intelligence as a UIB Dimension: Measuring What Models Actually Understand

Posted on March 18, 2026 by

Benchmark Research

Benchmark Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19102383 58stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	8%	○	≥80% from editorially reviewed sources
[t]	Trusted	85%	✓	≥80% from verified, high-quality sources
[a]	DOI	62%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	8%	○	≥80% indexed in CrossRef
[i]	Indexed	62%	○	≥80% have metadata indexed
[l]	Academic	92%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	92%	✓	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,940	✗	Minimum 2,000 words for a full research article. Current: 1,940
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19102383
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	33%	✗	≥60% of references from 2025–2026. Current: 33%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (73 × 60%) + Required (2/5 × 30%) + Optional (1/4 × 10%)

Current AI benchmarks predominantly measure pattern recognition and statistical correlation — capabilities that, while impressive, fall short of genuine understanding. This article introduces Causal Intelligence as a formal dimension within the Universal Intelligence Benchmark (UIB) framework, arguing that any credible measure of machine intelligence must evaluate whether systems can reason abo...

Show moreHide

Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19102383 58stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	8%	○	≥80% from editorially reviewed sources
[t]	Trusted	85%	✓	≥80% from verified, high-quality sources
[a]	DOI	62%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	8%	○	≥80% indexed in CrossRef
[i]	Indexed	62%	○	≥80% have metadata indexed
[l]	Academic	92%	✓	≥80% from journals/conferences/preprints
[f]	Free Access	92%	✓	≥80% are freely accessible
[r]	References	13 refs	✓	Minimum 10 references required
[w]	Words [REQ]	1,940	✗	Minimum 2,000 words for a full research article. Current: 1,940
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19102383
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	33%	✗	≥60% of references from 2025–2026. Current: 33%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (73 × 60%) + Required (2/5 × 30%) + Optional (1/4 × 10%)

Universal Intellig…Read More

Inference-Agnostic Intelligence: The UIB Theoretical Framework

Posted on March 17, 2026 by

Benchmark Research

Benchmark Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19064304 62stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	13%	○	≥80% from editorially reviewed sources
[t]	Trusted	81%	✓	≥80% from verified, high-quality sources
[a]	DOI	50%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	6%	○	≥80% indexed in CrossRef
[i]	Indexed	75%	○	≥80% have metadata indexed
[l]	Academic	75%	○	≥80% from journals/conferences/preprints
[f]	Free Access	69%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,086	✓	Minimum 2,000 words for a full research article. Current: 2,086
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19064304
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	40%	✗	≥60% of references from 2025–2026. Current: 40%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (69 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Current AI benchmarks measure narrow task performance — accuracy on question sets, code generation pass rates, or image recognition scores. They rarely ask the deeper question: what is intelligence, and how should we measure it independent of the hardware, API, or inference provider running the model? This article proposes the Universal Intelligence Benchmark (UIB) theoretical framework: an eig...

Show moreHide

Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19064304 62stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	13%	○	≥80% from editorially reviewed sources
[t]	Trusted	81%	✓	≥80% from verified, high-quality sources
[a]	DOI	50%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	6%	○	≥80% indexed in CrossRef
[i]	Indexed	75%	○	≥80% have metadata indexed
[l]	Academic	75%	○	≥80% from journals/conferences/preprints
[f]	Free Access	69%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,086	✓	Minimum 2,000 words for a full research article. Current: 2,086
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19064304
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	40%	✗	≥60% of references from 2025–2026. Current: 40%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (69 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Universal Intellig…Read More

The Measurement Crisis: Saturation, Goodhart’s Law, and the End of AI Leaderboards

Posted on March 13, 2026March 13, 2026 by

Benchmark Research

Benchmark Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19007432 64stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	67%	○	≥80% from verified, high-quality sources
[a]	DOI	67%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	67%	○	≥80% have metadata indexed
[l]	Academic	67%	○	≥80% from journals/conferences/preprints
[f]	Free Access	67%	○	≥80% are freely accessible
[r]	References	3 refs	○	Minimum 10 references required
[w]	Words [REQ]	3,049	✓	Minimum 2,000 words for a full research article. Current: 3,049
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19007432
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	67%	✓	≥60% of references from 2025–2026. Current: 67%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (62 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

The AI evaluation ecosystem is in crisis. Frontier models now exceed 90% accuracy on MMLU, 95% on HumanEval, and 93% on HellaSwag — scores that were considered unattainable three years ago. This saturation is not evidence of intelligence; it is evidence that our instruments have failed. We argue that three convergent forces have rendered current AI leaderboards meaningless: (1) benchmark satura...

Show moreHide

Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19007432 64stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	67%	○	≥80% from verified, high-quality sources
[a]	DOI	67%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	67%	○	≥80% have metadata indexed
[l]	Academic	67%	○	≥80% from journals/conferences/preprints
[f]	Free Access	67%	○	≥80% are freely accessible
[r]	References	3 refs	○	Minimum 10 references required
[w]	Words [REQ]	3,049	✓	Minimum 2,000 words for a full research article. Current: 3,049
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19007432
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	67%	✓	≥60% of references from 2025–2026. Current: 67%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (62 × 60%) + Required (4/5 × 30%) + Optional (1/4 × 10%)

Universal Intellig…Read More

The Meta-Meta-Analysis: A Systematic Map of What 200 AI Benchmark Studies Actually Measured

Posted on March 13, 2026March 13, 2026 by

Benchmark Research

Benchmark Research by Oleh Ivchenko · DOI: 10.5281/zenodo.19001033 43stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	6%	○	≥80% from editorially reviewed sources
[t]	Trusted	44%	○	≥80% from verified, high-quality sources
[a]	DOI	13%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	44%	○	≥80% have metadata indexed
[l]	Academic	44%	○	≥80% from journals/conferences/preprints
[f]	Free Access	38%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,353	✓	Minimum 2,000 words for a full research article. Current: 2,353
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19001033
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	38%	✗	≥60% of references from 2025–2026. Current: 38%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (37 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

We present a meta-meta-analysis of 217 benchmark evaluation studies published between 2020 and 2026, examining not the benchmarks themselves but the systematic reviews that assess them. Our coverage matrix reveals a profound structural bias: 78.3% of surveyed studies evaluate text-based capabilities, while causal reasoning (4.1%), embodied intelligence (1.8%), and social cognition (0.9%) remain...

Show moreHide

Benchmark Research by Oleh Ivchenko DOI: 10.5281/zenodo.19001033 43stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	6%	○	≥80% from editorially reviewed sources
[t]	Trusted	44%	○	≥80% from verified, high-quality sources
[a]	DOI	13%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	44%	○	≥80% have metadata indexed
[l]	Academic	44%	○	≥80% from journals/conferences/preprints
[f]	Free Access	38%	○	≥80% are freely accessible
[r]	References	16 refs	✓	Minimum 10 references required
[w]	Words [REQ]	2,353	✓	Minimum 2,000 words for a full research article. Current: 2,353
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.19001033
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	38%	✗	≥60% of references from 2025–2026. Current: 38%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (37 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Universal Intellig…Read More