UIB Benchmark API
Universal Intelligence Benchmark — 8 dimensions, any model, via API
Base URL
https://hub.stabilarity.com/api/v1/uib/
Authentication:
X-API-Key header.
Get your free key →
Endpoints
GET/v1/uib/status
Service status — total benchmark runs and number of distinct models evaluated.
POST/v1/uib/run
Run the full UIB benchmark against any model. You provide your own model API key. Returns composite score + per-dimension breakdown.
| Field | Type | Description |
model | string | Model identifier (e.g. “gpt-4”, “claude-sonnet-4-5”, “llama-3-70b”) |
api_key | string | Your model provider API key (OpenAI, Anthropic, etc.) |
dimensions | array | Optional — subset of dimensions to run. Default: all 8. |
api_base | string | Optional — custom API base URL for local/self-hosted models (e.g. http://localhost:11434/v1) |
GET/v1/uib/leaderboard?limit=20&benchmark_type=uib
Current leaderboard ranked by average composite score. Filter by benchmark_type.
GET/v1/uib/results?limit=50&offset=0
Paginated list of all benchmark run results. Optional: benchmark_type filter.
GET/v1/uib/reports
Paginated list of all individual benchmark reports. Same params as /results.
GET/v1/uib/reports/{run_id}
Full detail for a single benchmark run — all dimension scores, task-level results, and metadata.
GET/v1/uib/dimensions
List all 8 UIB dimensions with task counts and descriptions.
The 8 Dimensions
Causal ReasoningPearl’s causal hierarchy, intervention vs observation, confound detection. 5 tasks.
Embodied IntelligencePhysical reasoning, robot control, manipulation, navigation. 5 tasks.
Multimodal SynthesisCross-modal reasoning, sensor fusion, modality transfer. 5 tasks.
Temporal & PlanningLong-horizon planning, scheduling, trend analysis, temporal reasoning. 5 tasks.
Social CognitionTheory of mind, negotiation, sarcasm, team dynamics. 5 tasks.
Tool CreationAlgorithm design, DSL creation, self-improvement, optimization. 5 tasks.
Domain TransferCross-domain analogy, concept mapping, abstraction. 5 tasks.
Resource EfficiencyCompression theory, cost-normalized intelligence, speed prior. 3 tasks.
Examples
Run a full benchmark
# Run UIB on GPT-4
curl -X POST -H “X-API-Key: YOUR_STABILARITY_KEY” \
-H “Content-Type: application/json” \
-d ‘{“model”:”gpt-4″,”api_key”:”YOUR_OPENAI_KEY”}’ \
https://hub.stabilarity.com/api/v1/uib/run
Run specific dimensions only
# Test only causal and social dimensions
curl -X POST -H “X-API-Key: YOUR_STABILARITY_KEY” \
-H “Content-Type: application/json” \
-d ‘{“model”:”claude-sonnet-4-5″,”api_key”:”YOUR_KEY”,”dimensions”:[“causal”,”social”]}’ \
https://hub.stabilarity.com/api/v1/uib/run
Benchmark a local model (Ollama, vLLM, etc.)
# Point to your local OpenAI-compatible endpoint
curl -X POST -H “X-API-Key: YOUR_STABILARITY_KEY” \
-H “Content-Type: application/json” \
-d ‘{“model”:”llama-3-70b”,”api_key”:”none”,”api_base”:”http://localhost:11434/v1″}’ \
https://hub.stabilarity.com/api/v1/uib/run
Get leaderboard
curl -H “X-API-Key: YOUR_KEY” \
“https://hub.stabilarity.com/api/v1/uib/leaderboard?limit=10”
Example response (leaderboard)
[
{
“model”: “deepseek-v4”,
“avg_score”: 72.8,
“runs”: 3,
“dimensions”: {
“causal”: 78.2,
“embodied”: 61.4,
“multimodal”: 69.1,
“temporal”: 74.5,
“social”: 71.3,
“tool_creation”: 82.6,
“transfer”: 68.9,
“efficiency”: 76.4
}
}
]
Links
UIB Benchmark Tool (interactive) ·
Full API Gateway docs ·
Source code on GitHub ·
Research article: UIB Composite Score
Open Source & Contributions Welcome
The UIB benchmark is open source. Found a bug or want to contribute? File an issue or PR on
GitHub, or email
contact@stabilarity.com.