UIB Benchmark API

Universal Intelligence Benchmark — 8 dimensions, any model, via API

Base URL https://hub.stabilarity.com/api/v1/uib/
Authentication: X-API-Key header. Get your free key →

Endpoints

GET/v1/uib/status

Service status — total benchmark runs and number of distinct models evaluated.

POST/v1/uib/run

Run the full UIB benchmark against any model. You provide your own model API key. Returns composite score + per-dimension breakdown.

Field	Type	Description
`model`	string	Model identifier (e.g. “gpt-4”, “claude-sonnet-4-5”, “llama-3-70b”)
`api_key`	string	Your model provider API key (OpenAI, Anthropic, etc.)
`dimensions`	array	Optional — subset of dimensions to run. Default: all 8.
`api_base`	string	Optional — custom API base URL for local/self-hosted models (e.g. http://localhost:11434/v1)

GET/v1/uib/leaderboard?limit=20&benchmark_type=uib

Current leaderboard ranked by average composite score. Filter by benchmark_type.

GET/v1/uib/results?limit=50&offset=0

Paginated list of all benchmark run results. Optional: benchmark_type filter.

GET/v1/uib/reports

Paginated list of all individual benchmark reports. Same params as /results.

GET/v1/uib/reports/{run_id}

Full detail for a single benchmark run — all dimension scores, task-level results, and metadata.

GET/v1/uib/dimensions

List all 8 UIB dimensions with task counts and descriptions.

The 8 Dimensions

Causal Reasoning

Pearl’s causal hierarchy, intervention vs observation, confound detection. 5 tasks.

Embodied Intelligence

Physical reasoning, robot control, manipulation, navigation. 5 tasks.

Multimodal Synthesis

Cross-modal reasoning, sensor fusion, modality transfer. 5 tasks.

Temporal & Planning

Long-horizon planning, scheduling, trend analysis, temporal reasoning. 5 tasks.

Social Cognition

Theory of mind, negotiation, sarcasm, team dynamics. 5 tasks.

Tool Creation

Algorithm design, DSL creation, self-improvement, optimization. 5 tasks.

Domain Transfer

Cross-domain analogy, concept mapping, abstraction. 5 tasks.

Resource Efficiency

Compression theory, cost-normalized intelligence, speed prior. 3 tasks.

Examples

Run a full benchmark

# Run UIB on GPT-4 curl -X POST -H “X-API-Key: YOUR_STABILARITY_KEY” \ -H “Content-Type: application/json” \ -d ‘{“model”:”gpt-4″,”api_key”:”YOUR_OPENAI_KEY”}’ \ https://hub.stabilarity.com/api/v1/uib/run

Run specific dimensions only

# Test only causal and social dimensions curl -X POST -H “X-API-Key: YOUR_STABILARITY_KEY” \ -H “Content-Type: application/json” \ -d ‘{“model”:”claude-sonnet-4-5″,”api_key”:”YOUR_KEY”,”dimensions”:[“causal”,”social”]}’ \ https://hub.stabilarity.com/api/v1/uib/run

Benchmark a local model (Ollama, vLLM, etc.)

# Point to your local OpenAI-compatible endpoint curl -X POST -H “X-API-Key: YOUR_STABILARITY_KEY” \ -H “Content-Type: application/json” \ -d ‘{“model”:”llama-3-70b”,”api_key”:”none”,”api_base”:”http://localhost:11434/v1″}’ \ https://hub.stabilarity.com/api/v1/uib/run

Get leaderboard

curl -H “X-API-Key: YOUR_KEY” \ “https://hub.stabilarity.com/api/v1/uib/leaderboard?limit=10”

Example response (leaderboard)

[ { “model”: “deepseek-v4”, “avg_score”: 72.8, “runs”: 3, “dimensions”: { “causal”: 78.2, “embodied”: 61.4, “multimodal”: 69.1, “temporal”: 74.5, “social”: 71.3, “tool_creation”: 82.6, “transfer”: 68.9, “efficiency”: 76.4 } } ]

Links

UIB Benchmark Tool (interactive) · Full API Gateway docs · Source code on GitHub · Research article: UIB Composite Score

Open Source & Contributions Welcome The UIB benchmark is open source. Found a bug or want to contribute? File an issue or PR on GitHub, or email contact@stabilarity.com.