Skip to content

Stabilarity Hub

Menu
  • Home
  • Research
    • Medical ML Diagnosis
    • AI Economics
    • Cost-Effective AI
    • Anticipatory Intelligence
    • External Publications
    • Intellectual Data Analysis
    • Spec-Driven AI Development
    • Future of AI
    • AI Intelligence Architecture — A Research Series
    • Geopolitical Risk Intelligence
  • Projects
    • ScanLab
    • War Prediction
    • Risk Calculator
    • Anticipatory Intelligence Gap Analyzer
    • Data Mining Method Selector
    • AI Implementation ROI Calculator
    • AI Use Case Classifier & Matcher
    • AI Data Readiness Index Assessment
    • Ukraine Crisis Prediction Hub
    • Geopolitical Risk Platform
  • Events
    • MedAI Hackathon
  • Join Community
  • About
  • Contact
  • Terms of Service
Menu

Capturing AI Requirements: Beyond Functional Specifications

Posted on February 22, 2026 by





Capturing AI Requirements: Beyond Functional Specifications

Capturing AI Requirements: Beyond Functional Specifications

Author: Oleh Ivchenko

Series: Spec-Driven AI Development

Abstract

Traditional requirements engineering approaches, developed for deterministic software systems, prove inadequate when applied to AI systems characterized by learning, uncertainty, and emergent behavior. This article examines the unique challenges of capturing requirements for AI systems and proposes a structured framework that extends beyond conventional functional specifications. We explore behavioral specifications, performance requirements, safety constraints, fairness criteria, and stakeholder alignment techniques specific to AI development. Through comparison with experiment-driven and data-driven approaches, we demonstrate how specification-first methodologies enhance system reliability, regulatory compliance, and stakeholder trust while addressing the inherent non-determinism of machine learning systems.

1. Introduction: The Requirements Challenge in AI Systems

Requirements engineering (RE) forms the foundation of systematic software development, establishing what a system should do before determining how it will do it. However, AI systems introduce fundamental challenges that strain traditional RE frameworks. Unlike conventional software with deterministic input-output mappings, AI systems exhibit probabilistic behavior, learn from data, and may drift over time, making precise functional specifications elusive.

The experiment-driven approach dominant in AI research often bypasses formal requirements engineering entirely, iterating on model architectures and hyperparameters until acceptable performance metrics are achieved. While effective for research contexts, this methodology creates significant risks in enterprise deployments where safety, fairness, regulatory compliance, and long-term maintainability are paramount.

Data-driven development similarly prioritizes available datasets over stakeholder needs, often resulting in systems optimized for patterns in historical data rather than actual business requirements. Model-centric approaches focus on architectural innovations without adequate specification of acceptable system behavior, leading to deployment failures when models encounter out-of-distribution scenarios.

Spec-driven AI development addresses these limitations by establishing clear, verifiable requirements before model selection or training begins. This article presents a comprehensive framework for capturing AI requirements that acknowledges uncertainty while maintaining rigor, enabling organizations to build AI systems that are both innovative and trustworthy.

2. Unique Challenges in AI Requirements Engineering

2.1 Inherent Uncertainty and Non-Determinism

Traditional requirements assume deterministic behavior: given input X, the system produces output Y. AI systems operate probabilistically, producing distributions over possible outputs. This fundamental difference requires rethinking how we specify correctness. Rather than asserting “the system shall classify all images correctly,” specifications must accommodate inherent error rates, confidence thresholds, and uncertainty quantification.

The specification must distinguish between aleatoric uncertainty (irreducible noise in data) and epistemic uncertainty (model knowledge gaps), as these require different mitigation strategies. Requirements should specify acceptable uncertainty levels, calibration requirements, and fallback behaviors when confidence falls below thresholds.

2.2 Learning and Adaptation

AI systems learn from data, meaning their behavior evolves. Continual learning systems update in production, potentially diverging from original specifications. Requirements must specify not just initial behavior but also acceptable learning trajectories, drift detection mechanisms, and retraining triggers.

Unlike static software where updates are controlled, online learning systems may adapt to adversarial inputs or biased feedback loops. Specifications must establish guardrails: what the system may learn, from which data sources, under what conditions, and with what human oversight.

2.3 Emergent Behavior and Opacity

Deep learning models exhibit emergent behaviors not explicitly programmed, complicating requirements specification. Traditional functional requirements describe intended behaviors, but AI systems may develop capabilities beyond their training objectives, including unintended correlations and shortcuts.

Model opacity presents additional challenges. Explainability requirements must balance interpretability needs with model performance, as interpretable models often sacrifice accuracy. Requirements should specify acceptable trade-offs between accuracy and interpretability, contextual to the application domain.

2.4 Data Dependency

AI system behavior depends fundamentally on training data quality and distribution. Data requirements form a critical component of AI specifications, yet are often underspecified in experiment-driven approaches. Requirements must address data quality, representativeness, bias mitigation, and distribution shift detection.

Unlike conventional software where requirements focus on functionality, AI requirements must specify data provenance, labeling quality, class balance, and coverage of edge cases. Data-driven approaches that work backward from available data fail to address cases where required data doesn’t exist or is prohibitively expensive to acquire.

graph TD
    A[Traditional Software RE] -->|Deterministic| B[Functional Specs]
    A -->|Controlled Updates| C[Change Management]
    A -->|Transparent Logic| D[Verification]
    
    E[AI Systems RE] -->|Probabilistic| F[Behavioral + Performance Specs]
    E -->|Continuous Learning| G[Adaptation Boundaries]
    E -->|Emergent Behavior| H[Safety Constraints]
    E -->|Data Dependent| I[Data Requirements]
    
    F --> J[Spec-Driven AI Framework]
    G --> J
    H --> J
    I --> J
    
    K[Experiment-Driven] -.->|No Formal Specs| L[Metrics Only]
    M[Data-Driven] -.->|Dataset First| N[Requirements Inferred]
    
    style J fill:#90EE90
    style L fill:#FFB6C1
    style N fill:#FFB6C1

3. Behavioral Specifications vs Performance Specifications

3.1 Performance Specifications

Performance specifications quantify system capabilities through metrics. Common AI performance metrics include accuracy, precision, recall, F1 score, AUC-ROC, and domain-specific measures. While necessary, performance specifications alone prove insufficient.

The experiment-driven paradigm typically stops at performance specifications: “achieve 95% accuracy on test set”. However, high aggregate metrics can mask critical failures in subgroups, and test set performance may not generalize to production distributions.

Comprehensive performance specifications should include:

  • Aggregate metrics: Overall system performance on representative test sets
  • Subgroup metrics: Performance across demographic groups, edge cases, and rare scenarios
  • Robustness metrics: Performance degradation under adversarial attacks, distribution shift, and noise
  • Efficiency metrics: Latency, throughput, resource consumption, and scalability
  • Calibration metrics: Alignment between predicted confidence and actual accuracy

3.2 Behavioral Specifications

Behavioral specifications define how systems should act in specific scenarios, complementing aggregate metrics with concrete examples. Behavior-driven development (BDD) approaches specify expected inputs and outputs, but AI systems require probabilistic behavioral specifications.

Metamorphic testing provides one framework for behavioral specifications: defining relationships between input transformations and expected output transformations. For example, a sentiment classifier should produce similar outputs for paraphrased inputs, regardless of lexical variation.

Property-based testing offers another approach, specifying invariants the system must maintain:

  • Consistency: Similar inputs should produce similar outputs
  • Monotonicity: Increasing feature X should not decrease prediction Y (where causally justified)
  • Fairness properties: Predictions should not vary based on protected attributes when conditioned on legitimate factors
  • Causality constraints: Output changes should only result from causally relevant input changes

Behavioral specifications ground abstract performance metrics in concrete, verifiable scenarios. While experiment-driven approaches validate on aggregate test sets, behavioral testing uncovers systematic failures that aggregate metrics miss.

graph LR
    A[Requirements] --> B[Performance Specs]
    A --> C[Behavioral Specs]
    
    B --> D[Accuracy: 92%]
    B --> E[Latency: <100ms]
    B --> F[Fairness Metric: 0.05]
    
    C --> G[Scenario Tests]
    C --> H[Metamorphic Relations]
    C --> I[Property Invariants]
    
    G --> J[Edge Cases]
    G --> K[User Stories]
    
    D -.->|Can Miss| L[Subgroup Failures]
    E -.->|Can Miss| M[Tail Latencies]
    F -.->|Can Miss| N[Individual Fairness]
    
    J -->|Catches| L
    K -->|Catches| N
    
    style C fill:#90EE90
    style B fill:#FFD700

4. Safety and Fairness Requirements

4.1 Safety Specifications

Safety-critical AI systems require rigorous safety specifications beyond performance metrics. Safety requirements define unacceptable behaviors that the system must avoid, even at the cost of reduced performance.

ISO/PAS 21448 (SOTIF – Safety Of The Intended Functionality) provides a framework for specifying safety in AI systems, distinguishing between:

  • Known safe scenarios: Verified through testing and validation
  • Known unsafe scenarios: Explicitly prohibited or requiring human intervention
  • Unknown scenarios: Requiring uncertainty detection and conservative fallback

Safety specifications should include formal constraints using temporal logic or contracts. For example, an autonomous vehicle might specify: “The system shall not initiate lane changes when adjacent lane occupancy confidence < 95%." Such specifications make safety boundaries explicit and verifiable.

Runtime monitoring requirements ensure systems detect safety violations during operation, triggering safe fallback modes. Unlike experiment-driven approaches that focus on average-case performance, spec-driven development prioritizes worst-case guarantees.

4.2 Fairness Requirements

Fairness in AI systems lacks a universal definition, with multiple incompatible mathematical formulations. Requirements engineering must translate stakeholder fairness concerns into concrete, measurable specifications appropriate to the application context.

Common fairness specifications include:

  • Demographic parity: Equal acceptance rates across protected groups
  • Equalized odds: Equal true positive and false positive rates across groups
  • Calibration: Equal meaning of risk scores across groups
  • Individual fairness: Similar individuals receive similar treatment
  • Counterfactual fairness: Predictions would be identical in a counterfactual world where protected attributes differ

Impossibility results show these definitions conflict, requiring stakeholders to choose appropriate specifications for their context. Data-driven approaches may optimize for fairness metrics without understanding their normative implications, while spec-driven development forces explicit stakeholder deliberation on fairness trade-offs.

graph TD
    A[Safety & Fairness Requirements] --> B[Safety Specs]
    A --> C[Fairness Specs]
    
    B --> D[Forbidden Behaviors]
    B --> E[Performance Envelopes]
    B --> F[Runtime Monitoring]
    B --> G[Fallback Modes]
    
    C --> H[Group Fairness Metrics]
    C --> I[Individual Fairness]
    C --> J[Causality Constraints]
    
    D --> K[Formal Verification]
    E --> K
    F --> L[Runtime Assurance]
    G --> L
    
    H --> M[Stakeholder Alignment]
    I --> M
    J --> M
    
    N[Experiment-Driven] -.->|Ignores| D
    N -.->|Optimizes| O[Aggregate Metrics Only]
    
    P[Spec-Driven] -->|Enforces| K
    P -->|Requires| M
    
    style P fill:#90EE90
    style N fill:#FFB6C1

5. Stakeholder Alignment Techniques

5.1 The Challenge of Diverse Stakeholders

AI systems impact multiple stakeholders with potentially conflicting interests: end users, developers, business owners, regulators, and affected third parties. Traditional requirements engineering focuses on paying customers, but AI systems’ societal impact demands broader stakeholder consideration.

Experiment-driven development typically prioritizes researcher or developer preferences, optimizing for publishable results or technical elegance rather than stakeholder needs. Data-driven approaches may inadvertently encode the preferences of historical decision-makers embedded in training data, perpetuating rather than interrogating past practices.

5.2 Participatory Requirements Engineering

Participatory design methods engage diverse stakeholders in requirements elicitation, making implicit values explicit. Techniques include:

  • Value-sensitive design: Structured methods for identifying stakeholder values and translating them into technical requirements
  • Speculative scenarios: Concrete narratives exploring system implications across diverse contexts
  • Contestable AI design: Requirements that enable users to challenge and contest AI decisions
  • Co-design workshops: Collaborative sessions where stakeholders jointly define acceptable system behavior

5.3 Requirements Traceability and Documentation

Model cards and datasheets provide structured documentation formats for communicating AI system capabilities and limitations. However, these address post-hoc documentation rather than prospective requirements.

Spec-driven development requires bidirectional traceability from stakeholder concerns to technical specifications to implementation artifacts. This enables:

  • Verification that all stakeholder requirements are addressed
  • Impact analysis when requirements change
  • Accountability by linking system behaviors to responsible parties
  • Regulatory compliance demonstration through documented requirement chains

graph TD
    A[Stakeholder Groups] --> B[End Users]
    A --> C[Business Owners]
    A --> D[Developers]
    A --> E[Regulators]
    A --> F[Affected Third Parties]
    
    B --> G[Participatory RE Methods]
    C --> G
    D --> G
    E --> G
    F --> G
    
    G --> H[Value-Sensitive Design]
    G --> I[Co-Design Workshops]
    G --> J[Speculative Scenarios]
    
    H --> K[Requirements Document]
    I --> K
    J --> K
    
    K --> L[Behavioral Specs]
    K --> M[Performance Specs]
    K --> N[Safety Constraints]
    K --> O[Fairness Criteria]
    
    L --> P[Traceability Matrix]
    M --> P
    N --> P
    O --> P
    
    P --> Q[Implementation]
    P --> R[Verification]
    P --> S[Compliance Audit]
    
    T[Experiment-Driven] -.->|Skips| G
    T -.->|No| P
    
    style G fill:#90EE90
    style P fill:#90EE90
    style T fill:#FFB6C1

6. Practical Framework for AI Requirements Specification

6.1 Structured Requirements Template

A comprehensive AI requirements specification should include the following sections, informed by recent research on AI-specific requirements patterns:

1. Context and Scope

  • Problem statement and business objectives
  • Stakeholder identification and needs analysis
  • System boundaries and interfaces
  • Regulatory and ethical constraints

2. Data Requirements (following datasheet methodology)

  • Data sources and provenance
  • Quantity, quality, and representativeness criteria
  • Labeling protocols and quality assurance
  • Privacy and security constraints
  • Bias mitigation requirements

3. Functional Requirements

  • Primary tasks and capabilities
  • Input/output specifications with probability distributions
  • Integration requirements with existing systems

4. Performance Requirements

  • Aggregate metrics with minimum acceptable thresholds
  • Subgroup performance requirements
  • Robustness and reliability criteria
  • Efficiency and scalability targets

5. Behavioral Requirements

  • Scenario-based test cases
  • Metamorphic relations and property invariants
  • Edge cases and boundary conditions

6. Safety Requirements

  • Prohibited behaviors and failure modes
  • Performance envelope boundaries
  • Runtime monitoring specifications
  • Fallback and degradation strategies

7. Fairness Requirements

  • Protected attributes and sensitive subgroups
  • Chosen fairness metrics with justification
  • Acceptable performance trade-offs

8. Uncertainty and Adaptation Requirements

  • Confidence calibration requirements
  • Out-of-distribution detection mechanisms
  • Learning boundaries and update protocols
  • Drift detection and retraining triggers

9. Explainability and Transparency Requirements

  • Interpretability needs by stakeholder group
  • Explanation formats and interfaces
  • Auditability and logging requirements

10. Validation and Verification Criteria

  • Testing methodology and acceptance criteria
  • Verification procedures for each requirement
  • Continuous monitoring in production

6.2 Requirements Prioritization

AI requirements often conflict: accuracy vs. interpretability, performance vs. fairness, efficiency vs. robustness. Spec-driven development makes these trade-offs explicit through structured prioritization.

The MoSCoW method (Must have, Should have, Could have, Won’t have) can be adapted for AI systems, with safety and ethical requirements typically classified as “Must have” regardless of performance trade-offs.

6.3 Comparison with Alternative Approaches

Dimension Spec-Driven Experiment-Driven Data-Driven Model-Centric
Starting Point Stakeholder requirements Research question Available dataset Model architecture
Requirements Capture Formal, comprehensive Informal, metric-focused Inferred from data Minimal
Safety Consideration Explicit constraints Post-hoc testing Implicit in data quality Post-hoc evaluation
Fairness Handling Specified requirements Optional metric Data debiasing Algorithmic fairness
Stakeholder Alignment Participatory process Researcher judgment Historical data patterns Developer preferences
Traceability Full chain Limited Data lineage only Model artifacts only
Verification Requirements-based Test set performance Data quality metrics Benchmark comparison
Best For Enterprise, regulated domains Research, exploration Data-rich environments Benchmark competitions

sequenceDiagram
    participant S as Stakeholders
    participant R as Requirements
    participant D as Data
    participant M as Model
    participant V as Validation
    
    Note over S,V: Spec-Driven Approach
    S->>R: Elicit needs
    R->>D: Specify data requirements
    R->>M: Define acceptance criteria
    D->>M: Train within constraints
    M->>V: Verify against requirements
    V->>S: Demonstrate compliance
    
    Note over S,V: Experiment-Driven Approach
    M->>D: Select dataset
    D->>M: Train and iterate
    M->>V: Evaluate on metrics
    V-->>S: Report results
    
    Note over S,V: Data-Driven Approach
    D->>M: Available data determines task
    M->>V: Optimize for data patterns
    V-->>R: Infer requirements post-hoc

7. Case Example: Credit Scoring System

To illustrate these principles, consider requirements for an AI-based credit scoring system:

Traditional Functional Requirement: “The system shall predict loan default probability for applicants.”

Spec-Driven AI Requirement:

Performance: The model shall achieve minimum AUC-ROC of 0.75 overall, with no subgroup (by race, gender, age bracket) falling below 0.72. Calibration error (ECE) shall not exceed 0.05 for any subgroup.

Behavioral: For applicants differing only in protected attributes (race, gender), predictions shall not differ by more than 5 percentile points when conditioning on all legitimate factors (income, credit history, employment). Monotonicity shall hold: increasing income shall not decrease approval probability, holding other factors constant.

Safety: For predictions with confidence < 0.6, the system shall defer to human review. The system shall not approve loans when required financial documentation is incomplete.

Fairness: Demographic parity ratio (approval rate ratio between protected groups) shall fall within [0.8, 1.25]. Equalized odds shall hold within ±5% across demographic groups. Individual applicants may request reconsideration with counterfactual explanations.

Adaptation: The model shall be retrained quarterly. Performance monitoring shall trigger retraining if any subgroup AUC drops below 0.70 or if distribution shift detector (KS test) shows p < 0.01. Learning shall occur only on verified historical loan outcomes, excluding loans flagged for fraud or data quality issues.

Explainability: For denied applications, the system shall provide the three most influential factors and specific threshold values needed for approval. Loan officers shall have access to SHAP values for all applicant features.

This specification provides clear, verifiable criteria that address stakeholder concerns while acknowledging AI system characteristics. Contrast this with data-driven approaches that might simply optimize for historical approval patterns, potentially perpetuating discriminatory practices embedded in historical data.

8. Conclusion: The Path Forward

Capturing requirements for AI systems demands extensions to traditional requirements engineering that address uncertainty, learning, emergence, and data dependency. Behavioral specifications complement performance metrics, safety and fairness requirements become first-class concerns, and stakeholder alignment processes must expand beyond traditional customer focus.

Spec-driven AI development provides a rigorous framework for enterprise AI deployment, offering advantages over experiment-driven, data-driven, and model-centric approaches:

  • Risk mitigation: Explicit safety and fairness constraints reduce deployment failures
  • Regulatory compliance: Documented requirements chains support audits and compliance demonstration
  • Stakeholder trust: Participatory requirements processes build confidence and legitimacy
  • Maintainability: Clear specifications enable systematic updates and improvements
  • Accountability: Traceability links system behaviors to responsible decision-makers

While experiment-driven approaches remain valuable for research exploration, data-driven methods for data-rich environments, and model-centric approaches for competitive benchmarking, enterprise AI systems benefit from the rigor and transparency that specification-first methodologies provide.

The challenge lies not in whether to specify requirements, but in developing specification techniques adequate to AI’s unique characteristics. This article has presented a framework—behavioral alongside performance specifications, safety and fairness as requirements rather than afterthoughts, and participatory methods that engage diverse stakeholders—that addresses this challenge while maintaining the discipline that has proven essential in traditional software engineering.

As AI systems increasingly mediate critical decisions affecting individuals and society, the shift from opportunistic, data-driven development to principled, specification-driven engineering becomes not merely advisable but imperative.

References

All references are hyperlinked inline throughout the article to their DOI, arXiv, or authoritative source URLs.


Recent Posts

  • The Small Model Revolution: When 7B Parameters Beat 70B
  • Edge AI Economics: When Edge Beats Cloud
  • Velocity, Momentum, and Collapse: How Global Macro Dynamics Drive Near-Term Political Risk
  • Economic Vulnerability and Political Fragility: Are They the Same Crisis?
  • World Models: The Next AI Paradigm — Morning Review 2026-03-02

Recent Comments

  1. Oleh on Google Antigravity: Redefining AI-Assisted Software Development

Archives

  • March 2026
  • February 2026

Categories

  • ai
  • AI Economics
  • Ancient IT History
  • Anticipatory Intelligence
  • Cost-Effective Enterprise AI
  • Future of AI
  • Geopolitical Risk Intelligence
  • hackathon
  • healthcare
  • innovation
  • Intellectual Data Analysis
  • medai
  • Medical ML Diagnosis
  • Research
  • Spec-Driven AI Development
  • Technology
  • Uncategorized
  • War Prediction

About

Stabilarity Research Hub is dedicated to advancing the frontiers of AI, from Medical ML to Anticipatory Intelligence. Our mission is to build robust and efficient AI systems for a safer future.

Language

  • Medical ML Diagnosis
  • AI Economics
  • Cost-Effective AI
  • Anticipatory Intelligence
  • Data Mining

Connect

Telegram: @Y0man

Email: contact@stabilarity.com

© 2026 Stabilarity Research Hub

© 2026 Stabilarity Hub | Powered by Superbs Personal Blog theme
Stabilarity Research Hub

Open research platform for AI, machine learning, and enterprise technology. All articles are preprints with DOI registration via Zenodo.

100+
Articles
6
Series
DOI
Archived

Research Series

  • Medical ML Diagnosis
  • Anticipatory Intelligence
  • Intellectual Data Analysis
  • AI Economics
  • Cost-Effective AI
  • Spec-Driven AI

Community

  • Join Community
  • MedAI Hack
  • Zenodo Archive
  • Contact Us

Legal

  • Terms of Service
  • About Us
  • Contact
Operated by
Stabilarity OÜ
Registry: 17150040
Estonian Business Register →
© 2026 Stabilarity OÜ. Content licensed under CC BY 4.0
Terms About Contact

We use cookies to enhance your experience and analyze site traffic. By clicking "Accept All", you consent to our use of cookies. Read our Terms of Service for more information.