Capturing AI Requirements: Beyond Functional Specifications

Spec-Driven AI DevelopmentAcademic Research · Article 3 of 26

Capturing AI Requirements: Beyond Functional Specifications #

Academic Citation:
Ivchenko, O. (2026). Capturing AI Requirements: Beyond Functional Specifications. Spec-Driven AI Development Series. Stabilarity Research Hub.
DOI: 10.5281/zenodo.18730498^[1]

DOI: 10.5281/zenodo.18730498^[1]Zenodo Archive ORCID

2,868 words · 2% fresh refs · 5 diagrams · 50 references

Author: Oleh Ivchenko

Series: Spec-Driven AI Development

Abstract #

Traditional requirements engineering approaches, developed for deterministic software systems, prove inadequate when applied to AI systems characterized by l[REDACTED]g, uncertainty, and emergent behavior. This article examines the unique challenges of capturing requirements for AI systems and proposes a structured framework that extends beyond conventional functional specifications. We explore behavioral specifications, performance requirements, safety constraints, fairness criteria, and stakeholder alignment techniques specific to AI development. Through comparison with experiment-driven and data-driven approaches, we demonstrate how specification-first methodologies enhance system reliability, regulatory compliance, and stakeholder trust while addressing the inherent non-determinism of machine l[REDACTED]g systems.

1. Introduction: The Requirements Challenge in AI Systems #

Requirements engineering (RE) forms the foundation of systematic software development, establishing what a system should do before determining how it will do it^[2]. However, AI systems introduce fundamental challenges that strain traditional RE frameworks. Unlike conventional software with deterministic input-output mappings, AI systems exhibit probabilistic behavior, learn from data, and may drift over time^[3], making precise functional specifications elusive.

The experiment-driven approach dominant in AI research often bypasses formal requirements engineering entirely, iterating on model architectures and hyperparameters until acceptable performance metrics are achieved^[4]. While effective for research contexts, this methodology creates significant risks in enterprise deployments where safety, fairness, regulatory compliance, and long-term maintainability are paramount^[5].

Data-driven development similarly prioritizes available datasets over stakeholder needs, often resulting in systems optimized for patterns in historical data rather than actual business requirements^[6]. Model-centric approaches focus on architectural innovations without adequate specification of acceptable system behavior, leading to deployment failures when models encounter out-of-distribution scenarios^[7].

Spec-driven AI development addresses these limitations by establishing clear, verifiable requirements before model selection or training begins. This article presents a comprehensive framework for capturing AI requirements that acknowledges uncertainty while maintaining rigor, enabling organizations to build AI systems that are both innovative and trustworthy.

2. Unique Challenges in AI Requirements Engineering #

2.1 Inherent Uncertainty and Non-Determinism #

Traditional requirements assume deterministic behavior: given input X, the system produces output Y. AI systems operate probabilistically^[8], producing distributions over possible outputs. This fundamental difference requires rethinking how we specify correctness. Rather than asserting “the system shall classify all images correctly,” specifications must accommodate inherent error rates, confidence thresholds, and uncertainty quantification^[9].

The specification must distinguish between aleatoric uncertainty (irreducible noise in data) and epistemic uncertainty (model knowledge gaps)^[3], as these require different mitigation strategies. Requirements should specify acceptable uncertainty levels, calibration requirements, and fallback behaviors when confidence falls below thresholds.

2.2 L[REDACTED]g and Adaptation #

AI systems learn from data, meaning their behavior evolves. Continual l[REDACTED]g systems update in production, potentially diverging from original specifications^[10]. Requirements must specify not just initial behavior but also acceptable l[REDACTED]g trajectories, drift detection mechanisms, and retraining triggers^[11].

Unlike static software where updates are controlled, online l[REDACTED]g systems may adapt to adversarial inputs or biased feedback loops^[12]. Specifications must establish guardrails: what the system may learn, from which data sources, under what conditions, and with what human oversight.

2.3 Emergent Behavior and Opacity #

Deep l[REDACTED]g models exhibit emergent behaviors not explicitly programmed^[13], complicating requirements specification. Traditional functional requirements describe intended behaviors, but AI systems may develop capabilities beyond their training objectives^[14], including unintended correlations and shortcuts.

Model opacity presents additional challenges. Explainability requirements must balance interpretability needs with model performance^[15], as interpretable models often sacrifice accuracy. Requirements should specify acceptable trade-offs between accuracy and interpretability^[16], contextual to the application domain.

2.4 Data Dependency #

AI system behavior depends fundamentally on training data quality and distribution. Data requirements form a critical component of AI specifications^[17], yet are often underspecified in experiment-driven approaches. Requirements must address data quality, representativeness, bias mitigation, and distribution shift detection^[7].

Unlike conventional software where requirements focus on functionality, AI requirements must specify data provenance, labeling quality, class balance, and coverage of edge cases^[6]. Data-driven approaches that work backward from available data fail to address cases where required data doesn’t exist or is prohibitively expensive to acquire.

graph TD
    A[Traditional Software RE] -->Deterministic| B[Functional Specs]
    A -->Controlled Updates| C[Change Management]
    A -->Transparent Logic| D[Verification]
    
    E[AI Systems RE] -->Probabilistic| F[Behavioral + Performance Specs]
    E -->Continuous L[REDACTED]g| G[Adaptation Boundaries]
    E -->Emergent Behavior| H[Safety Constraints]
    E -->Data Dependent| I[Data Requirements]
    
    F --> J[Spec-Driven AI Framework]
    G --> J
    H --> J
    I --> J
    
    K[Experiment-Driven] -.->No Formal Specs| L[Metrics Only]
    M[Data-Driven] -.->Dataset First| N[Requirements Inferred]
    
    style J fill:#90EE90
    style L fill:#FFB6C1
    style N fill:#FFB6C1

3. Behavioral Specifications vs Performance Specifications #

3.1 Performance Specifications #

Performance specifications quantify system capabilities through metrics. Common AI performance metrics include accuracy, precision, recall, F1 score, AUC-ROC, and domain-specific measures^[18]. While necessary, performance specifications alone prove insufficient.

The experiment-driven paradigm typically stops at performance specifications: “achieve 95% accuracy on test set”^[19]. However, high aggregate metrics can mask critical failures in subgroups^[7], and test set performance may not generalize to production distributions^[20].

Comprehensive performance specifications should include:

Aggregate metrics: Overall system performance on representative test sets^[18]
Subgroup metrics: Performance across demographic groups, edge cases, and rare scenarios^[21]
Robustness metrics: Performance degradation under adversarial attacks, distribution shift, and noise^[22]
Efficiency metrics: Latency, throughput, resource consumption, and scalability^[12]
Calibration metrics: Alignment between predicted confidence and actual accuracy^[23]

3.2 Behavioral Specifications #

Behavioral specifications define how systems should act in specific scenarios, complementing aggregate metrics with concrete examples. Behavior-driven development (BDD) approaches specify expected inputs and outputs^[3], but AI systems require probabilistic behavioral specifications.

Metamorphic testing provides one framework for behavioral specifications^[24]: defining relationships between input transformations and expected output transformations. For example, a sentiment classifier should produce similar outputs for paraphrased inputs, regardless of lexical variation.

Property-based testing offers another approach, specifying invariants the system must maintain^[25]:

Consistency: Similar inputs should produce similar outputs
Monotonicity: Increasing feature X should not decrease prediction Y (where causally justified)
Fairness properties: Predictions should not vary based on protected attributes when conditioned on legitimate factors^[26]
Causality constraints: Output changes should only result from causally relevant input changes^[27]

Behavioral specifications ground abstract performance metrics in concrete, verifiable scenarios. While experiment-driven approaches validate on aggregate test sets, behavioral testing uncovers systematic failures that aggregate metrics miss^[28].

graph LR
    A[Requirements] --> B[Performance Specs]
    A --> C[Behavioral Specs]
    
    B --> D[Accuracy: 92%]
    B --> E[Latency: <100ms]
    B --> F[Fairness Metric: 0.05]
    
    C --> G[Scenario Tests]
    C --> H[Metamorphic Relations]
    C --> I[Property Invariants]
    
    G --> J[Edge Cases]
    G --> K[User Stories]
    
    D -.->Can Miss| L[Subgroup Failures]
    E -.->Can Miss| M[Tail Latencies]
    F -.->Can Miss| N[Individual Fairness]
    
    J -->Catches| L
    K -->Catches| N
    
    style C fill:#90EE90
    style B fill:#FFD700

4. Safety and Fairness Requirements #

4.1 Safety Specifications #

Safety-critical AI systems require rigorous safety specifications beyond performance metrics. Safety requirements define unacceptable behaviors that the system must avoid^[29], even at the cost of reduced performance.

ISO/PAS 21448 (SOTIF – Safety Of The Intended Functionality) provides a framework for specifying safety in AI systems^[30], distinguishing between:

Known safe scenarios: Verified through testing and validation
Known unsafe scenarios: Explicitly prohibited or requiring human intervention
Unknown scenarios: Requiring uncertainty detection and conservative fallback

Safety specifications should include formal constraints using temporal logic or contracts^[31]. For example, an autonomous vehicle might specify: “The system shall not initiate lane changes when adjacent lane occupancy confidence < 95%." Such specifications make safety boundaries explicit and verifiable.

Runtime monitoring requirements ensure systems detect safety violations during operation^[32], triggering safe fallback modes. Unlike experiment-driven approaches that focus on average-case performance, spec-driven development prioritizes worst-case guarantees.

4.2 Fairness Requirements #

Fairness in AI systems lacks a universal definition, with multiple incompatible mathematical formulations^[26]. Requirements engineering must translate stakeholder fairness concerns into concrete, measurable specifications appropriate to the application context.

Common fairness specifications include:

Demographic parity: Equal acceptance rates across protected groups^[33]
Equalized odds: Equal true positive and false positive rates across groups^[34]
Calibration: Equal meaning of risk scores across groups^[35]
Individual fairness: Similar individuals receive similar treatment^[33]
Counterfactual fairness: Predictions would be identical in a counterfactual world where protected attributes differ^[21]

Impossibility results show these definitions conflict^[36], requiring stakeholders to choose appropriate specifications for their context. Data-driven approaches may optimize for fairness metrics without understanding their normative implications^[37], while spec-driven development forces explicit stakeholder deliberation on fairness trade-offs.

graph TD
    A[Safety & Fairness Requirements] --> B[Safety Specs]
    A --> C[Fairness Specs]
    
    B --> D[Forbidden Behaviors]
    B --> E[Performance Envelopes]
    B --> F[Runtime Monitoring]
    B --> G[Fallback Modes]
    
    C --> H[Group Fairness Metrics]
    C --> I[Individual Fairness]
    C --> J[Causality Constraints]
    
    D --> K[Formal Verification]
    E --> K
    F --> L[Runtime Assurance]
    G --> L
    
    H --> M[Stakeholder Alignment]
    I --> M
    J --> M
    
    N[Experiment-Driven] -.->Ignores| D
    N -.->Optimizes| O[Aggregate Metrics Only]
    
    P[Spec-Driven] -->Enforces| K
    P -->Requires| M
    
    style P fill:#90EE90
    style N fill:#FFB6C1

5. Stakeholder Alignment Techniques #

5.1 The Challenge of Diverse Stakeholders #

AI systems impact multiple stakeholders with potentially conflicting interests: end users, developers, business owners, regulators, and affected third parties^[38]. Traditional requirements engineering focuses on paying customers, but AI systems’ societal impact demands broader stakeholder consideration^[39].

Experiment-driven development typically prioritizes researcher or developer preferences, optimizing for publishable results or technical elegance rather than stakeholder needs^[40]. Data-driven approaches may inadvertently encode the preferences of historical decision-makers embedded in training data, perpetuating rather than interrogating past practices^[41].

5.2 Participatory Requirements Engineering #

Participatory design methods engage diverse stakeholders in requirements elicitation^[42], making implicit values explicit. Techniques include:

Value-sensitive design: Structured methods for identifying stakeholder values and translating them into technical requirements^[43]
Speculative scenarios: Concrete narratives exploring system implications across diverse contexts^[44]
Contestable AI design: Requirements that enable users to challenge and contest AI decisions^[45]
Co-design workshops: Collaborative sessions where stakeholders jointly define acceptable system behavior^[46]

5.3 Requirements Traceability and Documentation #

Model cards and datasheets provide structured documentation formats^[47] for communicating AI system capabilities and limitations. However, these address post-hoc documentation rather than prospective requirements.

Spec-driven development requires bidirectional traceability from stakeholder concerns to technical specifications to implementation artifacts^[8]. This enables:

Verification that all stakeholder requirements are addressed
Impact analysis when requirements change
Accountability by linking system behaviors to responsible parties
Regulatory compliance demonstration through documented requirement chains^[48]

graph TD
    A[Stakeholder Groups] --> B[End Users]
    A --> C[Business Owners]
    A --> D[Developers]
    A --> E[Regulators]
    A --> F[Affected Third Parties]
    
    B --> G[Participatory RE Methods]
    C --> G
    D --> G
    E --> G
    F --> G
    
    G --> H[Value-Sensitive Design]
    G --> I[Co-Design Workshops]
    G --> J[Speculative Scenarios]
    
    H --> K[Requirements Document]
    I --> K
    J --> K
    
    K --> L[Behavioral Specs]
    K --> M[Performance Specs]
    K --> N[Safety Constraints]
    K --> O[Fairness Criteria]
    
    L --> P[Traceability Matrix]
    M --> P
    N --> P
    O --> P
    
    P --> Q[Implementation]
    P --> R[Verification]
    P --> S[Compliance Audit]
    
    T[Experiment-Driven] -.->Skips| G
    T -.->No| P
    
    style G fill:#90EE90
    style P fill:#90EE90
    style T fill:#FFB6C1

6. Practical Framework for AI Requirements Specification #

6.1 Structured Requirements Template #

A comprehensive AI requirements specification should include the following sections, informed by recent research on AI-specific requirements patterns^[3]:

1. Context and Scope

Problem statement and business objectives
Stakeholder identification and needs analysis
System boundaries and interfaces
Regulatory and ethical constraints

2. Data Requirements (following datasheet methodology^[17])

Data sources and provenance
Quantity, quality, and representativeness criteria
Labeling protocols and quality assurance
Privacy and security constraints
Bias mitigation requirements

3. Functional Requirements

Primary tasks and capabilities
Input/output specifications with probability distributions
Integration requirements with existing systems

4. Performance Requirements

Aggregate metrics with minimum acceptable thresholds
Subgroup performance requirements
Robustness and reliability criteria
Efficiency and scalability targets

5. Behavioral Requirements

Scenario-based test cases
Metamorphic relations and property invariants
Edge cases and boundary conditions

6. Safety Requirements

Prohibited behaviors and failure modes
Performance envelope boundaries
Runtime monitoring specifications
Fallback and degradation strategies

7. Fairness Requirements

Protected attributes and sensitive subgroups
Chosen fairness metrics with justification
Acceptable performance trade-offs

8. Uncertainty and Adaptation Requirements

Confidence calibration requirements
Out-of-distribution detection mechanisms
L[REDACTED]g boundaries and update protocols
Drift detection and retraining triggers

9. Explainability and Transparency Requirements

Interpretability needs by stakeholder group
Explanation formats and interfaces
Auditability and logging requirements

10. Validation and Verification Criteria

Testing methodology and acceptance criteria
Verification procedures for each requirement
Continuous monitoring in production

6.2 Requirements Prioritization #

AI requirements often conflict^[8]: accuracy vs. interpretability, performance vs. fairness, efficiency vs. robustness. Spec-driven development makes these trade-offs explicit through structured prioritization.

The MoSCoW method (Must have, Should have, Could have, Won’t have) can be adapted for AI systems, with safety and ethical requirements typically classified as “Must have”^[39] regardless of performance trade-offs.

6.3 Comparison with Alternative Approaches #

Dimension	Spec-Driven	Experiment-Driven	Data-Driven	Model-Centric
Starting Point	Stakeholder requirements	Research question	Available dataset	Model architecture
Requirements Capture	Formal, comprehensive	Informal, metric-focused	Inferred from data	Minimal
Safety Consideration	Explicit constraints	Post-hoc testing	Implicit in data quality	Post-hoc evaluation
Fairness Handling	Specified requirements	Optional metric	Data debiasing	Algorithmic fairness
Stakeholder Alignment	Participatory process	Researcher judgment	Historical data patterns	Developer preferences
Traceability	Full chain	Limited	Data lineage only	Model artifacts only
Verification	Requirements-based	Test set performance	Data quality metrics	Benchmark comparison
Best For	Enterprise, regulated domains	Research, exploration	Data-rich environments	Benchmark competitions

sequenceDiagram
    participant S as Stakeholders
    participant R as Requirements
    participant D as Data
    participant M as Model
    participant V as Validation
    
    Note over S,V: Spec-Driven Approach
    S->>R: Elicit needs
    R->>D: Specify data requirements
    R->>M: Define acceptance criteria
    D->>M: Train within constraints
    M->>V: Verify against requirements
    V->>S: Demonstrate compliance
    
    Note over S,V: Experiment-Driven Approach
    M->>D: Select dataset
    D->>M: Train and iterate
    M->>V: Evaluate on metrics
    V-->>S: Report results
    
    Note over S,V: Data-Driven Approach
    D->>M: Available data determines task
    M->>V: Optimize for data patterns
    V-->>R: Infer requirements post-hoc

7. Case Example: Credit Scoring System #

To illustrate these principles, consider requirements for an AI-based credit scoring system:

Traditional Functional Requirement: “The system shall predict loan default probability for applicants.”

Spec-Driven AI Requirement:

Performance: The model shall achieve minimum AUC-ROC of 0.75 overall, with no subgroup (by race, gender, age bracket) falling below 0.72. Calibration error (ECE) shall not exceed 0.05 for any subgroup.

Behavioral: For applicants differing only in protected attributes (race, gender), predictions shall not differ by more than 5 percentile points when conditioning on all legitimate factors (income, credit history, employment). Monotonicity shall hold: increasing income shall not decrease approval probability, holding other factors constant.

Safety: For predictions with confidence < 0.6, the system shall defer to human review. The system shall not approve loans when required financial documentation is incomplete.

Fairness: Demographic parity ratio (approval rate ratio between protected groups) shall fall within [0.8, 1.25]. Equalized odds shall hold within ±5% across demographic groups. Individual applicants may request reconsideration with counterfactual explanations.

Adaptation: The model shall be retrained quarterly. Performance monitoring shall trigger retraining if any subgroup AUC drops below 0.70 or if distribution shift detector (KS test) shows p < 0.01. L[REDACTED]g shall occur only on verified historical loan outcomes, excluding loans flagged for fraud or data quality issues.

Explainability: For denied applications, the system shall provide the three most influential factors and specific threshold values needed for approval. Loan officers shall have access to SHAP values for all applicant features.

This specification provides clear, verifiable criteria that address stakeholder concerns while acknowledging AI system characteristics. Contrast this with data-driven approaches that might simply optimize for historical approval patterns^[7], potentially perpetuating discriminatory practices embedded in historical data.

8. Conclusion: The Path Forward #

Capturing requirements for AI systems demands extensions to traditional requirements engineering that address uncertainty, l[REDACTED]g, emergence, and data dependency. Behavioral specifications complement performance metrics, safety and fairness requirements become first-class concerns, and stakeholder alignment processes must expand beyond traditional customer focus.

Spec-driven AI development provides a rigorous framework for enterprise AI deployment, offering advantages over experiment-driven, data-driven, and model-centric approaches:

Risk mitigation: Explicit safety and fairness constraints reduce deployment failures^[5]
Regulatory compliance: Documented requirements chains support audits and compliance demonstration^[48]
Stakeholder trust: Participatory requirements processes build confidence and legitimacy^[39]
Maintainability: Clear specifications enable systematic updates and improvements^[3]
Accountability: Traceability links system behaviors to responsible decision-makers

While experiment-driven approaches remain valuable for research exploration, data-driven methods for data-rich environments, and model-centric approaches for competitive benchmarking, enterprise AI systems benefit from the rigor and transparency that specification-first methodologies provide^[8].

The challenge lies not in whether to specify requirements, but in developing specification techniques adequate to AI’s unique characteristics. This article has presented a framework—behavioral alongside performance specifications, safety and fairness as requirements rather than afterthoughts, and participatory methods that engage diverse stakeholders—that addresses this challenge while maintaining the discipline that has proven essential in traditional software engineering.

As AI systems increasingly mediate critical decisions affecting individuals and society, the shift from opportunistic, data-driven development to principled, specification-driven engineering becomes not merely advisable but imperative.

Preprint References (original)+

References (48) #

Stabilarity Research Hub. (2026). Capturing AI Requirements: Beyond Functional Specifications. doi.org. d t i i
Ichikawa, T.; Hirakawa, M.. (1990). Iconic programming: where to go?. doi.org. d c r t l
Dalpiaz, Fabiano; Niu, Nan. (2020). Requirements Engineering in the Days of Artificial Intelligence. doi.org. d c r t l
Cross-lingual Language Model Pretraining. proceedings.neurips.cc. r t a
Zhao, Nengwen; Chen, Junjie; Peng, Xiao; Wang, Honglin; Wu, Xinya. (2020). Understanding and handling alert storm for online service systems. doi.org. d c r t l
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, et al.. (2019). Model Cards for Model Reporting. doi.org. d c r t i l
Hancox-Li, Leif. (2020). Robustness in machine learning explanations. doi.org. d c r t l
Gralha, Catarina; Goulao, Miguel; Araujo, Joao. (2019). Analysing Gender Differences in Building Social Goal Models: A Quasi-Experiment. doi.org. d c r t l
[1702.08608] Towards A Rigorous Science of Interpretable Machine Learning. arxiv.org. t i i
Tan, Sarah; Soloviev, Matvey; Hooker, Giles; Wells, Martin T.. (2020). Tree Space Prototypes. doi.org. d c r t l
[2004.05322] Holding-Based Evaluation upon Actively Managed Stock Mutual Funds in China. arxiv.org. t i i
Zhu, Jieming; He, Shilin; Liu, Jinyang; He, Pinjia; Xie, Qi. (2019). Tools and Benchmarks for Automated Log Parsing. doi.org. d c r t l
Lehmann, Daniel; Pradel, Michael. (2018). Feedback-directed differential testing of interactive debuggers. doi.org. d c r t l
(2020). Thread: Circuits. distill.pub. b
Guidotti, Riccardo; Monreale, Anna; Ruggieri, Salvatore; Turini, Franco; Giannotti, Fosca. (2019). A Survey of Methods for Explaining Black Box Models. doi.org. d c r t l
[1606.03490] The Mythos of Model Interpretability. arxiv.org. t i i
[1803.09010] Datasheets for Datasets. arxiv.org. t i i
Ribeiro, Filipe N.; Saha, Koustuv; Babaei, Mahmoudreza; Henrique, Lucas; Messias, Johnnatan. (2019). On Microtargeting Socially Divisive Ads. doi.org. d c r t l
"achieve 95% accuracy on test set". proceedings.mlr.press. r t a
[1807.01069] Adversarial Robustness Toolbox v1.0.0. arxiv.org. t i i
[1703.06856] Counterfactual Fairness. arxiv.org. t i i
[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks. arxiv.org. t i i
On Calibration of Modern Neural Networks. proceedings.mlr.press. r t a
[2005.04118] Beyond Accuracy: Behavioral Testing of NLP models with CheckList. arxiv.org. t i i
Cai, Yan; Meng, Ruijie; Palsberg, Jens. (2020). Low-overhead deadlock prediction. doi.org. d c r t l
[1609.07236] On the (im)possibility of fairness. arxiv.org. t i i
[1705.08821] Causal Effect Inference with Deep Latent-Variable Models. arxiv.org. t i i
Niven, Timothy; Kao, Hung-Yu. (2019). Probing Neural Network Comprehension of Natural Language Arguments. doi.org. d c t a
(2020). Safety requirements define unacceptable behaviors that the system must avoid. doi.org. d r t l
ISO 21448:2022 – Road vehicles — Safety of the intended functionality. iso.org. t a
[1606.06565] Concrete Problems in AI Safety. arxiv.org. t i i
Le, Xuan-Bach D.; Bao, Lingfeng; Lo, David; Xia, Xin; Li, Shanping. (2019). On Reliability of Patch Correctness Assessment. doi.org. d c r t l
[1104.3913] Fairness Through Awareness. arxiv.org. t i i
[1610.02413] Equality of Opportunity in Supervised Learning. arxiv.org. t i i
[1609.05807] Inherent Trade-Offs in the Fair Determination of Risk Scores. arxiv.org. t i i
Selbst, Andrew D.; Boyd, Danah; Friedler, Sorelle A.; Venkatasubramanian, Suresh; Vertesi, Janet. (2019). Fairness and Abstraction in Sociotechnical Systems. doi.org. d c r t l
Marda, Vidushi; Narayan, Shivangi. (2020). Data in New Delhi's predictive policing system. doi.org. d c r t l
De-Arteaga, Maria; Romanov, Alexey; Wallach, Hanna; Chayes, Jennifer; Borgs, Christian. (2019). Bias in Bios. doi.org. d c r t l
Raji, Inioluwa Deborah; Smart, Andrew; White, Rebecca N.; Mitchell, Margaret; Gebru, Timnit. (2020). Closing the AI accountability gap. doi.org. d c r t l
Sambasivan, Nithya; Kapania, Shivani; Highfill, Hannah; Akrong, Diana; Paritosh, Praveen; Aroyo, Lora M. (2021). “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. doi.org. d c r t i l
[1901.10002] A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. arxiv.org. t i i
Barocas, Solon; Selbst, Andrew D.; Raghavan, Manish. (2020). The hidden assumptions behind counterfactual explanations and principal reasons. doi.org. d c r t l
Friedman, Batya. (1996). Value-sensitive design. doi.org. d c r t l
Pettersson, Ingrid; Lachner, Florian; Frison, Anna-Katharina; Riener, Andreas; Butz, Andreas. (2018). A Bermuda Triangle?. doi.org. d c r t l
Langevin, Raina; Lordon, Ross J; Avrahami, Thi; Cowan, Benjamin R.; Hirsch, Tad. (2021). Heuristic Evaluation of Conversational Agents. doi.org. d c r t l
Marshall, Joe; Benford, Steve; Byrne, Richard; Tennent, Paul. (2019). Sensory Alignment in Immersive Entertainment. doi.org. d c r t l
[1810.03993] Model Cards for Model Reporting. arxiv.org. t i i
[2004.07213] Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arxiv.org. t i i

Version History · 2 revisions

Rev	Date	Status	Action	By	Size
v1	Mar 10, 2026	DRAFT	Initial draft First version created	(w) Author	24,565 (+24565)
v2	Mar 10, 2026	CURRENT	Published Article published to research hub	(w) Author	25,023 (+458)