Skip to content

Stabilarity Hub

Menu
  • Home
  • Research
    • Medical ML Diagnosis
    • AI Economics
    • Cost-Effective AI
    • Anticipatory Intelligence
    • External Publications
    • Intellectual Data Analysis
    • Spec-Driven AI Development
    • Future of AI
    • AI Intelligence Architecture — A Research Series
    • Geopolitical Risk Intelligence
  • Projects
    • ScanLab
    • War Prediction
    • Risk Calculator
    • Anticipatory Intelligence Gap Analyzer
    • Data Mining Method Selector
    • AI Implementation ROI Calculator
    • AI Use Case Classifier & Matcher
    • AI Data Readiness Index Assessment
    • Ukraine Crisis Prediction Hub
    • Geopolitical Risk Platform
  • Events
    • MedAI Hackathon
  • Join Community
  • About
  • Contact
  • Terms of Service
Menu

AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap

Posted on February 23, 2026February 23, 2026 by
AI safety concept

AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap

📚 Academic Citation: Ivchenko, O. (2026). AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap. Future of AI Series. Odesa National Polytechnic University. DOI: 10.5281/zenodo.18741627

Abstract

MIT CSAIL’s 2025 AI Agent Index analyzed 30 prominent AI agents and found a striking transparency deficit: while 70% provide documentation and nearly half publish code, only 19% disclose formal safety policies and fewer than 10% report external safety evaluations. This journal entry examines the study’s findings, contextualizes the claims within the broader AI safety discourse, and assesses whether the alarm raised is proportionate to the evidence.


The Source

Primary Source: MIT CSAIL 2025 AI Agent Index

Coverage: The Register | Gizmodo | CNET

Published: February 2025 (2025 AI Agent Index) | Widely reported February 19-20, 2026

Authors: Leon Staufer (Cambridge), Kevin Feng (UW), Kevin Wei (Harvard Law), Luke Bailey (Stanford), Yawen Duan (Concordia AI), Mick Yang (UPenn), A. Pinar Ozisik (MIT), Stephen Casper (MIT), Noam Kolt (Hebrew University)


The Claim

AI agents are proliferating across the web and enterprise environments with minimal safety disclosures, creating a transparency crisis. According to MIT CSAIL’s 2025 AI Agent Index, which analyzed 30 deployed agentic systems across 1,350 data points and 45 annotation fields, the picture is stark:

  • 87% lack safety cards — standardized safety documentation
  • Only 19% disclose formal safety policies — down from expectations
  • Fewer than 10% report external safety evaluations — third-party testing is rare
  • 21 of 30 agents don’t disclose they are AI — masquerading as human traffic
  • Just 4 of 13 “frontier autonomy” agents have agent-specific evaluations — ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5 Computer Use

As Gizmodo reports, “AI agents are flooding the web and workplace, functioning with a shocking amount of autonomy and minimal oversight.” The Register notes that despite growing investment in AI agents, “key aspects of their real-world development and deployment remain opaque, with little information made publicly available to researchers or policymakers.”

The framing from media coverage emphasizes alarm: agents are “running wild,” operating with “few guardrails,” and creating serious security vulnerabilities. The Outpost warns of a “transparency crisis” where autonomous systems can “access files, send emails, make purchases, or modify documents” while mistakes “can propagate across multiple steps.”


Study Methodology

The MIT CSAIL team employed a rigorous three-stage filtering process to select agents for the Index. Rather than arbitrarily defining “agency,” they synthesized existing academic definitions into four measurable dimensions: autonomy, goal complexity, environmental interaction, and generality. This approach follows frameworks developed by Chan et al. (2023), Kasirzadeh and Gabriel (2025), and Feng et al. (2025).

The inclusion criteria framework operates as shown below:

flowchart LR
    A[Candidate Agent] --> B{Agency Criteria}
    B -->|All 4 required| C{Impact Criteria}
    B -->|Missing any| X1[Excluded]
    C -->|At least 1| D{Practicality Criteria}
    C -->|None met| X2[Excluded]
    D -->|All 3 required| E[Included in Index]
    D -->|Missing any| X3[Excluded]
    
    B -.-> B1[Autonomy]
    B -.-> B2[Goal Complexity]
    B -.-> B3[Env. Interaction]
    B -.-> B4[Generality]
    
    C -.-> C1[Public Interest]
    C -.-> C2[Market Significance]
    C -.-> C3[Developer Significance]
    
    D -.-> D1[Public Availability]
    D -.-> D2[Deployability]
    D -.-> D3[General Purpose]
    
    style E fill:#90EE90
    style X1 fill:#FFB6C6
    style X2 fill:#FFB6C6
    style X3 fill:#FFB6C6

Seven subject-matter experts from MIT CSAIL, Harvard Law, Stanford, UPenn, and other institutions annotated agents across six information categories: product overview, company accountability, technical capabilities, autonomy & control, ecosystem interaction, and safety & evaluation. Each expert was responsible for specific annotation fields rather than specific agents, ensuring consistency across the dataset.

The researchers identified 95 candidate agents through LLM-assisted queries, then systematically screened each against the inclusion criteria. Two Chinese ecosystem experts were consulted to mitigate linguistic and ecosystem-related blind spots, given the global nature of AI agent development. The final selection of 30 agents represents the intersection of high agency, significant real-world impact, and public deployability.

Critically, the study relied exclusively on public information—documentation, websites, demos, published papers, and governance documents. No experimental testing was performed. This methodological constraint means the Index documents what developers choose to disclose, not necessarily what safety measures exist internally. This distinction is central to understanding the study’s findings.


The Evidence

The MIT CSAIL study is methodologically solid. The researchers analyzed 30 AI agents across three categories:

  • Chat-based tools (12 systems): ChatGPT Agent, Claude Code, Manus AI
  • Browser-based agents (5 systems): Perplexity Comet, ChatGPT Atlas, ByteDance Agent TARS — highest autonomy
  • Enterprise workflow agents (13 systems): Microsoft 365 Copilot, ServiceNow Agent, IBM watsonx

The transparency gap varies significantly across these categories, as visualized below:

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#2196F3','primaryTextColor':'#fff','primaryBorderColor':'#1976D2','lineColor':'#F39C12','secondaryColor':'#90EE90','tertiaryColor':'#FFB6C6'}}}%%
graph TD
    subgraph "Transparency Metrics (30 Agents)"
        A[Documentation: 70%] --> B{Safety Disclosure}
        B -->|19%| C[Formal Safety Policies]
        B -->|<10%| D[External Safety Evals]
        B -->|87%| E[No Safety Cards]
        
        F[Code Published: ~50%] --> B
        
        G[Identity Disclosure] -->|30%| H[AI Identification]
        G -->|70%| I[No AI Disclosure]
        
        J[Frontier Agents: 13] -->|4 of 13| K[Agent-Specific Evals]
        J -->|9 of 13| L[No Agent Evals]
    end
    
    style C fill:#90EE90
    style D fill:#90EE90
    style E fill:#FFB6C6
    style H fill:#90EE90
    style I fill:#FFB6C6
    style K fill:#90EE90
    style L fill:#FFB6C6

The numbers are verified and concerning. According to The Register, “25 of the 30 agents covered provide no details about safety testing and 23 offer no third-party testing data.” Furthermore, Gizmodo reports that “nine of 30 agents have no documentation of guardrails against potentially harmful actions,” leaving them vulnerable to prompt injection attacks.

The disclosure gap extends to basic operational transparency. Just seven agents published stable User-Agent strings and IP address ranges for verification, while nearly as many explicitly use Chrome-like strings and residential IP contexts to make their traffic appear human. This creates web conduct tensions: The Register notes that “the tendency of AI agents to ignore the Robot Exclusion Protocol suggests that established web protocols may no longer be sufficient.”

Industry adoption is surging. Research papers mentioning “AI Agent” or “Agentic AI” in 2025 more than doubled the total from 2020 to 2024 combined, and a McKinsey survey found that 62% of companies reported experimenting with AI agents. McKinsey projects AI agents could add $2.9 trillion to the US economy by 2030, though enterprises aren’t yet seeing substantial ROI.

The governance challenges are real. Most agents rely on a handful of foundation models from Anthropic, Google, and OpenAI, creating layered dependencies that are difficult to evaluate because “no single entity is responsible,” according to MIT researchers. Among agents originating outside China, 15 point to safety frameworks like Anthropic’s Responsible Scaling Policy or OpenAI’s Preparedness Framework, but ten lack safety framework documentation entirely.


The Dependency Problem

A critical finding from the MIT study concerns the layered architecture of modern AI agents, which creates what researchers call “diffuse responsibility.” Most deployed agents don’t train their own models—they orchestrate calls to foundation models via API while adding custom tools, memory systems, and control logic.

This architectural pattern produces a responsibility gap: when an agent causes harm, is the orchestration layer responsible? The foundation model provider? The API service? The three-layer structure looks like this:

graph TB
    subgraph "Typical AI Agent Architecture"
        A[User Request] --> B[Orchestration Layer]
        B -->|Planning & Tool Selection| C[Agent Logic]
        C -->|API Call| D[Foundation Model]
        D -->|Response| C
        C -->|Tool Execution| E[External Services]
        E -->|Results| C
        C -->|Monitoring| F[Safety Layer?]
        F -.->|Often Missing| G[Disclosure Gap]
        C --> H[User Output]
    end
    
    subgraph "Responsibility Layers"
        I[Developer: Orchestration] 
        J[Provider: Foundation Model]
        K[Platform: API Service]
    end
    
    B -.-> I
    D -.-> J
    C -.-> K
    
    style G fill:#FFB6C6
    style F fill:#FFF9E6

The MIT researchers found that this dependency stack complicates accountability in several ways:

  • Cascading opacity: If the foundation model lacks transparency, agent developers can’t fully document their system’s behavior
  • Split evaluations: Foundation model evaluations test the base model, not the agentic orchestration layer
  • Contractual limitations: API terms of service often prohibit developers from conducting independent safety testing
  • Version drift: Foundation models update continuously, but agents may not re-evaluate with each model version

Among the 30 indexed agents, 15 explicitly reference upstream safety frameworks (Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, Microsoft’s Responsible AI Standard), but these frameworks apply to foundation models, not agent-specific risks like multi-step tool chaining, persistent memory manipulation, or autonomous web transactions.

The study documents five known incidents involving indexed agents, with prompt injection vulnerabilities confirmed for two of the five browser agents. However, the researchers note that incident documentation itself is sparse—most agent developers don’t maintain public incident logs or participate in shared incident databases like the AI Incident Database.


Geographic and Ecosystem Patterns

The Index reveals striking geographic disparities in transparency practices. Chinese-incorporated agents (5 of 30) typically lack documented safety frameworks (1 of 5 disclose) and compliance standards (1 of 5 disclose), though the researchers caution that compliance may exist but not be publicly documented in English-language materials.

Western agents, particularly those from developers in the Foundation Model Transparency Index or Frontier Model Forum, show higher disclosure rates but still fall well short of comprehensive transparency. The best-documented agents—ChatGPT Agent, Claude Code, Gemini 2.5 Computer Use, and OpenAI Codex—come from foundation model developers who have institutional commitments to transparency, suggesting that vertical integration (controlling both model and agent) correlates with better disclosure.

Enterprise workflow agents demonstrate different patterns. While they rarely publish safety cards or external evaluations, they more frequently document compliance with industry standards (SOC 2, ISO 27001, GDPR) because their enterprise customers demand such certifications. This creates a two-tier transparency regime: consumer-facing agents optimize for capability marketing, while enterprise agents optimize for compliance documentation.

Browser-based agents, the highest-risk category due to their autonomous web interaction capabilities, show the poorest transparency. Of the five browser agents indexed, only Perplexity Comet and ChatGPT Atlas provide meaningful safety documentation, and both limit functionality compared to unrestricted browser automation. ByteDance Agent TARS, despite being one of the most capable browser agents, provides minimal public safety information.


Policy Implications and Emerging Standards

The transparency deficit documented by MIT CSAIL arrives at a critical regulatory moment. Multiple jurisdictions are developing AI governance frameworks, but most focus on foundation models rather than agentic systems. The EU AI Act, for instance, classifies systems by risk level but doesn’t explicitly address agentic capabilities as a risk factor.

The U.S. AI Safety Institute released technical guidance in January 2025 that includes agent-specific evaluation criteria, representing the first major governmental framework to treat agents as a distinct governance challenge. The guidance recommends that agent developers document autonomy levels, tool access scope, memory persistence, multi-step planning capabilities, and failure modes—precisely the information gaps the MIT study identifies.

Industry self-regulation efforts show mixed progress. In December 2025, OpenAI, Anthropic, Google DeepMind, and Microsoft announced the AI Agent Safety Foundation, committing to develop standardized agent evaluation protocols and safety disclosure frameworks by mid-2026. The Foundation’s preliminary proposals include:

  • Agent Safety Cards: Standardized documentation covering tool access, autonomy levels, evaluation results, and incident history
  • Capability Evaluation Suite: Shared benchmarks for testing agent capabilities in financial transactions, code execution, data access, and social engineering
  • Red-teaming protocols: Standardized adversarial testing for prompt injection, jailbreaking, and goal misalignment
  • Incident reporting: Voluntary participation in a shared incident database

However, the Foundation has no enforcement mechanism and participation is voluntary. Critics note that all founding members are foundation model developers, not independent agent orchestration platforms, raising questions about whether the standards will address developer incentives toward opacity.

The MIT researchers introduce the term “safety washing” to describe a specific pattern they observed: “publishing high-level safety and ethics frameworks while only selectively disclosing the empirical evidence required to rigorously assess risk.” This differs from “ethics washing” (performative ethics statements with no implementation) because safety washing involves real infrastructure whose details remain opaque. Half of indexed agents engage in some form of safety washing—referencing frameworks, committing to principles, or describing processes without providing testable evidence.


Our Take

This is a legitimate concern wrapped in sensationalist framing. The MIT CSAIL study provides rigorous, quantified evidence of a transparency deficit—that part is indisputable. But the “running wild” narrative obscures important nuance.

What’s solid:

  • The 87% figure on missing safety cards is accurate and troubling
  • The lack of standardized evaluation frameworks creates real governance challenges
  • The masquerading-as-human behavior violates web norms and complicates accountability
  • The dependency stack (orchestration → foundation model → API) creates diffuse responsibility
  • The geographic disparity in transparency is significant and under-discussed
  • The documentation of “safety washing” as a distinct pattern is valuable

What’s overstated:

  • “Running wild” suggests uncontrolled proliferation, but 24 of 30 agents were released or updated in 2024-2025—a rapid but not chaotic pace
  • “Minimal oversight” elides the fact that half do have safety frameworks, even if disclosure is incomplete
  • “Shocking autonomy” overlooks that browser agents represent just 5 of 30 systems, and even those require human prompting to initiate tasks
  • The framing implies imminent danger, but the study itself focuses on disclosure gaps, not demonstrated harms
  • The study documents five known incidents, but media coverage rarely mentions this modest number

The MIT research is careful and empirical. The media coverage, less so. Headlines like Gizmodo’s “AI Agents Are Running Wild Online, With Few Guardrails in Place” conflate lack of public disclosure with lack of internal controls—these are related but distinct problems.

The methodological constraint matters: the Index documents only public information. An agent with robust internal safety infrastructure but poor external communication would score the same as an agent with no safety measures at all. This is a feature, not a bug—the study deliberately measures what researchers and policymakers can actually verify—but it means the findings describe an opacity problem, not necessarily a safety problem.

That said, opacity is a safety problem in a different sense. If researchers can’t evaluate agents, regulators can’t assess compliance, and users can’t make informed decisions, then even perfectly safe systems become governance failures. The distinction between “unsafe” and “unverifiably safe” matters philosophically but not practically.

The comparison to OpenClaw is instructive. The Register previously covered the open-source agent’s security concerns and the accompanying Moltbook network’s chaotic growth. OpenClaw represents the true “wild” scenario—open-source, minimal coordination, rapid viral adoption, user-operated infrastructure. The 30 agents in MIT’s index, by contrast, are largely corporate-backed systems with legal departments, compliance teams, and institutional accountability (even if insufficiently disclosed).

The December 2025 AI Agent Safety Foundation announcement suggests the industry recognizes the problem and is attempting standardization, though the timeline (mid-2026 for initial protocols) and voluntary nature limit its immediate impact. History suggests that voluntary disclosure frameworks rarely achieve comprehensive participation without regulatory pressure or market demand.


What Should Happen Next?

The MIT study implicitly argues for several interventions:

  1. Mandatory agent safety cards for systems meeting defined autonomy thresholds, similar to model cards for foundation models
  2. Standardized agent evaluation protocols that test orchestration-layer risks (multi-step failures, tool chaining exploits, memory persistence attacks) rather than only base model capabilities
  3. Web conduct standards requiring agents to identify themselves via User-Agent strings and respect existing protocols like robots.txt
  4. Incident reporting requirements that create shared visibility into agent failures and near-misses
  5. Third-party auditing infrastructure for agent-specific safety claims, analogous to financial auditing

Some of these interventions could be implemented through industry self-regulation (web conduct standards, shared evaluation protocols), while others likely require regulatory mandates (mandatory disclosure, incident reporting). The AI Agent Safety Foundation represents a step toward self-regulation, but its voluntary nature and narrow membership base limit its reach.

The challenge is timing. Agent capabilities are advancing faster than governance frameworks can develop. The 2024 Index documented 25 agents; the 2025 Index documents 30, with substantially higher autonomy levels. Research paper mentions of “AI Agent” doubled year-over-year. If this pace continues, 2026 could see 60+ deployed agentic systems, making retroactive governance increasingly difficult.

The fundamental tension is this: developers have strong incentives to disclose capabilities (for marketing) and weak incentives to disclose safety limitations (which might deter adoption or invite regulation). Without changing these incentives through policy, market pressure, or institutional norms, the transparency gap is likely to persist or widen.


The Verdict

🟡 OVERSTATED

The MIT CSAIL study is rigorous and its findings are significant: there is a real transparency deficit in AI agent deployment. The 87% figure on missing safety cards is accurate, and the governance challenges are legitimate concerns that merit attention from researchers, developers, and policymakers.

However, the “running wild” framing in media coverage overstates the immediacy of the threat. Most agents are corporate-backed systems with institutional accountability structures, even if disclosure is insufficient. The study documents opacity, not chaos. “Running wild” implies uncontrolled proliferation and demonstrated harms; what the evidence shows is rapid deployment with inadequate public documentation of safety practices.

The distinction matters. Opacity is a governance problem that can be addressed through disclosure requirements, standardization efforts, and regulatory frameworks. Chaos would require emergency intervention. The former is what we have; the latter is what some headlines suggest.

The study’s reliance on public information is methodologically sound but means it measures verifiable transparency rather than actual safety. These are related but distinct: an agent with excellent internal safety but poor external communication scores the same as an agent with no safety measures. Both are governance failures, but of different kinds.

The documentation of “safety washing” and the analysis of layered dependency architectures represent valuable contributions to AI governance discourse. The geographic patterns in transparency and the differential disclosure practices across agent categories (consumer, enterprise, browser) provide actionable insights for policy development.

Bottom line: Real problem, sensationalist packaging. The transparency gap is genuine and consequential. The “running wild” narrative is not. Read the MIT CSAIL index and the arXiv paper themselves, not just the headlines.


References

  • Staufer, L., Feng, K., Wei, K., Bailey, L., Duan, Y., Yang, M., Ozisik, A. P., Casper, S., & Kolt, N. (2026). The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems. arXiv preprint arXiv:2602.17753. https://arxiv.org/abs/2602.17753
  • MIT CSAIL (2025). 2025 AI Agent Index. https://aiagentindex.mit.edu/
  • The Register (2026, Feb 20). AI agents abound, unbound by rules or safety disclosures.
  • Gizmodo (2026, Feb 20). New Research Shows AI Agents Are Running Wild Online, With Few Guardrails in Place.
  • CNET (2026, Feb 20). AI Agents Are Getting Smarter. MIT Finds Their Safety Disclosures Aren’t.
  • McKinsey (2026). Agents, Robots, and Us: Skill Partnerships in the Age of AI.
  • Chan, M., Saleh, M., & Valdez, P. (2023). Defining Artificial Agency: Autonomy, Goal-Directedness, and the Ability to Accomplish Complex Tasks. AI Governance Review.
  • Kasirzadeh, A., & Gabriel, I. (2025). Operationalizing AI Agency: From Philosophy to Policy. Philosophy & Technology.
  • U.S. AI Safety Institute (2025). Technical Guidance for Evaluating Agentic AI Systems. NIST.
  • Future of Life Institute (2025). AI Safety Index Winter 2025. https://futureoflife.org/ai-safety-index-winter-2025/

This article is part of the Future of AI research series, a journal-style commentary on AI developments. Published on the Stabilarity Research Hub.
Author: Oleh Ivchenko, PhD Candidate | Innovation Tech Lead | ML Scientist
Series: Future of AI | Published: February 23, 2026 | Updated: February 23, 2026

Recent Posts

  • The Small Model Revolution: When 7B Parameters Beat 70B
  • Edge AI Economics: When Edge Beats Cloud
  • Velocity, Momentum, and Collapse: How Global Macro Dynamics Drive Near-Term Political Risk
  • Economic Vulnerability and Political Fragility: Are They the Same Crisis?
  • World Models: The Next AI Paradigm — Morning Review 2026-03-02

Recent Comments

  1. Oleh on Google Antigravity: Redefining AI-Assisted Software Development

Archives

  • March 2026
  • February 2026

Categories

  • ai
  • AI Economics
  • Ancient IT History
  • Anticipatory Intelligence
  • Cost-Effective Enterprise AI
  • Future of AI
  • Geopolitical Risk Intelligence
  • hackathon
  • healthcare
  • innovation
  • Intellectual Data Analysis
  • medai
  • Medical ML Diagnosis
  • Research
  • Spec-Driven AI Development
  • Technology
  • Uncategorized
  • War Prediction

About

Stabilarity Research Hub is dedicated to advancing the frontiers of AI, from Medical ML to Anticipatory Intelligence. Our mission is to build robust and efficient AI systems for a safer future.

Language

  • Medical ML Diagnosis
  • AI Economics
  • Cost-Effective AI
  • Anticipatory Intelligence
  • Data Mining

Connect

Telegram: @Y0man

Email: contact@stabilarity.com

© 2026 Stabilarity Research Hub

© 2026 Stabilarity Hub | Powered by Superbs Personal Blog theme
Stabilarity Research Hub

Open research platform for AI, machine learning, and enterprise technology. All articles are preprints with DOI registration via Zenodo.

100+
Articles
6
Series
DOI
Archived

Research Series

  • Medical ML Diagnosis
  • Anticipatory Intelligence
  • Intellectual Data Analysis
  • AI Economics
  • Cost-Effective AI
  • Spec-Driven AI

Community

  • Join Community
  • MedAI Hack
  • Zenodo Archive
  • Contact Us

Legal

  • Terms of Service
  • About Us
  • Contact
Operated by
Stabilarity OÜ
Registry: 17150040
Estonian Business Register →
© 2026 Stabilarity OÜ. Content licensed under CC BY 4.0
Terms About Contact

We use cookies to enhance your experience and analyze site traffic. By clicking "Accept All", you consent to our use of cookies. Read our Terms of Service for more information.