AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap

Future of AIJournal Commentary · Article 4 of 43

AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap

Academic Citation: Ivchenko, O. (2026). AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap. Future of AI Series. Odesa National Polytechnic University. DOI: 10.5281/zenodo.18741627^[1]

DOI: 10.5281/zenodo.18741627^[1]Zenodo Archive ORCID

3,130 words · 31% fresh refs · 3 diagrams · 12 references

34stabilfr·wdophcgmx

Badge	Metric	Value	Status	Description
[s]	Reviewed Sources	0%	○	≥80% from editorially reviewed sources
[t]	Trusted	33%	○	≥80% from verified, high-quality sources
[a]	DOI	8%	○	≥80% have a Digital Object Identifier
[b]	CrossRef	0%	○	≥80% indexed in CrossRef
[i]	Indexed	17%	○	≥80% have metadata indexed
[l]	Academic	17%	○	≥80% from journals/conferences/preprints
[f]	Free Access	42%	○	≥80% are freely accessible
[r]	References	12 refs	✓	Minimum 10 references required
[w]	Words [REQ]	3,130	✓	Minimum 2,000 words for a full research article. Current: 3,130
[d]	DOI [REQ]	✓	✓	Zenodo DOI registered for persistent citation. DOI: 10.5281/zenodo.18741627
[o]	ORCID [REQ]	✓	✓	Author ORCID verified for academic identity
[p]	Peer Reviewed [REQ]	—	✗	Peer reviewed by an assigned reviewer
[h]	Freshness [REQ]	31%	✗	≥60% of references from 2025–2026. Current: 31%
[c]	Data Charts	0	○	Original data charts from reproducible analysis (min 2). Current: 0
[g]	Code	—	○	Source code available on GitHub
[m]	Diagrams	3	✓	Mermaid architecture/flow diagrams. Current: 3
[x]	Cited by	0	○	Referenced by 0 other hub article(s)

Score = Ref Trust (23 × 60%) + Required (3/5 × 30%) + Optional (1/4 × 10%)

Abstract #

MIT CSAIL’s 2025 AI Agent Index analyzed 30 prominent AI agents and found a striking transparency deficit: while 70% provide documentation and nearly half publish code, only 19% disclose formal safety policies and fewer than 10% report external safety evaluations. This journal entry examines the study’s findings, contextualizes the claims within the broader AI safety discourse, and assesses whether the alarm raised is proportionate to the evidence.

The Source #

Primary Source: MIT CSAIL 2025 AI Agent Index^[2]

Coverage: The Register^[3]Gizmodo^[4]CNET^[5]

Published: February 2025 (2025 AI Agent Index) | Widely reported February 19-20, 2026

Authors: Leon Staufer (Cambridge), Kevin Feng (UW), Kevin Wei (Harvard Law), Luke Bailey (Stanford), Yawen Duan (Concordia AI), Mick Yang (UPenn), A. Pinar Ozisik (MIT), Stephen Casper (MIT), Noam Kolt (Hebrew University)

The Claim #

AI agents are proliferating across the web and enterprise environments with minimal safety disclosures, creating a transparency crisis. According to MIT CSAIL’s 2025 AI Agent Index^[2], which analyzed 30 deployed agentic systems across 1,350 data points and 45 annotation fields, the picture is stark:

87% lack safety cards — standardized safety documentation
Only 19% disclose formal safety policies — down from expectations
Fewer than 10% report external safety evaluations — third-party testing is rare
21 of 30 agents don’t disclose they are AI — masquerading as human traffic
Just 4 of 13 “frontier autonomy” agents have agent-specific evaluations — ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5 Computer Use

As Gizmodo reports^[4], “AI agents are flooding the web and workplace, functioning with a shocking amount of autonomy and minimal oversight.” The Register notes^[3] that despite growing investment in AI agents, “key aspects of their real-world development and deployment remain opaque, with little information made publicly available to researchers or policymakers.”

The framing from media coverage emphasizes alarm: agents are “running wild,” operating with “few guardrails,” and creating serious security vulnerabilities. The Outpost^[6] warns of a “transparency crisis” where autonomous systems can “access files, send emails, make purchases, or modify documents” while mistakes “can propagate across multiple steps.”

Study Methodology #

The MIT CSAIL team employed a rigorous three-stage filtering process to select agents for the Index. Rather than arbitrarily defining “agency,” they synthesized existing academic definitions into four measurable dimensions: autonomy, goal complexity, environmental interaction, and generality. This approach follows frameworks developed by Chan et al. (2023), Kasirzadeh and Gabriel (2025), and Feng et al. (2025).

The inclusion criteria framework operates as shown below:

flowchart LR
    A[Candidate Agent] --> B{Agency Criteria}
    B -->All 4 required| C{Impact Criteria}
    B -->Missing any| X1[Excluded]
    C -->At least 1| D{Practicality Criteria}
    C -->None met| X2[Excluded]
    D -->All 3 required| E[Included in Index]
    D -->Missing any| X3[Excluded]
    
    B -.-> B1[Autonomy]
    B -.-> B2[Goal Complexity]
    B -.-> B3[Env. Interaction]
    B -.-> B4[Generality]
    
    C -.-> C1[Public Interest]
    C -.-> C2[Market Significance]
    C -.-> C3[Developer Significance]
    
    D -.-> D1[Public Availability]
    D -.-> D2[Deployability]
    D -.-> D3[General Purpose]
    
    style E fill:#90EE90
    style X1 fill:#FFB6C6
    style X2 fill:#FFB6C6
    style X3 fill:#FFB6C6

Seven subject-matter experts from MIT CSAIL, Harvard Law, Stanford, UPenn, and other institutions annotated agents across six information categories: product overview, company accountability, technical capabilities, autonomy & control, ecosystem interaction, and safety & evaluation. Each expert was responsible for specific annotation fields rather than specific agents, ensuring consistency across the dataset.

The researchers identified 95 candidate agents through LLM-assisted queries, then systematically screened each against the inclusion criteria. Two Chinese ecosystem experts were consulted to mitigate linguistic and ecosystem-related blind spots, given the global nature of AI agent development. The final selection of 30 agents represents the intersection of high agency, significant real-world impact, and public deployability.

Critically, the study relied exclusively on public information—documentation, websites, demos, published papers, and governance documents. No experimental testing was performed. This methodological constraint means the Index documents what developers choose to disclose, not necessarily what safety measures exist internally. This distinction is central to understanding the study’s findings.

The Evidence #

The MIT CSAIL study is methodologically solid. The researchers analyzed 30 AI agents across three categories:

Chat-based tools (12 systems): ChatGPT Agent, Claude Code, Manus AI
Browser-based agents (5 systems): Perplexity Comet, ChatGPT Atlas, ByteDance Agent TARS — highest autonomy
Enterprise workflow agents (13 systems): Microsoft 365 Copilot, ServiceNow Agent, IBM watsonx

The transparency gap varies significantly across these categories, as visualized below:

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000','primaryTextColor':'#fff','primaryBorderColor':'#1976D2','lineColor':'#F39C12','secondaryColor':'#90EE90','tertiaryColor':'#FFB6C6'}}}%%
graph TD
    subgraph "Transparency Metrics (30 Agents)"
        A[Documentation: 70%] --> B{Safety Disclosure}
        B -->|19%| C[Formal Safety Policies]
        B --><10%| D[External Safety Evals]
        B -->|87%| E[No Safety Cards]
        
        F[Code Published: ~50%] --> B
        
        G[Identity Disclosure] -->|30%| H[AI Identification]
        G -->|70%| I[No AI Disclosure]
        
        J[Frontier Agents: 13] -->|4 of 13| K[Agent-Specific Evals]
        J -->|9 of 13| L[No Agent Evals]
    end
    
    style C fill:#90EE90
    style D fill:#90EE90
    style E fill:#FFB6C6
    style H fill:#90EE90
    style I fill:#FFB6C6
    style K fill:#90EE90
    style L fill:#FFB6C6

The numbers are verified and concerning. According to The Register^[3], “25 of the 30 agents covered provide no details about safety testing and 23 offer no third-party testing data.” Furthermore, Gizmodo reports^[4] that “nine of 30 agents have no documentation of guardrails against potentially harmful actions,” leaving them vulnerable to prompt injection attacks.

The disclosure gap extends to basic operational transparency. Just seven agents published stable User-Agent strings and IP address ranges for verification, while nearly as many explicitly use Chrome-like strings and residential IP contexts to make their traffic appear human. This creates web conduct tensions: The Register notes^[3] that “the tendency of AI agents to ignore the Robot Exclusion Protocol suggests that established web protocols may no longer be sufficient.”

Industry adoption is surging. Research papers mentioning “AI Agent” or “Agentic AI” in 2025 more than doubled the total from 2020 to 2024 combined, and a McKinsey survey^[7] found that 62% of companies reported experimenting with AI agents. McKinsey projects AI agents could add $2.9 trillion to the US economy by 2030, though enterprises aren’t yet seeing substantial ROI.

The governance challenges are real. Most agents rely on a handful of foundation models from Anthropic, Google, and OpenAI, creating layered dependencies that are difficult to evaluate because “no single entity is responsible,” according to MIT researchers^[3]. Among agents originating outside China, 15 point to safety frameworks like Anthropic’s Responsible Scaling Policy or OpenAI’s Preparedness Framework, but ten lack safety framework documentation entirely.

The Dependency Problem #

A critical finding from the MIT study concerns the layered architecture of modern AI agents, which creates what researchers call “diffuse responsibility.” Most deployed agents don’t train their own models—they orchestrate calls to foundation models via API while adding custom tools, memory systems, and control logic.

This architectural pattern produces a responsibility gap: when an agent causes harm, is the orchestration layer responsible? The foundation model provider? The API service? The three-layer structure looks like this:

graph TB
    subgraph "Typical AI Agent Architecture"
        A[User Request] --> B[Orchestration Layer]
        B -->Planning & Tool Selection| C[Agent Logic]
        C -->API Call| D[Foundation Model]
        D -->Response| C
        C -->Tool Execution| E[External Services]
        E -->Results| C
        C -->Monitoring| F[Safety Layer?]
        F -.->Often Missing| G[Disclosure Gap]
        C --> H[User Output]
    end
    
    subgraph "Responsibility Layers"
        I[Developer: Orchestration] 
        J[Provider: Foundation Model]
        K[Platform: API Service]
    end
    
    B -.-> I
    D -.-> J
    C -.-> K
    
    style G fill:#FFB6C6
    style F fill:#FFF9E6

The MIT researchers found that this dependency stack complicates accountability in several ways:

Cascading opacity: If the foundation model lacks transparency, agent developers can’t fully document their system’s behavior
Split evaluations: Foundation model evaluations test the base model, not the agentic orchestration layer
Contractual limitations: API terms of service often prohibit developers from conducting independent safety testing
Version drift: Foundation models update continuously, but agents may not re-evaluate with each model version

Among the 30 indexed agents, 15 explicitly reference upstream safety frameworks (Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, Microsoft’s Responsible AI Standard), but these frameworks apply to foundation models, not agent-specific risks like multi-step tool chaining, persistent memory manipulation, or autonomous web transactions.

The study documents five known incidents involving indexed agents, with prompt injection vulnerabilities confirmed for two of the five browser agents. However, the researchers note that incident documentation itself is sparse—most agent developers don’t maintain public incident logs or participate in shared incident databases like the AI Incident Database.

Geographic and Ecosystem Patterns #

The Index reveals striking geographic disparities in transparency practices. Chinese-incorporated agents (5 of 30) typically lack documented safety frameworks (1 of 5 disclose) and compliance standards (1 of 5 disclose), though the researchers caution that compliance may exist but not be publicly documented in English-language materials.

Western agents, particularly those from developers in the Foundation Model Transparency Index or Frontier Model Forum, show higher disclosure rates but still fall well short of comprehensive transparency. The best-documented agents—ChatGPT Agent, Claude Code, Gemini 2.5 Computer Use, and OpenAI Codex—come from foundation model developers who have institutional commitments to transparency, suggesting that vertical integration (controlling both model and agent) correlates with better disclosure.

Enterprise workflow agents demonstrate different patterns. While they rarely publish safety cards or external evaluations, they more frequently document compliance with industry standards (SOC 2, ISO 27001, GDPR) because their enterprise customers demand such certifications. This creates a two-tier transparency regime: consumer-facing agents optimize for capability marketing, while enterprise agents optimize for compliance documentation.

Browser-based agents, the highest-risk category due to their autonomous web interaction capabilities, show the poorest transparency. Of the five browser agents indexed, only Perplexity Comet and ChatGPT Atlas provide meaningful safety documentation, and both limit functionality compared to unrestricted browser automation. ByteDance Agent TARS, despite being one of the most capable browser agents, provides minimal public safety information.

Policy Implications and Emerging Standards #

The transparency deficit documented by MIT CSAIL arrives at a critical regulatory moment. Multiple jurisdictions are developing AI governance frameworks, but most focus on foundation models rather than agentic systems. The EU AI Act, for instance, classifies systems by risk level but doesn’t explicitly address agentic capabilities as a risk factor.

The U.S. AI Safety Institute released technical guidance in January 2025 that includes agent-specific evaluation criteria, representing the first major governmental framework to treat agents as a distinct governance challenge. The guidance recommends that agent developers document autonomy levels, tool access scope, memory persistence, multi-step planning capabilities, and failure modes—precisely the information gaps the MIT study identifies.

Industry self-regulation efforts show mixed progress. In December 2025, OpenAI, Anthropic, Google DeepMind, and Microsoft announced the AI Agent Safety Foundation, committing to develop standardized agent evaluation protocols and safety disclosure frameworks by mid-2026. The Foundation’s preliminary proposals include:

Agent Safety Cards: Standardized documentation covering tool access, autonomy levels, evaluation results, and incident history
Capability Evaluation Suite: Shared benchmarks for testing agent capabilities in financial transactions, code execution, data access, and social engineering
Red-teaming protocols: Standardized adversarial testing for prompt injection, jailbreaking, and goal misalignment
Incident reporting: Voluntary participation in a shared incident database

However, the Foundation has no enforcement mechanism and participation is voluntary. Critics note that all founding members are foundation model developers, not independent agent orchestration platforms, raising questions about whether the standards will address developer incentives toward opacity.

The MIT researchers introduce the term “safety washing” to describe a specific pattern they observed: “publishing high-level safety and ethics frameworks while only selectively disclosing the empirical evidence required to rigorously assess risk.” This differs from “ethics washing” (performative ethics statements with no implementation) because safety washing involves real infrastructure whose details remain opaque. Half of indexed agents engage in some form of safety washing—referencing frameworks, committing to principles, or describing processes without providing testable evidence.

Our Take #

This is a legitimate concern wrapped in sensationalist framing. The MIT CSAIL study provides rigorous, quantified evidence of a transparency deficit—that part is indisputable. But the “running wild” narrative obscures important nuance.

What’s solid:

The 87% figure on missing safety cards is accurate and troubling
The lack of standardized evaluation frameworks creates real governance challenges
The masquerading-as-human behavior violates web norms and complicates accountability
The dependency stack (orchestration → foundation model → API) creates diffuse responsibility
The geographic disparity in transparency is significant and under-discussed
The documentation of “safety washing” as a distinct pattern is valuable

What’s overstated:

“Running wild” suggests uncontrolled proliferation, but 24 of 30 agents were released or updated in 2024-2025—a rapid but not chaotic pace
“Minimal oversight” elides the fact that half do have safety frameworks, even if disclosure is incomplete
“Shocking autonomy” overlooks that browser agents represent just 5 of 30 systems, and even those require human prompting to initiate tasks
The framing implies imminent danger, but the study itself focuses on disclosure gaps, not demonstrated harms
The study documents five known incidents, but media coverage rarely mentions this modest number

The MIT research is careful and empirical. The media coverage, less so. Headlines like Gizmodo’s “AI Agents Are Running Wild Online, With Few Guardrails in Place”^[4] conflate lack of public disclosure with lack of internal controls—these are related but distinct problems.

The methodological constraint matters: the Index documents only public information. An agent with robust internal safety infrastructure but poor external communication would score the same as an agent with no safety measures at all. This is a feature, not a bug—the study deliberately measures what researchers and policymakers can actually verify—but it means the findings describe an opacity problem, not necessarily a safety problem.

That said, opacity is a safety problem in a different sense. If researchers can’t evaluate agents, regulators can’t assess compliance, and users can’t make informed decisions, then even perfectly safe systems become governance failures. The distinction between “unsafe” and “unverifiably safe” matters philosophically but not practically.

The comparison to OpenClaw is instructive. The Register previously covered^[8] the open-source agent’s security concerns and the accompanying Moltbook network’s chaotic growth. OpenClaw represents the true “wild” scenario—open-source, minimal coordination, rapid viral adoption, user-operated infrastructure. The 30 agents in MIT’s index, by contrast, are largely corporate-backed systems with legal departments, compliance teams, and institutional accountability (even if insufficiently disclosed).

The December 2025 AI Agent Safety Foundation announcement suggests the industry recognizes the problem and is attempting standardization, though the timeline (mid-2026 for initial protocols) and voluntary nature limit its immediate impact. History suggests that voluntary disclosure frameworks rarely achieve comprehensive participation without regulatory pressure or market demand.

What Should Happen Next? #

The MIT study implicitly argues for several interventions:

Mandatory agent safety cards for systems meeting defined autonomy thresholds, similar to model cards for foundation models
Standardized agent evaluation protocols that test orchestration-layer risks (multi-step failures, tool chaining exploits, memory persistence attacks) rather than only base model capabilities
Web conduct standards requiring agents to identify themselves via User-Agent strings and respect existing protocols like robots.txt
Incident reporting requirements that create shared visibility into agent failures and near-misses
Third-party auditing infrastructure for agent-specific safety claims, analogous to financial auditing

Some of these interventions could be implemented through industry self-regulation (web conduct standards, shared evaluation protocols), while others likely require regulatory mandates (mandatory disclosure, incident reporting). The AI Agent Safety Foundation represents a step toward self-regulation, but its voluntary nature and narrow membership base limit its reach.

The challenge is timing. Agent capabilities are advancing faster than governance frameworks can develop. The 2024 Index documented 25 agents; the 2025 Index documents 30, with substantially higher autonomy levels. Research paper mentions of “AI Agent” doubled year-over-year. If this pace continues, 2026 could see 60+ deployed agentic systems, making retroactive governance increasingly difficult.

The fundamental tension is this: developers have strong incentives to disclose capabilities (for marketing) and weak incentives to disclose safety limitations (which might deter adoption or invite regulation). Without changing these incentives through policy, market pressure, or institutional norms, the transparency gap is likely to persist or widen.

The Verdict #

— OVERSTATED

The MIT CSAIL study is rigorous and its findings are significant: there is a real transparency deficit in AI agent deployment. The 87% figure on missing safety cards is accurate, and the governance challenges are legitimate concerns that merit attention from researchers, developers, and policymakers.

However, the “running wild” framing in media coverage overstates the immediacy of the threat. Most agents are corporate-backed systems with institutional accountability structures, even if disclosure is insufficient. The study documents opacity, not chaos. “Running wild” implies uncontrolled proliferation and demonstrated harms; what the evidence shows is rapid deployment with inadequate public documentation of safety practices.

The distinction matters. Opacity is a governance problem that can be addressed through disclosure requirements, standardization efforts, and regulatory frameworks. Chaos would require emergency intervention. The former is what we have; the latter is what some headlines suggest.

The study’s reliance on public information is methodologically sound but means it measures verifiable transparency rather than actual safety. These are related but distinct: an agent with excellent internal safety but poor external communication scores the same as an agent with no safety measures. Both are governance failures, but of different kinds.

The documentation of “safety washing” and the analysis of layered dependency architectures represent valuable contributions to AI governance discourse. The geographic patterns in transparency and the differential disclosure practices across agent categories (consumer, enterprise, browser) provide actionable insights for policy development.

Bottom line: Real problem, sensationalist packaging. The transparency gap is genuine and consequential. The “running wild” narrative is not. Read the MIT CSAIL index^[2] and the arXiv paper^[9] themselves, not just the headlines.

Preprint References (original)+

This article is part of the Future of AI research series, a journal-style commentary on AI developments. Published on the Stabilarity Research Hub.
Author: Oleh Ivchenko, PhD Candidate | Innovation Tech Lead | ML Scientist
Series: Future of AI | Published: February 23, 2026 | Updated: February 23, 2026

References (10) #

Stabilarity Research Hub. (2026). AI Agents Operate With Minimal Safety Disclosures: MIT Study Reveals Transparency Gap. doi.org. d t i i
The 2025 AI Agent Index. aiagentindex.mit.edu. t y
(2026). AI agents abound, unbound by rules or safety disclosures • The Register. theregister.com. n
New Research Shows AI Agents Are Running Wild Online, With Few Guardrails in Place. gizmodo.com. v
AI Agents Are Getting Better. Their Safety Disclosures Aren't – CNET. cnet.com. v
The Outpost. theoutpost.ai. l
McKinsey survey. mckinsey.com. t v
(2026). DIY AI bot farm OpenClaw is a security 'dumpster fire' • The Register. theregister.com. n
[2602.17753] The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems. arxiv.org. t i i
(2025). AI Safety Index Winter 2025 – Future of Life Institute. futureoflife.org. a

Version History · 4 revisions

Rev	Date	Status	Action	By	Size
v1	Feb 23, 2026	DRAFT	Initial draft First version created	(w) Author	9,291 (+9291)
v2	Feb 23, 2026	PUBLISHED	Published Article published to research hub	(w) Author	9,305 (+14)
v3	Feb 23, 2026	REVISED	Major revision Significant content expansion (+14,468 chars)	(w) Author	23,773 (+14468)
v4	Feb 23, 2026	CURRENT	Content update Section additions or elaboration	(w) Author	24,237 (+464)