Experts Call for Independent Audits as AI Safety Standards Remain Undefined

Key Points
- Michael Kreps warns AI safety testing could become politicized without clear standards.
- Microsoft, CAISI and NIST plan to develop adversarial assessment methods on the fly.
- Cornell professor Gregory Falco advocates for an independent, IRS‑style AI audit system.
- Falco argues the federal government lacks the technical capacity for direct AI evaluation.
- An external audit framework could impose penalties, driving firms to strengthen internal testing.
- Critics fear political manipulation of AI outputs could erode public trust.
- Independent audits may help smaller AI developers meet safety benchmarks.
Industry leaders and scholars warn that without clear standards, AI safety testing could become a political tool. Microsoft, the National Institute of Standards and Technology (NIST) and the Center for AI Safety Initiative (CAISI) plan to develop testing methods on the fly, but critics argue that only an independent audit system can prevent government overreach and ensure accountability. Cornell professor Gregory Falco proposes a rigorously enforced audit regime akin to the IRS, urging firms to adopt internal safety checks before deployment.
Washington faces a growing dilemma: how to safeguard artificial‑intelligence systems without turning oversight into a partisan exercise. "Without defining standards, the process can be politicized," warned Michael Kreps, a senior policy analyst, underscoring the risk that "whoever holds power gets to shape how the vetting works." Both the Biden and Trump administrations have yet to devise a clear path to avoid that outcome.
In response, Microsoft announced a partnership with the Center for AI Safety Initiative (CAISI) and the National Institute of Standards and Technology (NIST) to improve methodologies for adversarial assessments. The tech giant likened the effort to stress‑testing airbags, seatbelts and brakes, describing it as a way to probe "unexpected behaviors, misuse pathways, and failure modes" in AI models. The collaboration suggests a rapid, on‑the‑ground approach to standards development, but it leaves many questions unanswered.
Gregory Falco, an assistant professor of mechanical and aerospace engineering at Cornell University, argues that a more robust solution exists. "Government oversight of AI cannot simply mean political review of model outputs, nor should it become a mechanism for deciding whether a model says favorable or unfavorable things about a president or administration," he said. Falco proposes an independent audit framework that would operate outside direct political control.
He envisions a system where AI firms know their models could be examined at any time, creating "real consequences for reckless deployments." By borrowing concepts from the Internal Revenue Service, Falco suggests that auditors could impose penalties that compel companies to strengthen internal safety testing before releasing products to the public. "That seems like the only viable path," he added, noting the federal government lacks the technical expertise and day‑to‑day insight needed to evaluate complex AI systems directly.
The call for an audit regime arrives amid mounting concerns that current testing practices are insufficient. Critics fear that without transparent, non‑partisan standards, AI outputs could be manipulated to serve political narratives, eroding public trust. The proposed Microsoft‑CAISI‑NIST effort, while a step forward, may not address the underlying governance challenges.
Industry observers note that an independent audit system could also level the playing field for smaller AI developers who lack resources to conduct exhaustive internal testing. By establishing clear compliance benchmarks, such a framework would encourage broader adoption of safety best practices across the sector.
As the debate unfolds, policymakers, technologists and academia remain at odds over the best mechanism to ensure AI safety without compromising independence. The next months will likely see intensified lobbying for legislation that either empowers an external audit body or formalizes the collaborative testing approach spearheaded by Microsoft and its partners.