IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

The Path to Fairer AI Starts With Audits, Standards

Speakers at an Open Technology Institute event said government needs to establish clear procedures for vetting high-risk AI systems for bias and discriminatory impacts plus attach enforcement policies to drive change.

Machine Learning, AI
Ethical principles aren’t enough to defend against the worst potential impacts of artificial intelligence systems and the time has come for the U.S. to establish official legal policies for this emerging technology, said policy and technology experts during a recent report launch event from New America’s Open Technology Institute.

That work requires clearly defining terms and enforcement measures, and speakers sought to propose mechanisms that can help government promote fairness, accountability and transparency (FAT) in algorithmic systems, as well as outline the challenges that lie ahead.

They called for the federal government to regulate how private firms like online content platforms develop and leverage AI as well as establish formal policies for overseeing and vetting the algorithmic systems public agencies adopt and purchase. Such AI audits are currently voluntary, said Spandana Singh, policy analyst at the Open Technology Institute and co-author of the report.

AI can deliver newfound efficiencies, extract meaning from troves of data and deliver a variety of other benefits, but the complexity, opacity and lack of foresight in some of these systems means they can be designed, implemented or evolve in ways that produce biased and discriminatory effects. Without strong measures to catch and correct these issues, serious harms can occur.

These risks are especially steep for AI systems used to impact decisions about social support benefits or mortgage application approvals, criminal justice sentencing and other areas where mistakes can threaten people’s well-being.

The question is a pressing one, in part because many public agencies are already taking advantage of the technology. A February 2020 report from the Administrative Conference of the United States (ACUS) found that nearly half of 142 surveyed federal agencies used AI systems. Many agencies also depend on the fairness practices of private vendors, with ACUS finding a third of federally used AI tools were procured from private firms, according to Catherine Sharkey, professor of regulatory law and policy at New York University and public member of ACUS, who spoke during the event.


Transparency is a cornerstone of fairness and accountability efforts, because individuals unable to view how an algorithm arrived at its conclusion have limited ability to determine if the decision-making was fair or if it failed to account for certain factors, said Christine Custis, the nonprofit Partnership on AI’s head of Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles (ABOUT ML) and FAT.

“[An AI system] is high risk if we can’t tell you what the heck is going on,” Custis said.

Pulling back the curtain requires documenting a variety of information about a machine learning tool’s design and deployment throughout its entire life cycle — ranging from how developers gathered the data they trained their models on and how they directed the systems to weigh different factors to how those acquiring the tools put them into use, Custis said.

She said detailed record keeping is important because “if we don’t get it right, we want to at least be able to go back and trace the error — we want to figure out the source of the harm.”


Debate still rages over which kinds of AI systems should be prioritized for scrutiny. Many agree that “high-risk” systems need the most focus, but few agree on what that means. AI systems that are used in ways that can impact fundamental rights — such as who is incarcerated — could be deemed high risk, said Singh. Meanwhile, Custis suggested any system lacking transparency could deserve the label.

Machine learning tools are also designed to continually improve, which means that a system that started out as low-risk could later become high-risk, warned Sharkey. A simple tool leveraged for mundane tasks may become more complicated over time and used to influence more impactful decisions.

Legislators will need to settle these debates and ground polices in precise, standardized terms to make them enforceable, Singh said. Despite the challenges, policymakers should not wait for a perfect solution to all issues before taking action, but instead chip away at the problem by establishing and testing some initial policies, she said.

“The longer we put this off, the longer companies that are using systems we may find concerning can set the terms on their own. The longer they can set the terms and grade their own homework,” Singh said.


The Open Technology Institute report released in association with the event pointed to several potential methods for vetting aspects of AI systems before and after deployment and underscored the need for regular checks to confirm the systems continue operating safely.

Developers making AI systems can complete bias impact statements to help them reflect and catch potential prejudices that could get ingrained into their offerings, for example, while entities considering acquiring a tool might conduct algorithmic impact assessments to anticipate potential harmful outcomes and confirm whether the system is genuinely well-suited to the intended purpose. Algorithmic audits conducted by external regulators or by the tool’s creator might examine the code behind algorithms already in use or have volunteers test the functions.

Algorithmic impact assessments often do and should seek public comment, allowing potentially impacted communities to weigh in and call attention to any harms they believe may be being overlooked, the report said. Still, agencies must have caution to ensure that their public comment practices do not turn into effectively outsourcing oversight work to people at risk of being harmed, without pay or resources provided for this labor, the report stated.

But the space still needs consensus over exactly what a robust audit looks like, as well as how entities would be compelled to fix any issues the evaluations reveal, the report said. Some efforts are underway, with the National Defense Authorization Act for Fiscal Year 2021 instructing the National Institute of Standards and Technology (NIST) to start creating standards and frameworks for assessing the trustworthiness of AI systems.

The report also called for establishing formal training and certification processes to produce professional AI auditors.


Many unresolved questions loom, such as which values to prioritize when evaluating a system — privacy and transparency often clash, for example — as well as over what entity is best suited for handling the evaluations and where, or if the results will be published, Singh said. Regulators that avoid releasing audit findings publicly may be best able to overcome private firms’ assertions that their algorithms should be treated as trade secrets, but this comes at the cost of public transparency, the report notes.

Singh and Sharkey said the Food and Drug Administration (FDA), financial-sector regulators and other existing bodies provide potential models of how government could approach AI oversight. A regulatory entity might encourage private firms to adopt FAT practices by offering subsidies and discourage poor practices through fees, taxes and assigning liability. Singh noted that regulators would also need to determine whether and when to hold developers, acquiring parties or other parties liable for AI harms.

Sharkey said restricting government procurement to only those AI systems that meet FAT goals could also spur good vendor behavior, while efforts to develop in-house AI talent would also help agencies create more of their own systems

Establishing a new regulating agency may take considerable time, and other efforts may be needed sooner. The report suggests near-term action could include an executive order outlining what counts as a high-risk AI system and requiring agencies to review such systems before adopting them and then regularly thereafter.
Jule Pattison-Gordon is a senior staff writer for Government Technology. She previously wrote for PYMNTS and The Bay State Banner, and holds a B.A. in creative writing from Carnegie Mellon. She’s based outside Boston.