Artificial Intelligence

A Good AI Program Must Start With Good Data

Making good on the promise of generative AI requires a foundation of clean data and clear policies. Chief data officers and AI experts weigh in on practical ways to build a strong program.

March 2024 •

Wave of blue water on a dark blue background

The 2023 State CIO Survey from the National Association of State Chief Information Officers (NASCIO) asked 49 state IT chiefs how they would rate the maturity of their data governance over enterprise information management. The majority, 69 percent, reported they were only in the “beginning stages” of their governance structure. While 27 percent considered their governance to be “mature,” NASCIO Executive Director Doug Robinson isn’t convinced.

“They were probably hedging their bets because I don’t think a quarter of the states have mature data governance and data management,” said Robinson. “I’ve even had CIOs say to me — the ones who were honest — ‘Well, I just thought we’re probably better off than some states.’”

Robinson said poor data quality in the public sector has always been a challenge due to the fragmented structure of government work, but as agencies look to incorporate generative AI into their systems, the potential harm of messy data is more relevant than ever. Bad data in the hands of certain AI initiatives could lead to unfair allocations of resources and wasted funding on ineffective programs.

In addition to immature data governance, NASCIO’s survey also revealed that 84 percent of states surveyed did not have a formal data literacy or proficiency program, prompting concerns about whether data from multiple agencies can effectively be integrated into one tool without dire data quality consequences.

Robinson said states will need to overcome those two things before they can build trust in the data and make generative AI tools effective.

To unlock the full potential of generative AI for the public sector, Government Technology asked AI and data experts to map out a practical way to prepare state and local government data for optimal use.

BUILD A GOVERNMENT THAT SPEAKS DATA

Milda Aksamitauskas, fellow of the State Chief Data Officers Network at the Beeck Center for Social Impact + Innovation at Georgetown University, said the more closely agencies look at their data, the dirtier they’ll realize it is. But through peering into the data silos, entry errors and unbalanced wording in question prompts, agencies can best get clarity about what needs to be fixed.

“There’s no perfect data set,” said Aksamitauskas. “But the more you invest, and the more you look and ask questions, you can really try to improve the data quality.”

Like Robinson, Aksamitauskas is skeptical that any government agency truly has mature data governance, but noted that the ones in the best shape are those who have identified a data captain.

“That way when you have a question, you go to this committee or person who can explain how to approach things,” said Aksamitauskas. “Governance means you can ask questions about data quality and you can get to the person who knows the most about certain data.”

According to the Beeck Center, only three-quarters of states had established a statewide chief data officer (CDO) position by 2023. Many CDOs take on the duty of leading data 101 training exercises, which is critical streamlining at risk of being skipped without a CDO.

“You really need a person or group to spearhead what data governance and data management mean for your organization,” agreed Ricki Koinig, the CIO of Wisconsin’s Department of Natural Resources (DNR). “It can’t just be a free-for-all, or you’re not going to have standardization and it’s not going to be valuable data elements that you can really use.”

Koinig spent years as a global IT leader in the private sector before working for the state, and has learned even in the private sector that it’s simple to start a data governance conversation, but coming to an agreement on the details can be a challenge.

“Data governance is really happy in the first meetings, and then you realize very quickly, ‘Wow, there’s some hard decisions that we need to make that might even change the processes or even the culture of the organization moving forward,’” said Koinig. “A decision and escalation process that everyone understands and agrees to early on is going to help you get through those faster and friendlier.”

Koinig added that it’s also crucial to include diverse voices in the discussion as early as possible to ensure no bias is introduced into the data that feeds AI, and suggested that diversity teams should be at the table to look through details like data training materials.

“You have to get perspectives from a diverse team of folks that maybe you wouldn’t have normally thought of on AI governance or data governance,” she said. “That means women, people of color — it means whatever is important to the diversity of your organization or what you’re trying to accomplish.”

GIVE DATA OFFICERS A VOICE IN PROCUREMENT

Luis Videgaray, a senior lecturer at MIT Sloan School of Management and the Director of MIT’s AI Policy for the World Project, believes one of the most pressing matters agencies face with emerging AI initiatives is vetting the vendors they’ll work with and determining if any of the agencies’ data will be fed into AI.

“You’re going to be deploying your data into somebody else’s tool,” said Videgaray. “So you need to have very strong internal teams and one of their strengths needs to be in procurement technology. They need to have a deep understanding of the technology, but also be very, very good at discriminating products.”

He said agencies should be looking for red flags in their vendor terms and conditions about how the data will be handled. He added that a lack of transparency could stunt a pilot’s success, and break resident trust.

“APIs of foundation models, some are closed and you will only access through an API,” he said, pointing out that agencies cannot directly interact with the model’s code or underlying technology and therefore don’t have an idea how the foundation model works or if the data is being fed to a biased algorithm. “So those terms are essentially shifting all the risk and all liability to the user, the government in this case. That might not be acceptable.”

Videgaray encourages agencies to engage with suppliers who are willing to understand and respect an agency’s needs.

“If suppliers of AI capabilities are unwilling to adapt their terms, then we should probably move away from those vendors,” said Videgaray. “Find those who are willing to protect the data in a way that is appropriate and also provides the state or local government agency with the required degree of transparency about the workings of the model, the data that was used for training and how that data will interact with the data supplied by the customer.”

Josh Martin, the CDO for the state of Indiana and director of the Indiana Management Performance Hub, said the data officers and stewards need to be in the room for those conversations.

“We can come in and ask the hard questions that the agency might not be capable of asking, and evaluate the approaches that are being taken by external vendors who are selling AI as something magic coming in and fixing everything — but the solution delivered just doesn’t really seem to get there,” said Martin. “We can ensure we are getting better quality solutions, that we’re doing things responsibly and that we’re thinking about bias.”

He said those “hard questions” might include: Do we have the data elements necessary? Are we capturing the data appropriately? Is the data in a format we can use? Does that data exist?

“AI has really become a big buzzword in the vendor community and in the sales pitch,” said Martin. Asking these questions will help data stewards look beyond AI’s hype and get to where it might work in practice.

For smaller agencies or those that don’t have a designated data steward, Aksamitauskas suggests leaning into the local government community, tapping into neighboring areas for some insight.

“I think talking to other government agencies, asking if anyone has a cheat sheet to share — ‘What do you ask vendors?’ — could be helpful,” said Aksamitauskas. “Five agencies can figure out more than somebody alone.”

Scottsdale, Ariz., CIO Bianca Lochner — Bianca Lochner is CIO of Scottsdale, Ariz., one of the first cities to publish a Data Service Standard.

CREATE DATA POLICIES AND STANDARDS FOR AI USE

Scottsdale, Ariz., was one of the first cities to publish a Data Service Standard, an eight-page document that details the city’s guidelines and plan for building data services.

“We’re really going beyond publishing open data and making sure that our data services are used in a meaningful way internally, and also by our residents and businesses,” said Bianca Lochner, Scottsdale’s CIO. “We’re making sure that we understand what the needs are within the community in terms of the type of data we should make available, and how those sets are being used to improve the quality of life within our community.”

In Indiana, Martin’s team crafted a data policy to complement the statewide AI policy developed by the CIO and CISO.

“There’s tons of good intentions out there, but good intentions don’t pave the way to success,” said Martin. “We’ve got to make sure that we are thinking holistically about what our capacity is and what our data landscape looks like. We need to be thinking about how we can support our fellow agencies through policies and through procedures where we can engage with those agencies before they get too far down the road.”

CONDUCT DATA AUDITS AND READINESS ASSESSMENTS

As Scottsdale works to implement a new data management platform, Lochner noted an auditing process has been key to organization and crafting access restrictions. Knowing where the data lives makes it easy to pull it when it’s needed, but also allows her agency to consider who exactly is going to need access to it.

“We have millions of data sets, just the cataloging and indexing is tremendous,” said Lochner. “As we roll out this platform, we also have to build in processes and policies. One of them is data access insurance policies so that we’re able to facilitate collaboration while ensuring compliance with data-sharing regulations.”

While agencies may not be doing regular data audits now, Koinig at Wisconsin DNR predicts they’ll become popular when failed AI pilots lead agencies back to their source data. She noted when she first started working at DNR, a close look revealed dirty data.

“When I first got here, there were octuplets — not just duplicates but octuplets — of people in our system,” Koinig said.

“There’s going to be much more emphasis on data management,” Koinig continued. “Organizations are going to see they can’t just throw themselves off the cliff and do some majestic AI that everyone’s going to be impressed with. It’s going to go back to data.”

Indiana CDO Josh Martin advises asking “hard questions” about the data feeding AI products to help get past the hype.

Martin also predicts there will be a rise in AI readiness data assessments.

“All too often we look at the exciting opportunities of AI and what it can do,” he said. “We are getting sold on the concept of doing something with AI without understanding truly what data we have to go into. It’s been said for ages: garbage in, garbage out. If you don’t know what your data quality is, if you don’t know what your completeness is, if you don’t have a good understanding of where everything lives, your shortcomings, assumptions, everything else — you’re gonna put garbage in, get garbage out.”

While many experts urged patience. there was one use case identified the most frequently as a good starter pilot project.

“AI-enabled chatbots and virtual agents where they’re basically building their database through machine learning,” NASCIO’s Robinson said. “I think those are areas where you’re going to see a lot of benefit for the state because the data they’re relying on is from the inquiries. It’s not based on individualized data, it’s based on data that’s being generated on a transactional level.”

This story originally appeared in the March 2024 issue of Government Technology magazine. Click here to view the full digital edition online.

Tags:

Nikki Davidson

Nikki Davidson is a data reporter for Government Technology. She’s covered government and technology news as a video, newspaper, magazine and digital journalist for media outlets across the country. She’s based in Monterey, Calif.

See More Stories by Nikki Davidson