IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

What Is Data Poisoning, and How Can It Hurt the Public Sector?

Cybersecurity experts say AI and automation are changing how much impact manipulated data can have on government technology systems.

The view looking up at skyscrapers in a city overlaid with different colored lines.
Editor’s note: These are big, complex topics — so we've spent more time exploring them. Welcome to GT Spotlight. Have an idea for a feature? Email Associate Editor Zack Quaintance at zquaintance@govtech.com.

Public-sector cybersecurity experts often talk about how artificial intelligence has impacted the cybersecurity landscape. Deepfakes, voice phishing and the rapid writing of malicious code are all regular topics of concern. 

Another term joining that list of late, however, is “data poisoning.” Data poisoning is the deliberate introduction of false, misleading or manipulated data into a system with the goal of getting incorrect conclusions from analytics, algorithms or decision-makers. And some experts worry that it can be used in the public sector to influence policy, budgets or services.

Just as in a traditional cyber attack, motives for data poisoning often include financial gain or causing reputational damage. Unlike unintended bias or an AI hallucination, polluting the data is instead a means to an end. For governments that rely on data to prioritize services, even small distortions can have real-world consequences. When data feeds dashboards, risk scores or ticketing systems, manipulated inputs can cause high-risk cases to appear low risk — or push limited resources toward the wrong problems.

Austin, Texas, Chief Information Security Officer Brian Gardner says that the concept came into focus when he began looking at AI through an attacker’s lens. As organizations increasingly rely on machine learning and artificial intelligence models for forecasts, budgets and workforce decisions, manipulating the data those models consume could quietly change outcomes without triggering traditional defenses. In that sense, he describes data poisoning as a natural evolution of long-standing input manipulation attacks.

“If I manipulate the data, I have just changed the landscape,” Gardner said. “And the worst thing in the world is that bad data is worse than no data … now you’re moving in an unintended direction.”

Allen Ohanian is the CISO for the Los Angeles County Department of Children and Family Services. He describes data poisoning as a growing concern across service areas that depend on integrated data flows, including health services, probation, law enforcement and social services. These systems pull information from multiple controlled sources to identify patterns and guide decisions, but that same complexity makes oversight more difficult.

As a result, Ohanian says he is working more closely with data scientists, programmers and analysts to understand how data moves through systems, how reports are generated and where errors or manipulation could go unnoticed. Without strong governance, audits and cross-checks, bad data can quietly undermine the very tools agencies are using to try to improve outcomes.

WHAT DATA POISONING LOOKS LIKE


While data poisoning is not a new concept, it can affect large language models as well as carefully curated data lakes, both of which are increasingly used by the public sector. When manipulated, those tools can dole out inaccurate and potentially harmful results.

There are six types of data poisoning, according to a paper authored by Robert Morris University (RMU) professors Frank Hartle III and Steve Mancini with alumna Emily Kerry, who is now an intelligence analyst. In a nutshell, data poisoning has malicious intent and even small disturbances to data can “degrade model accuracy.” 

“Here it is simply: If the data is not clean, the training models are not training on accurate data,” said Mancini, who previously worked in federal cybersecurity. “Keep in mind, AI is not human. It doesn’t understand context, so it takes the data as is and trains on it. You can make it seem like it’s giving you some kind of logical answer, but it’s really not.”

The six types of attacks under this umbrella are targeted, non-targeted, label poisoning, training data poisoning, model inversion attacks and stealth attacks. The authors listed at-risk sectors as health care, finance, autonomous systems and generative AI.

In simple terms, bad actors inject harmful or misleading information into a training data set. This could mean labeling images incorrectly (we know what a cat looks like, but how is it labeled in the machine?) or feeding a large language model a stack of bad documents or misinformation articles. Once deployed, they become part of the output. Palo Alto Networks’ white paper What Is Data Poisoning [Examples & Prevention] lists three consequences: accuracy drop, misclassifications and backdoor triggering.

Palo Alto Networks also notes that data poisoning can cause AI models to misinterpret inputs in ways that are difficult to detect, such as classifying benign data as malicious or overlooking genuine risks. The consequences are more serious in high-stakes use cases such as fraud detection, cybersecurity and health-related analysis where AI outputs directly influence decisions. Once this data enters a training pipeline, its effects can persist, even as models are updated or retrained, particularly in continuous-learning or foundation-model adaptation scenarios, a conclusion reinforced by the RMU literature review.

Some attacks embed hidden backdoors that cause a model to behave normally during testing but respond differently when triggered by specific inputs. In large language models, those triggers can include concealed phrases, prompt patterns or poisoned retrieval-augmented generation content that leads the system to produce unsafe or attacker-controlled responses. Austin CISO Gardner likens this scenario to a SQL injection.

DATA AS FUEL


AI expert Ian Swanson explains machine learning and artificial intelligence in plain language. He testified to Congress in last year’s AI hearings before his company, Protect AI, was acquired by Palo Alto Networks in July.

He says that data is fuel for machine learning models, which are machines for the AI application.

“Typical data poison attacks can be one drip of poison in that fuel over a long period of time before it finally makes it go off the rails,” Swanson said. “AI is effectively going to ask a question and get a response. There are inputs, there are outputs, and it’s called at the point of inference, and through data poisoning it can manipulate this inference.”

“Now these attacks can be somewhat sophisticated to carry out, and the reason why is you need to alter how the model thinks. Typically, the model is being trained on a large corpus of data, a lot of data, and so to be able to alter it, to manipulate the decisions, you need to have a steady drumbeat of constantly trying to poison the data.”

Swanson says he is often asked whether live AI applications such as chatbots can be manipulated in a runtime security threat.

“It’s pretty hard to accomplish poisoning thousands and thousands of data sets to manipulate an outcome, especially if the company controls those data sets,” he said.

But Hartle and Mancini of RMU say they could envision nation-state actors shifting influence campaigns away from traditional social media and toward AI systems themselves, systems where the scale and downstream impact could be far greater. Rather than shaping public opinion directly, such efforts would aim to influence the data and models that increasingly inform individual or institutional decisions.

“You’re talking about a nation-state actor,” Hartle said, pointing to countries such as China and Russia that have demonstrated sustained interest in information operations.

He also notes that automated techniques already exist to influence how data is collected and processed by large-scale systems, including tools used to gather information for AI models such as social media bot farms.

PREVENTION AND REMEDIATION


Researchers have been experimenting with machine model security for at least 15 years, and at least one has shown that with 10 percent data poisoning, a model’s performance drops by 7 percent. Once that bad data is in the pool, it is hard to remediate.

Even untraining the models or resetting the data is difficult. Kerry says to do that, you “have to start from scratch because you have to essentially find a needle in a haystack.” Searching billions of records to find which one is wrong is a far reach.

Swanson says AI security falls into multiple camps including data and cyber. Organizations should know which data sources are trusted, which are open and how data flows into training pipelines and live applications. Mancini of RMU believes that it’s not only technology practitioners who should be paying attention to data provenance and potential errors or bias, but everyone using AI.

Some questions to ask when AI is being used in a company or government include, is it safe, trusted and secure? Do users understand the data supply chain: everything from data creation to production? Do users understand risks from data poisoning to runtime threats? Track data sources, how it has changed and how it is reused. Versioning, Swanson says, can help identify where data was altered.

A basic tenant of cybersecurity is to have visibility into your network, and the same applies with AI. Before applications can be secured, agencies must know what exists and how it is being used. Discover, inventory, document. Assess risk, again looking at data integrity, access and provenance; the model’s code, weights and training; and evaluate the application use and interaction.

“From a data governance perspective, we need to understand the control of the data that we are training these models on,” Swanson said. “If we control that, and we know that it cannot be manipulated by third-party sources, then we could feel … that our models are safe in terms of how they’re being trained.”

As AI becomes another attack surface that organizations really need to think about and protect, Gardner says there’s one key question to always consider, that being, “How clean is your data?”
Rae D. DeShong is a Texas-based staff writer for Government Technology and a former staff writer for Industry Insider — Texas. She has worked at The Dallas Morning News and as a community college administrator.