IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Young Startup Wants to Train AI Better, Faster Using Synthetic Data

Automated DL is going through an accelerator program right now, focusing on cybersecurity. But that's not the only thing the company is thinking about.

Artificial intelligence (AI) — of the “learning algorithms” variety, not the Skynet kind — is everywhere in the tech world right now. It’s because of the concept’s many possibilities: object recognition in pictures and videos, anticipating cybersecurity threats, finding specific kinds of people amid thousands or millions in a data set.

There’s also a fundamental need for all AI algorithms: training. They all need to run data to learn what it is they’re looking for. That’s how they “learn.”

What if there isn’t enough data to make the algorithms as good as they need to be? Or what if it takes too long to collect and prepare that data?

An early stage startup that just entered a northern Virginia cybersecurity accelerator thinks it has the solution: fake data.

Or, as Automated DL’s CEO Jeff Schilling puts it, synthetic data. His company has built up the capability to produce data — a lot of data — based on historical examples. It’s not real, but it mimics real data closely enough that AI algorithms can use it. And it’s realistic enough that it could be real.

It’s AI acceleration, basically.

“We want the AI people to get there better, faster,” Schilling said.

He described the process of creating fake data in terms of the process of humans learning to speak. They begin with sounds, then form approximate words, then learn to put those words in sequence via grammar. And just as humans use grammatic rules to put words together into novel sentences, Automated DL uses “grammar” to create novel data.

There are a few potential upsides to the concept. First, if learning algorithms hit the real world with more training under their belts, theoretically they will perform better. Second, the large amount of synthetic data Automated DL generates can be designed to reflect a wide array of possibilities — including rare situations that might not be represented in the real data.

Schilling pointed to the first known fatal crash of a Tesla in Autopilot mode as an example of where that might come in handy. In that case, the car was driving along a Florida highway when a semi-truck with a white trailer pulled onto the road in front of it. The sky behind the trailer was also white, and the Tesla’s software didn’t distinguish between the two.

Not that Automated DL’s technology could have necessarily prevented that specific incident. But Schilling said the event illustrates the broader need to train AI for scenarios humans might not think of on their own.

Third, Schilling said, many of the people creating AI are simply more interested in what their products are going to be able to do than they are with the task of hunting down data or creating their own.

“Making data for them is boring and a pain in the ass,” he said.

The accelerator the company is going through right now, Mach37, is oriented toward cybersecurity, and that’s a main focus for Automated DL at the moment. As part of that, they’re working with potential partners in the federal intelligence community.

However, Schilling stressed, there’s a wide ocean of possibilities out there for what the company is building. It’s hardware-agnostic and can be folded into other products. AI is growing quickly, and entrepreneurs have found a lot of different industries in which to apply it. So there’s no reason Automated DL wouldn’t work with state and local government — or the companies serving those governments. With a 20-year career at IBM on his résumé, Schilling said he’s well versed in the workings of sub-federal government.

And there are a lot of companies finding new ways to use AI to serve sub-federal governments. AppCityLife is building an AI chatbot for the NYC BigApps competition that could help immigrants access city resources. Pluto AI is working on software to help water utilities predict asset failure. SADA Systems has a platform that can analyze pictures of bridges and identify cracks. And those are just the startups.

Not to mention that cybersecurity, the company’s focus at the moment, is just a tad important to government.

Ben Miller is the associate editor of data and business for Government Technology. His reporting experience includes breaking news, business, community features and technical subjects. He holds a Bachelor’s degree in journalism from the Reynolds School of Journalism at the University of Nevada, Reno, and lives in Sacramento, Calif.