From finding criminals or rioters in a crowd to metering toll traffic on busy bridges, the minds behind the technology say it won’t be long before a medium created by humans for humans is all machine.
Looking back at the timeline of video’s evolution, it isn’t hard to see how we arrived at this point. The bulky cinderblock-shaped surveillance cameras once dotting the walls of banks and high-security locations have given way to high-definition, infrared, semi-intelligent systems almost everywhere we go.
The cassettes that once stood as the undeniable testament of happenings on any given street corner have been replaced by cloud storage and the processing power of neural networks, learning a little more about human behavior every day.
What’s more is that as these systems evolve, humans will become less and less crucial to the overall process. From finding criminals or rioters in a crowd to metering toll traffic on busy bridges, the minds behind the technology say it won’t be long before a medium created by humans for humans is all machine — well, mostly.
As researchers with the technology consultancy Gartner will tell you, the worldwide stream of incoming video data is outpacing our ability to ever watch it all. Closed-circuit systems, police body and dashboard cameras, home security networks, drone footage — most of it stands a fair chance of never being watched by human eyes, according to research director Nick Ingelbrecht.
The data out of these systems, he said, is becoming more likely to be analyzed by machines. And what this means is that video will be subject to algorithms and restrictions to identify actions, behaviors or patterns. A suspicious bag at the airport, a stalled vehicle on the freeway or a protester about to lob a rock through a shop window are all on the menu for video analytics.
“The amount of video out there is outstripping our ability to sit down and watch it,” Ingelbrecht said. “We project that by 2020, 95 percent of video or image content will never be viewed by humans, but instead will be vetted by machines that provide some degree of automated analysis.”
When all is said and done, it might be better this way. Machines don’t get bored watching banks of screens waiting for something to happen; people, on the other hand, do.
Today you could probably throw a stick and hit 10 industries leveraging video technology in one way or another. Two standouts in analytics are transportation and public safety.
“Video is kind of another data stream in a big data analytics environment, and the more varied and different streams of data you have, the more useful the information is going to be potentially, if you can analyze it efficiently,” Ingelbrecht explained.
As for what is truly possible, he is quick to point out that it isn’t the lack of technology that is keeping people from embracing the full potential of mining video data, it’s the potential costs.
The modern analogy for transportation technology might best be summed up by California’s shiny new Bay Bridge: a $6 billion modern marvel with a traffic control system that dates back to 1974. Ironic? Yes. Unusual? Absolutely not.
And the bridge that connects Oakland to San Francisco is not the only one in the country suffering from the effects of decades of underfunding. According to the most recent report card on infrastructure from the American Society of Civil Engineers, U.S. bridges on the whole earned a C+.
Updating this decades-old system, though a daunting task at first look, is a perfect opportunity for video analytics, said Kevin Aguigui, a senior systems engineer with international infrastructure specialist Kimley-Horn and Associates.
And the transportation company is currently working to do just that. Aguigui said the coordinated application of traffic sensors and smarter video systems could effectively reduce the need for subjective human decision-making when it comes to metering and traffic flows into the city.
“We are trying to basically use cameras and sensors in the ground, and maybe cameras that count vehicles, to provide a more automated system that can make decisions based on actual counts, not just human intervention,” he added. “The human element in managing traffic is one of the things that lags because a lot of agencies have not had the funding. … That’s the one area, in my opinion, that automation has worked, will work and can continue to work.”
But applying smarter technology to aging transportation infrastructure is easier said than done, especially when the communication backbone required for more modern, high-definition camera technology is lacking.
Bolstering connectivity is where Aguigui said the majority of the time and money is spent on projects like this. Without well structured lines of communication, the systems they support might as well be the 1970s tech the state is looking to replace.
“One of the things we are grappling with in traditional transportation technologies is, we have this issue of a lot of cameras out in the field — these are high-definition cameras; they require a lot of bandwidth. If you want to view these high-definition cameras, we need fiber-optic networks, we need high-speed communication systems, and that has been where most of the money is being spent. We’ve got to get the communications mediums to these cameras so they can actually view them appropriately.”
Another oft-encountered challenge is perfecting the algorithms meant to recognize activities a human might not think twice about. In a matter of seconds, a person could determine whether a car was stalled or simply stuck in traffic. For a computer, digesting this information is not so simple.
Aguigui said the larger problem boils down to the software that can handle the application and staff that can separate actual alerts from potential false positives. “We’re in an analog world and we are relying on digital technology to manage that analog world. It’s tough,” he said.
“[Networks are] going to give you a bunch of information and too much information is actually worse than not enough information that you have to verify because now they are overwhelmed with information. It becomes an issue where a lot of manual intervention is still necessary,” he explained of the analytics challenge today.
Luckily the company isn’t expecting to see this sort of issue on the toll project. Though Aguigui was limited in what he could disclose about the company’s work with airport systems, he said video analytics and sensor networks play in heavily to the overall security network. In-ground sensors, motion detection and vast arrays of cameras are commonplace.
Despite the full coverage in places like this, false positives are common and staff members must be willing to run down any potential breach. It comes down to striking a balance between computers and the human element.
“So when I say it’s the software and the algorithms, it’s not that they are bad,” Aguigui explained. “I think we are still learning how to use them in a way that is effective and efficient, and we can actually home in on the right thing they need to home in on given the limited time and resources that they have.”
It’s early January 2017, and a wall of monitors inside Sacramento’s police headquarters is streaming in footage from around California’s capitol city. Cars pass through intersections, and people stroll down sidewalks in various directions.
From behind the monitors that make up the control center, all looks well. But in the field, in areas where officers cannot afford to be at all times, the cameras themselves are not satisfied. They are alert, collecting everything they can from their narrow view of the world. By themselves, they are worthless — no context, no cooperation — but they are becoming an integral part of the policing process in the city’s Real-Time Crime Center.
Sgt. Marnie Stigerts, who also works auto theft investigations, heads up the center with the help of a small onsite crew. After successful field tests in mid-2016, the department decided to invest in more cameras and resources. And though the center is not operational 24/7, during critical incidents and large-scale events, the crew has what equates to a significant picture of the metro area.
The wealth of information coming through the monitors and the glowing bank of screens is next to worthless without officers who can interpret it and get it to their colleagues in the field.
The center is like so many other departments across the country trying to embrace new technology on a budget. Though you wouldn’t consider what happens in the sleek control room video analytics in the classical sense, the work done here ties in everything the platform might need to get off the ground, Stigerts explained.
GIS overlays of officer and camera positions provide center staff with the details they need to relay the intelligence they gather here.
Without warning, the alarm chirps off with an automated notice about a potential stolen vehicle. Officers hurry to confirm its validity, noting the make, model and camera that made the capture. There is no denying that humans still have a role to play in this environment — at least for the time being.
Even without a supercomputer to do this legwork, successes have already been seen. In 2016, the system helped authorities track down an alleged sex trafficker through license plate captures and alerts, a witness statement, and good police work. Warrants were issued and an arrest was made.
One man on the forefront of making video analytics smarter is Mubarak Shah, a computer scientist with the University of Central Florida. In 2002, Shah and his team were working to use video analytics to detect person-on-person violence. In 2010, their research focused on violence detection in larger groups, an area of research that presented challenges based on the large amount of data needed to identify certain actions — an area that the Defense Advanced Research Projects Agency has expressed interest in.
Where humans have little difficulty identifying a series of simple actions — say people celebrating at a birthday party — computers must be taught to recognize these behaviors. Until neural networks and deep learning capabilities improve, these lessons must be taught with lots of data and annotations. Annotations are added areas of focus that allow the system to home in on important activities in the frame.
“That has really revolutionized the field of computer vision and also machine learning. That requires a lot of training data,” Shah explained. “The problem with all of these things is that there is not enough data, which these matters require — machine learning matters or deep learning matters — they require lots and lots of data and data annotations.”
Today Shah’s research is also focused on assisting the Orlando Police Department and the National Institute of Justice to get a better handle on their significant stores of video. When courts order a specific clip, Shah said the department often struggles to quickly identify the video in question with limited manpower and resources. He believes video automation could improve the process.
As algorithms and machine learning tools become more advanced, more able to recognize the complexities of human behaviors, our ability to treat video like any other data source will improve in step.
As Shah sees it, tools like video digestion, or the ability to filter down the most important parts of a longer clip, are an inevitable part of our current technological course.
For Ingelbrecht, the question of where the technology will head is as simple as following the demand and the cash. While one could argue that the most demand is in the homeland security and protection arena, the interest is spread across the spectrum. Value is to be found anywhere and everywhere.
“I think a lot of companies, a lot of government organizations are realizing that video data is like any other sort of data asset. It’s extremely valuable, but the issue really is how you manage that data, how you exploit it, how you extract value from that data and put it to good use,” Ingelbrecht said.