Big data is only getting bigger. With the advent of social media over the past several years, public-sector agencies have even more information to compile, analyze and produce on demand.
Tom Kennedy, director of Public Sector Archiving and E-Discovery for Symantec, talked to Government Technology about how government agencies are gaining efficiencies while extracting useable, relevant information from a growing number of data sources.
1. What kinds of information does government have access to now that it didn’t before, and from your perspective, what challenges are governments facing trying to make sense of all this information?
E-discovery [electronic discovery] fundamentally is very linked to the big data issue. E-discovery is all about being able to intelligently search and retrieve relevant information across huge amounts of data. The government is dealing with more and more data every day, and pretty much anything electronic is producible.
Probably the biggest thing that’s new is social media -- all the different types of contents of social media. A lot of our customers have to be able to search and produce Twitter feeds, social media feeds, those types of things. Another big one is audio files. Since you can electronically record audio conversations, those are all searchable and producible via e-discovery tools.
2. How are new analytics tools helping the public sector extract useful information from unprecedented amounts of data?
It’s been really fascinating to watch the market evolve. Several years ago, e-discovery was one of the last fields that was getting automated. We've seen a huge revolution out there in business processes getting automated. The legal e-discovery business process was one of those where four or five years ago, most of our customers didn’t have any technology. They literally just made printouts of documents, and then took hours and days and months to review documents using highlighters and that type of thing.
The emergence of e-discovery basically automated that process. It has been interesting to watch the market evolve from point solutions where an agency started to buy specific solutions to automate just a portion of the process. The problem that occurred was that they’d have three or four different point solutions that they had to integrate -- there was a lot of moving the data into a tool, moving the data out of a tool, etc. It required a lot of integration and it was complex. Now the market is demanding more of an end-to-end tool where they can do all the steps in the e-discovery process in a single platform.
Along with that, the technology has come a really long way and the advances in advanced search are really exciting. Again, the basic principle of e-discovery is you're trying to empower the user to sift through huge amounts of information and only find the relevant information. The power of analytics is really important in that process.
3. Can you describe a couple of the latest analytics tools?
With a keyword search, it is high recall, but very low precision – you get a lot of documents back but there are a lot of false positives. You want to use analytics to keep the recall high, but with high precision. A technology called transparent keyword search lets you run a keyword search, but then be able to transparently look at all the hits that came back, and de-select the ones that are not relevant to your case.
One of the biggest new analytics tools is automated information recognition -- what we refer to as predictive coding or machine learning. That allows you to train the computer to review documents. If you have 20 million documents, you would take the first thousand and manually review them to see what's relevant and what's not relevant. And via that first thousand documents, you would train the computer on what's relevant and what's not relevant. Then the computer would go through the other 19 million-plus documents and come back to you with recommendations on what it believes is relevant and not relevant. It is the absolute cutting edge in e-discovery right now. There are huge time and money savings associated with it.
4. How widespread is the use of these tools?
We don't run into too many states anymore that don’t have any tools at all. What we see very differently across state and local government is the level of adoption and sophistication of the tools that they have. The exciting thing that we're starting to see is agencies taking more of an enterprise approach. You’re starting to see stakeholders across state agencies say, 'Hey, we have a common business problem. Let's solve this at an enterprise level and let's share a common tool across multiple stakeholders in an agency.'
5. What are some emerging trends you see influencing public-sector adoption of these kinds of tools?
The first one is this whole concept of moving from point solutions to really having an overall enterprise-wide approach to e-discovery. Frankly we’re starting to see the lines blur between e-discovery and an overall information governance strategy. Agencies want a plan or automation around managing their data, and then an e-discovery strategy to search and analyze the data. We're starting to see that all kind of blur together.
No. 2 would be the new types of content out there. It's not just email anymore; it’s social media and audio/video files.
The third would be the continued advancement of analytics and search capability. To me, that's the most exciting part of the whole industry and my job -- when a government agency has that "Aha" moment where they literally take a process that was manual before and took them an entire day to do, and then they use these analytics and they can do, what took a day, in minutes. That's really exciting to see. And I think this world of machine learning, which is a trend in other industries, will only continue to get better and more adopted.
Photo: Tom Kennedy, Symantec