Attacking the Big Data Deluge the Smart Way

Big data tools should be viewed as more of a valuable resource than an end product or outcome.

SALT LAKE CITY — By now, there's a fair chance many public-sector IT officials are tired of hearing the phrase “big data” tossed around, and they likely have no real idea as to how to harness the awesome power of the data each agency has been collecting for years now.

It isn’t as simple as categorizing it, stripping out the rough stuff and slapping a vague search feature on the whole shebang. There are much more important things to consider when it comes to sifting through the out-and-out treasure trove your agency calls its data.

Take, for example, the lessons learned from a computer science expert who has looked at massive data sets born out of complex DNA structures. While you might instinctively question the relevance of this type of work in relation to, say, the enormous customer data amassed by your state’s Department of Motor Vehicles or Health and Human Services agency, there are some very real correlations.

Miriah Meyer is an assistant professor with the University of Utah School of Computing. Her travels have landed her in the midst of data-heavy research projects where she was forced to ask, “What are you after here?” Meyer discussed her challenges with big data on June 7 at the Utah Digital Government Summit.

“As a society we’ve gotten really good at creating data, at measuring things," she said. "But one of the challenges we are facing right now is what do we do with all of this data? How do we actually use it to improve our lives, our health, well-being, and learn more about the world around us?”

Beyond simply presenting the data in a clear way, there is a need to assess the real purpose of the sets. What are you hoping to get from the information? This question, however basic, is often overlooked in the rush to clean up data and get it disseminated to the poor souls that will try to make sense of it.

In one big data project, the assistant professor worked with a Harvard medical team researching fruit fly DNA, which forced her to step out of the binary computer sciences and into the shoes of those gathering the data.

Through clarification and mapping out the intent of the information, Meyer said she was able to better provide visualization tools that opened access to a wider array of information and allowed the team to reassess other incorrect assumptions about their work.

“In dealing with data, there are two main approaches that people are taking today. The first one is really about … trying to take these large complex data sets and using advanced statistical methods to reduce them down to sets of numbers and values that we can actually wrap our head around,” she said. “[The other approach] is visualization, which is very near and dear to my heart.”

Meyer said the process of working with any new collaborators always requires what she called “data counseling” to establish the project parameters and turn the large messy questions into crisp, clear guidelines for effective data visualization tools.

“This tool was something that we integrated into their workflow and allowed them to ask a whole series of scientific questions that they hadn’t anticipated two years before,” she said.

But it isn’t all about which data sets make it into a data platform; sometimes it’s about how you present it all visually. Choosing the correct visualization channel to encode the information can open up or limit the usefulness of the tools.

And there are a number of ways to present the data — by color coding, density, volume, area angle, slope, length … the list goes on.

“It turns out that channels, like color, are actually the worst things that you could use for encoding numbers, as opposed to spatial location, such as position along an axis or length,” she said. “It turns out for us that it’s a lot more natural to understand changes in position than changes in color. So, visualization has a lot of these sorts of underlying, fundamental principles that we rely on in order to create active visualization tools.”

Meyer also said that big data tools should be viewed as more of a valuable resource than an end product or outcome.

“I think a lot of times, people still think of visualization as really being the icing on the cake, it’s the thing you do at the end of the process. It’s about these pretty pictures you create," she said. "But I really want to encourage you to think about visualization as not just about creating pretty pictures, but it’s really about a deep investigation into sense-making."

Eyragon Eidam is the Web editor for Government Technology magazine, after previously serving as assistant news editor and covering such topics as legislation, social media and public safety. He can be reached at