Government Technology

New York City Agency Simplifies the Data Warehouse



August 1, 2012 By

In New York City, the Human Resources Administration employs about 15,000 people who deliver Medicaid and other services to more than 3 million residents — and its enterprise data warehouse stores so much information that employees can potentially sift through hundreds of millions of rows inside just one table in a spreadsheet.

“Our largest table had more than three-and-a-half billion paid Medicaid claims,” said Anna Stern, assistant deputy commissioner of new initiatives in management information systems. “And the public assistance and food stamp data comes out of the state system, which is a mainframe system, so we’re dealing with old, complex systems [and] very big data.”

But that’s not the case any longer, thanks to Stern and her colleagues. Because the enterprise warehouse’s data groups often are much larger than employees need, they created dataSmart, a compact version of the warehouse that only contains the most needed data for staff members, while presenting users with a simpler interface.

The massive enterprise data warehouse had been around since 2001, so developers learned over time what the most popular data sets were to include in dataSmart. “We took everything we’ve learned over the years and built a streamlined model,” said Data Warehouse Director Jane Neimand. “We’re very pleased with it.”

A Simpler Method

DataSmart, which went live with Phase I on Feb. 14, 2012, is housed on the same Oracle database as the enterprise data warehouse, and only department employees can access the systems. Both versions are side-by-side, and users must determine which to query for their needs.

Most people use the Oracle Business Intelligence Discoverer tool to access dataSmart, but other tools like SQL are available. In Discoverer, users query the system and receive the data in spreadsheet tables. DataSmart tables contain fewer rows than those in the data warehouse. “The step-by-step is going to be sort of the same, but each step is just simpler,” Neimand said. “[DataSmart] only has what they absolutely need. It’s not all this other data that they probably don’t need.”

For instance, the enterprise data warehouse holds more than 20 years of data history, which amounts to 1,500-plus data elements with tables containing hundreds of millions, and in one case billions, of rows. On the other hand, dataSmart holds three years of data history and only the most requested data elements.

Stern, Neimand and their co-workers work continually to keep dataSmart fresh. They rebuild the database from scratch each year to house only the past three years’ worth of information. “One advantage of doing that is that it keeps it small,” Neimand said, “but another advantage is that, let’s say some data element becomes important that was not important the last time we did a build, that gives us the opportunity to build in that new element.”

Education and experience have taught them which elements are the most important to include. Roughly 10 years ago, Stern learned in Data Warehousing Institute courses that only about 15 percent of a warehouse’s data is actually used. The administration therefore started developing dataSmart as a means to offer the most desired data more conveniently.

“We said, ‘What if we could come up with something simpler?’ And that’s what dataSmart is,” Stern said, noting that she believed many users lacked the analytical skills needed to access the original system effectively. “It is a distillation of everything we have learned from 10 [or] 11 years of running this data warehouse in this agency. Our premise is, small is beautiful.”

Phase I of dataSmart includes data from the welfare administration system; later phases will incorporate additional information, including GIS, supplementary security income and Medicaid data. Stern estimated that Phase II will go live on Labor Day, followed by Phase III at an as-yet-undetermined later date. The money for the phases came from the department’s budget, and Stern estimated that it will cost an additional $200,000 to complete the second phase.

Big Data, Big Goals

Today, Stern oversees training programs that employees must complete before they can query either database, and the learning curve for dataSmart certification isn’t steep. Users complete four three-hour training sessions to learn dataSmart, compared to the nine two-hour sessions for the enterprise data warehouse. A division liaison must nominate someone for training before that person can be eligible for courses.

Business analysts are currently trained on Oracle’s Discoverer tool, but the company eventually will migrate the department to the Oracle Business Intelligence Enterprise Edition (OBIEE) platform. Stern and Neimand are unsure when exactly that will take place, but they wouldn’t be surprised if it took about a year to convert Oracle’s database modules to the new system. For example, Discoverer has files called “workbooks” that contain worksheets displaying data retrieved from the database, and migrating users’ personal workbooks to OBIEE will take a lot of work.

“Oracle cannot predict how well the Discoverer workbook will convert to OBIEE, so we’re going to try to get people to get rid of the workbooks they don’t need,” Stern said. “But that’s something that could take a lot of time.”

But no matter what happens, she and her team will work to ensure that dataSmart continues to deliver information conveniently. They’re even using Google’s search capabilities for inspiration.

“My goal would be to get as close to Google as we could get in terms of querying,” Stern said. “Google’s figured out how to give you all the information in the world with one little box on the screen. [It] makes all sorts of information that is complex acceptable to normal people. That’s the goal of dataSmart relative to the agency’s data.”


You may use or reference this story with attribution and a link to
http://www.govtech.com/health/New-York-City-Agency-Simplifies-the-Data-Warehouse.html


| More

You May Also Like

Comments

Maritza Quito    |    Commented August 3, 2012

By reading your new project in it's beginning stages, I a very enthusiatic in being able to use it in my everyday investigations in hra/irea division. We can complete our cases more effectively with tools like datasmart.

thomas    |    Commented August 6, 2012

If you mean "added Google Search capabilities for inspiration" for the data(S)mart,. My humble opinion is that since Google is notorious for storing searches and using the search data for their marketing, we are probably jepordising the integrity of our Valuable Data. our clients will end up getting more junk emails.. I am sorry if this is not what you mean in the article. It is clear that it is "unclear"

Al Phillippe    |    Commented August 15, 2012

I would like this large Data storage and transmittal capability for field investigator for field reports and secured client data transmitions over a media device like a Galaxy Pad instead of a Apple Pad; This would allow for greater tracking of data in a much more efficient manner, this would include document scans and verification and well as employee time management: Security can be done by MIS at HRA.


Add Your Comment

You are solely responsible for the content of your comments. We reserve the right to remove comments that are considered profane, vulgar, obscene, factually inaccurate, off-topic, or considered a personal attack.


Collaboration for the Public Sector



Collaborative Justice: Transforming Criminal Justice Services Through Unified Collaboration
This issue brief examines video collaboration in every stage of the human justice process, demonstrating how this technology can not only make services more efficient, affordable, and accessible.

Cloud-Based Services Accelerate Public Sector Adoption of Video Collaboration
Today, thanks to new cloud technologies and high-quality networks, mobile video services - which provide not only cost savings but which help governmental interactions become more efficient - are more feasible than ever before.

Modernization as a Service: Acquiring IT through Innovative Procurement

Five Ways Collaboration is Driving Government Performance

Mobile Video Collaboration: The New Business Reality