As the public and private sectors continue the trend toward big data, data scientists will be needed to analyze and compile useful information with that data.
David Steier, director of Information management at Deloitte Consulting, says that from a short-term perspective, a data scientist shortage could be an issue since not every company or agency will be able to find one willing to work for them. However, Steier feels that over the long term, the problem will become self correcting. Data science may become a more attractive career for individuals to pursue or the work of a data scientist will change so fewer are required in the future.
What are the key roles of a data scientist?
To make a big data project or any analytics project succeed, you actually need a lot of skills. I think of it as a combination of functional skills and technical skills … Most people when they think of data scientists, they think of the technical side. And their minds immediately go to analytics, which is important, but it’s not the whole part of the story.
So on the analytics, it’s the things around statistics, operations research, computer science, machine learning in particular is important for data science … But then there’s technology in the sense of being able to understand systems, particularly large systems, because you need to store data all over the place in distributed form, and the ability to program -- to write code that acts as a glue to put all these pieces together.
There’s also the design side of things, which is basically being able to create an interface to the data so people will find it usable, and there's the data side, which is data manipulation, data modeling, data cleansing. So if I got the numbers right, there should be kind of two functional skill sets and four technical skill sets. And all of those need to be combined to make a good data science project work.
Why is there so much hype over big data and data scientists right now?
It’s basically the availability of the data that was online and has never really been online, whether it’s ERP [enterprise resource planning] systems, or business intelligence systems internally or externally, the amount of data that’s now available. So there’s a lot of data out there and people are saying, ‘Gosh, there must be a pony in there somewhere so I can make better business decisions on the basis of that.’”
And on the data science side, [employers] are looking for people who can make sense of something that may be too large, or it’s changing too rapidly, or they’re from too many different data sources. So they are looking for people who combine enough of these skills, especially those who can talk to business folks and understand their requirements and make sure what they develop is useful.
What is the current status of data scientists in terms of how many are available?
I would say that it’s certainly difficult to find data scientists when people are looking for the magic person who combines all of those skills I talked about. They are very, very difficult to find. And those kinds of people definitely, especially the ones if you talk about advanced degrees.
There’s this conception that data scientists might have to be a Ph.D. And that’s a fairly small number. And Ph.D.s generally tend to go either to high technology companies who have them on critical mass, or to startups, or to consulting firms and so on. So they’re not available in all of the sectors who’d want to hire them.
And that goes for public sector as well?
Right. So when you say there’s a shortage, which generally means it’s very hard to find a person to work in the location that you want at the salary that you’re willing to pay. And they don’t tend to be evenly distributed throughout the country or throughout the world for that matter. And there is definitely a general raising of salaries for these kinds of positions.
What can we expect to see in the future?
I don’t know if there will be more of a data scientist shortage in the future. It will continue to be tight for awhile. But I think you’ll also find as people get more experienced with it.
A. They don’t need Ph.D.s
B. They need to form teams of people who might have the individual skills that know how to work together.
C. You’ll do a lot of internal training.
People will go for either part-time degree courses, whether it’s online or offline. There’s all sorts of options that didn’t even exist a couple years ago that can address some of the data science implications here.
Is there anything else that needs to happen so that more data scientists can enter the public-sector arena?
I think it will be important for the public sector to do partnerships with either academics or with folks in consulting to be able to meet the needs, because there are certainly a lot of data in the public sector that needs analysis. There’s a lot of opportunity there. And there just aren’t that many people to go around to work full time at all of the organizations that want to hire them.