![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
The headlines about an acute shortage of data scientists have been featuring prominently in the last few years. In a world obsessed with finding the next big innovation with big data, there just don’t seem to be enough data scientists to go around to satisfy an organizational craving for advanced analytics and insights.
Various solutions have been suggested, but many seem to be missing a key part of the problem: the shortage in skilled personnel spans the entire data analytics lifecycle. It’s not just data scientists that are missing in action or exceedingly difficult to hire: a similar problem exists with data engineers, and it’s even more acute, worrisome, and urgent.
Let’s first differentiate the two roles. According to an article on data science skills by Elena Grewal, head of data science at Airbnb, data scientists provide expertise in analytics (working on metrics, data storytelling, and tool-building), algorithms (interpreting algorithms that enable data products), and inference (providing causal connections with statistics). In short, the data scientist cleans, kneads and organizes Big Data.
The data engineer’s work can include data governance and quality control, complex distributed architectures implementation (on-premise or in-cloud), data pipeline building and maintenance, resource utilization optimization in storage or compute clusters, and batch processing jobs management to enable access to fresh, accurate data. In other words, she develops, builds, tests, and maintains databases, processing systems, and other architectures.
Right now, data scientists get all the glamour and spotlight, partly, because data science is the final and more visible step in the journey. All the other steps that need to occur before data scientists can even start working (this can be dozens of processes around ingesting, transforming, and structuring data for analysis) belong to the data engineers and are often more labor-intensive than the “gleaning insights” part.
Since we’re asking a question about data science, it makes sense to answer it with data. If there’s a shortage in a certain field, we would expect to see it manifested in (1) more open positions than available candidates and (2) very high salaries being offered by employers to lure the small number of available candidates.
LinkedIn and Indeed can give us pretty good insights into both of these questions. Here’s what we got looking at data for the United States (stats via LinkedIn):
* LinkedIn job search results for the following keywords in exact match: “data engineer,” “big data engineer,” “data scientist” in Geography: United States. All searches made on 02/20/2020.
** LinkedIn Sales Navigator search results for the following keywords in exact match: “data engineer,” “big data engineer,” “data scientist” in Geography: United States. All searches made on 02/20/2020.
Some interesting findings here: For every open data engineer job, there are 2.53 suitable candidates. For every open data scientist job, there are 4.76 suitable candidates. And for every open big data engineer job, there are 2.47 suitable candidates. The contrast with other developers is quite dramatic, showing how there is in fact a shortage of data professionals compared an abundance of web developers (10.8 for each position) and marketing managers (53.79 per open position).
Organizations who want to work successfully with Big Data have to take a farsighted view of the “data grind” — what it takes to get from the beginning to the end of the journey and what types of talent can deliver the required skills. Before the data scientist can do the magic, the data engineer has to build a whole lot of infrastructure. You need both talents, in equal measure, to optimize the results of your data science value chain.
Obviously it’s not a “competition” between data scientists and data engineers on which sector actually suffers from talent shortage. For your data endeavor to succeed, you’ll need the expertise of data engineers to build the infrastructure and prepare the data, as well as the skills of data scientists who use this data to develop analyses, algorithms, and research. A shortage of talent in either field can only raise manpower costs and doom the entire data science project.
Since media has focused on the data science talent gap, there is now a plethora of suggestions to address it. For tackling the data engineer skills gap, not too many exist.
Let me put forth a few:
Your data endeavors can translate into meaningful business value if you understand how data science really works and what your data scientists and data engineers really do.
You may already have the right people, now you just need to put them in the right place with the right technology — then you don’t have to be in this fight at all.