BE A GREAT SCIENTIST NOT JUST A DATA SCIENTIST
I want to relate a story about a great scientist name Jocelyn Bell who discovered radio pulses made by neutron starts. While working on…
Let’s talk about applying critical thinking in data science
Be a great "scientist" rather than just a "data" scientist
I want to relate a story about a great scientist name Jocelyn Bell who discovered radio pulses made by neutron starts. While working on her thesis, she was examining the output of the chart recorders by hand. She observed some anomalies, a "bit of scruff" as she called it, that did not fit the patterns produced by quasars. There was something interesting that was producing these bursts of radio waves going through her telescope. Eventually she eliminated all potential sources and concluded that they were made by neutron stars. This discovery was awarded the 1974 Nobel Prize in Physics. Her discovery is remarkable but noticed the following. First, she was analyzing the data by hand. Second, she knew her instrument, her tool so well not to be fool by it. Third, she did not dismiss the anomaly as an outlier to her trained eye like any other "scruff." The anomaly actually looked interesting to her. Fourth, she pursued her intuition to deduce the actual cause of the bit of scruff.
Because of the attention data science is getting in the media and academia, there are many more graduates entering the space or professional wishing to switch careers into data science. Undoubtedly, the following question frequently arises: What are the technical skills needed to be a good data scientist? Broadly speaking, good data scientists are strong in building models with Python or R, are good writing SQL queries, and understand causality (inference.) Those skills will give you the tools to clean, manipulate, and model the vast amounts of data you will encounter in most jobs. However, the type of specific technical skills needed in data science is highly dependent on the organization size, the industry, the team’s maturity, and, ultimately, the purpose of the analytics. Some jobs require only linear or logistic regression, while others may require sophisticated algorithms and visualization techniques.
But this answer fails to address the complete picture of a good data scientist, it only speaks to the "data" of being a data scientist. It does not address the most crucial component, which is "scientist." A scientist is someone who demonstrates high levels of critical thinking. Critical thinking means the ability to observe, reflect, question, and decide. Just like Joycelyn Bell did when she noticed the anomaly. A scientist asks, why does the output make sense? Does it follow my intuition? How do I validate the output? Do the results address the business question? A scientist has the ability to perform independent and reflective thinking by appraising the problem and the solution from multiple angles.
Once I read that about 90% of all data science projects fail or fail to be implemented. That is a massive waste of effort and talent. From my own experience of managing data scientists across multiple industries, I believe that the failure rate is so high because of the focus on the tool stack of a "data" scientist and the lack of emphasis on critical thinking. As my freshman calculus professor once told us: "I can teach a monkey to do calculus, but I cannot teach a monkey to think." I believe we have forgotten about being scientists.
So, what does it mean to be a good scientist? It means having a systematic approach to problem-solving, it means being curious about the problem being solved, it means being relentless in understanding the business question, it means having a specific hypothesis, it means having measurable metrics of success to testing the hypothesis, it means having an intuition about the solution, and it means being able to influence without using technical jargon. Dr. Bell is an excellent example of a "data scientist." She understood and knew her tools well, which is the reason she did not dismiss the anomaly as an outlier event. She examined her output by hand to not miss anything. She followed her intuition when she found something that did not look right. She methodically eliminated all possible sources that could cause the bit of scruff. A typical question that I ask during interviews is how much time you allocate to understand the business question, examine the data, decide the appropriate methodology to apply, and analyze the results. By my calculations, most data scientists spend less than 10% of their time analyzing and questioning their models’ output. Good scientists allocate more time to review, analyze, and challenge the results before reporting them to a manager or a business stakeholder.
In short, to be a good full-stack "data scientist," you need to know the technical tools (SQL, Python/R, ML frameworks/methodologies, statistics, etc.), and you need to be systematic and curious about the "how" and the "why" of the problem with good intuition about the business problem. It is easier to master the tools since it is a matter of time, but being a scientist takes immersion, takes following your intuition, and takes a thoughtful understanding of the problem and solution. We should all think about Jocelyn Bell process next time we are assigned a project.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS