The Long Walk from Data Science to Business Value
Why grounding data science projects in business perspective is key to their success.
Businesses in competitive markets must be aggressive in how differentiated experiences can enrich their end users and ecosystem partners. Data has been characterized as a "natural resource" to an enterprise. Differentiation can be attained by using enterprise data to identify and understand trends, patterns, and even, behaviors. Machine Learning (ML) models can be engineered to generate predictions from this precious enterprise data, which can provide insights to deliver on that differentiation to end users.
Here’s what I’ve learned from the field: most data science projects fail because they lose sight of business needs. Experts often become mired in data science wormholes and overlook the common goal. If we fail to ground our ML models in business perspective, we forfeit our potential to drive business growth. Let’s consider some key questions that help tackle this.
Understanding the business problem.
- Business framing workshops: As a proud IBMer and CSMer, I can say that we rock this at IBM. Mural is a great tool for doing this exercise remotely. Business framing workshops aim solely at defining a use case tailored to business needs.
- Industry research: Is your business in banking, retail or telco? Understanding your industry allows you to identify key business drivers primed by an industry context.
- Business Case: Is the use case supported and funded by a business case? Will our efforts drive revenue growth if successful?
Viewing the project through a business lens.
Here’s three questions that can help us take the pulse on data projects.
- Can I clearly articulate the business problem to others?
- Do I have all the information I need to understand the problem?
- How does my ML model address the problem?
Ensuring the model addresses the business problem.
There are books and careers dedicated to this very question so take this as an introduction to model selection. It’s worth mentioning at this point that data preparation and cleaning are out of scope here – that’s for another day… or year, depending on how long you take to clean your data.
- ML model testing: Testing will involve splitting the data into training, validation, and test sets. Then, we fit candidate models on the training set, evaluate and select on the validation set, and report the performance of the final model on the test set. It’s critical here to keep training, validation and test sets completely separate for a fair evaluation.
- ML model evaluation: Which model best addressed the business requirements? Could we pivot and orient to a more effective model? Are we splitting available data in a sensible way?
There are no perfect models – the aim is to find one that’s good enough to address the business problem.
Conclusion
With great data comes great power. Anchoring that data project in business perspective is vital to its success. When selecting an ML model, question whether the model actually tackles the business problem.
References
- Data Science for Business By Foster Provost and Tom Fawcett
- The CRoss Industry Standard Process for Data Mining (CRISP-DM) – What is CRISP-DM?
- Transforming data into action – Towards Data Science article by Lee Schlenker.
Footnote
In practice, there can be insufficient data to split into train, validate, test. In that case, there are two techniques used to approximate model selection:
- Probabilistic Measures: Choose a model via in-sample error and complexity.
- Resampling Methods: Choose a model via estimated out-of-sample error.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS