VOOZH about

URL: https://thenewstack.io/why-data-science-teams-should-be-using-pair-programming/

⇱ Why Data Science Teams Should Be Using Pair Programming - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-08-11 06:20:49
Why Data Science Teams Should Be Using Pair Programming
sponsor-vmware,sponsored-post-contributed,
Data / DevOps / Software Development

Why Data Science Teams Should Be Using Pair Programming

Pair programming is common in software engineering, less so in data science. This is a missed opportunity. Here are three ways pair programming benefits data science teams.
Aug 11th, 2023 6:20am by Woo Jung
👁 Featued image for: Why Data Science Teams Should Be Using Pair Programming
Image from Inside Creative House on Shutterstock
VMware Tanzu sponsored this post.

Data science is a practice that requires technical expertise in machine learning and code development. However, it also demands creativity (for instance, connecting dense numbers and data to real user needs) and lean thinking (like prioritizing the experiments and questions to explore next). In light of these needs, and to continuously innovate and create meaningful outcomes, it’s essential to adopt processes and techniques that facilitate high levels of energy, drive and communication in data science development.

Pair programming can increase communication, creativity and productivity in data science teams. Pair programming is a collaborative way of working in which two people take turns coding and navigating on the same problem, at the same time, on the same computer connected with two mirrored screens, two mice and two keyboards.

At VMware Tanzu Labs, our data scientists practice pair programming with each other and with our client-side counterparts. Pair programming is more widespread in software engineering than in data science. We see this as a missed opportunity. Let’s explore the nuanced benefits of pair programming in the context of data science, delving into three aspects of the data science life cycle and how pair programming can help with each one.

Trusted by enterprises and loved by developers, VMware Tanzu is built for platform and data teams who want to accelerate agentic software delivery and AI-ready data. Tanzu provides a pre-engineered, agentic app platform and an AI-ready data intelligence platform that helps enterprises build, run, manage and safeguard agents, their integrations and data so you can capitalize on AI at scale. 
Learn More
The latest from VMware Tanzu
Hear more from our sponsor

Pairing to Discover Creatively

When data scientists pick up a story for development, exploratory data analysis (EDA) is often the first step in which we start writing code. Arguably, among all components of the development cycle that require coding, EDA demands the most creativity from data scientists: The aim is to discover patterns in the data and build hypotheses around how we might be able to use this information to deliver value for the story at hand.

If new data sources need to be explored to deliver the story, we get familiar with them by asking questions about the data and validating what information they are able to provide to us. As part of this process, we scan sample records and iteratively design summary statistics and visualizations for reexamination.

Pairing in this context enables us to immediately discuss and spark a continuous stream of second opinions and tweaks on the statistics and visualizations displayed on the screen; we each build on the energy of our partner. Practicing this level of energetic collaboration in data science goes a long way toward building the creative confidence needed to generate a wider range of hypotheses, and it adds more scrutiny to synthesis when distinguishing between coincidence and correlation.

Pairing for Lean Experimentation

Based on what we learn about the data from EDA, we next try to summarize a pattern we’ve observed, which is useful in delivering value for the story at hand. In other words, we build or “train” a model that concisely and sufficiently represents a useful and valuable pattern observed in the data.

Arguably, this part of the development cycle demands the most “science” from data scientists as we continuously design, analyze and redesign a series of scientific experiments. We iterate on a cycle of training and validating model prototypes and make a selection as to which one to publish or deploy for consumption.

Pairing is essential to facilitating lean and productive experimentation in model training and validation. With so many options of model forms and algorithms available, balancing simplicity and sufficiency is necessary to shorten development cycles, increase feedback loops and mitigate overall risk in the product team.

As a data scientist, I sometimes need to resist the urge to use a sophisticated, stuffy algorithm when a simpler model fits the bill. I have biases based on prior experience that influence the algorithms explored in model training.

Having my paired data scientist as my “data conscience” in model training helps me put on the brakes when I’m running a superfluous number of experiments, constructively challenges the choices made in algorithm selection and course-corrects me when I lose focus from training prototypes strictly in support of the current story.

Pairing for Reproducibility

In addition to aspects of pair programming that influence productivity in specific components of the development cycle such as EDA and model training/validation, there are also perhaps more mundane benefits of pairing for data science that affect productivity and reproducibility more generally.

Take the example of pipelining. Much of the code written for data science is sequential by nature. The metrics we discover and design in EDA are derived from raw data that requires sequential coding to clean and process. These same metrics are then used as key pieces of information (a.k.a. “features”) when we build experiments for model training. In other words, the code written to design these metrics is a dependency for the code written for model training. Within model training itself, we often try different versions of a previously trained model (which we have previously written code to build) by exploring different variations of input parameter values to improve accuracy. The components and dependencies described above can be represented as steps and segments in a logical, sequential pipeline of code.

Pairing in the context of pipelining brings benefits in shared accountability driven by a sense of shared ownership of the codebase. While all data scientists know and understand the benefits of segmenting and modularizing code, when coding without a pair, it is easy to slip into a habit of creating overly lengthy code blocks, losing count on similar code being copied-pasted-modified and discounting groups of code dependencies that are only obvious to the person coding. These habits create cobwebs in the codebase and increase risks in reproducibility.

Enter your paired data scientist, who can raise a hand when it becomes challenging to follow the code, highlight groups of code to break up into pipeline segments and suggest blocks of repeated similar code to bundle into reusable functions. Note that this works bidirectionally: when practicing pairing, the data scientist who is typing is fully aware of the shared nature of code ownership and is proactively driven to make efforts to write reproducible code. Pairing is thus an enabler for creating and maintaining a reproducible data science codebase.

How to Get Started

If pair programming is new to your data science practice, consider a data science course, and we hope this post encourages you to explore pair programming with your team. At Tanzu Labs, we have introduced pair programming to many of our client-side data scientists and have observed that the cycles of continuous communication and feedback inherent in pair programming instill a way of working that sparks more creativity in data discovery, facilitates lean experimentation in model training and promotes better reproducibility of the codebase. And let’s not forget that we do all of this to deliver outcomes that delight users and drive meaningful business value.

Here are some practical tips to get started with pair programming in data science:

  • Synchronize schedules: Full-time pairing is easiest when participants start and end at the same time. This allows you to maximize your pairing time, as well as to stay on the same circadian rhythm. If this is not possible, for instance, due to time zone differences, define what hours you will be pairing.
  • Set up a pairing station: If you are pairing in person, set up a workstation where two monitors, two mice and two keyboards are attached to the same computer. If you are working remotely, ensure you have access to a videoconferencing tool with great screen-sharing technology, especially one that allows remote control. This will help both parties to stay engaged and make collaboration much smoother.
  • Practice empathy: Pairing with someone throughout the workday is immensely fun and exhilarating when both pairs are actively listening, validating each other’s thoughts and perspectives and engaging in acts of kindness.
  • Take breaks: Pairing is an intensive approach to writing code and requires continuous concentration and communication. Don’t forget to take frequent breaks when pairing to unwind, recharge and get back at it again.
Trusted by enterprises and loved by developers, VMware Tanzu is built for platform and data teams who want to accelerate agentic software delivery and AI-ready data. Tanzu provides a pre-engineered, agentic app platform and an AI-ready data intelligence platform that helps enterprises build, run, manage and safeguard agents, their integrations and data so you can capitalize on AI at scale. 
Learn More
The latest from VMware Tanzu
Hear more from our sponsor
TRENDING STORIES
Woo Jung is head of data science for Asia-Pacific and Japan, VMware Tanzu Labs. Before moving to Asia in 2017, Woo led data science teams in the United States with Pivotal Labs and EMC. Woo is focused on delivering data-driven...
Read more from Woo Jung
VMware Tanzu sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.