The Many Faces of Bias

Our weekly selection of must-read Editors' Picks and original features

Jun 16, 2022

3 min read

Bias is a charged term in the fields of data science and machine learning—not in the least because it means so many things for practitioners. Imbalanced statistical distributions are a form of bias, but so is the representation of racial and gender stereotypes in language models’ training data, or the way researchers’ assumptions get baked into the algorithms they build.

In our latest Author Spotlight Q&A, we chatted with Conor O’Sullivan about his growing interest in algorithmic fairness—a subfield devoted to countering biases in a wide range of data science practices and workflows. So it only felt natural to expand on this topic in this week’s Variable, where we highlight several recent articles that approach bias with great nuance, and from multiple angles.

A useful primer on preventing and removing bias from datasets. If you’ve just recently tapped into conversations around fairness and bias, a very good place to start is Ella Wilson‘s debut TDS post. It defines the core concepts you need to know, and also introduces some of the main approaches to tackling the problem of bias in training data.
A close look at the social impacts of bias. "When someone practices data science, they are either challenging or enforcing an existing structure of power." The starting point of Aisulu Omar‘s thought-provoking article is that working with massive datasets isn’t inherently good or bad, but that the combination of non-diverse teams and under-informed individual practitioners can cause (or perpetuate) harm.

On the issue of inference and multiple treatments. Turning to the statistical side of things, Matteo Courthoud‘s latest explainer is a lucid and engaging analysis of a recent paper on contamination bias: the problem that arises when we want to observe the effects of multiple, mutually exclusive treatments in contexts like experimental drugs, UX design, or policy debates.
Evaluating survival analysis models correctly. Issues surrounding accurate interpretation are also at the core of Nicolo Cosimo Albanese‘s deep dive on performance-evaluation metrics for survival analysis. He covers the ins and outs of several common metrics, and shares examples (in Python) to show readers how to go about choosing the right one.

If you’d like to exercise a few other data-science muscle groups this week—and why wouldn’t you?—here are a few recommended reads, spanning a wide spectrum of topics and approaches.

It’s always a treat to share a new post by Sara A. Metwalli, and this one’s no exception: here’s a concise, actionable tutorial on writing better comments in your code.
Continuing his fascinating, ongoing series on graph neural networks, Michael Bronstein (with coauthors Cristian Bodnar and Fabrizio Frasca) challenges the notion that graphs are, in fact, the right computational fabric for GNNs.
Unit-testing data pipelines is a perennial challenge for data analysts and engineers. Xiaoxu Gao is here to help, with a comprehensive tutorial that covers the implementation of unit tests in dbt.
For the tinkerers and builders out there, Elise Landman walks us through the process of training a basic recommender system with Word2Vec on browser session data.
We send you off this week with the latest from Leah Simpson and Ray McLendon, who leverage insights from Eric Ries’s "lean startup" approach to streamline and optimize machine learning products.

If you felt inspired to become a Medium member recently to support our authors’ work, we truly appreciate it! And we’re always grateful to all our readers and followers for keeping our community vibrant and supportive.

Until the next Variable,

TDS Editors

Written By

TDS Editors

See all from TDS Editors

Bias, Data Science, Tds Features, The Variable, Towards Data Science

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/the-many-faces-of-bias-c515cd483db4/

⇱ The Many Faces of Bias | Towards Data Science

The Many Faces of Bias

Related Articles

Implementing Convolutional Neural Networks in TensorFlow

Hands-on Time Series Anomaly Detection using Autoencoders, with Python

Solving a Constrained Project Scheduling Problem with Quantum Annealing

Back To Basics, Part Uno: Linear Regression and Cost Function

Must-Know in Statistics: The Bivariate Normal Projection Explained

Our Columns

Optimizing Marketing Campaigns with Budgeted Multi-Armed Bandits