VOOZH about

URL: https://towardsdatascience.com/confronting-bias-in-data-is-still-difficult-and-necessary-e0982fd7416c/

⇱ Confronting Bias in Data Is (Still) Difficult-and Necessary | Towards Data Science


Confronting Bias in Data Is (Still) Difficult-and Necessary

Our weekly selection of must-read Editors' Picks and original features

3 min read
👁 Photo by Charlotte Harrison on Unsplash
Photo by Charlotte Harrison on Unsplash

Year after year, datasets get bigger, cloud servers run faster, and analytics tools become more sophisticated. Despite this constant progress, however, practitioners continue to run into the issue of bias—whether it’s lurking in the dark recesses of their data files, popping up in their models’ outputs, or framing their project’s root assumptions.

A definitive solution to bias will require a lot more than local changes to a data team’s workflows; it’s not realistic to expect tactical fixes to solve a deep-rooted systemic problem. There’s hope, however, in the growing recognition (in tech and beyond) that this is, indeed, a problem to think about, discuss, and tackle collectively.

This week, we’re highlighting several articles that cover bias and data (and bias in data) in creative, actionable, and thought-provoking ways.

  • The different types of bias you might encounter. For anyone who’s exploring this topic for the first time, Shahrokh Barati‘s primer is an essential read on the differences between statistical bias and ethical bias: "two different categories of bias with distinct root causes and mitigations," that can each jeopardize data projects (and harm end users) if left unaddressed.
  • A powerful strategy to add to your anti-bias toolkit. After ML models go into production, they continue to evolve as teams fine-tune them to optimize their performance. Every tweak is a potential opening for bias to sneak in – which is why Jazmia Henry advocates for the adoption of model versioning, an approach that "allows for model rollbacks that can save your company money long term, but more importantly, help reduce bias if and when it arises."
  • Who shapes the politics of language models’ outputs? The rapid integration of chatbots into our day-to-day lives begs the question of their objectivity. Yennie Jun attempted to measure the political leanings of GPT-3’s outputs; the fascinating results she reports raise a whole set of questions about the responsibility and transparency of the people who train and design these powerful models.
  • How biased data can become a life-and-death issue. When we think of a field where data science and ML can make a major impact, healthcare is a common example, with many real-world applications already in use (or getting close). As Stefany Goradia shows, though, the datasets that health data scientists rely on can be rife with numerous forms of bias, which is why it’s crucial they know how to identify them correctly.
  • A deeper understanding of how bias works within AI systems. To round out our selection, we recommend reading Boris Ruf‘s lucid explanation of the inner workings of models—statistical formulas and all!—and how their design makes them susceptible to producing biased outputs.

For any of you who’d like to branch out into other topics over the next few days—from A/B testing to natural language processing—we’re delighted to share some of our recent favorites. Enjoy!


We hope you consider becoming a Medium member this week – it’s the most direct and effective way to support the work we publish.

Until the next Variable,

TDS Editors


Written By

TDS Editors

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles