VOOZH about

URL: https://thenewstack.io/removing-bias-from-ai-is-a-human-endeavor/

⇱ Removing Bias from AI Is a Human Endeavor - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2019-07-23 09:10:13
Removing Bias from AI Is a Human Endeavor
contributed,
Tech Culture

Removing Bias from AI Is a Human Endeavor

Machine learning training data can take many forms, but the end result is it can cause an algorithm to miss the relevant relations between features and target outputs. Whether your organization is a small business, global enterprise, or governmental agency, it’s essential you mitigate bias in your training data at every phase of your Artificial Intelligence (AI) initiatives.
Jul 23rd, 2019 9:10am by Wilson Pang
👁 Featued image for: Removing Bias from AI Is a Human Endeavor

McKinsey Global Institute recently reported that companies adopting all five forms of AI — computer vision, natural language, virtual assistants, robotic process automation, and advanced machine learning — stand to benefit disproportionately vs. their competitors. However, before organizations can enjoy the benefits of AI, they must ensure the data they use for their initiatives is usable and unbiased.

After all, Machine Learning (ML) algorithms are only as good as the data on which they are trained. And, one worrying trend manifesting itself is biased algorithms. To remove algorithmic bias, organizations must first ensure the training data they use is as free as possible from bias.

Bias in ML training data can take many forms, but the end result is it can cause an algorithm to miss the relevant relations between features and target outputs. Whether your organization is a small business, global enterprise, or governmental agency, it’s essential you mitigate bias in your training data at every phase of your Artificial Intelligence (AI) initiatives.

Training Data Makes AI Work

A machine learning model is usually built in three phases: training, validation, and testing. In the training phase, a large amount of data is annotated — labeled by humans or another method — and input to a machine learning algorithm, with a specific result in mind. The algorithm looks for patterns in the training data that map the input data attributes to the target then outputs a model that captures these patterns. For the model to be useful, it needs to be accurate, and accuracy requires data that points to the requisite target or target attribute. Validation and testing help refine and prove the model.

High-Quality Training Data Must Be Unbiased Training Data

Machines need massive volumes of data to learn. Accurately annotating training data is as critical as the learning algorithm itself. A common reason that ML models fall short in terms of accuracy is that they were created based on biased training data.

Without high-quality, unbiased data to train machine learning models, investment in AI initiatives is money wasted. A recent study from Infosys found that 49% of IT decision-makers reported that their organization is unable to deploy the AI technologies they want because their data is not ready to support the requirements of AI technologies…

What Causes Training Data Bias and What Is the Consequence?

Wilson Pang
Wilson Pang joined Appen in November 2018 as CTO and is responsible for the company’s products and technology. Wilson has over seventeen years’ experience in software engineering and data science. Prior to joining Appen, Wilson was Chief Data Officer of CTrip in China, the second-largest online travel agency company in the world where he led data engineers, analysts, data product managers and scientists to improve user experience and increase operational efficiency that grew the business. Before that, he was senior director of engineering in eBay in California and provided leadership to various domains including data service and solutions, search science, marketing technology and billing systems. He worked as an architect at IBM prior to eBay, building technology solutions for various clients. Wilson obtained his Masters and Bachelor’s degrees of Electric Engineering from Zhejiang University in China.

Engineers and data scientists, as well as executive roles such as the chief technology officer, should carefully consider the prejudices they inherently carry when building AI solutions and do what they can to correct for these prejudices.

Bias of ML models — or “machine bias” — can be a result of unbalanced data. Imagine a data set for search query classifiers for an eCommerce website, to predict relevant results for a given search term: “women’s shoes.” A typical data bias example could be a data set that consists mostly of high heels, sandals, and boots — with very few samples of athletic shoes. A classifier model trained with this unbalanced dataset is going to lean heavily toward shoes that align with the given sample data, and fail to return relevant results to someone who is looking for women’s tennis shoes. This is bias in action. Straightforward, but critical to correct.

As machine learning projects get more complex, with subtle variants to identify, it becomes crucial to have training data that is human-annotated in a completely unbiased way. When training data, human bias can wreak havoc on the accuracy of a machine learning model. Imagine creating an ML model with the intention of differentiating between not only washers and dryers, but between the condition of the appliances.

If you have a team of in-house personnel annotating the images used as training data, it’s essential they adhere to a completely unbiased approach to classifying the images. Let’s say they’ll be classifying a variety of shoe styles by gender, which may be a subjective judgment for many styles. Without a diverse approach, you risk creating a less-than-accurate machine learning model.

If you are basing a mobile app, for example, on the ability to comb e-commerce sites for appliances in a particular condition within a specific price range, a biased, inaccurate ML model is not going to drive the adoption needed to succeed.

How Do We Ensure Our Training Data Isn’t Biased?

To help ensure optimal results, it’s essential that organizations have tech teams with diverse members in charge of both building models and creating training data. In addition to building a diverse team, organizations should also take the following suggestions into consideration when attempting to mitigate bias in their data.

  • If training data comes from internal systems, try to find the most comprehensive data and experiment with different datasets and metrics.
  • If training data is collected or processed by external partners, it is important to recruit diversified crowds for annotation so data can be more representative.
  • Design the data annotation tasks correctly and carefully communicate instructions so that the crowd correctly performs the tasks without knowing how the data will be used. Knowing what the data may be used for may impact the judgments an annotator makes.
  • Once the training data is created, it’s important to check if the data has any implicit bias.

It Is Up to Humans to Reduce Machine Bias

Organizations that perform data annotation internally will likely have discovered it can be difficult to visualize high-dimensional training data and check for biases. ML teams should regularly validate machine learning models and test for bias. At the end of the day, it’s important to remember that machine learning algorithms will be as biased as the people who collected, contextualized, and fed it its training data. While getting ahead of competitors in the race for AI adoption may be crucial for business success, it’s important to remember that humans must still oversee algorithms.

Ultimately, it is up to us — CTOs, CEOs, CIOs, data scientists, machine learning engineers, and product managers — to determine the path machine learning algorithms take. As AI practitioners, we should carefully consider the prejudices we inherently carry when creating these technologies and correct for them.

Feature image via Pixabay.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.