Representation Bias in Machine Learning

Dr Anandhi Dhukaram

Understanding Representation Bias

In recent years AI systems are often making headlines for their failures. A notable example includes Facebook’s ad algorithm, which excludes women from seeing specific jobs or Google Gorilla, an image recognition algorithm which failed to label black females appropriately (Suresh and Guttag, 2021). These failures frequently trace back to bias in the data. One specific type of bias that needs to be addressed is representation bias, when the data used to train AI systems does not reflect the full diversity of the real world.

Representation bias often arises because datasets lack adequate representation of minorities or uncommon scenarios. This creates a gap between what the AI “learns” and the real-world situations where it is applied.

Why Representation Bias Matters?

Here is an example:

“ImageNet is a widely-used image dataset consisting of 1.2 million labelled images. ImageNet is intended to be used widely (i.e., its target population is “all natural images”). However, ImageNet does not evenly sample from this target population; instead, approximately 45% of the images in ImageNet were taken in the United States, and most of the remaining images are from North America or Western Europe. Only 1% and 2.1% of the images come from China and India, respectively” (Suresh et al., 2019).

As a result, a classifier trained on ImageNet performs significantly worse when classifying images of certain objects or people (e.g., “bridegroom”) if the images come from underrepresented countries like Pakistan or India (Suresh et al., 2019).

Here is an illustration:

“Data can be traumatized by one-time phenomena. An algorithm built for credit card applications uses historical data about the chance of default. In case of an unsuspected event during the collection of data, such as a natural catastrophe in a certain area, people might not be able to pay back their debts. Therefore, applicants from this area will most likely be classified as potential defaults. Thus, the one-time phenomenon is imprinted into the ML-application” Fahse (2021).

Therefore, if an AI is trained only on data representing a specific situation, group, or region, it might need help to make accurate predictions or decisions for anyone outside that narrow scope. For example, imagine a healthcare AI system trained primarily on data from urban hospitals. What happens when it is applied in rural settings?

Bias like this doesn’t just skew results—it erodes trust, accuracy, and fairness.

In healthcare, for example, an AI trained primarily on urban hospital data may fail to make accurate predictions in rural settings, leading to harmful or life-threatening outcomes.

How is it caused?

Representation bias can occur for several reasons, including:

Historical Discrimination in Data: Past inequalities reflected in the data get baked into the model.
Sampling Bias: The data collected may over-represent or under-represent certain groups while excluding others.
Data Preparation Overlooking Diversity: Failing to include diverse perspectives during the data-cleaning and preprocessing stages intentionally.
Shifts in Real-World Data: The world changes, and if the data used to train the model does not keep up, the AI quickly becomes outdated.

How Does Bias Impact Your Organisation?

A healthcare AI trained on urban hospital data may perform poorly in rural settings.
Credit scoring algorithms may unfairly penalize applicants affected by one-time natural catastrophes.

These flaws don’t just erode accuracy—they damage trust, fairness, and your organization’s reputation.

Where Does Representation Bias Start?

Product teams often focus on tackling representation bias during the data collection phase or evaluation phase with fairness tools. Representation bias starts creeping in right from the problem definition stage, and mitigating it effectively means taking thoughtful action at every step of the AI/ML pipeline including:

AI Problem Definition
Data Collection
Data Preparation
Model Preprocessing
Model Development & Deployment
Model Validation & Monitoring

How Can You Start Mitigating Representation Bias Today?

Tackling representation bias requires a systematic, proactive approach.

You can get started with these resources:

Free Resources for Representation Bias Mitigation

Best practice and design considerations for mitigating Representation Bias from problem definition to model deployment (click Free Download).

AI Bias Mitigation Package – £999

The ultimate resource for organisations ready to tackle bias at scale starting from problem definition through to model monitoring to drive responsible AI practices.

Mitigate and resolve 15 Types of Bias specific to your project with detailed guidance from problem definition to model monitoring.

Packed with practical methods, research-based strategies, and critical questions to guide your team.

Comprehensive checklists with +75 design cards for every phase in the AI/ ML pipeline

Get Bias Mitigation Package– (Delivery within 2-3 days)

Customised AI Bias Mitigation Package – £2499

We’ll customise the design cards and checklists to meet your specific use case and compliance requirements—ensuring the toolkit aligns perfectly with your goals and industry standards.

Mitigate and resolve 15 Types of Bias specific to your project with detailed guidance from problem definition to model monitoring.

Packed with practical methods, research-based strategies, and critical questions specific to your use case.

Customised checklists and +75 design cards for every phase in the AI/ ML pipeline

Get Customised AI Bias Mitigation Package– (Delivery within 7 days)

Conclusion

Understanding representation bias is the first step, but how do you practically identify and mitigate it in your AI systems? From problem definition to model monitoring, ensuring fairness requires a comprehensive approach across the AI lifecycle.

To help your teams navigate these challenges effectively, I’ve developed a concise framework for identifying and mitigating representation bias. This resource provides actionable steps using research-based best practices to ensure your AI systems work equitably and responsibly.

Sources

Catania, B., Guerrini, G. and Janpih, Z., 2023, December. Mitigating Representation Bias in Data Transformations: A Constraint-based Optimization Approach. In 2023 IEEE International Conference on Big Data (BigData) (pp. 4127-4136). IEEE.

Fahse, T., Huber, V. and van Giffen, B., 2021. Managing bias in machine learning projects. In Innovation Through Information Systems: Volume II: A Collection of Latest Research on Technology Issues (pp. 94-109). Springer International Publishing.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. and Galstyan, A., 2021. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6), pp.1-35.

Mousavi, M., Shahbazi, N. and Asudeh, A., 2023. Data coverage for detecting representation bias in image datasets: A crowdsourcing approach. arXiv preprint arXiv:2306.13868.

Shahbazi, N., Lin, Y., Asudeh, A. and Jagadish, H.V., 2022. A survey on techniques for identifying and resolving representation bias in data. CoRR, abs/2203.11852.

Suresh, H. and Guttag, J.V., 2019. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002, 2(8), p.73.

Suresh, H. and Guttag, J., 2021. Understanding potential sources of harm throughout the machine learning life cycle.

Representation Bias in Machine Learning

Dr Anandhi Dhukaram

Understanding Representation Bias

Why Representation Bias Matters?

How is it caused?

How Does Bias Impact Your Organisation?

Where Does Representation Bias Start?

How Can You Start Mitigating Representation Bias Today?

Free Resources for Representation Bias Mitigation

AI Bias Mitigation Package – £999

Customised AI Bias Mitigation Package – £2499

Conclusion

Share:

Related Courses & Al Consulting

Designing Safe, Secure and Trustworthy Al

Workshop for meeting EU AI ACT Compliance for Al

Contact us to discuss your requirements

Related Guidelines

Esdha Office

Responsible AI Governance

Service Locations

Blog - Responsible AI Governance

Ⓒ2025,+ Esdha - Responsible AI Governance Consultancy and Partner

To download the guide, fill it out.