Reporting Bias

Understanding Reporting Bias

Reporting bias occurs when certain patterns or perspectives are disproportionately represented in datasets, resulting in skewed outputs from models trained on this data. Language models (LMs) like RoBERTa and GPT-2 are especially susceptible to this issue, as the biases embedded in their training data can influence the model’s responses, reinforcing these biases in the outputs they generate.

Reporting bias is similar to measurement bias and it arises in two key phases of the modelling process (Fahse et al, 2021):

  1. BU-Phase (Business Understanding):
    • Bias occurs through subjective choices during model design, particularly when defining the target variable and selecting features.
    • Using imperfect proxies or protected attributes (e.g., race, gender) can lead to discrimination or inaccuracies.
    • Even if protected attributes are excluded, their correlation with non-protected attributes (redlining effect) can still introduce bias.
  2. DP-Phase (Data Preparation):
    • Bias can emerge during feature creation, derivation, or transformation, potentially omitting critical factors or introducing noise.
    • Inaccurate features or reliance on a limited number of inappropriate features may result in varying prediction accuracy across groups.

Implications of Reporting Bias in Machine Learning

For example, “creditworthiness” is an abstract construct that is often operationalised with a measurable proxy like a credit score. Proxies become problematic when they are poor reflections or the target construct and/or are generated differently across groups can contribute to bias through (Suresh and Guttag, 2021):

  • Oversimplification of Complex Constructs:
    Proxies like credit score for “creditworthiness” fail to capture the full complexity of creditworthiness. This oversimplification can ignore group-specific indicators of success or risk for creditworthiness, leading to biased outcomes.
  • Variability in Measurement Across Groups:
    Measurement methods may differ between groups, introducing bias. For instance, stricter monitoring at certain factory locations can inflate error counts (i.e., observed number of errors is being used as a proxy for work quality), creating feedback loops that perpetuate further monitoring for those groups.
  • Accuracy Disparities Across Groups:
    Structural discrimination can lead to systematic inaccuracies, such as racial or gender disparities in medical diagnoses or misclassification in criminal justice risk assessments. For example, proxies like “arrest” or “rearrest” disproportionately misrepresent minority communities due to over-policing, leading to models with higher false positive rates for these groups (Mehrabi, 2021).

 

Design Approaches to Mitigate Reporting Bias in Machine learning


Tackling representation bias requires a systematic, proactive approach.

You can get started with these resources:

AI Bias Mitigation Package – £999

The ultimate resource for organisations ready to tackle bias at scale starting from problem definition through to model monitoring to drive responsible AI practices.

dribbble, logo, media, social Mitigate and resolve 15 Types of Bias specific to your project with detailed guidance from problem definition to model monitoring.
dribbble, logo, media, social Packed with practical methods, research-based strategies, and critical questions to guide your team.
dribbble, logo, media, social Comprehensive checklists with +75 design cards for every phase in the AI/ ML pipeline
Get Bias Mitigation Package– (Delivery within 2-3 days)
Customised AI Bias Mitigation Package – £2499
We’ll customise the design cards and checklists to meet your specific use case and compliance requirements—ensuring the toolkit aligns perfectly with your goals and industry standards.
dribbble, logo, media, social Mitigate and resolve 15 Types of Bias specific to your project with detailed guidance from problem definition to model monitoring.
dribbble, logo, media, social Packed with practical methods, research-based strategies, and critical questions specific to your use case.
dribbble, logo, media, social Customised checklists and +75 design cards for every phase in the AI/ ML pipeline
Get Customised AI Bias Mitigation Package– (Delivery within 7 days)

 

Summary

Our goal is not to prescribe specific statistical methods or tools to address representation bias, as such technical details are beyond the scope of this guidance. Instead, we aim to highlight key considerations, challenges, and strategies for identifying and mitigating representation bias in data. By fostering awareness of its implications, we encourage practitioners to adopt context-appropriate solutions informed by their application requirements and stakeholder engagement.

 

Source

Fahse, T., Huber, V. and van Giffen, B., 2021. Managing bias in machine learning projects. In Innovation Through Information Systems: Volume II: A Collection of Latest Research on Technology Issues (pp. 94-109). Springer International Publishing.

Mavrogiorgos, K., Kiourtis, A., Mavrogiorgou, A., Menychtas, A. and Kyriazis, D., 2024. Bias in Machine Learning: A Literature Review. Applied Sciences14(19), p.8860.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. and Galstyan, A., 2021. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6), pp.1-35.

Shahbazi, N., Lin, Y., Asudeh, A. and Jagadish, H.V., 2022. A survey on techniques for identifying and resolving representation bias in data. CoRR, abs/2203.11852.

Suresh, H. and Guttag, J., 2021, October. A framework for understanding sources of harm throughout the machine learning life cycle. In Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1-9).

Tay L, Woo SE, Hickman L, Booth BM, D’Mello S. A Conceptual Framework for Investigating and Mitigating Machine-Learning Measurement Bias (MLMB) in Psychological Assessment. Advances in Methods and Practices in Psychological Science. 2022;5(1). 

Share:

Related Courses & Al Consulting

Designing Safe, Secure and Trustworthy Al

Workshop for meeting EU AI ACT Compliance for Al

Contact us to discuss your requirements

Related Guidelines

Understanding Learning Bias Learning bias in machine learning arises from the inherent assumptions, design decisions, and optimization goals embedded in

Understanding Reporting Bias Reporting bias occurs when certain patterns or perspectives are disproportionately represented in datasets, resulting in skewed outputs

Understanding Evaluation Bias Evaluation bias may occur when the population in the benchmark set is not representative of the actual

Understanding Deployment Bias Machine learning (ML) is increasingly pivotal in decisions that directly impact individuals and communities. These algorithms learn

No data was found

To download the guide, fill it out.