Understanding Measurement Bias
As artificial intelligence (AI) and machine learning (ML) increasingly integrate into society, concerns about the systematic inequalities these technologies may introduce or reinforce have grown significantly. Measurement bias, a key contributor to these disparities, arises when data collection or labeling processes fail to accurately capture the characteristics of different groups. According to Fahse et al, (2021) measurement bias arises in two key phases of the modelling process:
- BU-Phase (Business Understanding):
- Bias occurs through subjective choices during model design, particularly when defining the target variable and selecting features.
- Using imperfect proxies or protected attributes (e.g., race, gender) can lead to discrimination or inaccuracies.
- Even if protected attributes are excluded, their correlation with non-protected attributes (redlining effect) can still introduce bias.
- DP-Phase (Data Preparation):
- Bias can emerge during feature creation, derivation, or transformation, potentially omitting critical factors or introducing noise.
- Inaccurate features or reliance on a limited number of inappropriate features may result in varying prediction accuracy across groups.
Example for Measurement Bias in Machine Learning
Measurement bias can skew model predictions and outcomes, leading to harmful consequences (Tay et al, 2022). For instance, in healthcare, an algorithm designed to allocate resources underestimated the needs of Black patients compared to White patients, leading to inequitable healthcare access. Similarly, in hiring, discrepancies in the evaluation of candidates can perpetuate workplace inequalities. Addressing measurement bias is critical to fostering fairness and equity in AI-driven decision-making. Measurement or Reporting Bias arises from the way we choose, collect, and measure specific features in a dataset.
In a crime prediction application, the feature “number of arrests” is used to predict the likelihood of future criminal activity. However, this feature can reflect biases in data collection rather than actual differences in behavior (Fahse et al., 2021). For example, consider African American and Caucasian defendants who commit the same number of drug sales and thus share a similar true risk. If arrest rates are recorded differently across ethnic groups—such as heavier policing in minority neighborhoods—this can lead to disparities in the data. African American defendants in these neighborhoods are more likely to have higher numbers of drug-related arrests. As a result, even though the true risk is similar, the machine learning application may assign a higher risk score to African American defendants compared to Caucasian defendants. This highlights how biased data can skew predictive outcomes and perpetuate inequalities.
Implications of Measurement Bias in Machine Learning
For example, “creditworthiness” is an abstract construct that is often operationalised with a measurable proxy like a credit score. Proxies become problematic when they are poor reflections or the target construct and/or are generated differently across groups can contribute to bias through (Suresh and Guttag, 2021):
- Oversimplification of Complex Constructs:
Proxies like credit score for “creditworthiness” fail to capture the full complexity of creditworthiness. This oversimplification can ignore group-specific indicators of success or risk for creditworthiness, leading to biased outcomes. - Variability in Measurement Across Groups:
Measurement methods may differ between groups, introducing bias. For instance, stricter monitoring at certain factory locations can inflate error counts (i.e., observed number of errors is being used as a proxy for work quality), creating feedback loops that perpetuate further monitoring for those groups. - Accuracy Disparities Across Groups:
Structural discrimination can lead to systematic inaccuracies, such as racial or gender disparities in medical diagnoses or misclassification in criminal justice risk assessments. For example, proxies like “arrest” or “rearrest” disproportionately misrepresent minority communities due to over-policing, leading to models with higher false positive rates for these groups (Mehrabi, 2021).
Design Approaches to Mitigate Measurement Bias in Machine learning
Our goal is not to prescribe specific statistical methods or tools to address measurement bias, as such technical details are beyond the scope of this guidance. Instead, we aim to highlight key considerations, challenges, and strategies for identifying and mitigating measurement bias in data. By fostering awareness of its implications, we encourage practitioners to adopt context-appropriate solutions informed by their application requirements and stakeholder engagement.
Tackling representation bias requires a systematic, proactive approach. You can get started with these resources:
Free Resources for Measurement Bias Mitigation
Checklist for Measurement Bias from problem definition to model deployment (coming soon)
AI Bias Mitigation Package – £999
The ultimate resource for organisations ready to tackle bias at scale starting from problem definition through to model monitoring to drive responsible AI practices.



Customised AI Bias Mitigation Package – £2499



Sources
Fahse, T., Huber, V. and van Giffen, B., 2021. Managing bias in machine learning projects. In Innovation Through Information Systems: Volume II: A Collection of Latest Research on Technology Issues (pp. 94-109). Springer International Publishing.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. and Galstyan, A., 2021. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6), pp.1-35.
Shahbazi, N., Lin, Y., Asudeh, A. and Jagadish, H.V., 2022. A survey on techniques for identifying and resolving representation bias in data. CoRR, abs/2203.11852.
Suresh, H. and Guttag, J., 2021, October. A framework for understanding sources of harm throughout the machine learning life cycle. In Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1-9).
Tay L, Woo SE, Hickman L, Booth BM, D’Mello S. A Conceptual Framework for Investigating and Mitigating Machine-Learning Measurement Bias (MLMB) in Psychological Assessment. Advances in Methods and Practices in Psychological Science. 2022;5(1).