Human-centred Evaluation for Fairness in Machine Learning

As leaders and managers, you are in a unique position to drive the change needed to ensure fairness in machine learning. By adopting human-centred evaluation practices, engaging diverse stakeholders, and ensuring transparency, you can create AI systems that are not only effective but also ethical and just.

In this post, I’ll share insights into the need for human-centred evaluations, the challenges they present, and practical recommendations for how organisations can integrate fairness into their machine learning systems. With years of experience in the AI and responsible innovation space, I’ve witnessed both the progress and pitfalls of AI adoption and fairness efforts. Let’s dive in.

 

Why Human-Centred Evaluations Matter

Human-centred evaluations are essential in ensuring that AI systems are not only efficient but also align with human values, societal norms, and ethical considerations. When we talk about fairness in machine learning, we’re not just referring to mathematical fairness metrics like demographic parity or equalised odds. Fairness, in this context, involves understanding how decisions made by algorithms impact real people, and ensuring that these impacts do not disproportionately disadvantage certain groups based on race, gender, socio-economic background, or other factors.

Think about the use of AI in loan approval systems. A machine learning model might show 95% accuracy in predicting creditworthiness based on past data. On paper, that’s a success. However, if the model has been trained on biased historical data, it could disproportionately deny loans to certain minority groups, even if the overall prediction accuracy is high. This is where human-centred evaluation comes in: it ensures that these algorithms don’t just perform well statistically, but that they work in a way that is fair and just for all.

 

Challenges in Achieving Fairness

Despite the clear need for fairness in AI, achieving it is fraught with challenges:

  1. Balancing Technical Performance and Fairness
    One of the most common issues is the tension between optimising for technical performance (such as accuracy) and ensuring fairness. Many machine learning models are evaluated purely on their ability to make accurate predictions, often overlooking the underlying fairness of those predictions. For instance, a facial recognition system might achieve high accuracy but have significant performance disparities across different demographic groups. Striking a balance between these competing goals requires careful consideration and, often, trade-offs.

  2. Bias in Data
    Data is the foundation on which ML models are built. Unfortunately, historical data is often riddled with biases. For example, if a hiring algorithm is trained on historical hiring data that favours one gender over another, the algorithm might perpetuate this bias. Identifying and mitigating these biases at the data level is a critical aspect of human-centred evaluation. But doing so requires not just technical expertise, but also a deep understanding of the societal context and potential consequences of these biases.

  3. Diverse Stakeholder Expectations
    Fairness is a subjective concept, and what one group considers fair might not be seen as fair by another. For instance, some stakeholders may define fairness in terms of equal representation, while others might focus on equal treatment. These differing perspectives can create challenges when trying to design and implement fair ML systems. Engaging a diverse group of stakeholders early in the process is essential to navigate these differing expectations.

  4. Evaluation Complexity
    Traditional performance metrics—such as accuracy, precision, and recall—often don’t capture the full picture of fairness. Human-centred evaluations require a broader set of metrics that encompass not just statistical fairness but also human perceptions of fairness. This includes qualitative feedback, user experiences, and the real-world impacts of algorithmic decisions. Developing these metrics can be challenging, as they require combining both quantitative data and human insights.

 

Best Practices for Implementing Human-Centred Evaluations

Given these challenges, how can organisations ensure that their machine learning models are both fair and effective? Here are some best practices and actionable strategies:

  1. Involve Stakeholders from the Start Fairness is a multi-faceted concept, and different stakeholders will have different perspectives on what constitutes fairness. Engaging a diverse group of stakeholders—such as end-users, domain experts, and individuals from underrepresented groups—during the design and evaluation phases is crucial. By considering the needs, concerns, and feedback from all relevant parties, organisations can ensure that their AI systems reflect a broad range of fairness considerations.

  2. Adopt a Multi-Dimensional Approach to Fairness Evaluation It’s not enough to rely on a single fairness metric. A truly human-centred evaluation approach requires a combination of quantitative and qualitative assessments. While traditional metrics like demographic parity and equalised odds are useful, organisations should also gather qualitative feedback through user studies, focus groups, and interviews. This helps to understand how different groups perceive fairness and whether the system meets their expectations. For example, if an AI-driven health diagnosis tool is being used across different communities, it’s important to assess not just its accuracy, but also whether patients trust the system and feel it meets their needs.

  3. Regularly Monitor and Audit Systems for Fairness Fairness should not be a one-time check but an ongoing process. As AI systems are deployed and interact with real-world data, new biases or fairness issues may emerge. For example, societal changes or shifts in the demographic composition of a region could introduce new fairness challenges. Regular monitoring and auditing of AI systems are essential to ensure that they continue to meet fairness standards over time.

  4. Transparency and Explainability Transparency is key to building trust in AI systems. Users are more likely to accept a system that they understand, even if the system isn’t perfect, than one that operates as a “black box.” Ensuring that machine learning models are interpretable and that decisions can be explained in human terms is critical for fairness. For instance, in automated hiring systems, providing clear explanations for why a candidate was selected or rejected helps users understand the fairness of the system, building trust and reducing perceived bias.

  5. Address Disparate Impacts It’s essential to evaluate how different demographic groups are impacted by the algorithm’s decisions. Even if a model is statistically fair, it may still have a disparate impact on certain groups. For example, a recruitment system might be “fair” in terms of gender distribution in job recommendations but may still unintentionally disadvantage older candidates. Disaggregated performance analysis—breaking down the results by demographic group—can help identify and address these issues.

 

Summary

As leaders, managers, and key stakeholders, it’s imperative to recognise the importance of human-centred evaluations for fairness in ML. This approach places human values, societal impacts, and ethical considerations at the heart of AI development, ensuring that the systems we build do not inadvertently perpetuate harm or inequality.

 

Next Steps

  • If you’re interested in bespoke training or design solutions on AI fairness, feel free to reach out for a consultation.

  • Check out our the following resources and upcoming workshops to equip your teams with the tools and knowledge to implement fair AI systems.

 

Free Resources for Individual Fairness Design Considerations

Data Bias

Sampling Bias in Machine Learning

Social Bias in Machine Learning

Representation Bias in Machine Learning

 

Human-Centred Evaluations – £99

Empower your team to drive Responsible AI by fostering alignment with compliance needs and best practices.

dribbble, logo, media, social Practical, easy-to-use guidance from problem definition to model monitoring
dribbble, logo, media, social Checklists for every phase in the AI/ ML pipeline

Get Human-centred Evaluation – (Delivery within 2-3 days)
 
 
AI Fairness Mitigation Package – £999

The ultimate resource for organisations ready to tackle bias at scale starting from problem definition through to model monitoring to drive responsible AI practices.

dribbble, logo, media, social Mitigate and resolve 15 Types of Fairness specific to your project with detailed guidance from problem definition to model monitoring.
dribbble, logo, media, social Packed with practical methods, research-based strategies, and critical questions to guide your team.
dribbble, logo, media, social Comprehensive checklists for every phase in the AI/ ML pipeline
Get Fairness Mitigation Package– (Delivery within 2-3 days)
 
Customised AI Fairness Mitigation Package – £2499
We’ll customise the design cards and checklists to meet your specific use case and compliance requirements—ensuring the toolkit aligns perfectly with your goals and industry standards.
dribbble, logo, media, social Mitigate and resolve 15 Types of Fairness specific to your project with detailed guidance from problem definition to model monitoring.
dribbble, logo, media, social Packed with practical methods, research-based strategies, and critical questions specific to your use case.
dribbble, logo, media, social Customised checklists for every phase in the AI/ ML pipeline

 

Sources

Mathewson, K.W. and Pilarski, P.M., 2022. A brief guide to designing and evaluating human-centered interactive machine learning. arXiv preprint arXiv:2204.09622.

Sperrle, F., El‐Assady, M., Guo, G., Borgo, R., Chau, D.H., Endert, A. and Keim, D., 2021, June. A survey of human‐centered evaluations in human‐centered machine learning. In Computer Graphics Forum (Vol. 40, No. 3, pp. 543-568).

Zhang, W., Chen, M., Wang, X., Zhao, Y., Liu, J., Pahune, S., Li, T. and Sun, L., Human-Centric Machine Learning: Addressing Bias and Fairness in AI Systems.

Share:

Related Courses & Al Consulting

Designing Safe, Secure and Trustworthy Al

Workshop for meeting EU AI ACT Compliance for Al

Contact us to discuss your requirements

Related Guidelines

Fairness in cross-validation isn’t just a technical detail—it’s a critical component of responsible AI development, ensuring that your models serve

Understanding Sampling Bias Sampling bias is like representation bias, which arises from the non-random sampling of subgroups. Because of sampling

As leaders and managers, you are in a unique position to drive the change needed to ensure fairness in machine

One concept that has gained significant attention in recent years is “Equalized Odds.” As AI and ML models continue to

No data was found

To download the guide, fill it out.