Fairness in machine learning has been predominantly studied through global metrics like Demographic Parity and Equalized Odds. These approaches aim to ensure equity between groups defined by one or more categorical sensitive attributes, such as race or gender. While such metrics have become standard in fairness research, they inherently focus on group-level fairness and may overlook disparities at the individual level. As a result, decisions made by these algorithms can disproportionately penalize or favor individuals based on where the optimization process converges (Gran et al 2023). This trade-off highlights a significant limitation: global fairness does not guarantee fairness at the individual level.
To address this, Counterfactual Fairness has been proposed as a framework for assessing fairness at its most individual sense (Kusner et al., 2017). A decision is considered fair for an individual if it aligns with the decision that would have been made in a counterfactual world where the individual’s sensitive attributes were altered. By focusing on individual-level fairness, counterfactual fairness provides a more rigorous and equitable framework for decision-making in machine learning systems.
Examples Illustrating Counterfactual Fairness
Here are some examples from Kusner et al (2017) on counterfactual fairness:
Red car insurance example: a car insurance company predicts accident rates (Y) based on observed features like car color (X, e.g., red cars). However, a hidden factor, such as aggressive driving (U), causes both a preference for red cars and higher accident rates. Protected group membership (A, like race) correlates with driving red cars but does not directly affect accident rates. Counterfactual fairness highlights that relying on X alone or ignoring A can lead to unfair predictions. Instead, fairness involves basing predictions on U, which is free from bias linked to A.
Crime prediction example: a city predicts neighborhood crime rates (Y) using neighborhood (X) and race (A). Historical factors like segregation and biased policing create a correlation between A, X, and Y. Higher crime rates in certain neighborhoods reflect policing practices rather than actual criminal behavior. Counterfactual fairness ensures predictions adjust for these systemic biases by focusing on latent variables (U), such as socioeconomic conditions and policing practices, rather than directly using X or A.
In both cases, counterfactual fairness ensures that predictions are not unfairly influenced by biases stemming from historical or systemic inequalities.
Challenges and Complications
- Causal Modeling Complexity:
Counterfactual fairness requires building accurate causal models that explicitly represent the relationships between variables, including latent variables. These models rely on assumptions about causation, which may not always be well-supported or verifiable with available data.
Defining causal pathways and identifying latent variables (e.g., socioeconomic factors, cultural influences) can be resource-intensive and require domain expertise.
- Handling Historical Bias:
Historical data often reflects systemic biases, making it challenging to disentangle legitimate factors from biased ones. Designing fair models requires accurately identifying and modeling latent variables that capture unbiased aspects of predictions, which is not always straightforward.
- Fairness Trade-offs:
Counterfactual fairness may conflict with other fairness definitions or metrics, such as demographic parity or equalized odds. Balancing these trade-offs requires careful prioritization of fairness goals, which may vary across contexts and stakeholders.
- Assumptions and Provisional Models:
The fairness guarantees of counterfactual models are contingent on the validity of the underlying causal assumptions. Inaccurate or incomplete assumptions can lead to unintended consequences or reinforce biases.
- Path-Specific Fairness:
Determining which causal paths or descendants of protected attributes can be included while aligning with fairness goals adds complexity. This requires nuanced decision-making about acceptable dependencies on sensitive attributes.
- Scalability and Practicality:
The computational and data requirements for implementing counterfactual fairness can be high, particularly for large-scale systems or in domains with sparse data on protected groups.
Leadership Implications
- Vision and Commitment:
Leaders must articulate a clear vision for fairness and ensure it is a priority within the organization. This includes committing resources to build the necessary expertise and infrastructure for causal modeling and fairness analysis.
- Cross-Disciplinary Collaboration:
Implementing counterfactual fairness demands collaboration among data scientists, domain experts, ethicists, and policymakers. Leaders must foster an inclusive environment where diverse perspectives can guide model design.
- Transparency and Stakeholder Engagement:
Leaders should emphasize transparency in fairness objectives, including communicating the trade-offs made and the assumptions underpinning causal models. Engaging stakeholders in these discussions builds trust and ensures alignment with societal and organizational values.
- Adaptability and Continuous Learning:
As assumptions and data evolve, fairness models must be updated to reflect new insights. Leaders should establish processes for ongoing evaluation and refinement of fairness criteria and model performance.
- Navigating Ethical and Legal Risks:
Counterfactual fairness aligns with emerging legal and regulatory standards, such as the EU AI Act. Leaders must ensure compliance while addressing ethical risks associated with biased decision-making.
- Building Organizational Capacity:
Leaders should invest in training and development to build expertise in causal inference and fairness frameworks. This includes equipping teams with the tools and methodologies to implement counterfactual fairness effectively.
- Driving Accountability:
Accountability mechanisms should be embedded into the organization’s governance processes to ensure adherence to fairness principles and to address potential harms proactively.
Conclusion
Counterfactual fairness is a robust and individual-centered fairness criterion grounded in causal inference. By modeling latent variables and ensuring independence from protected attributes, it provides a principled approach to fair prediction and decision-making in the presence of historical biases and complex causal relationships.
Next Steps
-
If you’re interested in bespoke training or design solutions on AI fairness, feel free to reach out for a consultation.
-
Check out our the following resources and upcoming workshops to equip your teams with the tools and knowledge to implement fair AI systems.
Free Resources for Individual Fairness Design Considerations
Sampling Bias in Machine Learning
Social Bias in Machine Learning
Representation Bias in Machine Learning
Conditional Demographic Parity Guidance – £99
Empower your team to drive Responsible AI by fostering alignment with compliance needs and best practices.
Practical, easy-to-use guidance from problem definition to model monitoring
Checklists for every phase in the AI/ ML pipeline
AI Fairness Mitigation Package – £999
The ultimate resource for organisations ready to tackle bias at scale starting from problem definition through to model monitoring to drive responsible AI practices.



Customised AI Fairness Mitigation Package – £2499



Sources
Grari, V., Lamprier, S. and Detyniecki, M., 2023. Adversarial learning for counterfactual fairness. Machine Learning, 112(3), pp.741-763.
Kusner, M.J., Loftus, J., Russell, C. and Silva, R., 2017. Counterfactual fairness. Advances in neural information processing systems, 30.