A Practical Guide to Causal Dataset Repairs in Machine Learning

Causality studies the relationships between certain events or processes in terms of cause and effect. More recently, this area has been used in machine learning to identify the causal relationships between the different features and outcomes within a dataset. This is usually represented by a casual graph, which depicts the various features/attributes and the outcome variable as nodes, and the relationships as arrows.

A causal graph can be used to highlight certain paths that represent bias within the dataset. These are paths starting from the sensitive attribute node, such as gender or race (demographics more likely to experience discrimination), and ending at the outcome node. They can go directly from the sensitive attribute to the outcome variable (direct discrimination), or also pass through ‘unjustified’ attributes (indirect discrimination). Certain dataset repair methods can be applied, which aim to eliminate the causal paths linked to discrimination, and therefore remove the bias associated with the sensitive group. These types of repair techniques can be highly effective debiasing tools, however, there exist some known challenges, such as being difficult to apply in practice.

IMI MIRA Laura Hattam explores these issues further, firstly, by looking at various approaches for building causal graphs. Then, some causal debiasing methods from the literature are presented, with one implemented. The overall objective of this article is to provide a practical guide for practitioners that have an interest in causal dataset repairs.

Read the full article here.

This work is part of an Innovate UK project with Dr Julian Padget and the company Etiq.