Statistical Parity

Statistical parity is a fairness measure to assess how a given predictor behaves when splitted in groups according to sensitive attributes.
In Kafkanator, you have the statistical_parity method, in the package fairness. Let's use it on the well known COMPASS benchmark: You can find here the COMPASS dataset. That assigns a probability (Low,Medium,High) for a defendant to commit a crime again, from a series of attributes. There was a controverse long time ago because it was found it was racially biased.
Let's visualize quickly and easily biases in COMPAS predictions by using Kafkanator: We will do it in two times, first build a synthesis table with fit_data method. COMPASS prediction is in ScoreText column and EthnicCodeText is the sensitive attribute.
import pandas as pd
from kafkanator.fairness import fit_data
df = pd.read_csv('/home/jsaray/Documentos/datascience/archive/compas-scores-raw.csv')
bp_d = fit_data(df,'Ethnic_Code_Text','ScoreText')
Lets inspect bp_d variable, you should see something like this :
Sensitive Attribute | Prediction | Number |
---|---|---|
Hispanic | Low | 6963 |
Hispanic | High | 473 |
Hispanic | Medium | 1297 |
Other | Low | 2169 |
Other | High | 89 |
Once we have the summary table ( variable bp_d ) we can use our preferred python visualization toolkit to build our statistical parity visualization. The classical way to visualize this is by grouped bar plots. Seaborn provide a very nice, easy and confortable way of visualization :
sns.catplot(x="prediction", y="number",hue="prediction", col="attr",data=bp_d, kind="bar",height=4, aspect=.7)
You should see something like this :

As concluded by researchers, we see that COMPASS predictions are biased w.r.t race (african americans are most prone to be tagged as high risk of recommiting a crime).