Statistical Parity

May 18, 2025

Algorithmic Fairness Data Visualization

Statistical parity is a fairness measure to assess how a given predictor behaves when splitted in groups according to sensitive attributes.

In Kafkanator, you have the statistical_parity method, in the package fairness. Let's use it on the well known COMPASS benchmark: You can find here the COMPASS dataset. That assigns a probability (Low,Medium,High) for a defendant to commit a crime again, from a series of attributes. There was a controverse long time ago because it was found it was racially biased.

Let's visualize quickly and easily biases in COMPAS predictions by using Kafkanator: We will do it in two times, first build a synthesis table with fit_data method. COMPASS prediction is in ScoreText column and EthnicCodeText is the sensitive attribute.


import pandas as pd
from kafkanator.fairness import fit_data

df = pd.read_csv('/home/jsaray/Documentos/datascience/archive/compas-scores-raw.csv')
bp_d = fit_data(df,'Ethnic_Code_Text','ScoreText')

Lets inspect bp_d variable, you should see something like this :

Sensitive Attribute	Prediction	Number
Hispanic	Low	6963
Hispanic	High	473
Hispanic	Medium	1297
Other	Low	2169
Other	High	89

Once we have the summary table ( variable bp_d ) we can use our preferred python visualization toolkit to build our statistical parity visualization. The classical way to visualize this is by grouped bar plots. Seaborn provide a very nice, easy and confortable way of visualization :

                                        
sns.catplot(x="prediction", y="number",hue="prediction", col="attr",data=bp_d, kind="bar",height=4, aspect=.7)

You should see something like this :

As concluded by researchers, we see that COMPASS predictions are biased w.r.t race (african americans are most prone to be tagged as high risk of recommiting a crime).