Simmilarity Based Fairness

March 29, 2026
Algorithmic Fairness Data Visualization


Simmilarity Fairness is a justice principle that states that Simmilar Subjects must be trated simmilarly. How we apply such a principle in Machine Learning Fairness ?, let's suppose subjects are rows in a dataset D, we have a protected attribute A taking only 2 possible values, and we have a predictive model M, a simmilarity measure Sim(x,y), and a threshold e. The last principle will hold, if we split our dataset in two subsets according to our protected attribute A, and for each pair of elemnts x, y that are in different subsets the following statement is true :


IF sim(x,y) <= e THEN M(x) = M(y)


With Kafkanator, you can verify this constraint easily using the simmilarity_fairness_hash method. Use this method to sort your dataset according to a predefined simmilarity measure, you will have the exact rows that are not complying with the "Simmilar Subjects must be treated simmilarly" principle.

If you have a protected attribute having more than 2 values, you can compare all possible combinations of 2.

Check this notebook to see how to use simmilarity_fairness_hash to assess simmilarity fairness in your dataset .