Kafkanator
gini(x)
Computes the gini index from an ascending order gains array.
Examples:
>>> gini(np.array([1,1,2,2,3,3,3]))
0.20952380952380953
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
numpy array
|
Gains sorted in ascending order, for example [1,1,2,2,3,3,3] means a population of 7 people, the first one gain is 1, the third one 2, and so on. |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
The Gini index for this array |
Source code in kafkanator/kafkanator.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
index_on_dataframe_column(df, column, index_function, **kwargs)
This method computes an inequality index over a pandas dataframe column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
pandas DataFrame
|
The dataframe. |
required |
column
|
str
|
The column on which we will apply the inequality. |
required |
index_function
|
callable
|
the index we want to apply, is a kafkanator function such as gini(..), robin_hood(..), theil_index_L(..) or theil_index_T(..). |
required |
kwargs
|
dict
|
other parameters the index_function could use, for example if index_function=theil_index_T, we can set the base (e,10) on this dict. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
the inequality index result you choose applied on the column parameter. |
Source code in kafkanator/kafkanator.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
index_per_cluster(df, group_by_column, income_column, index='gini', **kwargs)
Make clusters over a data frame and apply an inequality index on each of them .
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
pandas Dataframe
|
a data frame where you have data about gains to be grouped according to a column. |
required |
group_by_column
|
str
|
the column you will perform your group by on. |
required |
income_column
|
str
|
column where you have the gains/incomes. For the moment the column must have numeric integer values, not proportions. |
required |
index
|
str
|
the type of inequality index you will use , you have gini, theil-t , theil-l, and robin hood. |
'gini'
|
kwargs
|
dict
|
optional, used in case you use theil - T, you can put here auxiliar parameter such as entropy base. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
list |
an array of tuples, each tuple is a value of the group_by_column, followed by the intra cluster resulting inequality index of your choice. |
Source code in kafkanator/kafkanator.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
lorentz_curve(population, income, gini_index=False)
This function computes the lorentz curve coordinates from a population and income array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
population
|
list
|
contains in position i a number representing the amount of people earning income[i]. |
required |
income
|
list
|
contains in position i a number representing the earning of the people in population[i]. |
required |
gini_index
|
boolean
|
True if you want the gini computed on the position 3 of returning tuple. False otherwise. |
False
|
Returns:
Name | Type | Description |
---|---|---|
tuple |
2-tuple with 2 list of x,y coordinates to be plotted using the visual framework of your choice. If you set gini_index flag to true, it wil be a 3-tuple, in the third position you find the gini coefficient. |
Source code in kafkanator/kafkanator.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
robin_hood(income_array)
Computes robin hood index. This is the percentage of income that must be redistributed in population in order to be egalitarian.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
income_array
|
list
|
Represents population gains, i.e [5,3,5,6,9] means that one person has 5 gains, the next one three and so on. Total gain will be sum(x), total population will be len(x) |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
A number between 0 and 1, is the percentage of the income that must be redistributed. A number close to 1 means high concentration of wealth in few hands a number close to 0 means a distribution of wealth close to egalitarian state. |
Source code in kafkanator/kafkanator.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
theil_index_L(income_array)
Computes the Theil L index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
income_array
|
list
|
array of incomes, the order is not important. ie [100,300,1000,500] |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
the theil L index. |
Source code in kafkanator/kafkanator.py
57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
theil_index_T(income_array, array_type='props', base_entropy=np.e)
Computes the Theil T index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
income_array
|
list
|
array of incomes. |
required |
array_type
|
str
|
if 'props' this means all numbers in income_array are between 0 and 1, and all must sum up to 1. if 'gains' this means income_array are integers representing gains. |
'props'
|
base_entropy
|
float
|
the base to compute the entropy, remember that entropy is a family of functions with diferent bases, e constant by default. |
e
|
Returns:
Name | Type | Description |
---|---|---|
float |
the theil T index |
Source code in kafkanator/kafkanator.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|