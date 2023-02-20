Impacting the unprivileged groups

In a training data set on claims of car insurance, red cars may have caused more accidents than cars of another color. The ML algorithm detects this correlation, but there is no scientific proof of causation between the color of a car and the risk of accidents.





If the algorithm is not designed to notice and eliminate this kind of unwanted correlation, it may be biased and result in poor predictions on new data.





There is a second, even more severe problem when the predictions impact people and the algorithm is biased to favor privileged groups over unprivileged groups, resulting in discrimination. This kind of discrimination can happen without explicitly providing sensitive personal data, as other attributes can implicitly reveal this information serving as a proxy.





For example, a car model can hint at the owner's gender, or the zip code may correlate with a resident's ethnicity or religion.



