A machine learning model can only be as good as the data used to train it. Therefore, if data contained historical human biases, the model predictions are likely to be biased too.
A bias in the Artificial Intelligence field is a phenomena which happens when an algorithm generates inaccurate predictions due to systematic prejudices represented in data. Hence, we must be capable of analyzing our model and improving it.
An article of IBM shows that the principal issues of biased models are:
- Imbalanced data
- Perpetual patterns
- Information correlations
- Data selection and model evaluation
If the data set which has been introduced is unbalanced, i.e., some class is underrepresented. For example, 90% of the images used to train a facial recognition system might be from white people and just a 10% from other ethnicities. Then, the model is very likely to perform good on white people but fail when recognizing, for instance, black people.
Perpetual patterns are patterns that already existed in the dataset. This is one of the most common biases. An example of perpetual patterns can be seen in recruitment models. If a company has hired only males for a certain position for the past 20 years, your model will give priority to males at the time of hiring since men are more suitable for the company according to historical data. Consequently, your model will be gender biased and inefficient.
It is common in practice to remove attributes that may represent perpetual patterns. For instance, gender or race. Nevertheless, sometimes these attributes correlated with other variables that are used for training and can lead to biased models. Some examples might be the potential relation between a postal code and someone’s wealth or race, or the name of a person and their religion.
Data selection and model evaluation
The process of selecting data to train or model is truly important and often can impact its performance. Ensuring that every group considered at inference time is equally represented, edge cases are present, or the distribution of the variables didn’t change with time are some of the heuristics we need to keep in mind when filtering data.
Finally, data used to evaluate our models will also play an important role since poor performance for certain groups may not be identified. Coming back to the example of the facial recognition system, it has been shown that their performance may vary across ethnicities and choosing unbalanced test sets lead to unrealistic accuracy results. You can read Joy Buolamwini’s work for more information here.
How can we find the biases in our models?
The biases of our models can be detected and mitigated by experts. Nevertheless, to minimize possible biases in AI models, companies should select rigorously the data, validate it and be able to analyze those models constantly.
Another option is to use tools that make the bias detection process much simpler thanks to their visual interfaces and automated analysis.
EXPAI offers the capacity to analyze and understand the models that your company is currently using. Thanks to eXplainable AI, our clients are able to recognize and remove biases from their models. To find more information about our solution: Click here!