How to Explain Your ML Models?
- by 7wData
Explainability in machine learning (ML) and artificial intelligence (AI) is becoming increasingly important. With the increased demand for explanations and the number of new...
Explainability in machine learning (ML) and artificial intelligence (AI) is becoming increasingly important. With the increased demand for explanations and the number of new approaches out there, it could be difficult to know where to start. In this post, we will get hands-on experience in explaining an ML model using a couple of approaches. By the end of it, you will have a good grasp of some fundamental ways in which you can explain the decisions or behavior of most machine learning models.Â
When it comes to explaining the models and/or their decisions, multiple approaches exist. One may want to explain theglobal(overall) model behavior or provide alocalexplanation (i.e. explain the decision of the model about each instance in the data). Some approaches are applied before the building of the model, others after the training (post-hoc). Some approaches explain the data, others the model. Some are purely visual, others not.Â
What you will need?Â
To follow this tutorial, you will need Python 3 and some ML knowledge as I will not explain how the model I will train works. My advice is to create and work in a virtual environment because you will need to install a few packages and you may not want them to disrupt your localcondaorpipsetting.Â
We will use the diabetes data set from sklearn and train a standard random forest regressor. Refer to the sklearn documentation to know more about the data (https://scikit-learn.org/stable/datasets/index.html)Â
We import the diabetes data set, assign the target variable to a vector of dependent variablesy, the rest to a matrix of featuresX,and train a standard random forest model.Â
In this tutorial, I am skipping some of the typical data science steps, such as cleaning, exploring the data, and performing the conventional train/test splitting but feel free to perform those on your own.Â
We can also easily calculate and print out the feature importances after the random forest model. We see that the most important is the ‘s5’, one of the factors measuring the blood serum, followed by ‘bmi’ and ‘bp’.Â
The first approach I will apply is called Individual conditional expectation (ICE) plots. They are very intuitive and show you how the prediction changes as you vary the feature values. They are similar to partial dependency plots but ICE plots go one step further and reveal heterogenous effects since ICE plots display one line per instance. The below code allows you to display an ICE plot for the feature ‘bmi’ after we have trained the random forest.Â
Figure 1.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More