Explainable AI (XAI) Methods Part 2- Individual Conditional Expectation (ICE) Curves
Tutorial on Individual Conditional Expectation (ICE) Curves, its advantages and disadvantages, how it is different from PDP and how to make…
Review of Previous Post
Explainable Machine Learning (XAI) refers to efforts to make sure that artificial intelligence programs are transparent in their purposes and how they work. [1] This is the second post among the XAI series that I plan to write.
In my previous [post](https://towardsdatascience.com/explainable-ai-xai-methods-part-1-partial-dependence-plot-pdp-349441901a3d), I introduced the concept of Partial Dependence (PD) and the Partial Dependence Plot (PDP), a visualization that uses those PD values to display the marginal average effects of feature(s) on the target variable. I recommend reviewing that post first before proceeding any further!
In this post, I will discuss what an Individual Conditional Expectation (ICE) Curve is, what the advantages and disadvantages are, and how to interpret and use it for model interpretability.
Individual Conditional Expectation (ICE) Curve/Plot
In my previous post, I explained that Partial Dependence (PD) is a global and model-agnostic XAI method. To recap and quote directly from my previous post:
Global methods give a comprehensive explanation on the entire data set, describing the impact of feature(s) on the target variable in the context of the overall data. Local methods, on the other hand, describes the impact of feature(s) on an observation level. Model-agnostic means that the method can be applied to any algorithm or model. [3]
ICE is also a model-agnostic method that can be applied to any model. In fact, it is basically the same concept as PD but is different in that it displays the marginal effect of feature(s) for each instance instead of calculating the average effect in a overall data context as Partial Dependence Plot (PDP) does. Thus, it can understood as the equivalent to a PDP for individual data instances. Visually, an ICE plot displays the dependence of the prediction on a feature for each instance separately, resulting in one line per instance. [2]
Different Types of ICE Plots
Centered ICE Plot
In a typical ICE plot, it is often difficult to make comparisons across different ICE lines (for different data instances) because each instance may start at different prediction values. Centered ICE plot, c-ICE for short, is a variation of the ICE plot that addresses this issue. Christoph Molnar’s Interpretable Machine Learning boo suggests anchoring the curves at the lower end of the feature is a good choice. [2]
As you can see from the two ICE plot examples (typical ICE plot v.s. c-ICE plot) from the Interpretable Machine Learning Book , c-ICE plot fixes the individual ICE lines to 0 at age 14. This makes it easier for the user to make comparisons across different ICE lines as opposed to the first typical ICE plot. We can see that in the c-ICE plot, the predictions for most women remain unchanged until the age of 45 where the predicted probability increases compared to age 14. [2]
Derivative ICE Plot
The name of this variation of the ICE plot makes it self-explanatory. Derivative ICE plot, d-ICE plot for short, is a ICE plot that shows the derivative values of the original ICE plot. This is useful for looking at if there are any interactions as the respective derivatives should be the same for all data instances. Otherwise, it would imply the existence of some interaction. Christoph Molnar’s Interpretable Machine Learning book, however, argues that the d-ICE plot takes a long time to compute and hence rather impratical. [2]
Advantages
Similar to PDPs, ICE curves provide straightforward insights to users. Each line represents individual instances and therefore allows us to observe how the marginal effect of feature(s) change on different values of the feature for each individual instance.
Another major advantage of ICE plots is that it enables us to capture heterogeneous relationships which is impossible to do when looking at just PDPs. A Heterogeneous relationship means that a feature has different directionality of impact on the target variable depending on different intervals of feature values.
For instance, if we are building a model that predicts loan application approval, the Partial Dependence Plot may tell us that the marginal average effect of age is positive on loan application approval (i.e. If you are older, your loan application is more likely to be approved). However, ICE plots may suggest the existence of individuals whose loan application approval probability is low regardless of age (i.e. ICE lines that are flat and the slope is close or equal to 0). Those individuals may be the applicants whose other conditions (i.e. income) are not satisfactory enough for loan approval from the standpoint of the loaner institution. As this example illustrates, "average" effects in PDPs may mask more local effects that may different from individual to individual.
Disadvantages
Remember from my previous post that the major assumption of PDPs was that the feature of interest should not be correlated with the other features. Otherwise, some values in the plot would not make sense or become invalid. ICE plots suffer the exact same problem.
Another issue with ICE plots is that the plot can be difficult to digest if too many ICE lines are in one canvas. A solution to this issue is to omit some ICE lines and draw only some of them. You may use different standards or conditions to determine which ICE lines to keep and which others to drop. For example, in this tutorial [notebook ](https://nbviewer.jupyter.org/github/jphall663/interpretable_machine_learning_with_python/blob/master/xgboost_pdp_ice.ipynb)from the H2O.ai team, it graphs ICE lines for every decile of the feature values which prevents the cluttering of lines. I highly recommend checking out that notebook as it also guides you on how to implement PDP and ICE plots from scratch without using any open source libraries. This will help you better grasp the theory behind PDP and ICE plots and how they actually work!
Implementation
There are multiple packages and libraries that we can use to draw ICE plots. If you are using R, there are packages including iml, ICEbox, pdp, condvis. For Python, PartialDependenceDisplay function in the sklearn.inspection module, the PyCEBox package and H2O package’s ice_plot function are available.
Let’s take a look at an example in Sklearn’s documentation. [4]
You first read in all the necessary libraries and packages.
import numpy as np
import pandas as pd
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.inspection import PartialDependenceDisplay
Next, we read in the data available as part of the Sklearn datasets offered.
# Read in data as part of the Sklearn datasets offered
X, y = make_hastie_10_2(random_state=0) # set a seed with random_state
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0).fit(X, y)
features = [0, 1]
PartialDependenceDisplay.from_estimator(clf, X, features, kind='individual')
Similar to PDPs, you can only plot the ICE curves after a model has been trained.
Also, pay attention to the parameter "kind". You will remember that PartialDependenceDisplay module is the same module we used to calculate Partial Dependence (PD) in my previous post. The only difference here is that new parameter "kind" that we did not specify when calculating PD. We can also specify that parameter to be equal to ‘both’ which will let us graph both the PDP and ICE curve in one canvas at the same time. This will be meaningful as we would be able to look at both the marginal average effect and marginal individual effects at the same time!
PartialDependenceDisplay.from_estimator(clf, X, features, kind='both')
Some other packages you can use to plot ICE curves in Python are PyCEBox and the well-known AutoML and Big Data package, H2O(made by the H2O.ai team). [6, 7]
Thanks for reading my post! The next XAI method that will be covered is the ALE plot. Stay tuned!
Please follow me on various pages if you are interested! All the relevant links are compiled together in this page!
References
[1] Explainable Artificial Intelligence (XAI) (2019), Technopedia
[2] C. Molnar, Interpretable Machine Learning (2020)
[3] S. Kim, Explainable AI (XAI) Methods Part 1 – Partial Dependence Plot (PDP) (2021), Towards Data Science
[4] Partial Dependence, Sklearn Documentation
[5] P. Hall, H2O.ai Team, Interpretable_machine_learning_with_python > xgboost_pdp_ice.ipynb, Jupyter Notebook
[6] PyCEBox Tutorial, PyCEBox
[7] H2O Machine Learning Explanability, H2O.ai
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS