Using GALE to compare sets of ML model explanations
Exploring a novel approach to machine learning explainability using topological data analysis
August 15, 2022 5 min read
Imagine you’re dropped into an unknown area and are handed a stack of unlabeled maps. A visual inspection of the terrain — identifying features like steep inclines, open valleys and bodies of water — can give you confidence in selecting the correct map to navigate your way. Now imagine you’re using a similar set of maps to try to explain a complex machine learning model — where feature importance stands in for elevation. In this case, we have to move from topography to the mathematical field of topology, the study of the properties of geometric shapes in mathematics, which happens to be a novel approach we can take to advance explainability in machine learning.
My Applied ML research team at Capital One, in collaboration with partners including researchers at NYU, recently published new findings that propose a method using Topological Data Analysis to explore the space of explanations and provide a stable, robust view. The research, called GALE: Globally Assessing Local Explanations, essentially offers a way to compare sets of model explanations and determine their similarities.
Using ablation studies framework for topological data analysis
To revisit the map analogy, if you’re following a topographic map, you would expect the contours of the landscape on the map to reflect what you’re experiencing in real life: Here I am at that ridge, over there is the base of the mountain. The challenge in explaining machine learning decisions is that we’re working with complex, cumbersome, multi-dimensional maps called manifolds. But what if we turned those manifolds into approximate graphical (graph network) representations? We could then compare whether or not the manifolds are similar.
This becomes important because machine learning model explanations don’t have a singular ground truth; there’s no training, no loss minimization for explanations. Instead, they’re built off of assumptions and axioms that we assume they’ll uphold. The “correctness” of the explanations in practice can vary based on the method’s translation from theory into code. They can also be sensitive to choices of hyperparameters, but with no ground truth, there is no notion of optimizing those hyperparameters. Our work offers a novel approach to compare multiple sets of model explanations and to determine whether they agree or disagree; the consensus among differing methods’ explanations provides a mechanism to build trust.
This research was recently presented at the TAG in Machine Learning Workshop at ICML and is closely related to our work on ablation studies (a common way ML practitioners test the feature importance), which could be a useful reference to gauge GALE’s performance. Our work on ablation studies for explainability, which will be presented at KDD’s Machine Learning in Finance Workshop, provides a framework to assess the faithfulness of a set of explanations to their model.
Leveraging machine learning models to inform topological data research
As machine learning advances, so do the breadth and depth of applications for more high-stakes, automated decision systems. Whether mitigating financial risk, making important medical diagnoses or teaching vehicles how to drive themselves, there’s becoming a broad, cross-industry need for faithful explanations to machine learning models. So it’s important to understand what these types of consequential machine learning models look at and how they make decisions.
Conclusion: Using topological data analysis for machine learning explainability
While not yet heavily explored within machine learning, the study of topology is an emerging approach to address this multifaceted challenge, and it has considerable potential. By leveraging topological data analysis at every step in the machine learning model pipeline, researchers may discover a level of tooling and introspection that has otherwise not been possible. That’s an exciting possibility — and could be one of the north stars on the map toward better models and greater explainability.