Understanding why a machine-learning model makes certain decisions can be just as important as proving that they are correct. A machine-learning model may correctly predict that a skin lesions is cancerous. However, it could have made the same prediction using a different blip from a clinical image.
Although tools can be used to make sense of the reasoning behind a model, they often only give insights on one decision and must each be manually assessed. Modelle are often trained with millions of data inputs. This makes it nearly impossible for humans to assess enough decisions to find patterns.
Researchers at MIT Research and IBM Research have now created a method that allows users to quickly analyze the machine-learning model’s behavior by aggregating, sorting, and ranking these explanations. The Shared Interest technique uses quantifiable metrics to compare how well a model’s reasoning matches that a human.
The sharing of interest could be used to quickly identify trends in a model’s decision-making. For example, a model might become confused by distracting or irrelevant features like background objects in photos. These insights can be combined to help the user quickly determine if a model is reliable and ready for deployment in a real-world setting.
“In developing Shared Interest our goal is to be able scale up this analysis so that you could understand more globally what your model’s behaviour is,” said Angie Boggust (lead author), a graduate student at the Visualization Group in the Computer Science and Artificial Intelligence Laboratory.
Boggust co-authored the paper with Arvind Satyanarayan (assistant professor of computer science, who leads the Visualization Group), Benjamin Hoover, and Hendrik Strobelt (both of IBM Research). The paper will be presented during the Conference on Human Factors in Computing Systems.
Boggust started working on the project while he was a summer intern at IBM. He was mentored by Strobelt. Boggust and Satyanarayan continued to work on the project after returning to MIT. They also collaborated with Strobelt, Hoover, and helped implement the case studies, which show how the technique can be applied in practice.
Alignment of AI and humans
Shared Interest uses popular methods that demonstrate how a machine learning model made a particular decision. These are known as saliency techniques. Saliency methods are used to highlight important areas in an image when a model classifies images. These areas can be visualized as a heatmap (or a saliency map) that is often overlayed on the original image. If the model classifies the image as a dog and highlights the head, it means that those pixels were crucial to the model’s decision to make the image a dog.
The Shared Interest method compares saliency methods with ground-truth data. Ground-truth data is typically human-generated annotations that surround each image. The box would cover the entire photo of the dog in the above example. To evaluate an image classification model, Shared interest compares the model’s saliency data with the ground-truth data from the same image. This allows us to determine how well they match.
This technique uses multiple metrics to determine the alignment or misalignment and then sorts a decision into one of eight different categories. These categories range from perfectly human-aligned (the model correctly predicts and the saliency map highlights the same area as the human-generated boxes) to completely distracted (the algorithm makes an inaccurate prediction and doesn’t use any image features in the human generated box).
“At one end of this spectrum, your human model made the same decision as a human model. At the other end, the human model and your model are making completely different decisions. Boggust explains that you can quantify that for all images in your dataset and use that quantification when sorting through them.
This technique is similar to text-based data where key words are highlighted and not image regions.

