Leveraging Examples for Clarity

A flowchart depicting the progression from the question of how examples can illuminate AI decision-making to three example-based methods: Adversarial Examples for security and reliability, Prototypes & Criticisms for data coverage and bias detection, and Influential Instances for model traceability and bias detection. These methods correspond to potential impacts on model robustness, understanding of vulnerabilities and sensitivity, as well as insights into data coverage and the influence of training datasets on the model. — Diagram created using Mermaid.js code written by Areal Tal, 2023.

How can we assess model reliability, discern its strengths & limitations, and identify specific areas for
targeted data and feature adjustments?

What

Example-based explanations utilize specific instances or what-if scenarios to illustrate the model’s decision-making process (Molnar, 2023).

How

Molnar describes several types of example-based explanations within Interpretable Machine Learning.

Adversarial Examples

These can be crafted by utilizing counterfactuals to learn how to introduce minor modifications to the input data that lead to the model making errors. Adversarial examples expose model vulnerabilities and security vulnerabilities to be patched in order to enhance resilience.

Prototypes & Criticisms

Prototypes, actual examples drawn from the center of dense clusters of observations, illuminate which regions of the feature space the model has good coverage over.

Criticisms are observations that are not near any prototypes, indicating potential areas of poor model performance due to inadequate data representation.

Influential Instances

Some data points play a larger role in shaping the model. Identifying these instances helps us see what data is steering the model’s learning. Calculating the influence of each instance can be incredibly expensive, depending on which technique is used (e.g. if using deletion diagnostics, which trains a new model for a dataset minus the observation of interest). Influence functions can be used to approximate the influence of instances, but it can only be used for certain types of models.

Image created by Ari Tal using an iPython Notebook, 2024

When

Each type of example-based explanation caters to specific needs:

Adversarial examples: These can be used for security assessments, testing the model’s reliability under varied conditions (Molnar, 2023).
Prototypes and criticisms: Assess data coverage prior to model training to detect potential data-driven biases.
Influential instances: These can be used to trace what training data shaped the model's learning and thus its behavior. This can potentially help with debugging data-related causes for known model biases and other model issues.

Potential Impacts

Example-based explanations offer a tangible glimpse into the model's decision-making, providing an intuitive understanding of the model's behavior. They are invaluable for identifying biases, validating model robustness, and ensuring AI decisions are understandable and justifiable. That said, the explanations serve slightly different purpose:

Adversarial Examples: Identify points of model failure and thus provide clarity on how to make the model more robust and reliable (Molnar, 2023).
Prototypes, criticisms, and influential instances: Assess the model's strengths, limitations, and biases with respect to areas of the feature space (Molnar, 2023). As such they direct focus to areas needing data augmentation or feature refinement to bolster model performance and equity.