Global Techniques for Understanding Model Behavior

A flowchart presenting questions linked to 'Global Techniques for Understanding Model Behavior' in the context of AI model explanations. It illustrates two paths: one for confirming the model's reliability through basic validation and another for explaining the model to stakeholders. These paths lead to 'Surrogate Models' for more detailed insights or 'Feature Importance' for basic insights. The flowchart culminates in potential impacts, highlighting 'Insight into how features affect the target,' 'Stakeholder Communication,' 'Recognize Relevance of Features,' and 'Model Transparency.' This visual complements the text, which discusses the importance of understanding and explaining model behavior to ensure reliability and clarity for all stakeholders.
      Diagram created using Mermaid.js code written by Areal Tal, 2023.


Can the model’s decision-making process be clearly explained to various stakeholders?


Understanding a model globally doesn't mean being an expert in its behavior across every possible scenario and data point. Rather, it entails gaining a grasp of the model's overall function and patterns.


Global Feature Importance

Molnar (2023) discusses several feature importance methods (not all of which will be covered here). All of these methods provide a relative magnitude of the influence of each feature without providing the direction of the feature's effect. Each method has a nuanced interpretation. 

SHAP-based feature importance (Molnar, 2023)
This method provides the average (approximated) magnitude of the contribution of each feature for a set of predictions. This is the process:

                                         Image created by Ari Tal using an iPython Notebook, 2024

    1. Estimate feature contributions: The SHAP (SHapley Additive exPlanations) framework is used to estimate the contribution of each feature to the prediction or prediction probability. This is repeated for each prediction.
    2. Determine magnitude of contributions: The absolute value of each such contribution is taken in order to determine the magnitude of the impact.
    3. Mean of magnitudes: For a given feature, its feature importance is the mean of the contribution magnitudes.

    Note: This formula, adapted from Molnar's work on interpretable machine learning, has been simplified for a wider audience. It has been restructured to replace some of the symbols with their English equivalents, maintaining its original integrity while making it more accessible.

    Permutation feature importance
    Rather than focusing on the contribution, this method takes a different approach. For a single observation, the actual value of the feature is substituted with the value of that same feature from another observation. Assumably, the prediction will change such that the prediction error will typically increase. The extent of this increase is a proxy for the relevance of the feature and thus is what permutation feature importance measures (Molnar, 2023).

    Feature importance for tree-based models
    This is calculated for a feature by looking at the specific nodes of a decision tree where a feature is used to perform a split. The measurement of how effective that feature is in dividing the observations into the child nodes (measured by change in variance, gini index, or entropy) is the feature importance. Advantages of using this type of feature importance is that it is significantly more efficient (with respect to time taken to execute) than other methods and is also a completely objective measure rather than an approximation or any reliance on random choice. The disadvantage is that it can only be used with tree-based models (Molnar, 2023).

    For non-technical stakeholders, a ranked list or a detailed visual of the relative importance of the most impactful features can each provide a quick and intuitive understanding. It simplifies the model's decision-driving elements into an easily digestible format.

    Surrogate Models

    For those seeking more detailed insights, surrogate models come into play. These are simpler, inherently interpretable models (see section Building Inherently Interpretable Models from the Start) that approximate the behavior of the more complex original (Molnar, 2023). Unlike the rough indication of a feature's relevance provided by feature importance, surrogate models provide more detail into the role each feature plays in predictions, and thus a simple picture of how the features affect the target.


    Incorporating global techniques for explaining model behavior throughout the iterative process of model development enables meaningful conversations about the model's behavior and impact among both technical and non-technical stakeholders (Singla, 2023). Due to their ability to enable model transparency, these methods are crucial whenever the implications of accountability, fairness, and trust are relevant.

    Global techniques can be adapted according to the needs and technical acumen of the stakeholders involved. The simplest and most basic understanding of model behavior can be obtained through a ranking of feature importance, often suitable for stakeholders with minimal technical knowledge or when a general overview is sufficient. As we climb the ladder of complexity, a visual of the magnitudes of feature importance of the top features offers a bit more detail, revealing the extent to which each feature influences the model’s predictions.

    For stakeholders with a foundational technical background, such as subject matter experts or data analysts, surrogate models can provide a more detailed understanding than feature importance (such as direction of feature's influence) without delving into the complexities of the original model.

    Potential Impacts

    Global techniques unveil the model's broader patterns and tendencies, aiding in transparency and predictability. This clarity helps in managing and refining the model (e.g. removing unneeded features) and facilitates informed communication between developers, end-users, and other stakeholders. It fosters trust in several ways, such as by:

    • Highlighting issues to fix: By pointing out potential obvious faults in the model early on, you can make corrections that improve the robustness of the model before there are harmful/expensive downstream effects. You might discover inappropriate or irrelevant features being utilized by the model (e.g. demographic information or unique identifiers), overreliance on specific features, or unexpected feature effects (though such unexpected effects may not mean that a change to the model is needed).
    • Facilitating communication with stakeholders: Embedding global explanatory techniques into iterative model development processes can facilitate enriched dialogues about the model’s behavior. You might find, for example, that the rationale of the model is incongruent with a subject matter expert's understanding of how the model should work. When such incongruities are discovered, it can bring about learning opportunities for those involved and potentially lead to model refinement.
    • Providing model transparency: Transparency of the rationale of the model is a necessary precondition for accountability, which in turn can lead to fairness and trust.

    A Consideration About Trust

    It's important to recognize that simply providing explanations doesn't always guarantee they're accurate or valid. Sometimes, explanations might create a sense of trust even if they are biased or not entirely correct (Alikhademi et al., 2021; Xu, 2021). Aiming for trustworthiness in AI systems can be better than trying to build trust (IBM Research, n.d.).

    Transition to Next Section

    With the base understanding from feature importance and the more detailed insights from surrogate models, there's an opportunity to delve even deeper. For those with a strong technical foundation and an appetite for the intricate dynamics of model behavior, we venture into the realm of analyzing feature effects. Here, we'll explore tools like Partial Dependence Plots (PDPs), Accumulated Local Effects (ALE) plots, and Individual Conditional Expectation (ICE) plots. We will also delve into calculating interactions with Friedman's H-Statistic. Each tool offers a lens to scrutinize a feature's influence, illuminating their individual and collective impacts on predictions. In the following section, we will guide you through these advanced techniques, reserved for those keen on unraveling the most intricate threads of model behavior. As we explore how to gain insights into the overall behavior of a model, a practical question arises:


    Do you have evidence, beyond the general gauge provided by cross-validation, confirming the model's reliable performance on new unseen data?

    This question isn’t just about making the model understandable but also about ensuring its predictions are reliable when applied to new data. Global techniques only begin to address this need. With this in mind, we move to a closer look at how individual features influence predictions.