Building Inherently Interpretable Models from the Start

A flowchart detailing the process of building inherently interpretable AI models. It starts with a question about ensuring transparency and interpretability, which connects to the section 'Building Inherently Interpretable Models from the Start.' From here, paths branch out to different model types based on data characteristics: Linear Models for only linear relationships, Decision Trees or Rule-based Models when feature interactions are present, and Naive Bayes Classifier or K-Nearest Neighbors when non-linearities are present. The paths converge on potential impacts, emphasizing 'Model Transparency,' 'Ease to Interpret,' and 'Prediction Transparency,' reflecting the models' straightforward nature and their capability to clarify how predictions are made.
      Diagram created using Mermaid.js code written by Areal Tal, 2023.


How can we ensure that the model is designed with transparency and interpretability at its core?


Inherently interpretable models are designed to be straightforward and easy to understand. These models aim to provide a clear and intuitive insight into how decisions are made, focusing on simplicity to make the model's decision-making process transparent for stakeholders.


The selection of ML algorithms that produce simpler architectures like a linear model, an individual decision tree, or a k-nearest neighbors model lay out their decision-making process in an easily digestible manner. Other considerations for simplifying the model architecture include limiting the number of features, standardizing inputs (making weights more comparable), utilizing regularization, and choosing appropriate hyperparameters.

Linear Models

These models are favored for their straightforward nature. They can be a good fit when the data shows clear linear relationships, but they might struggle when these relationships become more complex.

L1 Regularization

L1 regularization enhances interpretability over a straightforward supervised learning algorithm by removing less important features by way of regularization, thereby reducing complexity and making the model easier to read and understand. Lasso Regression is the name of the algorithm in the case of utilizing l1 regularization with linear regression. However, l1 regularization can be used with a wide variety of regression and classification algorithms (Van Otten, 2023).

Decision Trees

A decision tree offers a visual representation of the decision-making processes. However, their interpretability can decrease as they become more complex as the tree grows larger. What you can do is select hyperparameters that limit the depth, limit the number of features used, and increase the amount of pruning done.

Other Simple Models

Naive Bayes Classifier and k-Nearest Neighbors are also valued for their interpretability (Molnar, 2023), but their clarity can depend on the context and the number of features.


In scenarios where clear and trustworthy explanations are paramount, such as in healthcare, high-risk safety-critical sectors, or under regulatory requirements, inherently interpretable models might be preferred due to their reduced risk of misunderstanding. In such contexts, a straightforward explanation of decisions is not merely beneficial, but often mandatory.

That said, inherently interpretable models are only possible when all of the following prerequisites are met:

  • There are few features.
  • There are no complex non-linear relationships between any of the features and the target.
  • Feature interactions are limited and not too complex.

Potential Impacts

Inherently interpretable models offer significant advantages by design:

  • Ease of Interpretation and Model Transparency: Their simplicity makes their decision-making processes easily understood. Their clear logic allows for a direct understanding of how inputs are transformed into outputs. This clarity makes it easier to explain model behavior to stakeholders from various backgrounds. It also fosters trust and confidence.
  • Prediction Transparency: The transparency potential of these models extends to the individual predictions they make as well. The outcome of each prediction can be traced back not only to the influencing factors, but to exactly how those factors affected the prediction.

Transition to Next Section

In the balancing act between model interpretability and performance, an inherently interpretable model might lean toward the former, particularly if it doesn’t fully capture the complexity of the relationships in the data. The advent of post-hoc explanations has eased this tension. However, model-agnostic post-hoc explanations are approximations - their fidelity to model behavior is uncertain and can be challenging to validate (Lakkaraju et al., 2019). Furthermore, different tools used for these explanations can produce conflicting results (Krishna et al., 2022). The simplicity of inherently interpretable models can preclude such issues.