Healthy frictions are a vital engine of growth in any ecosystem. Adversarial parties pushing each other to achieve new levels of performance are the essence of human’s biological, social and economic evolution. The deep learning space is not an exception to this phenomenon and, very often, we find tremendous friction between different schools of thought in the ecosystem. From those friction dynamics, none is more important than the relationship between interpretability and accuracy in deep learning models.
Interpreting a mathematical equation like x + y = z is simple. By seeing the result and the inputs, you can quickly tell how the process was able to produce the output. However, if I ask you to interpret how an orchestra is delivering Beethoven Symphony №. 9 the process might not be as trivial 😉 Understanding the part that each instrument played as well as its interactions with other dozens of instruments is not very easy to do just by listening to the final melody. Well, we have similar issues in the deep learning world.
Some machine learning models such as linear regressions or back-propagations are very easy to interpret. You can start from the results and trace the steps all the way to the inputs. However, many machine learning problems in real life require complex computational structures such as deep neural networks which are composed by hundreds of hidden layers and millions of nodes. Interpreting the result of a deep neural network is difficult at best and, in many cases, computationally unviable.
Interpretability vs. Accuracy
The friction between the interpretability and accuracy capabilities of deep learning models is the friction between being able to accomplish complex knowledge tasks and understanding how those tasks were accomplished. Knowledge vs. Control, Performance vs. Accountability, Efficiency vs. Simplicity…pick your favorite dilemma and they all can be explained by balancing the tradeoffs between accuracy and interpretability.
Do you care about obtaining the best results or do you care about understanding how those results were produced? That’s a question that data scientists need to answer in every deep learning scenario. Many deep learning techniques are complex in nature and, although they result very accurate in many scenarios, they can become incredibly difficult to interpret. If we can plot some of the best-known deep learning models in a chart that correlates accuracy and interpretability, we will get something like the following:
Not All Interpretations are Created Equal
Just like in human cognition, interpreting knowledge is a relatively abstract concept. In the case of deep learning, there are different ways to interpret the intricacies of a model.
Similarly, the deep learning space has produced different methods that can be used to improve the interpretability of a model. Here are some of my favorites:
The Building Blocks of Interpretability
When comes to deep learning models, interpretability is not a single concept but a combination of different principles. In a recent paper, researchers from Google outlined what they considered some of the foundational building blocks of interpretability. The paper presents three fundamental characteristics that make a model interpretable:
Google summarizes the principles of interpretability as the following:
· Understanding what Hidden Layers Do: The bulk of the knowledge in a deep learning model is formed in the hidden layers. Understanding the functionality of the different hidden layers at a macro level is essential to be able to interpret a deep learning model.
Understanding what Hidden Layers Do: The bulk of the knowledge in a deep learning model is formed in the hidden layers. Understanding the functionality of the different hidden layers at a macro level is essential to be able to interpret a deep learning model.
· Understanding How Nodes are Activated: The key to interpretability is not to understand the functionality of individual neurons in a network but rather groups of interconnected neurons that fire together in the same spatial location. Segmenting a network by groups of interconnected neurons will provide a simpler level of abstraction to understand its functionality.
· Understanding How Concepts are Formed: Understanding how deep neural network forms individual concepts that can then be assembled into the final output is another key building block of interpretability.
Combining those three principles certainly improves the interpretability of a deep neural network. Certainly, interpretability is an essential element for the mainstream adoption of deep learning techniques. However, when thinking about implementing an interpretable deep learning model, we should always remember that we might be paying a price in terms of accuracy.