Neural network technology can help solve some of the toughest problems facing humanity today, but it comes with one major caveat: It’s a black box, taking in data and spitting out results without ever being able to explain how or why. As neural networks are deployed in increasing numbers, their impact grows accordingly — as does the potential for bias to wreak havoc on those who feel its impact.
This is where a team from Google Search, Hebrew University, MIT, Tel Aviv University, and the Weizmann Institute of Science come in with a potential answer for classification networks: StylEx, which provides “disentangled attributes” designed to explain exactly why a given image was classified in a certain way — figure out, effectively, what makes a cat a cat or a car a car, at least from the perspective of the classifier itself.
Rooting out bias via explanations
The problem of bias in machine learning is well understood. Researchers at North Carolina State University and Pennsylvania State University highlighted issues with algorithmic bias in artificially intelligent hiring platforms nearly two years ago; in 2019 a team at Carnegie Mellon University proposed a system for looking inside machine learning models to find influential proxies; and team comprised of computer scientists from Princeton and Stanford Universities proposed tackling the problem in the data sets themselves.
The approach taken for StylEx is different, and considerably more hands-on. StylEx, its creators explain, provides a means of training a model for a generative adversarial network (GAN) to create a “StyleSpace” with explicable attributes specific to the classifier. “These can then be used to visualize the effect of changing multiple attributes per image,” the team writes in the paper’s abstract, “thus providing image-specific explanations.”
Those explanations aren’t presented as a simple list of properties: Key to StylEx is its interactivity, whereby each given attribute — the list of which varies depending on the data set and the classifier being trained — is presented to the viewer with sliders. Increasing or decreasing a given attribute’s slider then uses a generative adversarial network to create a new image with the attribute emphasized or de-emphasized — creating, under the user’s control, images which are ideal examples of what the classifier is looking for or what the team calls “counterfactual examples” of the opposite.
“For example,” Google research software engineers Oran Lang and Inbar Mosseri explain in a joint post on the topic, “one can draw conclusions such as ‘dogs are more likely to have their mouth open than cats’, ‘cats’ pupils are more slit-like’, ‘cats’ ears do not tend to be folded’ and so on.”
Training for transparency
StylEx works by modifying the training stage of the StyleGAN generative adversarial network with two additional components: An encoder which uses a reconstruction-loss layer to ensure generated images don’t stray too far from the appearance of the input image; and a classification-loss layer which forces the newly-generated image to the same classifier probability of the input image — meaning that key visual details on which the classifier hinges its decision are preserved, even if they’re not obvious to the human viewer.
With training complete, the StyleSpace — a property analyzed by Zongze Wu, Dani Lischinski, and Eli Shechtman in their 2020 paper on the StyleGAN2 architecture, on which StylEx is built — is searched for attributes which, when changed, have a notable impact on the classification probability. Initially, the attributes are image-specific; by repeating the training across a large number of class-specific images the attributes shift towards class-specificity.
Key to the success of the approach is that the attributes are readily understandable by a human viewer, as proven by user group testing. Attributes discovered in a network trained to classify the gender of a person in the image include, for example, the presence of stubble, a mustache, lipstick, and the thickness of the subject’s eyebrows; for age, the top four attributes were skin pigmentation, eyebrow thickness once again, the presence or absence of glasses, and whether the subject’s hair was dark or light.
The technology extends beyond base image classification, however, with its creators suggesting StylEx also has use in the fields of medicine and biology. Trained on images of retinal disease, StylEx was able to extract attributes known to be indicators of disease including exudates and hemorrhages; trained on images of leaves, it picked out a range of attributes linked to plant diseases including the color at the base of the leaf, the presence or absence of spots, and a rotten leaf apex.
Classifier, dataset bias exposed
The team is clear on one thing: “Our method explains a classifier, not reality,” Lang and Mosseri warn. “The method is designed to reveal image attributes that a given classifier has learned to utilize from data; those attributes may not necessarily characterize actual physical differences between class labels (e.g., a younger or older age) in reality.”
That warning brings with it a hidden benefit, too: The very fact that StylEx brings up attributes related to the classifier rather than reality can be exploited to uncover biases in both the classifier network and the underlying dataset, “It can further be used to improve fairness of neural networks,” Lang and Mosseri claim, "by augmenting the training dataset with examples that compensate for the biases our method reveals.
“We believe that our technique is a promising step towards detection and mitigation of previously unknown biases in classifiers and/or datasets. Additionally, our focus on multiple-attribute based explanation is key to providing new insights about previously opaque classification processes and aiding in the process of scientific discovery.”
The team’s work, presented at the International Conference on Computer Vision (ICCV) 2021, is available on the arXiv preprint server under open access terms. Additional information is available on the project’s GitHub repository, alongside a permissively-licensed Colab notebook and model weights for the GANs discussed in the paper.
Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, Inbar Mosseri: Explaining in Style: Training a GAN to explain a classifier in StyleSpace, ICCV 2021. DOI arXiv:2104.13369v2 [cs.CV]