Forget Manual Annotation: AI Can Now Learn Concepts on Its Own

8 Apr 2025

Authors:

(1) Sanchit Sinha, University of Virginia (sanchit@virginia.edu);

(2) Guangzhi Xiong, University of Virginia (hhu4zu@virginia.edu);

(3) Aidong Zhang, University of Virginia (aidong@virginia.edu).

Table of Links

Abstract and 1 Introduction

3.2 Self-supervised Contrastive Concept Learning

3.3 Prototype-based Concept Grounding

3.4 End-to-end Composite Training

4 Experiments and 4.1 Datasets and Networks

4.2 Hyperparameter Settings

4.3 Evaluation Metrics and 4.4 Generalization Results

4.5 Concept Fidelity and 4.6 Qualitative Visualization

5 Conclusion and References

Appendix

Related work on concept-level explanations. Recent research has focused on designing concept-based deep learning methods to interpret how deep learning models can use highlevel human-understandable concepts in arriving at decisions [Ghorbani et al., 2019; Chen et al., 2019; Wu et al., 2020; Koh et al., 2020; Yeh et al., 2019; Mincu et al., 2021; Huang et al., 2022; Leemann et al., 2022; Sinha et al., 2021; Sinha et al., 2023]. Such concept-based deep learning models aim to incorporate high-level concepts into the learning procedure. Concept priors have been utilized to align model concepts with human-understandable concepts [Zhou et al., 2018; Murty et al., 2020; Chen et al., 2019] and bottleneck models were generalized wherein any prediction model architecture can be transformed [Koh et al., 2020; Zaeem and Komeili, 2021] by integrating an intermediate layer to represent a human-understandable concept representation. Similar work on utilizing CBMs for various downstream tasks include [Sawada, 2022b; Jeyakumar et al., 2021; Pittino et al., 2021; Bahadori and Heckerman, 2020].

Related work on self-supervised learning with images. Self-supervised learning [Xu et al., 2019; Saito et al., 2020] via pretext tasks has been demonstrated to learn high-quality domain invariant representations from images using a variety of transformations such as rotations [Xu et al., 2019; Gidaris et al., 2018]. Self-supervised learning in image space fall into two major paradigms. The first approach generates multiple ‘views’ or small transformations of the same image which preserve the inherent semantics. The transformations are usually small enough to not cause a significant shift in the intended and actual features in the latent space and are trained using a form of contrastive loss [Wang and Liu, 2021]. The second paradigm attempts to view self-supervised feature learning as a puzzle-solving problem [Xu et al., 2019].

Related work on automatic interpretable concept learning. Supervised concept learning requires the concepts of each training sample to be manually annotated, which is impossible with a moderately large dataset and the concepts are restricted to what humans can conceptualize. To alleviate such bottlenecks, automatic concept learning is becoming increasingly appealing. One dominant architecture is Self Explaining Neural Networks (SENN) proposed in [Alvarez-Melis and Jaakkola, 2018]. Several other popular methods have been proposed which automatically learn concepts are detailed [Kim et al., 2018; Ghorbani et al., 2019; Yeh et al., 2019; Wu et al., 2020; Goyal et al., 2019].

Comparision with existing work. Our work aims to address a challenge existing approaches face, concepts learned by self-explaining models may not be able to generalize well across domains, as the learned concepts are mixed with domain-dependent noise and less robust to light transformations due to a lack of supervision and regularization. Our proposed approach tackles this largely unsolved problem by designing a novel representative concept extraction framework and regularizes it using self-supervised contrastive concept learning and prototype-based grounding.

Concurrent to our work, BotCL [Wang, 2023] also proposes to utilize self-supervised learning to learn interpretable concepts. However, our approach is significantly different in both training and evaluation. We utilize multiple SOTA transformations to learn distinct concepts, while BotCL only uses a very crude regularization by maximizing the similarity between samples from the same class during concept learning. Our evaluation framework is significantly more extensive and comprises of concept interoperability by evaluating per- formance across domains, while BotCL only uses task accuracy. Another work related to ours [Sawada, 2022b] proposes to incorporate multiple unsupervised concepts in the bottleneck layer of CBMs in addition to supervised concepts which differs from our approach as we learn all concepts in a selfsupervised manner, without supervision. Another concurrent work [Sawada, 2022a] attempts to utilize a modified autoencoder setup with a discriminator instead of a decoder and weak supervision using an object-detecting network (Faster RCNN) which is very specific to the autonomous driving datasets and is not generalizable.

This paper is available on arxiv under CC BY 4.0 DEED license.

← Previous

Enhancing AI Interpretability: A Guide to Self-Explaining Concept Architectures

Up Next →

4 Steps to Achieve Domain-Invariant Concept Learning in AI Systems

Forget Manual Annotation: AI Can Now Learn Concepts on Its Own

Table of Links

2 Related Work