Modality-Invariant Image Classification (MIIC)

Seungryong Kim1
Rui Cai2
Kihong Park3
Sunok Kim3
Kwanghoon Sohn3
Korea University1, MSRA3, Yonsei University4

[TIP'17 paper]

Examples of similarity kernel for images under challengingly varying modality conditions. (a)-(d) Tower bridge images taken under snow condition in (a), (b) and sunset condition in (c), (d) with their corresponding deep CNN activation features (4096-d) [3]. (e)-(h) Stonehenge images taken under snow condition in (e), (f) and sunset condition in (g), (h). In conventional similarity kernel (e.g., inner product) as in (i), the similarity between images derived from same modality is higher than that between images derived from a similar category, which limits the performance of conventional methods. Unlike this conventional kernel, the proposed kernel is robust to photometric variations.

We present a unified framework for image classification of image sets taken under varying modality conditions. Our approach is motivated by a key observation that the image feature distribution is simultaneously influenced by the semantic-class and the modality category label, which limits the performance of conventional methods for this task. With this insight, we introduce modality uniqueness as a discriminative weight that divides each modality cluster from all other clusters. By leveraging the modality uniqueness, our framework is formulated as unsupervised modality clustering and classifier learning based on modality-invariant similarity kernel. Specifically, in the assignment step, training images are first assigned to the most similar cluster in terms of modality. In the update step, based on the current cluster hypothesis, the modality uniqueness and the sparse dictionary are updated. These two steps are formulated in an iterative manner. Based on the final clusters, a modality invariant marginalized kernel is then computed, where the similarities between the reconstructed features of each modality are aggregated across all clusters. Our framework enables the reliable inference of semantic-class category for an image, even across large photometric variations. Experimental results show that our method outperforms conventional methods on various benchmarks, e.g., landmark identification under severely varying weather conditions, domain-adapting image classification, and RGB-NIR image classification.