LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation

Sunok Kim¹

Seungryong Kim²

Dongbo Min³

Kwanghoon Sohn¹

Yonsei University¹, Korea University², Ewha Womans University³

[CVPR'19 paper]

[CVPR'19 slide]

[Code]

The network configuration of LAF-Net which consists of four sub-networks, including feature extraction networks, attention inference networks, scale inference network, and recursive refinement networks. Given matching cost, disparity, and color image as input, our networks output confidence of the disparity.

We present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the trimodal confidence features. The attention inference networks encode the importance of tri-modal confidence features and then concatenate them using the attention maps in an adaptive and dynamic fashion. This enables us to make an optimal fusion of the heterogeneous features, compared to a simple concatenation technique that is commonly used in conventional approaches. In addition, to encode the confidence features with locally-varying receptive fields, the scale inference networks learn the scale map and warp the fused confidence features through convolutional spatial transformer networks. Finally, the confidence map is progressively estimated in the recursive refinement networks to enforce a spatial context and local consistency. Experimental results show that this model outperforms the state-ofthe-art methods on various benchmarks.

Paper

Sunok Kim, Seungryong Kim, Dongbo Min, Kwanghoon Sohn

LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation

CVPR, 2019 (Oral)

[pdf] [bibtex]

Video

Results

The confidence maps on MID 2006 dataset [34] (first two rows) and MID 2014 dataset [33] (last two rows) using census-SGM and MC-CNN. (a) color images, (b) initial disparity map, (c)-(f) are estimated confidence maps by (c) Kim et al. [21], (d) LFN [7], (e) LGC-Net [40], (f) LAF-Net, and (g) ground-truth confidence map.

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017M3C4A7069370). This webpage template was borrowed from the project pages of colorization and hmr.