Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation
Chengyang Li
Dan Song
Ruofeng Tong
Min Tang
BMVC 2018 | arXiv
Figure 1: Overview of the proposed MSDS-RCNN. The network architecture consists of a multispectral proposal network (MPN) to generate pedestrian proposals, and a subsequent multispectral classification network (MCN) to distinguish pedestrian instances from hard negatives. The unified network is learned by jointly optimizing pedestrian detection and semantic segmentation tasks. The final detections are obtained by integrating the outputs from different modalities as well as the two stages.
Abstract
Multispectral pedestrian detection has attracted increasing attention from the research community due to its crucial competence for many around-the-clock applications (e.g., video surveillance and autonomous driving), especially under insufficient illumination conditions. We create a human baseline over the KAIST dataset and reveal that there is still a large gap between current top detectors and human performance. To narrow this gap, we propose a network fusion architecture, which consists of a multispectral proposal network to generate pedestrian proposals, and a subsequent multispectral classification network to distinguish pedestrian instances from hard negatives. The unified network is learned by jointly optimizing pedestrian detection and semantic segmentation tasks. The final detections are obtained by integrating the outputs from different modalities as well as the two stages. The approach significantly outperforms state-of-the-art methods on the KAIST dataset while remain fast. Additionally, we contribute a sanitized version of training annotations for the KAIST dataset, and examine the effects caused by different kinds of annotation errors. Future research of this problem will benefit from the sanitized version which eliminates the interference of annotation errors.
Detection Performance
We compare our MSDS-RCNN with existing methods, including ACF+T+THOG [1], Halfway Fusion [2], Fusion RPN [3], Fusion RPN+BF [3], IAF R-CNN [4] and IATDNN+IASS [5]. Fig. 2 illustrates the ROC curves.
Figure 2: Comparisons of detection results reported on the test set of KAIST dataset [1], in terms of Reasonable-all. Our method surpasses existing state-of-the-arts by a large margin (26% relative). About one third of the error is attributed to the annotation noise, which can be further eliminated using our sanitized training annotations.
Downloads
Human Baseline: [Google Drive] [OneDrive]
Detection Results: [Google Drive] [OneDrive]
Sanitized Training Annotations: [Google Drive] [OneDrive]
KAIST Multispectral Pedestrian Dataset: Link to KAIST dataset
Improved Testing Annotations provided by Liu et al.: Link to download (Since the original annotations of the test set contain many problematic bounding boxes, we highly recommend you to report results using the improved testing annotations instead of the orignial ones to enable a reliable comparison.)
Codes
Link to project on GitHub: (https://github.com/Li-Chengyang/MSDS-RCNN)
Citation
@InProceedings{li_2018_BMVC,
author = {Li, Chengyang and Song, Dan and Tong, Ruofeng and Tang, Min},
title = {Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation},
booktitle = {British Machine Vision Conference (BMVC)},
month = {September},
year = {2018},
}
Reference
[1] Soonmin Hwang, Jaesik Park, Namil Kim, Yukyung Choi, and In So Kweon. Multispectral pedestrian detection: Benchmark dataset and baseline. In CVPR, pages 1037–1045, 2015.
[2] Jingjing Liu, Shaoting Zhang, Shu Wang, and Dimitris Metaxas. Multispectral deep neural networks for pedestrian detection. In BMVC, pages 73.1–73.13, 2016.
[3] Daniel König, Michael Adam, Christian Jarvers, Georg Layher, Heiko Neumann, and Michael Teutsch. Fully convolutional region proposal networks for multispectral person detection. In CVPRW, pages 243–250, 2017.
[4] Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. Illumination-aware faster r-cnn for robust multispectral pedestrian detection. arXiv preprint arXiv:1803.05347, 2018.
[5] Dayan Guan, Yanpeng Cao, Jun Liang, Yanlong Cao, and Michael Ying Yang. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. arXiv preprint arXiv:1802.09972, 2018.