D-SELACFP: Visible-Infrared Person Re-Identification via Local Attention and Cross-Modal Feature Perception Enhancement
DOI:
https://doi.org/10.53469/wjimt.2025.08(07).09Keywords:
Visible-Infrared Person Re-Identification, Local Attention Module, Modal Noise, Feature Perception EnhancementAbstract
Visible-Infrared Person Re-Identification (VI-ReID) faces significant challenges such as difficult feature matching and large pose variations, leading to low cross-modality image matching accuracy. Existing approaches typically extract features directly from raw images and embed features from different modalities into a common space to learn shared representations. However, they often neglect the interference of noise and fail to fully utilize identity-discriminative information within modality-specific features, resulting in weak cross-modality invariant feature extraction and interference from irrelevant noise during feature matching. To address the issues of modal noise interference and feature matching difficulty in VI-ReID, this paper proposes a dual-stream neural network framework designed to reduce modal noise and enhance perceptual features. The framework incorporates a Local Attention Module (LAM) and an Inter-modal Feature Perception Enhancement Module (I-MFPE). These modules work synergistically to mine salient features both within and across modalities, while effectively leveraging the identity-discriminative information inherent in modality-specific features. Specifically, the LAM focuses on discriminative part-level features and suppresses background noise interference, thereby effectively extracting both modality-shared and modality-specific features while reducing the impact of irrelevant information. The I-MFPE module enhances the ability to extract shareable features from heterogeneous images by optimizing fine-grained feature representations in both channel and spatial dimensions, simultaneously mitigating the influence of modal differences on matching. The proposed method effectively alleviates noise introduced by factors such as viewpoint variations and background clutter, enhancing the discriminative power of cross-modality pedestrian features and providing a more robust feature representation for VI-ReID. Experimental results demonstrate the significant advantages of the proposed method in suppressing noise interference and improving feature matching performance.
References
LUO H, JIANG W, FAN X, et al. A survey on deep learning based per‐ son re-identification[J]. Acta Automatica Sinica, 2019, 45(11): 2032- 2049.
LIAO S, SHAO L. Graph sampling based deep metric learning for generalizable person re - identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 7359-7368.
NGuyEn D T, HonG H G, KiM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras [J]. Sensors, 2017, 17(3): 605.
HAo Y, WAnG N N, GAo X B, et al. Dual-alignment feature embedding for cross-modality person re-identification [C]//27th ACM International Conference on Multimedia, 2019: 57-65.
CHENG D, Li X H, Qi M B, et al. Exploring cross-modality commonalities via dual-stream multi-branch network for infrared-visible person re-identification [J]. IEEE Access, 2020, 8: 12824-12834.
Liu H J, ChEnG J, WAnG W, et al. Enhancing the discriminative feature learning for visiblethermal cross-modality person re-identification [J]. Neurocomputing, 2020, 398: 11-19.
WANG P Y, ZhAo Z C, Su F, et al. Deep multi-patch matching network for visible thermal person re-identification [J]. IEEE Transactions on Multimedia, 2021, 23: 1474-1488.
Zhu Y X, YAnG Z, WAnG L, et al. Hetero-center loss for cross-modality person reidentification [J]. Neurocomputing, 2020, 386: 97-109.
Fu C Y, Hu Y B, Wu X, et al. CM-NAS: cross-modality neural architecture search for visibleinfrared person re-identification [C]//IEEE/CVF International Conference on Computer Vision, 2021: 11803-11812.
YE M, RuAn W J, Du B, et al. Channel augmented joint learning for Visible-Infrared recognition [C]//IEEE/CVF International Conference on Computer Vision, 2021: 13547-13556.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3-19.
Yang J, Zhang J, Yu F, et al. Learning to know where to see: a visibility-aware approach for occluded person re-identification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 11885-11894.
Liu Z, Zhang Z, Li D, et al. Dual-branch self-attention network for pedestrian attribute recognition[J]. Pattern Recognition Letters, 2022, 163: 112-120.
Li C, Yang X, Yin K, et al. Pedestrian re‐identification based on attribute mining and reasoning[J]. IET Image Processing, 2021, 15(11):2399-2411.
Zhao J, Wang H, Zhou Y, et al. Spatial-channel enhanced transformer for Visible-Infrared person reidentification[J]. IEEE Transactions on Multimedia, 2022, 25: 3668-3680.
QIN Z, ZHANG P, WU F, et al. Fcanet: Frequency channel attention networks[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 783-792
YANG K W, YANG J W, TIAN X M. Learning multi-granularity fea tures from multi-granularity regions for person re-identification[J]. Neurocomputing, 2021, 432: 206-215.
LI Q, WANG H J, LI B Y, et al. Fast image semantic segmentation method based on improved IIE-SegNet [J]. Journal of Harbin Engineer ing University, 2024, 45(2): 314-323.
WU A C, ZHENG W S, YU H X, et al. RGB-infrared cross-modality person re-identification [C]//Proceedings of the 2017 IEEE Interna tional Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 5390-5399.
CHOI S, LEE S M, KIM Y, et al. Hi-CMD: hierarchical cross-modality disentanglement for Visible-Infrared person re-identification[C]//Pro‐ ceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 10257- 10266.
CHEN Y, WAN L, LI Z H, et al. Neural feature search for RGBinfrared person re-identification [C]//Proceedings of the 2021 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2021: 587-597.
HAO X, ZHAO S Y, YE M, et al. Cross-modality person reidentification via modality confusion and center aggregation[C]//Pro‐ ceedings of the 2021 IEEE/CVF International Conference on Com‐ puter Vision (ICCV). Piscataway: IEEE Press, 2021: 16403-16412.
YANG M X, HUANG Z Y, HU P, et al. Learning with twin noisy la‐ bels for Visible-Infrared person re-identification[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni‐ tion (CVPR). Piscataway: IEEE Press, 2022: 14288-14297.
WU Q, DAI P Y, CHEN J, et al. Discover cross-modality nuances for Visible-Infrared person re-identification[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2021: 4328-4337.
PARK H, LEE S, LEE J, et al. Learning by aligning: Visible-Infrared person re-identification using cross-modal correspondences[C]//Pro‐ ceedings of the 2021 IEEE/CVF International Conference on Com‐ puter Vision (ICCV). Piscataway: IEEE Press, 2021: 12026-12035.
YU H, CHENG X, PENG W, et al. Modality unifying network for Visible-Infrared person re-identification [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 11185-11195.
REN K, ZHANG L. Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 393-402.