1.云南大学信息学院,云南 昆明 650504
2.云南省高校物联网技术及应用重点实验室,云南 昆明 650504
[ "邓鑫(1999‒ ),男,云南大学信息学院硕士生,主要研究方向为计算机视觉、行为识别。" ]
[ "谷金晶(1990‒ ),女,博士,云南大学信息学院讲师、硕士生导师,主要研究方向为跨媒体语义分析与理解、视频异常行为识别、视频描述。" ]
[ "赵征鹏(1972‒ ),男,云南大学信息学院副教授、硕士生导师,主要研究方向为信号与信息处理、计算机系统及应用。" ]
[ "普园媛(1972‒ ),女,博士,云南大学信息学院教授,主要研究方向为图像风格迁移、多模态情感分析、视觉媒体计算。" ]
[ "徐丹(1968‒ ),女,博士,云南大学信息学院教授,主要研究方向为图形绘制技术、图像融合、虚拟现实、视觉计算及认知。" ]
收稿:2024-10-15,
修回:2024-11-29,
纸质出版:2025-12-10
移动端阅览
邓鑫,谷金晶,赵征鹏等.基于人-物交互图卷积网络的扶梯乘客危险行为识别[J].物联网学报,2025,09(04):184-193.
DENG Xin,GU Jinjing,ZHAO Zhengpeng,et al.Escalator passenger dangerous behavior recognition based on human-object interaction graph convolutional network[J].Chinese Journal on Internet of Things,2025,09(04):184-193.
邓鑫,谷金晶,赵征鹏等.基于人-物交互图卷积网络的扶梯乘客危险行为识别[J].物联网学报,2025,09(04):184-193. DOI: 10.11959/j.issn.2096-3750.2025.00470.
DENG Xin,GU Jinjing,ZHAO Zhengpeng,et al.Escalator passenger dangerous behavior recognition based on human-object interaction graph convolutional network[J].Chinese Journal on Internet of Things,2025,09(04):184-193. DOI: 10.11959/j.issn.2096-3750.2025.00470.
扶梯乘客的不当行为极易引发公共安全事故和财产损失,基于监控视频准确识别出扶梯乘客危险行为,对于保障公共安全具有重要意义。但现有的行为识别方法鲜有关注扶梯场景下的乘客危险行为,并且缺乏对人与扶梯时空交互的建模分析。因此,提取人体骨架和人-物交互的时空信息,设计了基于距离度量的双流人-物交互图卷积网络来识别扶梯乘客危险行为。首先,分别提取人体骨架和扶梯关键点特征,通过扶梯关键点为人体骨架特征补充场景信息。其次,利用人-扶梯间的距离度量危险行为中人-物关系的动态变化,加强模型对危险行为中时空交互信息的建模。最后,为了填补现有公开数据集中扶梯危险行为视频的空白,构建了一个扶梯乘客危险行为视频数据集ESC-Danger,该数据集包含倚靠、攀爬、下蹲、伸手、探头、滞留、逆行和奔跑8类扶梯乘客危险行为。在ESC-Danger数据集上所提模型的识别准确率为95.06%,相比于其他先进算法,具有较高的识别准确率和良好的泛化性能。
Improper behavior of escalator passengers can easily lead to public safety accidents and property losses. Accurately identifying dangerous behaviors of escalator passengers based on surveillance videos is of great significance for ensuring public safety. However
existing behavior recognition methods rarely focus on the dangerous behaviors of passengers in escalator scenes
and lack modeling and analysis of spatial-temporal interactions between people and escalators. Therefore
spatio-temporal information from human skeleton and human-object interactions were extracted
and a two-stream human-object interaction graph convolutional network considering distance metrics to identify dangerous behaviors of escalator passengers was designed. Firstly
features from both human skeleton and escalator keypoints were extracted
supplementing scene information for human skeleton features using escalator keypoints. Secondly
distance metrics between humans and escalators to dynamically capture changes in human-object relationships within dangerous behaviors was utilized
enhancing the model's modeling of spatio-temporal interaction information in dangerous behaviors. Finally
to fill the gap in existing publicly available datasets regarding videos of dangerous behaviors on escalators
a dataset called ESC-Danger for escalator passenger dangerous behaviors was constructed. This dataset contains eight classes of escalator passenger dangerous behaviors
including lean
climb
crouch
reach out
poke head out
retention
retrograde
and run. The recognition ccuracy of the proposed model on the ESC-Danger dataset is 95.06%
demonstrating higher recognition accuracy and good generalization performance compared to other state-of-the-art algorithms.
YAN S J , XIONG Y J , LIN D H . Spatial temporal graph convolutional networks for skeleton-based action recognition [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Menlo Park : AAAI Press , 2018 , 32 ( 1 ): 7444 - 7452 .
LI Y L , LIU X P , LU H , et al . Detailed 2D-3D joint representation for human-object interaction [C ] // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2020 : 10163 - 10172 .
JIANG X H , XU K , SUN T F . Action recognition scheme based on skeleton representation with DS-LSTM network [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2020 , 30 ( 7 ): 2129 - 2140 .
YANG Z Y , LI Y C , YANG J C , et al . Action recognition with spatio-temporal visual attention on skeleton image sequences [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2019 , 29 ( 8 ): 2405 - 2415 .
SHI L , ZHANG Y F , CHENG J , et al . Skeleton-based action recognition with directed graph neural networks [C ] // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2020 : 7904 - 7913 .
LIU Z Y , ZHANG H W , CHEN Z H , et al . Disentangling and unifying graph convolutions for skeleton-based action recognition [C ] // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2020 : 140 - 149 .
杨学存 , 李杰华 , 陈丽媛 , 等 . 基于人体骨架的扶梯乘客异常行为识别方法 [J ] . 安全与环境学报 , 2024 , 24 ( 2 ): 636 - 643 .
YANG X C , LI J H , CHEN L Y , et al . An abnormal behavior recognition method of escalator passengers based on human skeletons [J ] . Journal of Safety and Environment , 2024 , 24 ( 2 ): 636 - 643 .
GKIOXARI G , GIRSHICK R , DOLLÁR P , et al . Detecting and recognizing human-object interactions [C ] // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2018 : 8359 - 8367 .
ZHOU P H , CHI M M . Relation parsing neural network for human-object interaction detection [C ] // Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2020 : 843 - 851 .
WANG H R , YU B S , LI J Q , et al . Multi-stream interaction networks for human action recognition [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 5 ): 3050 - 3060 .
CHAO Y W , LIU Y F , LIU X Y , et al . Learning to detect human-object interactions [C ] // Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE Press , 2018 : 381 - 389 .
KOPPULA H S , SAXENA A . Anticipating human activities using object affordances for reactive robotic response [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2016 , 38 ( 1 ): 14 - 29 .
SENGUPTA A , JIN F , ZHANG R Y , et al . Mm-pose: real-time human skeletal posture estimation using mmWave radars and CNNs [J ] . IEEE Sensors Journal , 2020 , 20 ( 17 ): 10032 - 10044 .
JIANG T , LU P , ZHANG L , et al . RTMPose: real-time multi-person pose estimation based on MMPose [EB ] . 2023 .
LIU J , SHAHROUDY A , XU D , et al . Spatio-temporal LSTM with trust gates for 3D human action recognition [C ] // Proceedings of the Computer Vision-ECCV 2016 . Cham : Springer , 2016 : 816 - 833 .
ZHU W T , LAN C L , XING J L , et al . Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks [C ] // Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence . Menlo Park : AAAI Press , 2016 : 3697 - 3703 .
LIU M Y , YUAN J S . Recognizing human actions as the evolution of pose estimation maps [C ] // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2018 : 1159 - 1168 .
DUAN H D , ZHAO Y , CHEN K , et al . Revisiting skeleton-based action recognition [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 2959 - 2968 .
ZHOU Y X , YAN X D , CHENG Z Q , et al . BlockGCN: redefine topology awareness for skeleton-based action recognition [C ] // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2024 : 2049 - 2058 .
SHI L , ZHANG Y F , CHENG J , et al . Two-stream adaptive graph convolutional networks for skeleton-based action recognition [C ] // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2020 : 12018 - 12027 .
LIN L L , ZHANG J H , LIU J Y . Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition [C ] // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2023 : 2363 - 2372 .
LEE J , LEE M , LEE D , et al . Hierarchically decomposed graph convolutional networks for skeleton-based action recognition [C ] // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2024 : 10410 - 10419 .
GUPTA A , KEMBHAVI A , DAVIS L S . Observing human-object interactions: using spatial and functional compatibility for recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2009 , 31 ( 10 ): 1775 - 1789 .
YAO B P , LI F F . Modeling mutual context of object and human pose in human-object interaction activities [C ] // Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2010 : 17 - 24 .
DESAI C , RAMANAN D , FOWLKES C . Discriminative models for static human-object interactions [C ] // Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops . Piscataway : IEEE Press , 2010 : 9 - 16 .
LIU Y , YUAN J S , CHEN C W . ConsNet: learning consistency graph for zero-shot human-object interaction detection [C ] // Proceedings of the 28th ACM International Conference on Multimedia . New York : ACM Press , 2020 : 4235 - 4243 .
QI S Y , WANG W G , JIA B X , et al . Learning human-object interactions by graph parsing neural networks [C ] // Proceedings of the Computer Vision - ECCV 2018 . Cham : Springer , 2018 : 407 - 423 .
ZHANG F Z , YUAN Y H , CAMPBELL D , et al . Exploring predicate visual context in detecting of human-object interactions [C ] // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2024 : 10377 - 10387 .
YANG Y H , ZHAI W , LUO H C , et al . LEMON: learning 3D human-object interaction relation from 2D images [C ] // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2024 : 16284 - 16295 .
DUAN H D , WANG J Q , CHEN K , et al . PYSKL: towards good practices for skeleton action recognition [C ] // Proceedings of the 30th ACM International Conference on Multimedia . New York : ACM Press , 2022 : 7351 - 7354 .
CHEN M , WEI Z W , HUANG Z F , et al . Simple and deep graph convolutional networks [C ] // Proceedings of the 37th International Conference on Machine Learning . New York : ACM Press , 2020 : 1725 - 1735 .
HE K M , GKIOXARI G , DOLLÁR P , et al . Mask R-CNN [C ] // Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2017 : 2980 - 2988 .
王源鹏 , 万海斌 , 黄凯 , 等 . 基于YOLOv5s的自动扶梯乘客异常行为实时检测算法 [J ] . 激光与光电子学进展 , 2024 , 61 ( 8 ): 0812004 .
WANG Y P , WAN H B , HUANG K , et al . Real-time detection of abnormal behavior of escalator passengers based on YOLOv5s [J ] . Laser Optoelectronics Progress , 2024 , 61 ( 8 ): 0812004 .
REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .
LYU C Q , ZHANG W W , HUANG H A , et al . RTMDet: an empirical study of designing real-time object detectors [EB ] . 2022 .
SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 . New York : ACM Press , 2014 : 568 - 576 .
CHEN Y X , ZHANG Z Q , YUAN C F , et al . Channel-wise topology refinement graph convolution for skeleton-based action recognition [C ] // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2022 : 13339 - 13348 .
0
浏览量
73
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621