基于位置可学习视觉中心机制的零售商品检测方法

吕晓华; 魏铭辰; 刘立波

doi:10.11959/j.issn.2096-3750.2023.00366

您当前的位置：

首页 >

文章列表页 >

基于位置可学习视觉中心机制的零售商品检测方法

理论与技术 | 更新时间：2024-08-16

- 基于位置可学习视觉中心机制的零售商品检测方法
- Retail commodity detection method based on location learnable visual center mechanism
- 物联网学报 2023年7卷第4期页码：142-152
- 作者机构：
- 作者简介：
  
  [ "吕晓华（2000- ），男，宁夏大学信息工程学院硕士生，主要研究方向为基于深度学习的细粒度商品检测、增量学习" ]
  [ "魏铭辰（1993- ），男，宁夏大学信息工程学院硕士生，主要研究方向为基于深度学习的细粒度商品检测" ]
  [ "刘立波（1974- ），女，博士，宁夏大学教授、博士生导师，主要研究方向为智能信息处理、计算机视觉" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(62262053);宁夏科技创新领军人才计划项目;The Ningxia Science and Technology Innovation Leading Talent Plan(2022GKLRLX03)
- DOI：10.11959/j.issn.2096-3750.2023.00366
  中图分类号： TP18
- 纸质出版日期：2023-12-20，
  
  网络出版日期：2023-12，
- 稿件说明：
移动端阅览
吕晓华, 魏铭辰, 刘立波. 基于位置可学习视觉中心机制的零售商品检测方法[J]. 物联网学报, 2023,7(4):142-152.

XIAOHUA LYU, MINGCHEN WEI, LIBO LIU. Retail commodity detection method based on location learnable visual center mechanism. [J]. Chinese journal on internet of things, 2023, 7(4): 142-152.
吕晓华, 魏铭辰, 刘立波. 基于位置可学习视觉中心机制的零售商品检测方法[J]. 物联网学报, 2023,7(4):142-152. DOI： 10.11959/j.issn.2096-3750.2023.00366.

XIAOHUA LYU, MINGCHEN WEI, LIBO LIU. Retail commodity detection method based on location learnable visual center mechanism. [J]. Chinese journal on internet of things, 2023, 7(4): 142-152. DOI： 10.11959/j.issn.2096-3750.2023.00366.

摘要

针对零售商品包装变形和重叠使得难以有效捕捉显著且多样化的特征信息，导致检测精度不高的问题，设计了位置可学习视觉中心（LLVC

location learnable visual center）机制，对YOLOX-s进行改进，取得了更高的检测精度。为有效应对商品包装变形和重叠现象，首先，通过轻量级多层感知机融合不同特征通道上的信息，以充分捕获全局上下文信息；接着，通过设计的LLVC增强局部特征表示能力，并利用空间信息为局部特征分配可学习的权重，提高辨别性局部特征的关注程度；最后，将交并比（IoU

intersection over union）损失函数替换为中心交并比（CIoU

centered intersection over union），并在此基础上引入功率参数α，有效降低了漏检率。实验结果表明，所提方法在零售商品识别（RPC

retail product checkout）数据集上取得91.3%的准确率，相比YOLOX-s提高了2.2%，并优于目前主流的轻量级目标检测算法；同时每秒帧率（FPS

frame per second）为97 frame/s，模型大小为9.48 MB，能够在计算资源受限的场景下，准确且实时地进行零售商品检测。

Abstract

To address the problem of low detection accuracy caused by the difficulty in effectively capturing significant and diversified feature information for packaging deformation and overlap products

a location learnable visual center (LLVC) mechanism was designed to improve YOLOX-s

achieving higher detection accuracy.To effectively deal with product packaging deformation and overlap phenomena

firstly

global context information was captured through a lightweight multi-layer perceptron to help the model better understand spatial information in product features.Secondly

the local feature representation ability was enhanced by the designed LLVC and the spatial information was used to allocate learnable weights for local features to increase the attention of discriminative local features.Finally

the intersection over union (IoU) loss function was replaced with centered intersection over union (CIoU) and power parameters were introduced on this basis to effectively reduce the missed detection rate.Experimental results show that the proposed method achieves an accuracy of 91.3% on the retail product checkout (RPC) dataset

which is 2.2% higher than YOLOX-s and better than current mainstream lightweight object detection algorithms.At the same time

frame per second (FPS) is 97 frame/s

and the model size is 9.48 MB.It can accurately and in real-time detect retail products in scenarios where computing resources are limited.

关键词

零售商品检测YOLOX-s中心学习机制损失函数轻量级

Keywords

retail commodity detectionYOLOX-scentral learning mechanismloss functionlightweight

references

SARAN A, HASSAN E, MAURYA A K . Robust visual analysis for planogram compliance problem[C]// Proceedings of 2015 14th IAPR International Conference on Machine Vision Applications (MVA). Piscataway:IEEE Press, 2015: 576-579.

RAY A, KUMAR N, SHAW A ,et al. U-PC:Unsupervised planogram compliance[C]// European Conference on Computer Vision. Cham:Springer, 2018: 598-613.

GEORGE M, FLOERKEMEIER C . Recognizing products:a per-exemplar multi-label image classification approach[C]// European Conference on Computer Vision. Cham:Springer, 2014: 440-455.

HIGA K, IWAMOTO K, NOMURA T . Multiple object identification using grid voting of object center estimated from keypoint matches[C]// Proceedings of 2013 IEEE International Conference on Image Processing. Piscataway:IEEE Press, 2014: 2973-2977.

BAO R, HIGA K, IWAMOTO K . Local feature based multiple object instance identification using scale and rotation invariant implicit shape model[C]// Computer Vision-ACCV 2014 Workshops. Cham:Springer International Publishing, 2015: 600-614.

YÖRÜK E, ÖNER K T, AKGÜL C B . An efficient Hough transform for multi-instance object recognition and pose estimation[C]// Proceedings of 2016 23rd International Conference on Pattern Recognition (ICPR). Piscataway:IEEE Press, 2017: 1352-1357.

WEI Y C, TRAN S, XU S X ,et al. Deep learning for retail product recognition:challenges and techniques[J]. Computational Intelligence and Neuroscience, 2020:8875910.

HURTIK P, MOLEK V, VLASANEK P . YOLO-ASC:you only look once and see contours[C]// Proceedings of 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway:IEEE Press, 2020: 1-7.

GOLDMAN E, HERZIG R, EISENSCHTAT A ,et al. Precise detection in densely packed scenes[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 5222-5231.

SZEGEDY C, VANHOUCKE V, IOFFE S ,et al. Rethinking the inception architecture for computer vision[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 2818-2826.

SELVAM P, KOILRAJ J A S . A deep learning framework for grocery product detection and recognition[J]. Food Analytical Methods, 2022,15(12): 3498-3522.

WANG H I, MIYAZAKI L K, FALHEIRO M S ,et al. Designing a self-payment cashier for bakeries using YOLO V4[C]// Proceedings of 2021 14th IEEE International Conference on Industry Applications (INDUSCON). Piscataway:IEEE Press, 2021: 260-265.

GE Z, LIU S, WANG F ,et al. YOLOX:exceeding YOLO series in 2021[J]. arXiv preprint, 2021,arXiv:2107.08430.

QUAN Y, ZHANG D, ZHANG L ,et al. Centralized feature pyramid for object detection[J]. arXiv preprint, 2022,arXiv:2210.02093.

HOU Q B, ZHOU D Q, FENG J S . Coordinate attention for efficient mobile network design[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 13708-13717.

YU J H, JIANG Y N, WANG Z Y ,et al. UnitBox:an advanced object detection network[C]// Proceedings of the 24th ACM International Conference on Multimedia. New York:ACM Press, 2016: 516-520.

ZHENG Z H, WANG P, LIU W ,et al. Distance-IoU loss:faster and better learning for bounding box regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 12993-13000.

HE J, ERFANI S, MA X ,et al. Alpha-IoU:a family of power intersection over union losses for bounding box regression[J]. Advances in Neural Information Processing Systems, 2021,34: 20230-20242.

BOCHKOVSKIY A, WANG C Y, LIAO H Y M . YOLOv4:optimal speed and accuracy of object detection[J]. arXiv preprint, 2020:arXiv:2004.10934.

REDMON J, FARHADI A . YOLOv3:an incremental improvement[J]. arXiv preprint, 2018,arXiv:1804.02767.

LIN T Y, DOLLÁR P, GIRSHICK R ,et al. Feature pyramid networks for object detection[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 936-944.

LIU S, QI L, QIN H F ,et al. Path aggregation network for instance segmentation[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 8759-8768.

HOWARD A G, ZHU M, CHEN B ,et al. MobileNets:efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint, 2017,arXiv:1704.04861.

TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A ,et al. MLP-mixer:an all-MLP architecture for vision[J]. Advances in Neural Information Processing Systems, 2021(34): 24261-24272.

HENDRYCKS D, GIMPEL K . Gaussian error linear units (GELUs)[J]. arXiv preprint, 2016,arXiv:1606.08415.

IOFFE S, SZEGEDY C . Batch normalization:accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York:ACM Press, 2015: 448-456.

KRIZHEVSKY A, SUTSKEVER I, HINTON G E . ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017,60(6): 84-90.

BA J L, KIROS J R, HINTON G E . Layer normalization[J]. arXiv preprint, 2016,arXiv:1607.06450.

LOSHCHILOV I, HUTTER F . SGDR:stochastic gradient descent with warm restarts[J]. arXiv preprint, 2016,arXiv:1608.03983.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据