浙江工商大学,浙江 杭州 310018
[ "张子天(1988− ),男,博士,浙江工商大学信息与电子工程学院副研究员,主要研究方向为基于机器学习的网络流量预测与资源管理。" ]
[ "葛天豪(2000− ),男,浙江工商大学信息与电子工程学院硕士生,主要研究方向为空天地网络。" ]
[ "诸葛斌(1976− ),男,博士,浙江工商大学信息与电子工程学院教授,主要研究方向为网络和通信技术、互联网技术和网络安全。" ]
[ "郑运强(1998- ),男,浙江工商大学信息与电子工程学院硕士生,主要研究方向为基于强化学习的网络资源管理。" ]
[ "董黎刚(1972− ),男,博士,浙江工商大学信息与电子工程学院教授,主要研究方向为智能网络、在线教育。" ]
[ "蒋献(1988− ),男,浙江工商大学信息与电子工程学院讲师、实验员,主要研究方向为在线教育。" ]
收稿:2025-08-26,
修回:2025-09-28,
录用:2025-10-20,
移动端阅览
张子天, 葛天豪, 诸葛斌, 等. 基于扩散强化学习的UAV能量导向轨迹规划与物联网服务质量优化[J/OL]. 物联网学报, 2026.
ZAHNG Zitian, GE Tianhao, ZHUGE Bin, et al. UAV Energy-Oriented Trajectory Planning and IoT Service Quality Optimization Based on Diffusion Reinforcement Learning[J/OL]. Chinese Journal on Internet of Things, 2026.
为解决物联网(Internet of Things,IoT)设备产生的异构计算任务需要高效调度的问题,提出基于扩散强化学习(Diffusion Reinforcement Learning,DiffRL)的无人机(Unmanned Aerial Vehicle,UAV)通信网络与任务卸载系统,主要创新点包括:(1)扩散强化学习卸载决策框架,通过去噪扩散隐式模型(Denoising Diffusion Implicit Model,DDIM)采样技术将采样步数从50步减少到15步,加速70%的采样过程,同时保持98%的决策质量;(2)能量导向无人机轨迹规划算法,降低系统总能耗15.3%;(3) 实现DiffRL决策与轨迹规划的紧耦合,解决动态环境下多目标优化问题。实验表明,本系统在能耗和任务延迟方面较传统方法均有提升,在任务卸载决策中比传统深度Q网络(Deep Q-Network,DQN)和深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法分别降低任务延迟30.2%和9.2%。
To address the high-frequency heterogeneous computing task requirements generated by IoT devices
this paper proposes a Unmanned Aerial Vehicle(UAV) communication network and task offloading system based on diffusion reinforcement learning(DiffRL).The main innovations include: (1) A diffusion reinforcement learning offloading decision framework that reduces sampling steps from 50 to 15 steps through denoising diffusion implicit model(DDIM)sampling technology
accelerating the sampling process by 70% while maintaining 98% of decision quality; (2) An e
nergy-oriented UAV trajectory planning algorithm that reduces total system energy consumption by 15.3%;(3)Achieving tight coupling between DiffRL decision-making and trajectory planning to solve multi-objective optimization problems in dynamic environments
.
Experiments show that this system achieves improvements in both energy consumption and task delay compared to traditional methods
reducing task delay by 30.2% and 9.2% respectively when compared to traditional Deep Q-network(DQN)and deep deterministic policy gradient algorithms (DDPG) in task offloading decisions.
Y. Zeng and R. Zhang , “ Energy-efficient UAV communication with trajectory optimization ,” IEEE Transactions on Wireless Communications , vol. 16 , no. 6 , pp. 3747 - 3760 , Jun. 2017 .
F. Zhou , R. Q. Hu , and Y. Qian , “ Computation rate maximization in UAV-enabled wireless-powered mobile-edge computing systems ,” IEEE Journal on Selected Areas in Communications , vol. 36 , no. 9 , pp. 1927 - 1941 , Sep. 2018 .
V. Mnih , K. Kavukcuoglu , D. Silver , et al. , “ Human-level control through deep reinforcement learning ,” Nature , vol. 518 , no. 7540 , pp. 529 - 533 , Feb. 2015 .
J. Schulman , F. Wolski , P. Dhariwal , A. Radford , and O. Klimov , “ Proximal policy optimization algorithms ,” arXiv preprint arXiv: 1707.06347 , 2017.
M. Janner , Y. Du , J. T. Tenenbaum , and S. Levine , “ Planning with diffusion for flexible behavior synthesis ,” in Proceedings of the 37th International Conference on Machine Learning (ICML) , Baltimore, Maryland, USA: PMLR, 2022, pp. 9782 – 9793 .
H. Liang , Z. Yang , G. Zhang et al. , “ Resource allocation for space-air-ground integrated networks: A comprehensive review ,” Journal of Communications and Information Networks , vol. 9 , no. 1 , pp. 1 – 23 , 2024 .
Hansen-Estruch P. , Zhang A. K. , Vuong Q. , et al . " Diffusion policies as an expressive policy class for offline reinforcement learning ," International Conference on Learning Representations (ICLR) , 2023 .
H. Du , Z. Li , D. Niyato , J. Kang , Z. Xiong , H. Huang , and S. Mao , “ Diffusion-based reinforcement learning for edge-enabled AI-generated content services,” IEEE Trans. Mobile Comput. , to appear, 2024.
J. Song , C. Meng , and S. Ermon , “ Denoising diffusion implicit models ,” in Proceedings of the 9th International Conference on Learning Representations (ICLR) , Vienna, Austria (Online), 2021.
Q. Wu , Y. Zeng , and R. Zhang , “ Joint trajectory and communication design for multi-UAV enabled wireless networks ,” IEEE Transactions on Wireless Communications , vol. 17 , no. 3 , pp. 2109 - 2121 , Mar. 2018 .
L. Liu , S. Zhang , and R. Zhang , “ Multi-beam UAV communication in cellular uplink: Cooperative interference cancellation and sum-rate maximization ,” IEEE Transactions on Wireless Communications , vol. 18 , no. 10 , pp. 4679 - 4694 , Oct. 2019 .
S. Zhang and R. Zhang , “ Radio map-based 3 D path planning for cellular-connected UAV ,” IEEE Transactions on Wireless Communications, vol. 20 , no. 3 , pp. 1975 - 1989 , Mar. 2021.
X. Liu , Y. Liu , and Y. Chen , “ Reinforcement learning in multiple-UAV networks: Deployment and movement design ,” IEEE Transactions on Vehicular Technology , vol. 68 , no. 8 , pp. 8036 - 8049 , Aug. 2019 .
N. Khan , A. Ahmad , A. Wakeel , Z. Kaleem , B. Rashid , and W. Khalid , ‘‘Efficient UAVs deployment and resource allocation in UAV-relay assisted public safety networks for video transmission , ’’ IEEE Access , vol . 12 , pp. 4561 – 4574 , 2024 .
Y. Zheng and J. Chen , ‘‘Geography-aware optimal UAV 3D placement for LOS relaying: A geometry approach,’’ IEEE Trans. Wireless Commun., vol. 23 , no. 8 , pp. 9301 – 9314 , Aug. 2024 .
M. Chen , M. Mozaffari , W. Saad , C. Yin , M. Debbah , and C. S. Hong , “ Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-of-experience ,” IEEE Journal on Selected Areas in Communications , vol. 35 , no. 5 , pp. 1046 - 1061 , May 2017.
C. H. Liu , Z. Chen , and Y. Zhan , “ Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach ,” IEEE Journal on Selected Areas in Communications , vol. 36 , no. 9 , pp. 2059 - 2070 , Sep. 2018 .
T. Haarnoja , A. Zhou , and P. Abbeel , “ Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor ,” in Proceedings of the 35th International Conference on Machine Learning (ICML) , Stockholm, Sweden: PMLR, 2018, pp. 1861 – 1870 .
S. S. Sefati et al. , “ A comprehensive survey on resource management in 6G network based on Internet of Things ,” IEEE Access , vol. 12 , pp. 113741 – 113784 , 2024 .
J. Sohl-Dickstein , E. Weiss , N. Maheswaranathan , and S. Ganguli , " Deep Unsupervised Learning using Nonequilibrium Thermodynamics ," in Proceedings of the 32nd International Conference on Machine Learning (ICML) , Lille, France , 2015 , pp. 2256 - 2265 .
J. Ho , A. Jain , and P. Abbeel , “ Denoising diffusion probabilistic models ,” in Advances in Neural Information Processing Systems , vol. 33 , pp. 6840 - 6851 , 2020 .
P. Wang , Y. Zhou , and K. Xu , “ Diffusion policies as an expressive policy class for offfine reinforcement learning ,” in International Conference on Learning Representations , 2022.
H. Du , R. Zhang , Y. Liu , J. Wang , Y. Lin , Z. Li , D. Niyato , J. Kang , Z. Xiong , S. Cui et al. , “ Enhancing deep reinforcement learning: A tutorial on generative diffusion models in network optimization ,” IEEE Communications Surveys & Tutorials , 2024.
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621