基于扩散强化学习的UAV能量导向轨迹规划与物联网服务质量优化

张子天; 葛天豪; 诸葛斌; 郑运强; 董黎刚; 蒋献

doi:null

您当前的位置：

首页 >

文章列表页 >

基于扩散强化学习的UAV能量导向轨迹规划与物联网服务质量优化

更新时间：2026-03-03

- 基于扩散强化学习的UAV能量导向轨迹规划与物联网服务质量优化
- UAV Energy-Oriented Trajectory Planning and IoT Service Quality Optimization Based on Diffusion Reinforcement Learning
- 物联网学报 2026年
- 作者机构：
  
  浙江工商大学，浙江杭州 310018
- 作者简介：
  
  [ "张子天（1988− ），男，博士，浙江工商大学信息与电子工程学院副研究员，主要研究方向为基于机器学习的网络流量预测与资源管理。" ]
  [ "葛天豪（2000− ），男，浙江工商大学信息与电子工程学院硕士生，主要研究方向为空天地网络。" ]
  [ "诸葛斌（1976− ），男，博士，浙江工商大学信息与电子工程学院教授，主要研究方向为网络和通信技术、互联网技术和网络安全。" ]
  [ "郑运强（1998- ），男，浙江工商大学信息与电子工程学院硕士生，主要研究方向为基于强化学习的网络资源管理。" ]
  [ "董黎刚（1972− ），男，博士，浙江工商大学信息与电子工程学院教授，主要研究方向为智能网络、在线教育。" ]
  [ "蒋献（1988− ），男，浙江工商大学信息与电子工程学院讲师、实验员，主要研究方向为在线教育。" ]
- 基金信息：
  
  国家自然科学基金资助项目(62301488;W2421086)
- DOI：
  中图分类号： TN929.5
- 收稿：2025-08-26，
  
  修回：2025-09-28，
  
  录用：2025-10-20，
- 稿件说明：
移动端阅览
张子天, 葛天豪, 诸葛斌, 等. 基于扩散强化学习的UAV能量导向轨迹规划与物联网服务质量优化[J/OL]. 物联网学报, 2026.

ZAHNG Zitian, GE Tianhao, ZHUGE Bin, et al. UAV Energy-Oriented Trajectory Planning and IoT Service Quality Optimization Based on Diffusion Reinforcement Learning[J/OL]. Chinese Journal on Internet of Things, 2026.
张子天, 葛天豪, 诸葛斌, 等. 基于扩散强化学习的UAV能量导向轨迹规划与物联网服务质量优化[J/OL]. 物联网学报, 2026. DOI：

ZAHNG Zitian, GE Tianhao, ZHUGE Bin, et al. UAV Energy-Oriented Trajectory Planning and IoT Service Quality Optimization Based on Diffusion Reinforcement Learning[J/OL]. Chinese Journal on Internet of Things, 2026. DOI：

摘要

为解决物联网（Internet of Things，IoT）设备产生的异构计算任务需要高效调度的问题，提出基于扩散强化学习（Diffusion Reinforcement Learning，DiffRL）的无人机（Unmanned Aerial Vehicle，UAV）通信网络与任务卸载系统，主要创新点包括：(1)扩散强化学习卸载决策框架，通过去噪扩散隐式模型（Denoising Diffusion Implicit Model，DDIM）采样技术将采样步数从50步减少到15步，加速70%的采样过程，同时保持98%的决策质量；(2)能量导向无人机轨迹规划算法，降低系统总能耗15.3%；(3) 实现DiffRL决策与轨迹规划的紧耦合，解决动态环境下多目标优化问题。实验表明，本系统在能耗和任务延迟方面较传统方法均有提升，在任务卸载决策中比传统深度Q网络（Deep Q-Network，DQN）和深度确定性策略梯度（Deep Deterministic Policy Gradient，DDPG）算法分别降低任务延迟30.2%和9.2%。

Abstract

To address the high-frequency heterogeneous computing task requirements generated by IoT devices

this paper proposes a Unmanned Aerial Vehicle（UAV） communication network and task offloading system based on diffusion reinforcement learning（DiffRL）.The main innovations include: (1) A diffusion reinforcement learning offloading decision framework that reduces sampling steps from 50 to 15 steps through denoising diffusion implicit model（DDIM）sampling technology

accelerating the sampling process by 70% while maintaining 98% of decision quality; (2) An e

nergy-oriented UAV trajectory planning algorithm that reduces total system energy consumption by 15.3%;（3）Achieving tight coupling between DiffRL decision-making and trajectory planning to solve multi-objective optimization problems in dynamic environments

Experiments show that this system achieves improvements in both energy consumption and task delay compared to traditional methods

reducing task delay by 30.2% and 9.2% respectively when compared to traditional Deep Q-network（DQN）and deep deterministic policy gradient algorithms （DDPG） in task offloading decisions.

关键词

Keywords

references

Y. Zeng and R. Zhang , “ Energy-efficient UAV communication with trajectory optimization ,” IEEE Transactions on Wireless Communications , vol. 16 , no. 6 , pp. 3747 - 3760 , Jun. 2017 .

F. Zhou , R. Q. Hu , and Y. Qian , “ Computation rate maximization in UAV-enabled wireless-powered mobile-edge computing systems ,” IEEE Journal on Selected Areas in Communications , vol. 36 , no. 9 , pp. 1927 - 1941 , Sep. 2018 .

V. Mnih , K. Kavukcuoglu , D. Silver , et al. , “ Human-level control through deep reinforcement learning ,” Nature , vol. 518 , no. 7540 , pp. 529 - 533 , Feb. 2015 .

J. Schulman , F. Wolski , P. Dhariwal , A. Radford , and O. Klimov , “ Proximal policy optimization algorithms ,” arXiv preprint arXiv: 1707.06347 , 2017.

M. Janner , Y. Du , J. T. Tenenbaum , and S. Levine , “ Planning with diffusion for flexible behavior synthesis ,” in Proceedings of the 37th International Conference on Machine Learning (ICML) , Baltimore, Maryland, USA: PMLR, 2022, pp. 9782 – 9793 .

H. Liang , Z. Yang , G. Zhang et al. , “ Resource allocation for space-air-ground integrated networks: A comprehensive review ,” Journal of Communications and Information Networks , vol. 9 , no. 1 , pp. 1 – 23 , 2024 .

Hansen-Estruch P. , Zhang A. K. , Vuong Q. , et al . " Diffusion policies as an expressive policy class for offline reinforcement learning ," International Conference on Learning Representations (ICLR) , 2023 .

H. Du , Z. Li , D. Niyato , J. Kang , Z. Xiong , H. Huang , and S. Mao , “ Diffusion-based reinforcement learning for edge-enabled AI-generated content services,” IEEE Trans. Mobile Comput. , to appear, 2024.

J. Song , C. Meng , and S. Ermon , “ Denoising diffusion implicit models ,” in Proceedings of the 9th International Conference on Learning Representations (ICLR) , Vienna, Austria (Online), 2021.

Q. Wu , Y. Zeng , and R. Zhang , “ Joint trajectory and communication design for multi-UAV enabled wireless networks ,” IEEE Transactions on Wireless Communications , vol. 17 , no. 3 , pp. 2109 - 2121 , Mar. 2018 .

L. Liu , S. Zhang , and R. Zhang , “ Multi-beam UAV communication in cellular uplink: Cooperative interference cancellation and sum-rate maximization ,” IEEE Transactions on Wireless Communications , vol. 18 , no. 10 , pp. 4679 - 4694 , Oct. 2019 .

S. Zhang and R. Zhang , “ Radio map-based 3 D path planning for cellular-connected UAV ,” IEEE Transactions on Wireless Communications, vol. 20 , no. 3 , pp. 1975 - 1989 , Mar. 2021.

X. Liu , Y. Liu , and Y. Chen , “ Reinforcement learning in multiple-UAV networks: Deployment and movement design ,” IEEE Transactions on Vehicular Technology , vol. 68 , no. 8 , pp. 8036 - 8049 , Aug. 2019 .

N. Khan , A. Ahmad , A. Wakeel , Z. Kaleem , B. Rashid , and W. Khalid , ‘‘Efficient UAVs deployment and resource allocation in UAV-relay assisted public safety networks for video transmission , ’’ IEEE Access , vol . 12 , pp. 4561 – 4574 , 2024 .

Y. Zheng and J. Chen , ‘‘Geography-aware optimal UAV 3D placement for LOS relaying: A geometry approach,’’ IEEE Trans. Wireless Commun., vol. 23 , no. 8 , pp. 9301 – 9314 , Aug. 2024 .

M. Chen , M. Mozaffari , W. Saad , C. Yin , M. Debbah , and C. S. Hong , “ Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-of-experience ,” IEEE Journal on Selected Areas in Communications , vol. 35 , no. 5 , pp. 1046 - 1061 , May 2017.

C. H. Liu , Z. Chen , and Y. Zhan , “ Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach ,” IEEE Journal on Selected Areas in Communications , vol. 36 , no. 9 , pp. 2059 - 2070 , Sep. 2018 .

T. Haarnoja , A. Zhou , and P. Abbeel , “ Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor ,” in Proceedings of the 35th International Conference on Machine Learning (ICML) , Stockholm, Sweden: PMLR, 2018, pp. 1861 – 1870 .

S. S. Sefati et al. , “ A comprehensive survey on resource management in 6G network based on Internet of Things ,” IEEE Access , vol. 12 , pp. 113741 – 113784 , 2024 .

J. Sohl-Dickstein , E. Weiss , N. Maheswaranathan , and S. Ganguli , " Deep Unsupervised Learning using Nonequilibrium Thermodynamics ," in Proceedings of the 32nd International Conference on Machine Learning (ICML) , Lille, France , 2015 , pp. 2256 - 2265 .

J. Ho , A. Jain , and P. Abbeel , “ Denoising diffusion probabilistic models ,” in Advances in Neural Information Processing Systems , vol. 33 , pp. 6840 - 6851 , 2020 .

P. Wang , Y. Zhou , and K. Xu , “ Diffusion policies as an expressive policy class for offfine reinforcement learning ,” in International Conference on Learning Representations , 2022.

H. Du , R. Zhang , Y. Liu , J. Wang , Y. Lin , Z. Li , D. Niyato , J. Kang , Z. Xiong , S. Cui et al. , “ Enhancing deep reinforcement learning: A tutorial on generative diffusion models in network optimization ,” IEEE Communications Surveys & Tutorials , 2024.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

空天地一体化网络中的联合轨迹优化与计算卸载策略

面向高效巡检任务推理的边缘辅助无人机机载视频压缩与传输

面向无人机集群的感知与控制闭环调度策略

基于NOMA的海洋物联网安全计算卸载

无人机辅助移动边缘计算网络中轨迹设计和带宽分配策略