基于深度强化学习的无人机数据采集和路径规划研究

牟治宇; 张煜; 范典; 刘君; 高飞飞

doi:10.11959/j.issn.2096-3750.2020.00177

您当前的位置：

首页 >

文章列表页 >

基于深度强化学习的无人机数据采集和路径规划研究

专题：智慧交通物联网 | 更新时间：2024-06-05

- 基于深度强化学习的无人机数据采集和路径规划研究
- Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning
- 物联网学报 2020年4卷第3期页码：42-51
- 作者机构：
  
  1. 清华大学自动化系，北京 100084
  2. 清华大学北京信息科学与技术国家研究中心，北京 100084
  3. 中国信息通信研究院泰尔终端实验室，北京 100191
  4. 清华大学网络科学与网络空间研究院，北京 100084
- 作者简介：
  
  [ "牟治宇（1997- ），男，河北石家庄人，清华大学硕士生，主要研究方向为基于深度强化学习的无人机路径规划" ]
  [ "张煜（1993- ），女，河南郑州人，清华大学博士生，主要研究方向为物联网通信理论、基于强化学习的无人机路径规划" ]
  [ "范典（1992- ），男，山东菏泽人，中国信息通信研究院泰尔终端实验室战略规划与研究部工程师，主要研究方向为毫米波大规模多天线通信理论、阵列信号处理和无人机通信理论" ]
  [ "刘君（1982- ），女，山东济南人，博士，清华大学助理研究员，主要研究方向为天地一体化网络、无人机组网" ]
  [ "高飞飞（1980- ），男，陕西西安人，博士，清华大学副教授、博士生导师，主要研究方向为多天线通信和智能信号处理技术" ]
- 基金信息：
  
  国家重点研发计划;The National Key R＆D Program of China(2018AAA0102401);中国信息通信研究院2020年青年课题;The China Academy of Information and Communications Technology Youth Project 2020;清华大学自主科研项目;The Tsinghua University Independent Research Project(2019Z08QCX19);国家自然科学基金资助项目;The National Natural Science Foundation of China(61902214);北京市自然科学基金资助项目;The Beijing Natural Science Foundation(4182030);北京市自然科学基金资助项目;The Beijing Natural Science Foundation(L182042)
- DOI：10.11959/j.issn.2096-3750.2020.00177
  中图分类号： TN92
- 纸质出版日期：2020-09-30，
  
  网络出版日期：2020-09，
- 稿件说明：
移动端阅览
牟治宇, 张煜, 范典, 等. 基于深度强化学习的无人机数据采集和路径规划研究[J]. 物联网学报, 2020,4(3):42-51.

ZHIYU MOU, YU ZHANG, DIAN FAN, et al. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning. [J]. Chinese journal on internet of things, 2020, 4(3): 42-51.
牟治宇, 张煜, 范典, 等. 基于深度强化学习的无人机数据采集和路径规划研究[J]. 物联网学报, 2020,4(3):42-51. DOI： 10.11959/j.issn.2096-3750.2020.00177.

ZHIYU MOU, YU ZHANG, DIAN FAN, et al. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning. [J]. Chinese journal on internet of things, 2020, 4(3): 42-51. DOI： 10.11959/j.issn.2096-3750.2020.00177.

摘要

物联网时代需要实现海量的节点覆盖和连接，对于一些偏远地区，物联网通信技术存在无法及时采集数据的问题。而无人机具有灵活性和机动性等特点，因此，可用于物联网中的无线传感器网络的数据采集。所提方案着重对无人机辅助传感器网络数据采集时的路径规划问题进行了研究，同时满足无人机自身因电池容量有限而产生的充电需求。具体地，利用时间抽象分层强化学习思想，基于离散动作深度强化学习架构，提出了一种新颖的option-DQN（option-deep Q-learning）算法，实现了高效的无人机数据采集和路径规划，同时控制无人机及时进行充电，保证其正常飞行。仿真结果表明，相比于传统DQN（deep Q-learning）算法，所提算法在训练时的周期奖励上升速度更快，最终达到的周期奖励水平更高，并且无人机在执行任务时的轨迹更清晰、合理，所提算法可以判断无人机何时应进行充电，从而保证无人机的电量始终充足。

Abstract

The Internet of things (IoT) era needs to realize the wide coverage and connections for the IoT nodes.However

the IoT communication technology cannot collect data timely in the remote area.UAV has been widely used in the IoT wireless sensor network for the data collection due to its flexibility and mobility.The trajectory design of the UAV assisted sensor network data acquisition was discussed in the proposed scheme

as well as the UAV charging demand in the data collection process was met.Specifically

based on the hierarchical reinforcement learning with the temporal abstraction

a novel option-DQN (option-deep Q-learning) algorithm targeted for the discrete action was proposed to improve the performance of the data collection and trajectory design

and control the UAV to recharge in time to ensure its normal flight.The simulation results show that the training rewards and speed of the proposed method are much better than the conventional DQN (deep Q-learning) algorithm.Besides

the proposed algorithm can guarantee the sufficient power supply of UAV by controlling it to recharge timely.

关键词

无人机路径规划数据采集充电

Keywords

UAVtrajectory designdata collectioncharging

references

ZHAO N, LU W D, SHENG M ,et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019,26(1): 45-51.

CHENG F, ZHANG S, LI Z ,et al. UAV trajectory optimization for data offloading at the edge of multiple cells[J]. IEEE Transactions on Vehicular Technology, 2018,67(7): 6732-6736.

YOU C S, ZHANG R . 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019,18(6): 3192-3207.

ZHAN C, ZENG Y, ZHANG R . Energy-efficient data collection in UAV enabled wireless sensor network[J]. IEEE Wireless Communications Letters, 2018,7(3): 328-331.

SHAMSOSHOARA A, KHALEDI M, AFGHAH F ,et al. Distributed cooperative spectrum sharing in UAV networks using multi-agent reinforcement learning[C]// 2019 16th IEEE Annual Consumer Communications ＆ Networking Conference (CCNC). IEEE, 2019: 1-6.

YANG Q, JANG S J, YOO S J . Q-learning-based fuzzy logic for multi-objective routing algorithm in flying Ad Hoc networks[J]. Wireless Personal Communications, 2020,113(1): 115-138.

LIU X, LIU Y X, ZHANG N ,et al. Optimizing trajectory of unmanned aerial vehicles for efficient data acquisition:a matrix completion approach[J]. IEEE Internet of Things Journal, 2019,6(2): 1829-1840.

ZHANG J, ZENG Y, ZHANG R . Multi-antenna UAV data harvesting:joint trajectory and communication optimization[J]. Journal of Communications and Information Networks, 2020,5(1): 86-99.

ZHAN C, ZENG Y, ZHANG R . Trajectory design for distributed estimation in UAV-enabled wireless sensor network[J]. IEEE Transactions on Vehicular Technology, 2018,67(10): 10155-10159.

ALFATTANI S, JAAFAR W, YANIKOMEROGLU H ,et al. Multi-UAV data collection framework for wireless sensor networks[C]// 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019.

LI X W, YAO H P, WANG J J ,et al. Joint node assignment and trajectory optimization for rechargeable multi-UAV aided IoT systems[C]// 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2019: 1-6.

ZHANG Y, LI B, GAO F F ,et al. A robust design for ultra reliable ambient backscatter communication systems[J]. IEEE Internet of Things Journal, 2019,6(5): 8989-8999.

CUI M, ZHANG G C, WU Q Q ,et al. Robust trajectory and transmit power design for secure UAV communications[J]. IEEE Transactions on Vehicular Technology, 2018,67(9): 9042-9046.

AL-HOURANI A, KANDEEPAN S, LARDNER S . Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014,3(6): 569-572.

SCHULMAN J, WOLSKI F, DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017

SCHAUL T, QUAN J, ANTONOGLOU I ,et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015

MNIH V, BADIA A P, MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// International Conference on Machine Learning. 2016: 1928-1937.

KULKARNI T D, NARASIMHAN K, SAEEDI A ,et al. Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic motivation[C]// Advances in Neural Information Processing Systems. 2016: 3675-3683.

丁瑞金, 高飞飞, 邢玲 . 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019,3(2): 56-63.

DING R J, GAO F F, XING L . Intelligent routing strategy in the Internet of things based on deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2019,3(2): 56-63.

MNIH V, KAVUKCUOGLU K, SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.

SUTTON R S, PRECUP D, SINGH S . Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1-2): 181-211.

蒋昂波, 王维维 . ReLU 激活函数优化研究[J]. 传感器与微系统, 2018,37(2): 50-52.

JIANG A B, WANG W W . Research on optimization of ReLU activation function[J]. Transducer and Microsystem Technology, 2018,37(2): 50-52.

TOKIC M, PALM G . Value-difference based exploration:adaptive control between epsilon-greedy and softmax[C]// Annual Conference on Artificial Intelligence. Springer, 2011: 335-346.

BOR-YALINIZ R I, EL-KEYI A, YANIKOMEROGLU H . Efficient 3-D placement of an aerial base station in next generation cellular networks[C]// 2016 IEEE International Conference on Communications (ICC). IEEE, 2016: 1-5.

浏览量

1105

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

一种针对无人机配送网络的能量自维持调度方案

可充电无人机辅助数据采集系统的飞行路线与通信调度优化

基于无人机的物联网空基中继鲁棒优化

边缘智能驱动的高能效无人机自主导航算法研究

基于无人机的物联网无线通信的潜力与方法