浏览全部资源
扫码关注微信
1. 清华大学自动化系,北京 100084
2. 清华大学北京信息科学与技术国家研究中心,北京 100084
3. 中国信息通信研究院泰尔终端实验室,北京 100191
4. 清华大学网络科学与网络空间研究院,北京 100084
[ "牟治宇(1997- ),男,河北石家庄人,清华大学硕士生,主要研究方向为基于深度强化学习的无人机路径规划" ]
[ "张煜(1993- ),女,河南郑州人,清华大学博士生,主要研究方向为物联网通信理论、基于强化学习的无人机路径规划" ]
[ "范典(1992- ),男,山东菏泽人,中国信息通信研究院泰尔终端实验室战略规划与研究部工程师,主要研究方向为毫米波大规模多天线通信理论、阵列信号处理和无人机通信理论" ]
[ "刘君(1982- ),女,山东济南人,博士,清华大学助理研究员,主要研究方向为天地一体化网络、无人机组网" ]
[ "高飞飞(1980- ),男,陕西西安人,博士,清华大学副教授、博士生导师,主要研究方向为多天线通信和智能信号处理技术" ]
纸质出版日期:2020-09-30,
网络出版日期:2020-09,
移动端阅览
牟治宇, 张煜, 范典, 等. 基于深度强化学习的无人机数据采集和路径规划研究[J]. 物联网学报, 2020,4(3):42-51.
ZHIYU MOU, YU ZHANG, DIAN FAN, et al. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning. [J]. Chinese journal on internet of things, 2020, 4(3): 42-51.
牟治宇, 张煜, 范典, 等. 基于深度强化学习的无人机数据采集和路径规划研究[J]. 物联网学报, 2020,4(3):42-51. DOI: 10.11959/j.issn.2096-3750.2020.00177.
ZHIYU MOU, YU ZHANG, DIAN FAN, et al. Research on the UAV-aided data collection and trajectory design based on the deep reinforcement learning. [J]. Chinese journal on internet of things, 2020, 4(3): 42-51. DOI: 10.11959/j.issn.2096-3750.2020.00177.
物联网时代需要实现海量的节点覆盖和连接,对于一些偏远地区,物联网通信技术存在无法及时采集数据的问题。而无人机具有灵活性和机动性等特点,因此,可用于物联网中的无线传感器网络的数据采集。所提方案着重对无人机辅助传感器网络数据采集时的路径规划问题进行了研究,同时满足无人机自身因电池容量有限而产生的充电需求。具体地,利用时间抽象分层强化学习思想,基于离散动作深度强化学习架构,提出了一种新颖的option-DQN(option-deep Q-learning)算法,实现了高效的无人机数据采集和路径规划,同时控制无人机及时进行充电,保证其正常飞行。仿真结果表明,相比于传统DQN(deep Q-learning)算法,所提算法在训练时的周期奖励上升速度更快,最终达到的周期奖励水平更高,并且无人机在执行任务时的轨迹更清晰、合理,所提算法可以判断无人机何时应进行充电,从而保证无人机的电量始终充足。
The Internet of things (IoT) era needs to realize the wide coverage and connections for the IoT nodes.However
the IoT communication technology cannot collect data timely in the remote area.UAV has been widely used in the IoT wireless sensor network for the data collection due to its flexibility and mobility.The trajectory design of the UAV assisted sensor network data acquisition was discussed in the proposed scheme
as well as the UAV charging demand in the data collection process was met.Specifically
based on the hierarchical reinforcement learning with the temporal abstraction
a novel option-DQN (option-deep Q-learning) algorithm targeted for the discrete action was proposed to improve the performance of the data collection and trajectory design
and control the UAV to recharge in time to ensure its normal flight.The simulation results show that the training rewards and speed of the proposed method are much better than the conventional DQN (deep Q-learning) algorithm.Besides
the proposed algorithm can guarantee the sufficient power supply of UAV by controlling it to recharge timely.
无人机路径规划数据采集充电
UAVtrajectory designdata collectioncharging
ZHAO N, LU W D, SHENG M ,et al. UAV-assisted emergency networks in disasters[J]. IEEE Wireless Communications, 2019,26(1): 45-51.
CHENG F, ZHANG S, LI Z ,et al. UAV trajectory optimization for data offloading at the edge of multiple cells[J]. IEEE Transactions on Vehicular Technology, 2018,67(7): 6732-6736.
YOU C S, ZHANG R . 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019,18(6): 3192-3207.
ZHAN C, ZENG Y, ZHANG R . Energy-efficient data collection in UAV enabled wireless sensor network[J]. IEEE Wireless Communications Letters, 2018,7(3): 328-331.
SHAMSOSHOARA A, KHALEDI M, AFGHAH F ,et al. Distributed cooperative spectrum sharing in UAV networks using multi-agent reinforcement learning[C]// 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 2019: 1-6.
YANG Q, JANG S J, YOO S J . Q-learning-based fuzzy logic for multi-objective routing algorithm in flying Ad Hoc networks[J]. Wireless Personal Communications, 2020,113(1): 115-138.
LIU X, LIU Y X, ZHANG N ,et al. Optimizing trajectory of unmanned aerial vehicles for efficient data acquisition:a matrix completion approach[J]. IEEE Internet of Things Journal, 2019,6(2): 1829-1840.
ZHANG J, ZENG Y, ZHANG R . Multi-antenna UAV data harvesting:joint trajectory and communication optimization[J]. Journal of Communications and Information Networks, 2020,5(1): 86-99.
ZHAN C, ZENG Y, ZHANG R . Trajectory design for distributed estimation in UAV-enabled wireless sensor network[J]. IEEE Transactions on Vehicular Technology, 2018,67(10): 10155-10159.
ALFATTANI S, JAAFAR W, YANIKOMEROGLU H ,et al. Multi-UAV data collection framework for wireless sensor networks[C]// 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019.
LI X W, YAO H P, WANG J J ,et al. Joint node assignment and trajectory optimization for rechargeable multi-UAV aided IoT systems[C]// 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2019: 1-6.
ZHANG Y, LI B, GAO F F ,et al. A robust design for ultra reliable ambient backscatter communication systems[J]. IEEE Internet of Things Journal, 2019,6(5): 8989-8999.
CUI M, ZHANG G C, WU Q Q ,et al. Robust trajectory and transmit power design for secure UAV communications[J]. IEEE Transactions on Vehicular Technology, 2018,67(9): 9042-9046.
AL-HOURANI A, KANDEEPAN S, LARDNER S . Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014,3(6): 569-572.
SCHULMAN J, WOLSKI F, DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv:1707.06347, 2017
SCHAUL T, QUAN J, ANTONOGLOU I ,et al. Prioritized experience replay[J]. arXiv:1511.05952, 2015
MNIH V, BADIA A P, MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// International Conference on Machine Learning. 2016: 1928-1937.
KULKARNI T D, NARASIMHAN K, SAEEDI A ,et al. Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic motivation[C]// Advances in Neural Information Processing Systems. 2016: 3675-3683.
丁瑞金, 高飞飞, 邢玲 . 基于深度强化学习的物联网智能路由策略[J]. 物联网学报, 2019,3(2): 56-63.
DING R J, GAO F F, XING L . Intelligent routing strategy in the Internet of things based on deep reinforcement learning[J]. Chinese Journal on Internet of Things, 2019,3(2): 56-63.
MNIH V, KAVUKCUOGLU K, SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
SUTTON R S, PRECUP D, SINGH S . Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1-2): 181-211.
蒋昂波, 王维维 . ReLU 激活函数优化研究[J]. 传感器与微系统, 2018,37(2): 50-52.
JIANG A B, WANG W W . Research on optimization of ReLU activation function[J]. Transducer and Microsystem Technology, 2018,37(2): 50-52.
TOKIC M, PALM G . Value-difference based exploration:adaptive control between epsilon-greedy and softmax[C]// Annual Conference on Artificial Intelligence. Springer, 2011: 335-346.
BOR-YALINIZ R I, EL-KEYI A, YANIKOMEROGLU H . Efficient 3-D placement of an aerial base station in next generation cellular networks[C]// 2016 IEEE International Conference on Communications (ICC). IEEE, 2016: 1-5.
0
浏览量
1105
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构