期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

章广梅张建丰王炜发李勇《电讯技术》2022,62(7)

随着通信及网络技术的发展,移动终端设备往往配备多个网络接口,具有多链路的网络接入能力。为实现多链路传输,互联网工程任务组提出了新型的多路径传输控制协议（Multipath Transmission Control Protocol,MPTCP）。如何合理进行MPTCP数据包调度为用户提供更大带宽、提高数据传输可靠性,并最大化网络资源利用率,成为网络通信领域的重要课题。首先,介绍了MPTCP的基本功能和其在数据调度中面临的主要挑战;然后,针对非对称多链路下的MPTCP传输调度算法,从减少数据包乱序、降低传输延迟、提高链路利用率,以及结合强化学习四个角度分析和比较了近年来的多路径传输相关研究;最后,结合当前研究热点展望了MPTCP的未来发展趋势。相似文献

2.

基于可中断Option的在线分层强化学习方法

朱斐许志鹏刘全伏玉琛王辉《通信学报》2016,37(6):65-74

针对大数据体量大的问题,在Macro-Q算法的基础上提出了一种在线更新的Macro-Q算法(MQIU),同时更新抽象动作的值函数和元动作的值函数,提高了数据样本的利用率。针对传统的马尔可夫过程模型和抽象动作均难于应对可变性,引入中断机制,提出了一种可中断抽象动作的Macro-Q无模型学习算法(IMQ),能在动态环境下学习并改进控制策略。仿真结果验证了MQIU算法能加快算法收敛速度,进而能解决更大规模的问题,同时也验证了IMQ算法能够加快任务的求解,并保持学习性能的稳定性。相似文献

3.

面向动态拓扑网络的深度强化学习路由技术

伍元胜《电讯技术》2021,61(6):659-665

针对现有智能路由技术无法适用于动态拓扑的不足,提出了一种面向动态拓扑的深度强化学习智能路由技术,通过使用图神经网络近似PPO(Proximal Policy Optimization)强化学习算法中的策略函数与值函数、策略函数输出所有链路的权值、基于链路权值计算最小成本路径的方法,实现了路由智能体对不同网络拓扑的泛化.... 相似文献

4.

基于强化学习的虚拟机资源自动配置

李文婵彭志平《电子设计工程》2014,(5):38-40

虚拟机技术允许多个虚拟机在同一台物理主机上共享资源.为了响应应用需求的变化或者是资源供应的变化,分配到虚拟机上的资源应该能够动态的重新配置.为此,本文提出了一个基于强化学习的算法来自动处理配置进程,即（Standard Reinforcement Learning Auto-Configuration）. 强调了基于算法的模型来解决在资源管理系统的稳定性和适应性问题.这里通过在一个云环境仿真软件CloudSim在基于虚拟机的云测试床实施具有代表性的服务器负载的实验,结果证明了的有效性.这个方法可以在小规模系统里发现最优（或接近最优）的配置策略,并且表现了很好的稳定性和适应性. 相似文献

5.

Autonomic discovery of subgoals in hierarchical reinforcement learning

XIAO Ding LI Yi-tong SHI Chuan 《中国邮电高校学报(英文版)》2014,21(5):94-104

Option is a promising method to discover the hierarchical structure in reinforcement learning （RL） for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent＇s actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value （UDV） approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly. 相似文献

6.

Avoiding collaborative paradox in multi-agent reinforcement learning

Hyunseok Kim Seonghyun Kim Donghun Lee Ingook Jang 《ETRI Journal》2021,43(6):1004-1012

The collaboration productively interacting between multi-agents has become an emerging issue in real-world applications. In reinforcement learning, multi-agent environments present challenges beyond tractable issues in single-agent settings. This collaborative environment has the following highly complex attributes: sparse rewards for task completion, limited communications between each other, and only partial observations. In particular, adjustments in an agent's action policy result in a nonstationary environment from the other agent's perspective, which causes high variance in the learned policies and prevents the direct use of reinforcement learning approaches. Unexpected social loafing caused by high dispersion makes it difficult for all agents to succeed in collaborative tasks. Therefore, we address a paradox caused by the social loafing to significantly reduce total returns after a certain timestep of multi-agent reinforcement learning. We further demonstrate that the collaborative paradox in multi-agent environments can be avoided by our proposed effective early stop method leveraging a metric for social loafing. 相似文献

7.

基于强化学习的多机器人避碰算法研究

段勇陈腾峰《信息技术》2012,(6):100-103

采用强化学习解决多机器人避碰问题。然后针对表格式Q学习算法只能用于离散的状态并且学习时间过长,难以收敛的不足,提出了神经网络和Q学习相结合的算法。最后将该算法应用到多机器人避碰问题中,仿真实验表明该算法有效,能较好地解决多机器人避碰问题。相似文献

8.

Service migration in mobile edge computing: A deep reinforcement learning approach

Hongman Wang Yingxue Li Ao Zhou Yan Guo Shangguang Wang 《International Journal of Communication Systems》2023,36(1):e4413

In mobile edge computing, service migration can not only reduce the access latency but also reduce the network costs for users. However, due to bandwidth bottleneck, migration costs should also be considered during service migration. In this way, the trade-off between benefits of service migration and total service costs is very important for the cloud service providers. In this paper, we propose an efficient dynamic service migration algorithm named SMDQN, which is based on reinforcement learning. We consider each mobile application service can be hosted on one or more edge nodes and each edge node has limited resources. SMDQN takes total delay and migration costs into consideration. And to reduce the size of Markov decision process space, we devise the deep reinforcement learning algorithm to make a fast decision. We implement the algorithm and test the performance and stability of it. The simulation result shows that it can minimize the service costs and adapt well to different mobile access patterns. 相似文献

9.

Service chain mapping algorithm based on reinforcement learning

Liang WEI Tao HUANG Jiao ZHANG Zenan WANG Jiang LIU Yunjie LIU 《通信学报》2018,39(1):90-100

A service chain resource scheduling architecture of multi-agent based on artificial intelligence technology was proposed.Meanwhile,a service chain mapping algorithm based on reinforcement learning was designed.Through the Q-learning mechanism,the location of each virtual network element in the service chain was determined according to the system status and the reward and punishment feedback after the deployment.The experimental results show that compared with the classical algorithms,the algorithm effectively reduces the average transmission delay of the service and improves the load balance of the system. 相似文献

10.

Deep reinforcement learning based resource allocation algorithm in cellular networks

Xiaomin LIAO Shaohu YAN Jia SHI Zhenyu TAN Zhongling ZHAO Zan LI 《通信学报》2019,40(2):11-18

In order to solve multi-objective optimization problem,a resource allocation algorithm based on deep reinforcement learning in cellular networks was proposed.Firstly,deep neural network (DNN) was built to optimize the transmission rate of cellular system and to complete the forward transmission process of the algorithm.Then,the Q-learning mechanism was utilized to construct the error function,which used energy efficiency as the rewards.The gradient descent method was used to train the weights of DNN,and the reverse training process of the algorithm was completed.The simulation results show that the proposed algorithm can determine optimization extent of optimal resource allocation scheme with rapid convergence ability,it is obviously superior to the other algorithms in terms of transmission rate and system energy consumption optimization. 相似文献

11.

Vehicle-following system based on deep reinforcement learning in marine scene

张新娄皓然蒋励肖前浩蔡著文《中国邮电高校学报(英文版)》2022,29(5):10-20

In order to solve the problems that the feature data type are not rich enough in the data collection process about the vehicle-following task in marine scene which results in a long model convergence time and high training difficulty, a two-stage vehicle-following system was proposed. Firstly, semantic segmentation model predicts the number of pixels of the followed target, then the number of pixels of the followed target is mapped to the position feature. Secondly, deep reinforcement learning algorithm enables the control equipment to make decision action, to ensure that two moving objects remain within the safe distance. The experimental results show that the two-stage vehicle-following system has a 40% faster convergence rate than the model without position feature, and the following stability is significantly improved by adding the position feature. 相似文献

12.

基于知识的Agent强化学习算法分析与研究

殷锋社《电子设计工程》2011,19(11):115-117

强化学习具有与环境交互的优势,笔者提出的基于知识的Q-学习算法(KBQL)就是利用Q-学习算法的这个特点,利用Agent的先验知识来缩小Agent学习的状态空间,以加速强化学习的收敛性,同时采用Agent的学习机制克服其知识的不精确性,从而提高学习算法的鲁棒性和适应性。相似文献

13.

Adaptive quality of service‐based routing approaches: development of neuro‐dynamic state‐dependent reinforcement learning algorithms

Abdelhamid Mellouk Saïd Hoceïni Yacine Amirat 《International Journal of Communication Systems》2007,20(10):1113-1130

In this paper, we propose two adaptive routing algorithms based on reinforcement learning. In the first algorithm, we have used a neural network to approximate the reinforcement signal, allowing the learner to take into account various parameters such as local queue size, for distance estimation. Moreover, each router uses an online learning module to optimize the path in terms of average packet delivery time, by taking into account the waiting queue states of neighbouring routers. In the second algorithm, the exploration of paths is limited to N‐best non‐loop paths in terms of hops number (number of routers in a path), leading to a substantial reduction of convergence time. The performances of the proposed algorithms are evaluated experimentally with OPNET simulator for different levels of traffic's load and compared with standard shortest‐path and Q‐routing algorithms. Our approach proves superior to classical algorithms and is able to route efficiently even when the network load varies in an irregular manner. We also tested our approach on a large network topology to proof its scalability and adaptability. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献

14.

基于强化学习的合作频谱分配算法

下载免费PDF全文

李冠雄李桂林《电波科学学报》2022,37(1):8-14

为了解决认知无线电网络中的频谱分配问题,提出了一种基于用户体验质量的合作强化学习频谱分配算法,将认知网络中的次用户模拟为强化学习中的智能体,并在次用户间引入合作机制,新加入用户可以吸收借鉴其他用户的强化学习经验,能够以更快的速度获得最佳的频谱分配方案;并且在频谱分配过程中引入了主用户和次用户之间的价格博弈因素,允许主用... 相似文献

15.

Distributed interference coordination based on multi-agent deep reinforcement learning

Tingting LIU Yi’nan LUO Chenyang YANG 《通信学报》2020,41(7):38-48

A distributed interference coordination strategy based on multi-agent deep reinforcement learning was investigated to meet the requirements of file downloading traffic in interference networks.By the proposed strategy transmission scheme could be adjusted adaptively based on the interference environment and traffic requirements with limited amount of information exchanged among nodes.Simulation results show that the user satisfaction loss of the proposed strategy from the optimal strategy with perfect future information does not exceed 11% for arbitrary number of users and traffic requirements. 相似文献

16.

Secured data offloading using reinforcement learning and Markov decision process in mobile edge computing

Jitendra Kumar Samriya Mohit Kumar Sukhpal Singh Gill 《International Journal of Network Management》2023,33(5):e2243

Mobile Internet services are developing rapidly for several applications based on computational ability such as augmented/virtual reality, vehicular networks, etc. The mobile terminals are enabled using mobile edge computing (MEC) for offloading the task at the edge of the cellular networks, but offloading is still a challenging issue due to the dynamism, and uncertainty of upcoming IoT requests and wireless channel state. Moreover, securing the offloading data enhanced the challenges of computational complexities and required a secure and efficient offloading technique. To tackle the mentioned issues, a reinforcement learning-based Markov decision process offloading model is proposed that optimized energy efficiency, and mobile users' time by considering the constrained computation of IoT devices, moreover guarantees efficient resource sharing among multiple users. An advanced encryption standard is employed in this work to fulfil the requirements of data security. The simulation outputs reveal that the proposed approach surpasses the existing baseline models for offloading overhead and service cost QoS parameters ensuring secure data offloading. 相似文献

17.

Software-defined networking QoS optimization based on deep reinforcement learning

Julong LAN Xueshuai ZHANG Yuxiang HU Penghao SUN 《通信学报》2019,40(12):60-67

To solve the problem that the QoS optimization schemes which based on heuristic algorithm degraded often due to the mismatch between parameters and network characteristics in software-defined networking scenarios,a software-defined networking QoS optimization algorithm based on deep reinforcement learning was proposed.Firstly,the network resources and state information were integrated into the network model,and then the flow perception capability was improved by the long short-term memory,and finally the dynamic flow scheduling strategy,which satisfied the specific QoS objectives,were generated in combination with deep reinforcement learning.The experimental results show that,compared with the existing algorithms,the proposed algorithm not only ensures the end-to-end delay and packet loss rate,but also improves the network load balancing by 22.7% and increases the throughput by 8.2%. 相似文献

18.

Artificial emotion model based on reinforcement learning mechanism of neural network

SHI Xue-fei WANG Zhi-liang PING An ZHANG Li-kun 《中国邮电高校学报(英文版)》2011,18(3):105-109

A hierarchical-processed frame construction of artificial emotion model for intelligent system is proposed in the paper according to the basic conclusion of emotional psychology.The general method of emotion processing,which considers only one single layer,has been changed in the presented construction.An artificial emotional development model is put forward based on reinforcement learning mechanism of neural network.The new model takes the emotion itself as reinforcement signal and describes its different influences on action learning efficiency corresponding to different individualities.In the end,simulation result based on child playmate robot is discussed and the effectiveness of the model is verified. 相似文献

19.

Multi-agent reinforcement learning for edge information sharing in vehicular networks

Ruyan Wang Xue Jiang Yujie Zhou Zhidu Li Dapeng Wu Tong Tang Alexander Fedotov Vladimir Badenko 《Digital Communications & Networks》2022,8(3):267-277

To guarantee the heterogeneous delay requirements of the diverse vehicular services, it is necessary to design a full cooperative policy for both Vehicle to Infrastructure (V2I) and Vehicle to Vehicle (V2V) links. This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links. Specifically, a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user, respectively. A multi-agent reinforcement learning framework is designed to solve these two problems, where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework. Thereafter, a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward. The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments. 相似文献

20.

基于深度强化学习的智能决策方法

熊蓉玲段春怡冉华明杨萌冯旸赫《电讯技术》2023,(1):1-6

针对传统深度强化学习算法难以快速解决长时序复杂任务的问题,提出了一种引入历史信息和人类知识的深度强化学习方法,对经典近端策略优化(Proximal Policy Optimization, PPO)强化学习算法进行改进,在状态空间引入历史状态以反映环境的时序变化特征,在策略模型中基于人类认知增加无效动作掩膜,禁止智能体进行无效探索,提高探索效率,从而提升模型的训练性能。仿真结果表明,所提方法能够有效解决长时序复杂任务的智能决策问题,相比传统的深度强化学习算法可显著提高模型收敛效果。相似文献