首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.  相似文献   

2.
A reinforcement agent for object segmentation in ultrasound images   总被引:1,自引:0,他引:1  
The principal contribution of this work is to design a general framework for an intelligent system to extract one object of interest from ultrasound images. This system is based on reinforcement learning. The input image is divided into several sub-images, and the proposed system finds the appropriate local values for each of them so that it can extract the object of interest. The agent uses some images and their ground-truth (manually segmented) version to learn from. A reward function is employed to measure the similarities between the output and the manually segmented images, and to provide feedback to the agent. The information obtained can be used as valuable knowledge stored in the Q-matrix. The agent can then use this knowledge for new input images. The experimental results for prostate segmentation in trans-rectal ultrasound images show high potential of this approach in the field of ultrasound image segmentation.  相似文献   

3.
4.
This paper firstly proposes a bilateral optimized negotiation model based on reinforcement learning. This model negotiates on the issue price and the quantity, introducing a mediator agent as the mediation mechanism, and uses the improved reinforcement learning negotiation strategy to produce the optimal proposal. In order to further improve the performance of negotiation, this paper then proposes a negotiation method based on the adaptive learning of mediator agent. The simulation results show that the proposed negotiation methods make the efficiency and the performance of the negotiation get improved.  相似文献   

5.
This paper presents a hybrid agent architecture that integrates the behaviours of BDI agents, specifically desire and intention, with a neural network based reinforcement learner known as Temporal Difference-Fusion Architecture for Learning and COgNition (TD-FALCON). With the explicit maintenance of goals, the agent performs reinforcement learning with the awareness of its objectives instead of relying on external reinforcement signals. More importantly, the intention module equips the hybrid architecture with deliberative planning capabilities, enabling the agent to purposefully maintain an agenda of actions to perform and reducing the need of constantly sensing the environment. Through reinforcement learning, plans can also be learned and evaluated without the rigidity of user-defined plans as used in traditional BDI systems. For intention and reinforcement learning to work cooperatively, two strategies are presented for combining the intention module and the reactive learning module for decision making in a real time environment. Our case study based on a minefield navigation domain investigates how the desire and intention modules may cooperatively enhance the capability of a pure reinforcement learner. The empirical results show that the hybrid architecture is able to learn plans efficiently and tap both intentional and reactive action execution to yield a robust performance.  相似文献   

6.
An agent based architecture that is modelled on a successfully operating process of the real world–criminal investigation–circumvents high computational costs caused by Bayesian fusion by realising a distributed local Bayesian fusion approach. The idea underlying local Bayesian fusion approaches is to perform Bayesian fusion at least not in detail on the whole space that is spanned by the Properties-of-Interest. Local Bayesian fusion is mainly based on coarsening and restriction techniques. Here, we focus on coarsening. We give an overview over the agent based conception and translate the proposed ideas in a formal mathematical framework.  相似文献   

7.
8.
针对移动代理的安全问题,提出了基于门限技术的多Agent系统安全性设计方案。方案采用门限秘密共享技术,将加密信息和密钥由多个移动Agent分开携带,保护移动Agent的隐私,提高移动Agent系统的可靠性和容错能力:通过有限授权实现移动Agent代表用户在远端签约而不暴露其主人的私钥的功能。  相似文献   

9.
To synthesize the optimal control strategies of nonlinear systems on infinite horizon while subject to mixed equality and inequality constraints has been a challenge to control engineers. This paper regards it as a problem of finite-time optimization in infinite-horizon control then devises a reinforcement learning agent, termed as the Adaptive Optimal Control (AOC) agent, to carry out the finite-time optimization procedures. Adaptive optimal control is in the sense of activating the finite-time optimization procedure whenever needed to improve the control strategy or adapt to a real-world environment. The Nonlinear Quadratic Regulator (NQR) is shown a typical example that the AOC agent can find out. The optimality conditions and adaptation rules for the AOC agent are deduced from Pontryagin’s minimum principle. The requirements for convergence and stability of the AOC system are shown.  相似文献   

10.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method.  相似文献   

11.
Reinforcement learning is not well scalable in state spaces with high-dimensions. The hierarchical reinforcement learning resolves this problem by task decomposition. Task decomposition is done by extracting bottlenecks, which is in turn another challenging issue, especially in terms of time and memory complexity and the need to the prior knowledge of the environment. To alleviate these issues, a new approach is proposed toward the problem of extracting bottlenecks. Holonic concept clustering and attentional functions are proposed to extract bottleneck states. To this end, states are organized based on the effects of actions by means of a holonic clustering to extract high-level concepts. High-level concepts are used as cues for controlling attention. The proposed mechanism has a better time complexity and fewer requirements to the designer's help. The experimental results showed a considerable improvement in the precision of bottleneck detection and agent's performance for traditional benchmarks comparing to other similar methods.  相似文献   

12.
基于半自治agent的Profit-sharing增强学习方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
在基于半自治agent的系统中应用profit-sharing增强学习方法,并与基于动态规划的Q-learning 增强学习方法进行比较,在不确定因素较多的动态环境中,当系统状态变化不是一个马尔科夫过程时profit-sharing方法具有很大优势。根据半自治agent中半自治的特性——受制性,提出了一种面向基于半自治agent的增强学习模型,以战场仿真中安全隐蔽的寻找模型为实例对基于半自治agent的profit-sharing增强学习模型进行了试验分析。  相似文献   

13.
Filling the gaps between virtual and physical systems will open new doors in Smart Manufacturing. This work proposes a data-driven approach to utilize digital transformation methods to automate smart manufacturing systems. This is fundamentally enabled by using a digital twin to represent manufacturing cells, simulate system behaviors, predict process faults, and adaptively control manipulated variables. First, the manufacturing cell is accommodated to environments such as computer-aided applications, industrial Product Lifecycle Management solutions, and control platforms for automation systems. Second, a network of interfaces between the environments is designed and implemented to enable communication between the digital world and physical manufacturing plant, so that near-synchronous controls can be achieved. Third, capabilities of some members in the family of Deep Reinforcement Learning (DRL) are discussed with manufacturing features within the context of Smart Manufacturing. Trained results for Deep Q Learning algorithms are finally presented in this work as a case study to incorporate DRL-based artificial intelligence to the industrial control process. As a result, developed control methodology, named Digital Engine, is expected to acquire process knowledges, schedule manufacturing tasks, identify optimal actions, and demonstrate control robustness. The authors show that integrating a smart agent into the industrial platforms further expands the usage of the system-level digital twin, where intelligent control algorithms are trained and verified upfront before deployed to the physical world for implementation. Moreover, DRL approach to automated manufacturing control problems under facile optimization environments will be a novel combination between data science and manufacturing industries.  相似文献   

14.
The problem of computing a route for a mobile agent that incrementally fuses the data as it visits the nodes in a distributed sensor network is considered. The order of nodes visited along the route has a significant impact on the quality and cost of fused data, which, in turn, impacts the main objective of the sensor network, such as target classification or tracking. We present a simplified analytical model for a distributed sensor network and formulate the route computation problem in terms of maximizing an objective function, which is directly proportional to the received signal strength and inversely proportional to the path loss and energy consumption. We show this problem to be NP-complete and propose a genetic algorithm to compute an approximate solution by suitably employing a two-level encoding scheme and genetic operators tailored to the objective function. We present simulation results for networks with different node sizes and sensor distributions, which demonstrate the superior performance of our algorithm over two existing heuristics, namely, local closest first and global closest first methods.  相似文献   

15.
针对网络能耗和延迟问题,提出了一种基于免疫代理的数据融合算法。通过代理的自由迁移降低节点传输能耗;通过免疫降低参与融合的节点数以降低网络能耗;设立应急通道以降低紧急情况下的网络延迟;采用十六进制编码方法对融合数据进行压缩处理。试验结果表明,该算法能有效降低网络能耗和延迟。  相似文献   

16.
This paper proposes a neuro-fuzzy combiner (NFC) with reinforcement learning capability for solving multiobjective control problems. The proposed NFC can combine n existing low-level controllers in a hierarchical way to form a multiobjective fuzzy controller. It is assumed that each low-level (fuzzy or nonfuzzy) controller has been well designed to serve a particular objective. The role of the NFC is to fuse the n actions decided by the n low-level controllers and determine a proper action acting on the environment (plant) at each time step. Hence, the NFC can combine low-level controllers and achieve multiple objectives (goals) at once. The NFC acts like a switch that chooses a proper action from the actions of low-level controllers according to the feedback information from the environment. In fact, the NFC is a soft switch; it allows more than one low-level actions to be active with different degrees through fuzzy combination at each time step. An NFC can be designed by the trial-and-error approach if enough a priori knowledge is available, or it can be obtained by supervised learning if precise input/output training data are available. In the more practical cases when there is no instructive teaching information available, the NFC can learn by itself using the proposed reinforcement learning scheme. Adopted with reinforcement learning capability, the NFC can learn to achieve desired multiobjectives simultaneously through the rough reinforcement feedback from the environment, which contains only critic information such as "success (good)" or "failure (bad)" for each desired objective. Computer simulations have been conducted to illustrate the performance and applicability of the proposed architecture and learning scheme.  相似文献   

17.
The multimodal perception of intelligent robots is essential for achieving collision-free and efficient navigation. Autonomous navigation is enormously challenging when perception is acquired using only vision or LiDAR sensor data due to the lack of complementary information from different sensors. This paper proposes a simple yet efficient deep reinforcement learning (DRL) with sparse rewards and hindsight experience replay (HER) to achieve multimodal navigation. By adopting the depth images and pseudo-LiDAR data generated by an RGB-D camera as input, a multimodal fusion scheme is used to enhance the perception of the surrounding environment compared to using a single sensor. To alleviate the misleading way for the agent to navigate with dense rewards, the sparse rewards are intended to identify its tasks. Additionally, the HER technique is introduced to address the sparse reward navigation issue for accelerating optimal policy learning. The results show that the proposed model achieves state-of-the-art performance in terms of success, crash, and timeout rates, as well as generalization capability.  相似文献   

18.
针对各模态之间信息密度存在差距和融合过程中可能会丢失部分情感信息等问题,提出一种基于非文本模态强化和门控融合方法的多模态情感分析模型。该模型通过设计一个音频-视觉强化模块来实现音频和视觉模态的信息增强,从而减小与文本模态的信息差距。之后,通过跨模态注意力和门控融合方法,使得模型充分学习到多模态情感信息和原始情感信息,从而增强模型的表达能力。在对齐和非对齐的CMU-MOSEI数据集上的实验结果表明,所提模型是有效的,相比现有的一些模型取得了更好的性能。  相似文献   

19.
Delay Tolerant Reinforcement-Based (DTRB) is a delay tolerant routing solution for IEEE 802.11 wireless networks which enables device to device data exchange without the support of any pre-existing network infrastructure. The solution utilizes Multi-Agent Reinforcement Learning techniques to learn about routes in the network and forward/replicate the messages that produce the best reward. The rewarding process is executed by a learning algorithm based on the distances between the nodes, which are calculated as a function of time from the last meetings. DTRB is a flooding-based delay tolerant routing solution. The simulation results show that DTRB can deliver more messages than a traditional delay tolerant routing solution does in densely populated areas, with similar end-to-end delay and lower network overhead.  相似文献   

20.
本文提出了一种保护移动Agent不受Agent平台攻击的旅行协议.该协议基于Agent旅行的历史记录,在一定的条件下允许Agent所有者检测对Agent代码、状态和执行流的非法篡改.这个协议具有很高的安全性能,能够检测旅行途中的Agent平台时Agent数据的篡改,能够防止重播攻击.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号