首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts.  相似文献   

2.
Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2×2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes will largely be achieved. The effect can select Pareto outcomes that are not Nash equilibria and it can select Pareto optimal outcomes among Nash equilibria.Acknowledgement This material is based upon work supported by, or in part by, NSF grant number SES-9709548. We wish to thank an anonymous referee for a number of very helpful suggestions.  相似文献   

3.
The purpose of the reinforcement learning system is to learn an optimal policy in general. On the other hand, in two-player games such as Othello, it is important to acquire a penalty-avoiding policy that can avoid losing the game. We know the penalty avoiding rational policy making algorithm (PARP) to learn the policy. If we apply PARP to large-scale problems, we are confronted with an explosion of the number of states. In this article, we focus on Othello, a game that has huge state spaces. We introduce several ideas and heuristics to adapt PARP to Othello. We show that our learning player beats the well-known Othello program, KITTY. This work was presented, in part, at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

4.
随机博弈框架下的多agent强化学习方法综述   总被引:4,自引:0,他引:4  
宋梅萍  顾国昌  张国印 《控制与决策》2005,20(10):1081-1090
多agent学习是在随机博弈的框架下,研究多个智能体间通过自学习掌握交互技巧的问题.单agent强化学习方法研究的成功,对策论本身牢固的数学基础以及在复杂任务环境中广阔的应用前景,使得多agent强化学习成为目前机器学习研究领域的一个重要课题.首先介绍了多agent系统随机博弈中基本概念的形式定义;然后介绍了随机博弈和重复博弈中学习算法的研究以及其他相关工作;最后结合近年来的发展,综述了多agent学习在电子商务、机器人以及军事等方面的应用研究,并介绍了仍存在的问题和未来的研究方向.  相似文献   

5.
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them.  相似文献   

6.
Reinforcement learning is an optimisation technique for applications like control or scheduling problems. It is used in learning situations, where success and failure of the system are the only training information. Unfortunately, we have to pay a price for this powerful ability: long training times and the instability of the learning process are not tolerable for industrial applications with large continuous state spaces. From our point of view, the integration of prior knowledge is a key mechanism for making autonomous learning practicable for industrial applications. The learning control architecture Fynesse provides a unified view onto the integration of prior control knowledge in the reinforcement learning framework. In this way, other approaches in this area can be embedded into Fynesse. The key features of Fynesse are (1) the integration of prior control knowledge like linear controllers, control characteristics or fuzzy controllers, (2) autonomous learning of control strategies and (3) the interpretation of learned strategies in terms of fuzzy control rules. The benefits and problems of different methods for the integration of a priori knowledge are demonstrated on empirical studies.The research project F ynesse was supported by the Deutsche Forschungsgemeinschaft (DFG).  相似文献   

7.
进化强化学习及其在机器人路径跟踪中的应用   总被引:2,自引:1,他引:2  
研究了一种基于自适应启发评价(AHC)强化学习的移动机器人路径跟踪控制方法.AHC的评价单元(ACE)采用多层前向神经网络来实现.将TD(λ)算法和梯度下降法相结合来更新神经网络的权值.AHC的动作选择单元(ASE)由遗传算法优化的模糊推理系统(FIS)构成.ACE网络的输出构成二次强化信号,用于指导ASE的学习.最后将所提出的算法应用于移动机器人的行为学习,较好地解决了机器人的复杂路径跟踪问题.  相似文献   

8.
The present study aims at contributing to the current state-of-the art of activity-based travel demand modelling by presenting a framework to simulate sequential data. To this end, the suitability of a reinforcement learning approach to reproduce sequential data is explored. Additionally, as traditional reinforcement learning techniques are not capable of learning efficiently in large state and action spaces with respect to memory and computational time requirements on the one hand, and of generalizing based on infrequent visits of all state-action pairs on the other hand, the reinforcement learning technique as used in most applications, is enhanced by means of regression tree function approximation.Three reinforcement learning algorithms are implemented to validate their applicability: the traditional Q-learning and Q-learning with bucket-brigade updating are tested against the improved reinforcement learning approach with a CART function approximator. These methods are applied on data of 26 diary days. The results are promising and show that the proposed techniques offer great opportunity of simulating sequential data. Moreover, the reinforcement learning approach improved by introducing a regression tree function approximator learns a more optimal solution much faster than the two traditional Q-learning approaches.  相似文献   

9.
Reinforcement learning (RL) is one of the methods of solving problems defined in multiagent systems. In the real world, the state is continuous, and agents take continuous actions. Since conventional RL schemes are often defined to deal with discrete worlds, there are difficulties such as the representation of an RL evaluation function. In this article, we intend to extend an RL algorithm so that it is applicable to continuous world problems. This extension is done by a combination of an RL algorithm and a function approximator. We employ Q-learning as the RL algorithm, and a neural network model called the normalized Gaussian network as the function approximator. The extended RL method is applied to a chase problem in a continuous world. The experimental result shows that our RL scheme was successful. This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000  相似文献   

10.
At AROB5, we proposed a solution to the path planning of a mobile robot. In our approach, we formulated the problem as a discrete optimization problem at each time step. To solve the optimization problem, we used an objective function consisting of a goal term, a smoothness term, and a collision term. While the results of our simulation showed the effectiveness of our approach, the values of the weights in the objective function were not given by any theoretical method. This article presents a theoretical method using reinforcement learning for adjusting the weight parameters. We applied Williams' learning algorithm, episodic REINFORCE, to derive a learning rule for the weight parameters. We verified the learning rule by some experiments. This work was presented, in part, at the Sixth International Symposium on Artificial Life and Robotics, Tokyo, Japan, January 15–17, 2001  相似文献   

11.
In cognitive science, artificial intelligence, psychology, and education, a growing body of research supports the view that the learning process is strongly influenced by the learner's goals. The fundamental tenet ofgoal-driven learning is that learning is largely an active and strategic process in which the learner, human or machine, attempts to identify and satisfy its information needs in the context of its tasks and goals, its prior knowledge, its capabilities, and environmental opportunities for learning. This article examines the motivations for adopting a goal-driven model of learning, the relationship between task goals and learning goals, the influences goals can have on learning, and the pragmatic implications of the goal-driven learning model. It presents a new integrative framework for understanding the goal-driven learning process and applies this framework to characterizing research on goal-driven learning.  相似文献   

12.
Cognitive radio network (CRN) enables unlicensed users (or secondary users, SUs) to sense for and opportunistically operate in underutilized licensed channels, which are owned by the licensed users (or primary users, PUs). Cognitive radio network (CRN) has been regarded as the next-generation wireless network centered on the application of artificial intelligence, which helps the SUs to learn about, as well as to adaptively and dynamically reconfigure its operating parameters, including the sensing and transmission channels, for network performance enhancement. This motivates the use of artificial intelligence to enhance security schemes for CRNs. Provisioning security in CRNs is challenging since existing techniques, such as entity authentication, are not feasible in the dynamic environment that CRN presents since they require pre-registration. In addition these techniques cannot prevent an authenticated node from acting maliciously. In this article, we advocate the use of reinforcement learning (RL) to achieve optimal or near-optimal solutions for security enhancement through the detection of various malicious nodes and their attacks in CRNs. RL, which is an artificial intelligence technique, has the ability to learn new attacks and to detect previously learned ones. RL has been perceived as a promising approach to enhance the overall security aspect of CRNs. RL, which has been applied to address the dynamic aspect of security schemes in other wireless networks, such as wireless sensor networks and wireless mesh networks can be leveraged to design security schemes in CRNs. We believe that these RL solutions will complement and enhance existing security solutions applied to CRN To the best of our knowledge, this is the first survey article that focuses on the use of RL-based techniques for security enhancement in CRNs.  相似文献   

13.
Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such complex problems are associated with some difficulties. As we discuss in this article, these methods are plagued by the so-called curse of dimensionality and the curse of modelling. In this article, we discuss reinforcement learning, a machine learning technique for solving sequential decision making problems with large state spaces. We describe how reinforcement learning can be combined with a function approximation method to avoid both the curse of dimensionality and the curse of modelling. To illustrate the usefulness of this approach, we apply it to a problem with a huge state space—learning to play the game of Othello. We describe experiments in which reinforcement learning agents learn to play the game of Othello without the use of any knowledge provided by human experts. It turns out that the reinforcement learning agents learn to play the game of Othello better than players that use basic strategies.  相似文献   

14.
In this paper, we first discuss the meaning of physical embodiment and the complexity of the environment in the context of multi-agent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup (Kitano et al., 1997) to illustrate the effectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identification. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior policy. The proposed method is applied to a soccer playing situation. The method successfully models a rolling ball and other moving agents and acquires the learner's behaviors. Computer simulations and real experiments are shown and a discussion is given.  相似文献   

15.
Mahadevan  Sridhar 《Machine Learning》1996,22(1-3):159-195
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric calledn-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms while several algorithms can provably generategain-optimal policies that maximize average reward, none of them can reliably filter these to producebias-optimal (orT-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.  相似文献   

16.
从知识的表达和运用综述强化学习研究   总被引:5,自引:0,他引:5  
为推进强化学习研究的进一步深入和扩大其实际应用范围,从强化学习研究的理论基础--知识袭示和运用的角度对强化学习进行分类.并就经典随机强化学习、模糊强化学习、定性强化学习以及灰色强化学习作了较详细的探讨与比较.最后从知识表达和运用的角庹对强化学习的发展进行了展望.  相似文献   

17.
Petrochemical industry is one of the major sectors contributing to the world-wide economy and the digital transformation is urgent to enhance core competence. In general, ethylene, propylene and butadiene, which are associated with synthetic chemicals, are the main raw materials of this industry with around 70–80% cost structure. In particular, butadiene is one of the key materials for producing synthetic rubber and used for several daily commodities. However, the price of butadiene fluctuates along with the demand–supply mismatch or by the international economy and political events. This study proposes two-stage data science framework to predict the weekly price of butadiene and optimize the procurement decision. The first stage suggests several the price prediction models with a comprehensive information including contract price, supply rate, demand rate, and upstream and downstream information. The second stage applies the analytic hierarchy process and reinforcement learning technique to derive an optimal policy of procurement decision and reduce the total procurement cost. An empirical study is conducted to validate the proposed framework, and the results improve the accuracy of price forecasts and the procurement cost reduction of the raw materials.  相似文献   

18.
The paper uses ideas from Machine Learning, Artificial Intelligence and Genetic Algorithms to provide a model of the development of a fight-or-flight response in a simulated agent. The modelled development process involves (simulated) processes of evolution, learning and representation development. The main value of the model is that it provides an illustration of how simple learning processes may lead to the formation of structures which can be given a representational interpretation. It also shows how these may form the infrastructure for closely-coupled agent/environment interaction.  相似文献   

19.
In this work, RL is used to find an optimal policy for a marketing campaign. Data show a complex characterization of state and action spaces. Two approaches are proposed to circumvent this problem. The first approach is based on the self-organizing map (SOM), which is used to aggregate states. The second approach uses a multilayer perceptron (MLP) to carry out a regression of the action-value function. The results indicate that both approaches can improve a targeted marketing campaign. Moreover, the SOM approach allows an intuitive interpretation of the results, and the MLP approach yields robust results with generalization capabilities.  相似文献   

20.
基于增强学习的多机器人系统优化控制是近年来机器人学与分布式人工智能的前沿研究领域.多机器人系统具有分布、异构和高维连续空间等特性,使得面向多机器人系统的增强学习的研究面临着一系列挑战,为此,对其相关理论和算法的研究进展进行了系统综述.首先,阐述了多机器人增强学习的基本理论模型和优化目标;然后,在对已有学习算法进行对比分析的基础上,重点探讨了多机器人增强学习理论与应用研究中的困难和求解思路,给出了若干典型问题和应用实例;最后,对相关研究进行了总结和展望.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号