首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many motor skills in humanoid robotics can be learned using parametrized motor primitives. While successful applications to date have been achieved with imitation learning, most of the interesting motor learning problems are high-dimensional reinforcement learning problems. These problems are often beyond the reach of current reinforcement learning methods. In this paper, we study parametrized policy search methods and apply these to benchmark problems of motor primitive learning in robotics. We show that many well-known parametrized policy search methods can be derived from a general, common framework. This framework yields both policy gradient methods and expectation-maximization (EM) inspired algorithms. We introduce a novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives. We compare this algorithm, both in simulation and on a real robot, to several well-known parametrized policy search methods such as episodic REINFORCE, ??Vanilla?? Policy Gradients with optimal baselines, episodic Natural Actor Critic, and episodic Reward-Weighted Regression. We show that the proposed method out-performs them on an empirical benchmark of learning dynamical system motor primitives both in simulation and on a real robot. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task on a real Barrett WAM? robot arm.  相似文献   

2.
一类基于有效跟踪的广义平均奖赏激励学习算法   总被引:1,自引:0,他引:1  
取消了平均奖赏激励学习的单链或互通MDPs假设,基于有效跟踪技术和折扣奖赏型SARSA(λ)算法,对传统的平均奖赏激励学习进行了推广,提出了一类广义平均奖赏激励学习算法,并对算法的性能进行了初步的比较实验。  相似文献   

3.
After demonstrating adequately the usefulness of evolutionary multiobjective optimization (EMO) algorithms in finding multiple Pareto-optimal solutions for static multiobjective optimization problems, there is now a growing need for solving dynamic multiobjective optimization problems in a similar manner. In this paper, we focus on addressing this issue by developing a number of test problems and by suggesting a baseline algorithm. Since in a dynamic multiobjective optimization problem, the resulting Pareto-optimal set is expected to change with time (or, iteration of the optimization process), a suite of five test problems offering different patterns of such changes and different difficulties in tracking the dynamic Pareto-optimal front by a multiobjective optimization algorithm is presented. Moreover, a simple example of a dynamic multiobjective optimization problem arising from a dynamic control loop is presented. An extension to a previously proposed direction-based search method is proposed for solving such problems and tested on the proposed test problems. The test problems introduced in this paper should encourage researchers interested in multiobjective optimization and dynamic optimization problems to develop more efficient algorithms in the near future.  相似文献   

4.
Long-Ji Lin 《Machine Learning》1992,8(3-4):293-321
To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning.This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frameworks, a dynamic environment was used as a testbed. The environment is moderately complex and nondeterministic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.  相似文献   

5.
In addition to satisfying several competing objectives, many real-world applications are also characterized by a certain degree of noise, manifesting itself in the form of signal distortion or uncertain information. In this paper, extensive studies are carried out to examine the impact of noisy environments in evolutionary multiobjective optimization. Three noise-handling features are then proposed based upon the analysis of empirical results, including an experiential learning directed perturbation operator that adapts the magnitude and direction of variation according to past experiences for fast convergence, a gene adaptation selection strategy that helps the evolutionary search in escaping from local optima or premature convergence, and a possibilistic archiving model based on the concept of possibility and necessity measures to deal with problem of uncertainties. In addition, the performances of various multiobjective evolutionary algorithms in noisy environments, as well as the robustness and effectiveness of the proposed features are examined based upon five benchmark problems characterized by different difficulties in local optimality, nonuniformity, discontinuity, and nonconvexity  相似文献   

6.
Detecting moving shadows: algorithms and evaluation   总被引:28,自引:0,他引:28  
Moving shadows need careful consideration in the development of robust dynamic scene analysis systems. Moving shadow detection is critical for accurate object detection in video streams since shadow points are often misclassified as object points, causing errors in segmentation and tracking. Many algorithms have been proposed in the literature that deal with shadows. However, a comparative evaluation of the existing approaches is still lacking. In this paper, we present a comprehensive survey of moving shadow detection approaches. We organize contributions reported in the literature in four classes two of them are statistical and two are deterministic. We also present a comparative empirical evaluation of representative algorithms selected from these four classes. Novel quantitative (detection and discrimination rate) and qualitative metrics (scene and object independence, flexibility to shadow situations, and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor video sequences. These video sequences and associated "ground-truth" data are made available at http://cvrr.ucsd.edu/aton/shadow to allow for others in the community to experiment with new algorithms and metrics.  相似文献   

7.
An Evolutionary Approach to Multiobjective Clustering   总被引:6,自引:0,他引:6  
The framework of multiobjective optimization is used to tackle the unsupervised learning problem, data clustering, following a formulation first proposed in the statistics literature. The conceptual advantages of the multiobjective formulation are discussed and an evolutionary approach to the problem is developed. The resulting algorithm, multiobjective clustering with automatic k-determination, is compared with a number of well-established single-objective clustering algorithms, a modern ensemble technique, and two methods of model selection. The experiments demonstrate that the conceptual advantages of multiobjective clustering translate into practical and scalable performance benefits  相似文献   

8.
车辆路径问题是物流运输优化中的核心问题,目的是在满足顾客需求下得到一条最低成本的车辆路径规划。但随着物流运输规模的不断增大,车辆路径问题求解难度增加,并且对实时性要求也不断提高,已有的常规算法不再适应实际要求。近年来,基于强化学习算法开始成为求解车辆路径问题的重要方法,在简要回顾常规方法求解车辆路径问题的基础上,重点总结基于强化学习求解车辆路径问题的算法,并将算法按照基于动态规划、基于价值、基于策略的方式进行了分类;最后对该问题未来的研究进行了展望。  相似文献   

9.
Dynamic consolidation of virtual machines (VMs) is an efficient approach for improving the utilization of physical resources and reducing energy consumption in cloud data centers. Despite the large volume of research published on this topic, there are very few open‐source software systems implementing dynamic VM consolidation. In this paper, we propose an architecture and open‐source implementation of OpenStack Neat, a framework for dynamic VM consolidation in OpenStack clouds. OpenStack Neat can be configured to use custom VM consolidation algorithms and transparently integrates with existing OpenStack deployments without the necessity of modifying their configuration. In addition, to foster and encourage further research efforts in the area of dynamic VM consolidation, we propose a benchmark suite for evaluating and comparing dynamic VM consolidation algorithms. The proposed benchmark suite comprises OpenStack Neat as the base software framework, a set of real‐world workload traces, performance metrics and evaluation methodology. As an application of the proposed benchmark suite, we conduct an experimental evaluation of OpenStack Neat and several dynamic VM consolidation algorithms on a five‐node testbed, which shows significant benefits of dynamic VM consolidation resulting in up to 33% energy savings. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
图像超分辨率重建是用于提高图像质量的一项重要技术, 得益于深度学习在计算机视觉领域的成功应用和快速发展, 单图像超分辨率重建的效果得到了显著提升. 因此, 本文针对基于深度学习的单图像超分辨率重建方法展开深入研究, 首先综合介绍了用于该领域的基准数据集、性能评价指标、损失函数等相关知识, 然后对有监督学习和无监督学习下单图像超分辨率重建技术的最新算法进行分类讨论, 并且比较分析了不同模型之间的异同点与优缺点, 最后对该领域面临的问题和未来的发展方向进行了总结与展望.  相似文献   

11.
移动边缘计算是解决机器人大计算量任务需求的一种方法。传统算法基于智能算法或凸优化方法,迭代时间长。深度强化学习通过一次前向传递即可求解,但只针对固定数量机器人进行求解。通过对深度强化学习分析研究,在深度强化学习神经网络中输入层前进行输入规整,在输出层后添加卷积层,使得网络能够自适应满足动态移动机器人数量的卸载需求。最后通过仿真实验验证,与自适应遗传算法和强化学习进行对比,验证了所提出算法的有效性及可行性。  相似文献   

12.
深度强化学习作为机器学习发展的最新成果,已经在很多应用领域崭露头角。关于深度强化学习的算法研究和应用研究,产生了很多经典的算法和典型应用领域。深度强化学习应用在智能制造中,能在复杂环境中实现高水平控制。对深度强化学习的研究进行概述,对深度强化学习基本原理进行介绍,包括深度学习和强化学习。介绍深度强化学习算法应用的理论方法,在此基础对深度强化学习的算法进行了分类介绍,分别介绍了基于值函数和基于策略梯度的强化学习算法,列举了这两类算法的主要发展成果,以及其他相关研究成果。对深度强化学习在智能制造的典型应用进行分类分析。对深度强化学习存在的问题和未来发展方向进行了讨论。  相似文献   

13.
In the bioinformatics community, it is really important to find an accurate and simultaneous alignment among diverse biological sequences which are assumed to have an evolutionary relationship. From the alignment, the sequences homology is inferred and the shared evolutionary origins among the sequences are extracted by using phylogenetic analysis. This problem is known as the multiple sequence alignment (MSA) problem. In the literature, several approaches have been proposed to solve the MSA problem, such as progressive alignments methods, consistency-based algorithms, or genetic algorithms (GAs). In this work, we propose a Hybrid Multiobjective Evolutionary Algorithm based on the behaviour of honey bees for solving the MSA problem, the hybrid multiobjective artificial bee colony (HMOABC) algorithm. HMOABC considers two objective functions with the aim of preserving the quality and consistency of the alignment: the weighted sum-of-pairs function with affine gap penalties (WSP) and the number of totally conserved (TC) columns score. In order to assess the accuracy of HMOABC, we have used the BAliBASE benchmark (version 3.0), which according to the developers presents more challenging test cases representing the real problems encountered when aligning large sets of complex sequences. Our multiobjective approach has been compared with 13 well-known methods in bioinformatics field and with other 6 evolutionary algorithms published in the literature.  相似文献   

14.
红外与可见光图像融合是机器视觉的一个重要领域,在日常生活中应用广泛。近年来,虽然红外与可见光图像融合领域已有多种融合算法,但目前该领域还缺乏能够衡量多种融合算法性能的算法框架和融合基准。在简要概述了红外与可见光图像融合的最新进展后,提出了一种扩展VIFB的红外与可见光图像融合基准,该基准由56对图像、32种融合算法和16种评价指标组成。基于该融合基准进行了大量实验,用来测评所选取的融合算法的性能。通过定性和定量结果分析,确定了性能优良的图像融合算法,并对红外与可见光图像融合领域的未来前景进行了展望。  相似文献   

15.
The box-covering method is widely used on measuring the fractal property on complex networks. The problem of finding the minimum number of boxes to tile a network is known as a NP-hard problem. Many algorithms have been proposed to solve this problem. All the current box-covering algorithms regard the box number minimization as the only objective. However, the fractal modularity of the network partition divided by the box-covering method, has been proved to be strongly related to the information transportation in complex networks. Maximizing the fractal modularity is also important in the box-covering method, which can be divided into two objectives: maximization of ratio association and minimization of ratio cut. In this paper, to solve the dilemma of minimizing the box number and maximizing the fractal modularity at the same time, a multiobjective discrete particle swarm optimization box-covering (MOPSOBC) algorithm is proposed. The MOPSOBC algorithm applies the decomposition approach on the two objectives to approximate the Pareto front. The proposed MOPSOBC algorithm has been applied to six benchmark networks and compared with the state-of-the-art algorithms, including two classical box-covering algorithms, four single objective optimization algorithms and six multiobjective optimization algorithms. The experimental results show that the MOPSOBC algorithm can get similar box numbers with the current best algorithm, and it outperforms the state-of-the-art algorithms on the fractal modularity and normalized mutual information.  相似文献   

16.
The field of Metaheuristics has produced a large number of algorithms for continuous, black-box optimization. In contrast, there are few standard benchmark problem sets, limiting our ability to gain insight into the empirical performance of these algorithms. Clustering problems have been used many times in the literature to evaluate optimization algorithms. However, much of this work has occurred independently on different problem instances and the various experimental methodologies used have produced results which are frequently incomparable and provide little knowledge regarding the difficulty of the problems used, or any platform for comparing and evaluating the performance of algorithms. This paper discusses sum of squares clustering problems from the optimization viewpoint. Properties of the fitness landscape are analysed and it is proposed that these problems are highly suitable for algorithm benchmarking. A set of 27 problem instances (from 4-D to 40-D), based on three well-known datasets, is specified. Baseline experimental results are presented for the Covariance Matrix Adaptation-Evolution Strategy and several other standard algorithms. A web-repository has also been created for this problem set to facilitate future use for algorithm evaluation and comparison.  相似文献   

17.
强化学习用于解决无模型情况下的优化决策问题,是实现人工智能的重要技术之一,但传统的表格型强化学习方法难以处理具有大规模、连续空间的控制问题。近似强化学习受到函数逼近思想的启发,对价值函数或策略函数参数化表示,通过参数优化间接获得最优行为策略,在视频游戏、棋类对抗及机器人控制等领域应用效果显著。基于此,对近似强化学习算法的研究现状与应用进展进行了梳理和综述。介绍了近似强化学习相关的基础理论;分类总结了近似强化学习的经典算法及一些相应的改进方法;概述了近似强化学习在机器人控制领域的研究进展,并总结了当前面临的若干主要问题,为后续的研究提供参考。  相似文献   

18.
Hyper heuristics is a relatively new optimisation algorithm. Numerous studies have reported that hyper heuristics are well applied in combinatorial optimisation problems. As a classic combinatorial optimisation problem, the row layout problem has not been publicly reported on applying hyper heuristics to its various sub-problems. To fill this gap, this study proposes a parallel hyper-heuristic approach based on reinforcement learning for corridor allocation problems and parallel row ordering problems. For the proposed algorithm, an outer layer parallel computing framework was constructed based on the encoding of the problem. The simulated annealing, tabu search, and variable neighbourhood algorithms were used in the algorithm as low-level heuristic operations, and Q-learning in reinforcement learning was used as a high-level strategy. A state space containing sequences and fitness values was designed. The algorithm performance was then evaluated for benchmark instances of the corridor allocation problem (37 groups) and parallel row ordering problem (80 groups). The results showed that, in most cases, the proposed algorithm provided a better solution than the best-known solutions in the literature. Finally, the meta-heuristic algorithm applied to three low-level heuristic operations is taken as three independent algorithms and compared with the proposed hyper-heuristic algorithm on four groups of parallel row ordering problem instances. The effectiveness of Q-learning in selection is illustrated by analysing the comparison results of the four algorithms and the number of calls of the three low-level heuristic operations in the proposed method.  相似文献   

19.
A considerable number of constrained optimization evolutionary algorithms (COEAs) have been proposed due to increasing interest in solving constrained optimization problems (COPs) by evolutionary algorithms (EAs). In this paper, we first review existing COEAs. Then, a novel EA for constrained optimization is presented. In the process of population evolution, our algorithm is based on multiobjective optimization techniques, i.e., an individual in the parent population may be replaced if it is dominated by a nondominated individual in the offspring population. In addition, three models of a population-based algorithm-generator and an infeasible solution archiving and replacement mechanism are introduced. Furthermore, the simplex crossover is used as a recombination operator to enrich the exploration and exploitation abilities of the approach proposed. The new approach is tested on 13 well-known benchmark functions, and the empirical evidence suggests that it is robust, efficient, and generic when handling linear/nonlinear equality/inequality constraints. Compared with some other state-of-the-art algorithms, our algorithm remarkably outperforms them in terms of the best, mean, and worst objective function values and the standard deviations. It is noteworthy that our algorithm does not require the transformation of equality constraints into inequality constraints  相似文献   

20.
The loss of measurements used for controller scheduling or envelope protection in modern flight control systems due to sensor failures leads to a challenging fault‐tolerant control law design problem. In this article, an approach to design such a robust fault‐tolerant control system, including full envelope protections using multiobjective optimization techniques, is proposed. The generic controller design and controller verification problems are derived and solved using novel multiobjective hybrid genetic optimization algorithms. These algorithms combine the multiobjective genetic search strategy with local, single‐objective optimization to improve convergence speed. The proposed strategies are applied to the design of a fault‐tolerant flight control system for a modern civil aircraft. The results of an industrial controller verification and validation campaign using an industrial benchmark simulator are reported.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号