共查询到20条相似文献,搜索用时 15 毫秒
1.
Autonomous Robots - This paper presents a learning-based method that uses simulation data to learn an object manipulation task using two model-free reinforcement learning (RL) algorithms. The... 相似文献
2.
This paper studies evolutionary programming and adopts reinforcement learning theory to learn individual mutation operators. A novel algorithm named RLEP (Evolutionary Programming based on Reinforcement Learning) is proposed. In this algorithm, each individual learns its optimal mutation operator based on the immediate and delayed performance of mutation operators. Mutation operator selection is mapped into a reinforcement learning problem. Reinforcement learning methods are used to learn optimal policies by maximizing the accumulated rewards. According to the calculated Q function value of each candidate mutation operator, an optimal mutation operator can be selected to maximize the learned Q function value. Four different mutation operators have been employed as the basic candidate operators in RLEP and one is selected for each individual in different generations. Our simulation shows the performance of RLEP is the same as or better than the best of the four basic mutation operators. 相似文献
3.
Recently much attention has been paid to intelligent systems which can adapt themselves to dynamic and/or unknown environments
by the use of learning methods. However, traditional learning methods have a disadvantage that learning requires enormously
long amounts of time with the degree of complexity of systems and environments to be considered. We thus propose a novel reinforcement
learning method based on adaptive immunity. Our proposed method can provide a near-optimal solution with less learning time
by self-learning using the concept of adaptive immunity. The validity of our method is demonstrated through some simulations
with Sutton’s maze problem.
This work was present in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February
2, 2008 相似文献
4.
Intelligent Service Robotics - Intelligent object manipulation for grasping is a challenging problem for robots. Unlike robots, humans almost immediately know how to manipulate objects for grasping... 相似文献
5.
Artificial Life and Robotics - Decision making is an essential component of autonomous vehicle technology and received significant attention from academic and industry organizations. One of the... 相似文献
6.
tableau方法是一种具有较强的通用性和适用性的推理方法,但由于函数符号、等词等的限制,使得自动推理具有不确定性,针对tableau推理中封闭集合构造过程具有盲目性的问题,提出将强化学习用于tableau自动推理的方法,该方法将tableau推理过程中的逻辑公式与强化学习相结合,产生抽象的状态和活动,这样一方面可以通过学习方法控制自动推理的推理顺序,形成合理的封闭分枝,减少推理的盲目性;另一方面复杂的推理可以利用简单的推理结果,提高推理的效率。 相似文献
7.
针对Ceph云存储的数据分布算法CRUSH存在数据在各存储节点上分布不均衡而影响读写QoS性能问题,提出一种基于强化学习的数据分布方法.从算法本身的数据分布过程分析得出PG在OSD间分布不够均衡是造成数据分布不均的原因;在此基础上建立强化学习模型,训练调整PG在分布过程中的OSD权重,使PG更加均衡分布到各个OSD节点... 相似文献
8.
In order to serve people and support them in daily life, a domestic or service robot needs to accommodate itself to various
individuals. Emotional and intelligent human–robot interaction plays an important role for a robot to gain attention of its
users. Facial expression recognition is a key factor in interactive robotic applications. In this paper, an image-based facial
expression recognition system that adapts online to a new face is proposed. The main idea of the proposed learning algorithm
is to adjust parameters of the support vector machine (SVM) hyperplane for learning facial expressions of a new face. After
mapping the input space to Gaussian-kernel space, support vector pursuit learning (SVPL) is employed to retrain the hyperplane
in the new feature space. To expedite the retraining step, we propose to retrain a new SVM classifier by using only samples
classified incorrectly in previous iteration in combination with critical historical sets. After adjusting the hyperplane
parameters, the new classifier will recognize more effectively previous unrecognizable facial datasets. Experiments of using
an embedded imaging system show that the proposed system recognizes new facial datasets with a recognition rate of 92.7%.
Furthermore, it also maintains a satisfactory recognition rate of 82.6% of old facial samples. 相似文献
9.
为了提高移动传感器网络时延系统控制能力,提出基于强化学习的移动传感器网络时延系统控制模型,采用高阶近似微分方程构建移动传感器网络时延系统的控制目标函数,结合最大似然估计方法进行移动传感器网络的时延参数估计,采用强化学习方法进行移动传感器网络的收敛性控制和自适应调度,建立传感器网络时延系统控制的多维测度信息配准模型,在强化跟踪学习寻优模式下实现移动传感器网络时延系统的自适应控制。仿真结果表明,采用该方法进行移动传感器网络时延系统控制的自适应性较好,时延参数估计准确度较高,控制过程的鲁棒性较强。 相似文献
10.
The Journal of Supercomputing - The Internet of Things (IoT) has developed a well-defined infrastructure due to commercializing novel technologies. IoT networks enable smart devices to compile... 相似文献
12.
为了确保BDIAgent在动态、复杂的环境中实现基于某目标的动作序列决策任务,使用与/或图描述了意图决策结构,此结构将目标与实现这些目标的计划联系起来。根据意图决策结构,提出了3种不同的基于加强学习的动作规划策略,分别是短视性BDIAgent的单步规划、具有远见的BDIAgent的多步规划和追求完美的BDIAgent的最优规划。与传统的BDIAgent系统相比,这种新的意图决策模式克服了计划抽象的不足,并且易于编程实现。 相似文献
13.
针对当前各种路由算法在广域网环境下由于不能适应各种拓扑环境和负载不均衡时所引起的路由性能不高等问题,提出了一种基于梯度上升算法实现的增强学习的自适应路由算法RLAR。增强学习意味着学习一种策略,即基于环境的反馈信息构造从状态到行为的映射,其本质为通过与环境的交互试验对策略集合进行评估。将增强学习策略运用于网络路由优化中,为路由研究提供了一种全新的思路。对比了多种现有的路由算法,实验结果表明,RLAR能有效提高网络路由性能。 相似文献
14.
自主系统中,agent通过与环境交互来执行分配给他们的任务,采用分层强化学习技术有助于agent在大型、复杂的环境中提高学习效率。提出一种新方法,利用蚂蚁系统优化算法来识别分层边界发现子目标状态,蚂蚁遍历过程中留下信息素,利用信息素的变化率定义了粗糙度,用粗糙度界定子目标;agent使用发现的子目标创建抽象,能够更有效地探索。在出租车环境下验证算法的性能,实验结果表明该方法可以显著提高agent的学习效率。 相似文献
15.
时间表问题是典型的组合优化和不确定性调度问题。课表问题是时间表问题的一种形式,分析了排课问题的数学模型,并研究了用增强学习(Reinforcement Leaming)算法中的Q学习(Q-Leaming)算法和神经网络技术结合解决大学课表编排问题,给出了一个基于该算法的排课模型,并对其排课效果进行了分析和探讨。 相似文献
16.
We investigate the problem of a robot searching for an object. This requires reasoning about both perception and manipulation: some objects are moved because the target may be hidden behind them, while others are moved because they block the manipulator’s access to other objects. We contribute a formulation of the object search by manipulation problem using visibility and accessibility relations between objects. We also propose a greedy algorithm and show that it is optimal under certain conditions. We propose a second algorithm which takes advantage of the structure of the visibility and accessibility relations between objects to quickly generate plans. Our empirical evaluation strongly suggests that our algorithm is optimal under all conditions. We support this claim with a partial proof. Finally, we demonstrate an implementation of both algorithms on a real robot using a real object detection system. 相似文献
17.
World Wide Web - With the sharing economy boom, there is a notable increase in the number of car-sharing corporations, which provided a variety of travel options and improved convenience and... 相似文献
18.
This paper investigates the maintenance problem for a flow line system consisting of two series machines with an intermediate finite buffer in between. Both machines independently deteriorate as they operate, resulting in multiple yield levels. Resource constrained imperfect preventive maintenance actions may bring the machine back to a better state. The problem is modeled as a semi-Markov decision process. A distributed multi-agent reinforcement learning algorithm is proposed to solve the problem and to obtain the control-limit maintenance policy for each machine associated with the observed state represented by yield level and buffer level. An asynchronous updating rule is used in the learning process since the state transitions of both machines are not synchronous. Experimental study is conducted to evaluate the efficiency of the proposed algorithm. 相似文献
19.
This paper firstly proposes a bilateral optimized negotiation model based on reinforcement learning. This model negotiates on the issue price and the quantity, introducing a mediator agent as the mediation mechanism, and uses the improved reinforcement learning negotiation strategy to produce the optimal proposal. In order to further improve the performance of negotiation, this paper then proposes a negotiation method based on the adaptive learning of mediator agent. The simulation results show that the proposed negotiation methods make the efficiency and the performance of the negotiation get improved. 相似文献
20.
Active authentication of mobile devices such as smartphones and ipads is promising to enhance security to access confidential data or systems. In this paper, we propose an active authentication scheme, which exploits the physical-layer properties of ambient radio signals to identify mobile devices in indoor environments. More specifically, we discriminate mobile devices in different locations by analyzing the ambient radio sources, because the received signal strength indicator set of the ambient signals measured by a smartphone is usually different from that observed by its spoofer located in another area. We formulate the interactions between the legitimate mobile device and its spoofer as an active authentication game, in which the receiver chooses its test threshold in the hypothesis test in the spoofing detection, while the spoofer chooses its attack strength. In a dynamic radio environment with unknown attack parameters, we propose a learning-based authentication algorithm based on the physical-layer properties of the ambient radio environments. Simulation results show that the proposed scheme accurately detects spoofers in typical indoor environments. 相似文献
|