首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated.  相似文献   

2.
Agents negotiate depending on individual perceptions of facts, events, trends and special circumstances that define the negotiation context. The negotiation context affects in different ways each agent’s preferences, bargaining strategies and resulting benefits, given the possible negotiation outcomes. Despite the relevance of the context, the existing literature on automated negotiation is scarce about how to account for it in learning and adapting negotiation strategies. In this paper, a novel contextual representation of the negotiation setting is proposed, where an agent resorts to private and public data to negotiate using an individual perception of its necessity and risk. A context-aware negotiation agent that learns through Self-Play and Reinforcement Learning (RL) how to use key contextual information to gain a competitive edge over its opponents is discussed in two levels of temporal abstraction. Learning to negotiate in an Eco-Industrial Park (EIP) is presented as a case study. In the Peer-to-Peer (P2P) market of an EIP, two instances of context-aware agents, in the roles of a buyer and a seller, are set to bilaterally negotiate exchanges of electrical energy surpluses over a discrete timeline to demonstrate that they can profit from learning to choose a negotiation strategy while selfishly accounting for contextual information under different circumstances in a data-driven way. Furthermore, several negotiation episodes are conducted in the proposed EIP between a context-aware agent and other types of agents proposed in the existing literature. Results obtained highlight that context-aware agents do not only reap selfishly higher benefits, but also promote social welfare as they resort to contextual information while learning to negotiate.  相似文献   

3.
This paper introduces adaptive reinforcement learning (ARL) as the basis for a fully automated trading system application. The system is designed to trade foreign exchange (FX) markets and relies on a layered structure consisting of a machine learning algorithm, a risk management overlay and a dynamic utility optimization layer. An existing machine-learning method called recurrent reinforcement learning (RRL) was chosen as the underlying algorithm for ARL. One of the strengths of our approach is that the dynamic optimization layer makes a fixed choice of model tuning parameters unnecessary. It also allows for a risk-return trade-off to be made by the user within the system. The trading system is able to make consistent gains out-of-sample while avoiding large draw-downs.  相似文献   

4.
5.
While driving a vehicle safely at its handling limit is essential in autonomous vehicles in Level 5 autonomy, it is a very challenging task for current conventional methods. Therefore, this study proposes a novel controller of trajectory planning and motion control for autonomous driving through manifold corners at the handling limit to improve the speed and shorten the lap time of the vehicle. The proposed controller innovatively combines the advantages of conventional model-based control algorithm, model-free reinforcement learning algorithm, and prior expert knowledge, to improve the training efficiency for autonomous driving in extreme conditions. The reward shaping of this algorithm refers to the procedure and experience of race training of professional drivers in real time. After training on track maps that exhibit different levels of difficulty, the proposed controller implemented a superior strategy compared to the original reference trajectory, and can to other tougher maps based on the basic driving knowledge learned from the simpler map, which verifies its superiority and extensibility. We believe this technology can be further applied to daily life to expand the application scenarios and maneuvering envelopes of autonomous vehicles.  相似文献   

6.
The Journal of Supercomputing - Pairs trading is an effective statistical arbitrage strategy considering the spread of paired stocks in a stable cointegration relationship. Nevertheless, rapid...  相似文献   

7.
In the actual working site, the equipment often works in different working conditions while the manufacturing system is rather complicated. However, traditional multi-label learning methods need to use the pre-defined label sequence or synchronously predict all labels of the input sample in the fault diagnosis domain. Deep reinforcement learning (DRL) combines the perception ability of deep learning and the decision-making ability of reinforcement learning. Moreover, the curriculum learning mechanism follows the learning approach of humans from easy to complex. Consequently, an improved proximal policy optimization (PPO) method, which is a typical algorithm in DRL, is proposed as a novel method on multi-label classification in this paper. The improved PPO method could build a relationship between several predicted labels of input sample because of designing an action history vector, which encodes all history actions selected by the agent at current time step. In two rolling bearing experiments, the diagnostic results demonstrate that the proposed method provides a higher accuracy than traditional multi-label methods on fault recognition under complicated working conditions. Besides, the proposed method could distinguish the multiple labels of input samples following the curriculum mechanism from easy to complex, compared with the same network using the pre-defined label sequence.  相似文献   

8.
Tian  Hao  Xu  Xiaolong  Lin  Tingyu  Cheng  Yong  Qian  Cheng  Ren  Lei  Bilal  Muhammad 《World Wide Web》2022,25(5):1769-1792
World Wide Web - The ubiquitous Internet of Things (IoTs) devices spawn growing mobile services of applications with computationally-intensive and latency-sensitive features, which increases the...  相似文献   

9.
Intelligent Service Robotics - Intelligent object manipulation for grasping is a challenging problem for robots. Unlike robots, humans almost immediately know how to manipulate objects for grasping...  相似文献   

10.
深度逆向强化学习是机器学习领域的一个新的研究热点,它针对深度强化学习的回报函数难以获取问题,提出了通过专家示例轨迹重构回报函数的方法。首先介绍了3类深度强化学习方法的经典算法;接着阐述了经典的逆向强化学习算法,包括基于学徒学习、最大边际规划、结构化分类和概率模型形式化的方法;然后对深度逆向强化学习的一些前沿方向进行了综述,包括基于最大边际法的深度逆向强化学习、基于深度Q网络的深度逆向强化学习和基于最大熵模型的深度逆向强化学习和示例轨迹非专家情况下的逆向强化学习方法等。最后总结了深度逆向强化学习在算法、理论和应用方面存在的问题和发展方向。  相似文献   

11.
Neural Computing and Applications - The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban...  相似文献   

12.
Journal of Intelligent Manufacturing - EU regulations on $$textit{CO}_2$$ limits and the trend of individualization are pushing the automotive industry towards greater flexibility and robustness...  相似文献   

13.
Artificial Life and Robotics - Decision making is an essential component of autonomous vehicle technology and received significant attention from academic and industry organizations. One of the...  相似文献   

14.
Distributed manufacturing plays an important role for large-scale companies to reduce production and transportation costs for globalized orders. However, how to real-timely and properly assign dynamic orders to distributed workshops is a challenging problem. To provide real-time and intelligent decision-making of scheduling for distributed flowshops, we studied the distributed permutation flowshop scheduling problem (DPFSP) with dynamic job arrivals using deep reinforcement learning (DRL). The objective is to minimize the total tardiness cost of all jobs. We provided the training and execution procedures of intelligent scheduling based on DRL for the dynamic DPFSP. In addition, we established a DRL-based scheduling model for distributed flowshops by designing suitable reward function, scheduling actions, and state features. A novel reward function is designed to directly relate to the objective. Various problem-specific dispatching rules are introduced to provide efficient actions for different production states. Furthermore, four efficient DRL algorithms, including deep Q-network (DQN), double DQN (DbDQN), dueling DQN (DlDQN), and advantage actor-critic (A2C), are adapted to train the scheduling agent. The training curves show that the agent learned to generate better solutions effectively and validate that the system design is reasonable. After training, all DRL algorithms outperform traditional meta-heuristics and well-known priority dispatching rules (PDRs) by a large margin in terms of solution quality and computation efficiency. This work shows the effectiveness of DRL for the real-time scheduling of dynamic DPFSP.  相似文献   

15.
International Journal of Speech Technology - We are generating truly mind-boggling amounts of audio data on a daily basis simply by using the Internet. In different audio-based applications, it...  相似文献   

16.
为提高多车场车辆路径问题(multi-depot vehicle routing problem, MDVRP)的求解效率,提出了端到端的深度强化学习框架。首先,将MDVRP建模为马尔可夫决策过程(Markov decision process, MDP),包括对其状态、动作、收益的定义;同时,提出了改进图注意力网络(graph attention network, GAT)作为编码器对MDVRP的图表示进行特征嵌入编码,设计了基于Transformer的解码器;采用改进REINFORCE算法来训练该模型,该模型不受图的大小约束,即其一旦完成训练,就可用于求解任意车场和客户数量的算例问题。最后,通过随机生成的算例和公开的标准算例验证了所提出框架的可行性和有效性,即使在求解客户节点数为100的MDVRP上,经训练的模型平均仅需2 ms即可得到与现有方法相比更具优势的解。  相似文献   

17.
Machine Learning - The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In...  相似文献   

18.
深度强化学习(deep reinforcement learning,DRL)可广泛应用于城市交通信号控制领域,但在现有研究中,绝大多数的DRL智能体仅使用当前的交通状态进行决策,在交通流变化较大的情况下控制效果有限。提出一种结合状态预测的DRL信号控制算法。首先,利用独热编码设计简洁且高效的交通状态;然后,使用长短期记忆网络(long short-term memory,LSTM)预测未来的交通状态;最后,智能体根据当前状态和预测状态进行最优决策。在SUMO(simulation of urban mobility)仿真平台上的实验结果表明,在单交叉口、多交叉口的多种交通流量条件下,与三种典型的信号控制算法相比,所提算法在平均等待时间、行驶时间、燃油消耗、CO2排放等指标上都具有最好的性能。  相似文献   

19.
A metapattern (also known as a metaquery) is a new approach for integrated data mining systems. As opposed to a typical “toolbox”-like integration, where components must be picked and chosen by users without much help, metapatterns provide a common representation for inter-component communication as well as a human interface for hypothesis development and search control. One weakness of this approach, however, is that the task of generating fruitful metapatterns is still a heavy burden for human users. In this paper, we describe a metapattern generator and an integrated discovery loop that can automatically generate metapatterns. Experiments in both artificial and real-world databases have shown that this new system goes beyond the existing machine learning technologies, and can discover relational patterns without requiring humans to pre-label the data as positive or negative examples for some given target concepts. With this technology, future data mining systems could discover high-quality, human-comprehensible knowledge in a much more efficient and focused manner, and data mining could be managed easily by both expert and less-expert users  相似文献   

20.
《微型机与应用》2016,(6):54-57
精神分裂症是最常见的精神疾病之一,目前具体病因尚未明确,准确诊断患病与否是治疗该疾病的前提。深度学习是一种构造多层神经网络的机器学习方法,具有发现数据中隐藏的分布式特征表示的能力。针对精神分裂症患者的脑电信号,提出了一种栈式自编码网络深度模型,以达到根据脑电信号自动识别受试者是否患病的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号