首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The main idea of reinforcement learning is evaluating the chosen action depending on the current reward. According to this concept, many algorithms achieved proper performance on classic Atari 2600 games. The main challenge is when the reward is sparse or missing. Such environments are complex exploration environments like Montezuma’s Revenge, Pitfall, and Private Eye games. Approaches built to deal with such challenges were very demanding. This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments. Moreover, we added some simple enhancements to several hyperparameters, such as the number of actions and the sampling ratio that helped improve performance. We include the extra reward within the human demonstrations. After that, we used Prioritized Double Deep Q-Networks (Prioritized DDQN) to learning from these demonstrations. Our approach enabled the Prioritized DDQN with a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye. We used the same games to compare our results with several baselines, such as the Rainbow and Deep Q-learning from demonstrations (DQfD) algorithm. The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time.  相似文献   

2.
M A L Thathachar 《Sadhana》1990,15(4-5):263-281
We consider stochastic automata models of learning systems in this article. Such learning automata select the best action out of a finite number of actions by repeated interaction with the unknown random environment in which they operate. The selection of an action at each instant is done on the basis of a probability distribution which is updated according to a learning algorithm. Convergence theorems for the learning algorithms are available. Moreover the automata can be arranged in the form of teams and hierarchies to handle complex learning problems such as pattern recognition. These interconnections of learning automata could be regarded as artificial neural networks.  相似文献   

3.
One common practice in order picking is order batching, in which items of two or more orders are picked together in one picking trip. Order batching can reduce the total order-picking travel distance if orders with similar picking locations are batched together and picked in the same order-picking trip. In this paper, the performance of different order-batching methods that are made up of one seed-order selection rule and one accompanying-order selection rule is investigated. A seed-order selection rule selects the first order (i.e. the seed order) in an order batch, while an accompanying-order selection rule selects the rest of orders (i.e. the accompanying orders) to be added to the order batch. In this paper, one investigates the performance of nine seed-order selection rules and 10 accompanying-order selection rules under two different route-planning methods and two different aisle-picking-frequency distributions. The problem environment is a distribution centre's warehouse which has an I/O point at one of its corners and two cross-aisles—one front cross-aisle and one back cross-aisle. One wants to understand not only the performance of every seed-order selection rule and every accompanying-order selection rule, but also their combined performance. The effects of route-planning methods and aisle-picking-frequency distributions on the performance of seed-order selection rules and accompanying-order selection rules are also investigated. Different random problems were generated and tested for this purpose. It is hoped that the knowledge learned from this study can benefit practitioners in distribution centres with order-batching operations.  相似文献   

4.
Animal societies rely on interactions between group members to effectively communicate and coordinate their actions. To date, the transmission properties of interaction networks formed by direct physical contacts have been extensively studied for many animal societies and in all cases found to inhibit spreading. Such direct interactions do not, however, represent the only viable pathways. When spreading agents can persist in the environment, indirect transmission via ‘same-place, different-time’ spatial coincidences becomes possible. Previous studies have neglected these indirect pathways and their role in transmission. Here, we use rock ant colonies, a model social species whose flat nest geometry, coupled with individually tagged workers, allowed us to build temporally and spatially explicit interaction networks in which edges represent either direct physical contacts or indirect spatial coincidences. We show how the addition of indirect pathways allows the network to enhance or inhibit the spreading of different types of agent. This dual-functionality arises from an interplay between the interaction-strength distribution generated by the ants'' movement and environmental decay characteristics of the spreading agent. These findings offer a general mechanism for understanding how interaction patterns might be tuned in animal societies to control the simultaneous transmission of harmful and beneficial agents.  相似文献   

5.
This paper presents work carried out within the 'ExPlanTech' project (IST-1999-20171) funded in part by the European Commission's Information Technologies Programme. The mission of the ExPlanTech technology transfer project is to introduce, customize and exploit the multi-agent production planning technology (ProPlanT multi-agent system research prototype) in two specific industrial enterprises. An agent-driven service negotiations and decision process, based on usagecentred knowledge about task requirements, substitutes the traditional production planning activity. We introduce a methodology for integration of the projectdriven production planning based on agent-based engineering within the existing enterprise resource planning system. This novel production planning technology will facilitate optimization of resource utilization and supplier chain while meeting the customer demands. This paper describes a FIPA-compliant implementation of the ExPlanTech technology at the LIAZ Pattern Shop manufacturing company. We describe the structure of the agent community, types of agents, implementation of the planning strategy and its incorporation within the real production environment.  相似文献   

6.
The health care environment still needs knowledge based discovery for handling wealth of data. Extraction of the potential causes of the diseases is the most important factor for medical data mining. Fuzzy association rule mining is well-performed better than traditional classifiers but it suffers from the exponential growth of the rules produced. In the past, we have proposed an information gain based fuzzy association rule mining algorithm for extracting both association rules and member-ship functions of medical data to reduce the rules. It used a ranking based weight value to identify the potential attribute. When we take a large number of distinct values, the computation of information gain value is not feasible. In this paper, an enhanced approach, called gain ratio based fuzzy weighted association rule mining, is thus proposed for distinct diseases and also increase the learning time of the previous one. Experimental results show that there is a marginal improvement in the attribute selection process and also improvement in the classifier accuracy. The system has been implemented in Java platform and verified by using benchmark data from the UCI machine learning repository.  相似文献   

7.
Optimizing dispatching policy in a networked, multi-machine system is a formidable task for both field experts and operations researchers due to the problem's stochastic and combinatorial nature. This paper proposes an innovative variation of co-evolutionary genetic algorithm (CGA) for acquiring the adaptive scheduling strategies in a complex multi-machine system. The task is to assign each machine an appropriate dispatching rule that is harmonious with the rules used in neighbouring machines. An ordinary co-evolutionary algorithm would not be successful due to the high variability (i.e. noisy causality) of system performance and the ripple effects among neighbouring populations. The computing time for large enough populations to avoid premature convergence would be prohibitive. We introduced the notion of derivative contribution feedback (DCF), in which an individual rule for a machine takes responsibility for the first-order change of the overall system performance according to its participation in decisions. The DCFCGA effectively suppressed premature convergence and produced dispatching rules for spatial adaptation that outperformed other heuristics. The required time for knowledge acquisition was also favourably compared with an efficient statistical method. The DCF-CGA method can be utilized in a wide variety of genetic algorithm application problems that have similar characteristics and difficulties.  相似文献   

8.
This article presents a ‘conventionalist’ approachto rule-directed behavior, emphasizing the interpertive dimesion.The first part presnents a general definition of a rule andtypology of rules in order to analyze its consequences for interpretivebehavior. The secone part studies the modes of interpretationsof rules in relation to collective dynamics and gives emphasisto be tole playee by underdeter minastin. The interpertive framworkis applied to a French maintenance workshop in which a teambonus was introduced in 1991. The paper concludes that (i) allcontracts and bence the interpretation requirements are essentiallyincompletel; (ii) as a consequence thetre is an inevitably imperfectincentive alignment; (iii) the dynamics of interpretation alsobring about imperfectly reciprocal behavior and a ‘logic’of collective action; (iv) interpretation is intertwined withthe distribution of team knowledge; and (v) there are two typesof links between rules and routines. To interpret a rule isto initiate a learning process and to construct collective knowledge.  相似文献   

9.
Dynamic selection of scheduling rules during real operations has been recognized as a promising approach to the scheduling of the production line. For this strategy to work effectively, sufficient knowledge is required to enable prediction of which rule is the best to use under the current line status. In this paper, a new learning algorithm for acquiring such knowledge is proposed. In this algorithm, a binary decision tree is automatically generated using empirical data obtained by iterative production line simulations, and it decides in real time which rule to be used at decision points during the actual production operations. The configuration of the developed dynamic scheduling system and the learning algorithm are described in detail. Simulation results on its application to the dispatching problem are discussed with regard to its scheduling performance and learning capability.  相似文献   

10.
In reports addressing animal foraging strategies, it has been stated that Lévy-like algorithms represent an optimal search strategy in an unknown environment, because of their super-diffusion properties and power-law-distributed step lengths. Here, starting with a simple random walk algorithm, which offers the agent a randomly determined direction at each time step with a fixed move length, we investigated how flexible exploration is achieved if an agent alters its randomly determined next step forward and the rule that controls its random movement based on its own directional moving experiences. We showed that our algorithm led to an effective food-searching performance compared with a simple random walk algorithm and exhibited super-diffusion properties, despite the uniform step lengths. Moreover, our algorithm exhibited a power-law distribution independent of uniform step lengths.  相似文献   

11.
A new scheduling system for selecting dispatching rules in real time is developed by combining the techniques of simulation, data mining, and statistical process control charts. The proposed scheduling system extracts knowledge from data coming from the manufacturing environment by constructing a decision tree, and selects a dispatching rule from the tree for each scheduling period. In addition, the system utilises the process control charts to monitor the performance of the decision tree and dynamically updates this decision tree whenever the manufacturing conditions change. This gives the proposed system the ability to adapt itself to changes in the manufacturing environment and improve the quality of its decisions. We implement the proposed system on a job shop problem, with the objective of minimising average tardiness, to evaluate its performance. Simulation results indicate that the performance of the proposed system is considerably better than other simulation-based single-pass and multi-pass scheduling algorithms available in the literature. We also illustrate knowledge extraction by presenting a sample decision tree from our experiments.  相似文献   

12.
In this paper, we address the flexible job-shop scheduling problem (FJSP) with release times for minimising the total weighted tardiness by learning dispatching rules from schedules. We propose a random-forest-based approach called Random Forest for Obtaining Rules for Scheduling (RANFORS) in order to extract dispatching rules from the best schedules. RANFORS consists of three phases: schedule generation, rule learning with data transformation, and rule improvement with discretisation. In the schedule generation phase, we present three solution approaches that are widely used to solve FJSPs. Based on the best schedules among them, the rule learning with data transformation phase converts them into training data with constructed attributes and generates a dispatching rule with inductive learning. Finally, the rule improvement with discretisation improves dispatching rules with a genetic algorithm by discretising continuous attributes and changing parameters for random forest with the aim of minimising the average total weighted tardiness. We conducted experiments to verify the performance of the proposed approach and the results showed that it outperforms the existing dispatching rules. Moreover, compared with the other decision-tree-based algorithms, the proposed algorithm is effective in terms of extracting scheduling insights from a set of rules.  相似文献   

13.
Constructing interaction network from biomedical texts is a very important and interesting work. The authors take advantage of text mining and reinforcement learning approaches to establish protein interaction network. Considering the high computational efficiency of co‐occurrence‐based interaction extraction approaches and high precision of linguistic patterns approaches, the authors propose an interaction extracting algorithm where they utilise frequently used linguistic patterns to extract the interactions from texts and then find out interactions from extended unprocessed texts under the basic idea of co‐occurrence approach, meanwhile they discount the interaction extracted from extended texts. They put forward a reinforcement learning‐based algorithm to establish a protein interaction network, where nodes represent proteins and edges denote interactions. During the evolutionary process, a node selects another node and the attained reward determines which predicted interaction should be reinforced. The topology of the network is updated by the agent until an optimal network is formed. They used texts downloaded from PubMed to construct a prostate cancer protein interaction network by the proposed methods. The results show that their method brought out pretty good matching rate. Network topology analysis results also demonstrate that the curves of node degree distribution, node degree probability and probability distribution of constructed network accord with those of the scale‐free network well.Inspec keywords: cancer, proteins, molecular biophysics, learning (artificial intelligence), data mining, text analysis, medical computing, topology, statistical distributionsOther keywords: text mining, reinforcement learning, cooccurrence‐based interaction extraction approach, reinforcement learning‐based algorithm, prostate cancer protein interaction network, matching rate, scale‐free network, probability distribution, node degree probability, node degree distribution, network topology  相似文献   

14.
基于多Agent的加工时间定额确定方法   总被引:2,自引:0,他引:2  
研究了CAPP和生产调度集成环境中的时间定额问题,提出了一种基于Agent的时间定额确定方法。采用智能Agent的方法,建立了集成环境中确定时间定额的结构模型。将影响时间定额的因素映射成与工序状态有关的零件特征Agent和加工方法Agent。以神经网络Agent为计算工具,借助于Agent的判断推理能力,将各因素处理成神经网络可以识别和接受的数据模式。利用黑板结构处理各Agent之间的信息交互和协同控制,以快速确定工序的时间定额问题。实验表明,只要选择适当的样本,并在系统中不断进行自学习和自组织,就能快速获得准确的工序时间定额。  相似文献   

15.
Although mean flow time and tardiness have been used for a long time as indicators in both manufacturing plants and academic research on dispatching rules, according to Theory of Constraints (TOC), neither indicator properly measures deviation from production plans. TOC claims that using throughput dollar-day (TDD) and inventory dollar-day (IDD) can induce the factory to take appropriate actions for the organization as a whole, and that these can be applied to replace various key performance indices used by most factories. However, no one has studied dispatching rules based on TDD and IDD performance indicators. The study addresses two interesting issues. (1) If TDD and IDD are used as performance indicators, do those dispatching rules that yield a better performance in tardiness and mean flow time still yield satisfactory results in terms of TDD and IDD performance? (2) Does a dispatching rule exist to outperform the current dispatching rules in terms of TDD and IDD performance? First, a TDD/IDD-based heuristic dispatching rule is developed to answer these questions. Second, a computational experiment is performed, involving six simulation examples, to compare the proposed TDD/IDD-based heuristic-dispatching rule with the currently used dispatching rules. Five dispatching rules, shortest processing time, earliest due date, total profit, minimum slack and apparent tardiness cost, are adopted herein. The results demonstrate that the developed TDD/IDD-based heuristic dispatching rule is feasible and outperforms the selected dispatching rules in terms of TDD and IDD.  相似文献   

16.
The primary focus of this work is in the development of an evolutionary optimization technique which gets progressively 'smarter' during the optimization process by learning from computed domain knowledge. In the approach, the influence of the design variables on the problem solution is recognized, and the knowledge learned is then generalized to dynamically create or change design rules during optimization. This technique, when applied to a constrained optimization problem, shows progressive improvement in convergence of search, as successive generations of rules evolve by learning from the environment. This method is applied to a complex aerodynamic optimization problem involving turbine airfoil design. In this investigation, the 3D geometry of an airfoil is optimized by simultaneously optimizing multiple 2D slices of the airfoil. Results from the optimization of a low pressure turbine nozzle are presented in the paper. Results obtained using standard numerical optimization techniques are also presented for comparison purposes.  相似文献   

17.
The ability to predict future events based on the past is an important attribute of organisms that engage in adaptive behaviour. One prominent computational method for learning to predict is called temporal-difference (TD) learning. It is so named because it uses the difference between successive predictions to learn to predict correctly. TD learning is well suited to modelling the biological phenomenon of conditioning, wherein an organism learns to predict a reward even though the reward may occur later in time. We review a model for conditioning in bees based on TD learning. The model illustrates how the TD-learning algorithm allows an organism to learn an appropriate sequence of actions leading up to a reward, based solely on reinforcement signals. The second part of the paper describes how TD learning can be used at the cellular level to model the recently discovered phenomenon of spike-timing-dependent plasticity. Using a biophysical model of a neocortical neuron, we demonstrate that the shape of the spike-timing-dependent learning windows found in biology can be interpreted as a form of TD learning occurring at the cellular level. We conclude by showing that such spike-based TD-learning mechanisms can produce direction selectivity in visual-motion-sensitive cells and can endow recurrent neocortical circuits with the powerful ability to predict their inputs at the millisecond time-scale.  相似文献   

18.
Animals make use a range of social information to inform their movement decisions. One common movement rule, found across many different species, is that the probability that an individual moves to an area increases with the number of conspecifics there. However, in many cases, it remains unclear what social cues produce this and other similar movement rules. Here, we investigate what cues are used by damselfish (Dascyllus aruanus) when repeatedly crossing back and forth between two coral patches in an experimental arena. We find that an individual''s decision to move is best predicted by the recent movements of conspecifics either to or from that individual''s current habitat. Rather than actively seeking attachment to a larger group, individuals are instead prioritizing highly local and dynamic information with very limited spatial and temporal ranges. By reanalysing data in which the same species crossed for the first time to a new coral patch, we show that the individuals use static cues in this case. This suggests that these fish alter their information usage according to the structure and familiarity of their environment by using stable information when moving to a novel area and localized dynamic information when moving between familiar areas.  相似文献   

19.
Industrial systems are constantly subject to random events with inevitable uncertainties in production factors, especially in processing times. Due to this stochastic nature, selecting appropriate dispatching rules has become a major issue in practical problems. However, previous research implies that using one dispatching rule does not necessarily yield an optimal schedule. Therefore, a new algorithm is proposed based on computer simulation and artificial neural networks (ANNs) to select the optimal dispatching rule for each machine from a set of rules in order to minimise the makespan in stochastic job shop scheduling problems (SJSSPs). The algorithm contributes to the previous work on job shop scheduling in three significant ways: (1) to the best of our knowledge it is the first time that an approach based on computer simulation and ANNs is proposed to select dispatching rules; (2) non-identical dispatching rules are considered for machines under stochastic environment; and (3) the algorithm is capable of finding the optimal solution of SJSSPs since it evaluates all possible solutions. The performance of the proposed algorithm is compared with computer simulation methods by replicating comprehensive simulation experiments. Extensive computational results for job shops with five and six machines indicate the superiority of the new algorithm compared to previous studies in the literature.  相似文献   

20.
Efficient reservoir management requires the implementation of generalized optimal operating policies that manage storage volumes and releases while optimizing a single objective or multiple objectives. Reservoir operating rules stipulate the actions that should be taken under the current state of the system. This study develops a set of piecewise linear operating rule curves for water supply and hydropower reservoirs, employing an imperialist competitive algorithm in a parameterization–simulation–optimization approach. The adaptive penalty method is used for constraint handling and proved to work efficiently in the proposed scheme. Its performance is tested deriving an operation rule for the Dez reservoir in Iran. The proposed modelling scheme converged to near-optimal solutions efficiently in the case examples. It was shown that the proposed optimum piecewise linear rule may perform quite well in reservoir operation optimization as the operating period extends from very short to fairly long periods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号