期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An abstract knowledge based approach to diagnosis and recovery of plan failure in multi-agent systems

N. Shah K.-M. Chao A. James 《Advanced Engineering Informatics》2007,21(2):183-190

Open multi-agent systems (MAS) are decentralised and distributed systems that consist of a large number of loosely coupled autonomous agents. In the absence of centralised control they tend to be difficult to manage, especially in an open environment, which is dynamic, complex, distributed and unpredictable. This dynamism and uncertainty in an open environment gives rise to unexpected plan failures. In this paper we present an abstract knowledge based approach for the diagnosis and recovery of plan action failures. Our approach associates a sentinel agent with each problem solving agent in order to monitor the problem solving agent’s interactions. The proposed approach also requires the problem solving agents to be able to report on the status of a plan’s actions.Once an exception is detected the sentinel agents start an investigation of the suspected agents. The sentinel agents collect information about the status of failed plan abstract actions and knowledge about agents’ mental attitudes regarding any failed plan. The sentinel agent then uses this abstract knowledge and the agents’ mental attitudes, to diagnose the underlying cause of the plan failure. The sentinel agent may ask the problem solving agent to retry their failed plan based on the diagnostic result. 相似文献

2.

Interactive POMDPs with finite-state models of other agents

Alessandro?Panella Email author View author&#;s OrcID profile Piotr?Gmytrasiewicz 《Autonomous Agents and Multi-Agent Systems》2017,31(4):861-904

We consider an autonomous agent facing a stochastic, partially observable, multiagent environment. In order to compute an optimal plan, the agent must accurately predict the actions of the other agents, since they influence the state of the environment and ultimately the agent’s utility. To do so, we propose a special case of interactive partially observable Markov decision process, in which the agent does not explicitly model the other agents’ beliefs and preferences, and instead represents them as stochastic processes implemented by probabilistic deterministic finite state controllers (PDFCs). The agent maintains a probability distribution over the PDFC models of the other agents, and updates this belief using Bayesian inference. Since the number of nodes of these PDFCs is unknown and unbounded, the agent places a Bayesian nonparametric prior distribution over the infinitely dimensional set of PDFCs. This allows the size of the learned models to adapt to the complexity of the observed behavior. Deriving the posterior distribution is in this case too complex to be amenable to analytical computation; therefore, we provide a Markov chain Monte Carlo algorithm that approximates the posterior beliefs over the other agents’ PDFCs, given a sequence of (possibly imperfect) observations about their behavior. Experimental results show that the learned models converge behaviorally to the true ones. We consider two settings, one in which the agent first learns, then interacts with other agents, and one in which learning and planning are interleaved. We show that the agent’s performance increases as a result of learning in both situations. Moreover, we analyze the dynamics that ensue when two agents are simultaneously learning about each other while interacting, showing in an example environment that coordination emerges naturally from our approach. Furthermore, we demonstrate how an agent can exploit the learned models to perform indirect inference over the state of the environment via the modeled agent’s actions. 相似文献

3.

Plan coordination by revision in collective agent based systems 总被引：2，自引：0，他引：2

Hans Tonino Cees Witteveen 《Artificial Intelligence》2002,142(2):121-145

In order to model plan coordination behavior of agents we develop a simple framework for representing plans, resources and goals of agents. Plans are represented as directed acyclic graphs of skills and resources that, given adequate initial resources, can realize special resources, called goals. Given the storage costs of resources, application costs of skills, and values of goals, it is possible to reason about the profits of a plan for an agent. We then model two forms of plan coordination behavior between two agents, viz. fusion, aiming at the maximization of the total yield of the agents involved, and collaboration, which aims at the maximization of the individual yield of each agent. We argue how both forms of cooperation can be seen as iterative plan revision processes. We also present efficient polynomial algorithms for agent plan fusion and collaboration that are based on this idea of iterative plan revision. Both the framework and the fusion algorithm will be illustrated by an example from the field of transportation, where agents are transportation companies. 相似文献

4.

Learning opponent’s preferences for effective negotiation: an approach based on concept learning

Reyhan Aydo?an P?nar Yolum 《Autonomous Agents and Multi-Agent Systems》2012,24(1):104-140

We consider automated negotiation as a process carried out by software agents to reach a consensus. To automate negotiation, we expect agents to understand their user’s preferences, generate offers that will satisfy their user, and decide whether counter offers are satisfactory. For this purpose, a crucial aspect is the treatment of preferences. An agent not only needs to understand its own user’s preferences, but also its opponent’s preferences so that agreements can be reached. Accordingly, this paper proposes a learning algorithm that can be used by a producer during negotiation to understand consumer’s needs and to offer services that respect consumer’s preferences. Our proposed algorithm is based on inductive learning but also incorporates the idea of revision. Thus, as the negotiation proceeds, a producer can revise its idea of the consumer’s preferences. The learning is enhanced with the use of ontologies so that similar service requests can be identified and treated similarly. Further, the algorithm is targeted to learning both conjunctive as well as disjunctive preferences. Hence, even if the consumer’s preferences are specified in complex ways, our algorithm can learn and guide the producer to create well-targeted offers. Further, our algorithm can detect whether some preferences cannot be satisfied early and thus consensus cannot be reached. Our experimental results show that the producer using our learning algorithm negotiates faster and more successfully with customers compared to several other algorithms. 相似文献

5.

A distributed approach for fault detection and diagnosis based on Time Petri Nets 总被引：3，自引：0，他引：3

G. Jiroveanu R.K. Boel 《Mathematics and computers in simulation》2006,70(5-6):287-313

This paper proposes an algorithm for the model based design of a distributed protocol for fault detection and diagnosis for very large systems. The overall process is modeled as different Time Petri Net (TPN) models (each one modeling a local process) that interact with each other via guarded transitions that becomes enabled only when certain conditions (expressed as predicates over the marking of some places) are satisfied (the guard is true). In order to use this broad class of time DES models for fault detection and diagnosis we derive in this paper the timing analysis of the TPN models with guarded transitions. In this paper we also extend the modeling capability of the faults calling some transitions faulty when operations they represent take more or less time than a prescribed time interval corresponding to their normal execution. We consider here that different local agents receive local observation as well as messages from neighboring agents. Each agent estimates the state of the part of the overall process for which it has model and from which it observes events by reconciling observations with model based predictions. We design algorithms that use limited information exchange between agents and that can quickly decide “questions” about “whether and where a fault occurred?” and “whether or not some components of the local processes have operated correctly?”. The algorithms we derive allow each local agent to generate a preliminary diagnosis prior to any communication and we show that after communicating the agents we design recover the global diagnosis that a centralized agent would have derived. The algorithms are component oriented leading to efficiency in computation. 相似文献

6.

Aggregating value ranges: preference elicitation and truthfulness

Joseph Farfel Vincent Conitzer 《Autonomous Agents and Multi-Agent Systems》2011,22(1):127-150

We study the case where agents have preferences over ranges (intervals) of values, and we wish to elicit and aggregate these preferences. For example, consider a set of climatologist agents who are asked for their predictions for the increase in temperature between 2009 and 2100. Each climatologist submits a range, and from these ranges we must construct an aggregate range. What rule should we use for constructing the aggregate range? One issue in such settings is that an agent (climatologist) may misreport her range to make the aggregate range coincide more closely with her own (true) most-preferred range. We extend the theory of single-peaked preferences from points to ranges to obtain a rule (the median-of-ranges rule) that is strategy-proof under a condition on preferences. We then introduce and analyze a natural class of algorithms for approximately eliciting a median range from multiple agents. We also show sufficient conditions under which such an approximate elicitation algorithm still incentivizes agents to answer truthfully. Finally, we consider the possibility that ranges can be refined when the topic is more completely specified (for example, the increase in temperature on the North Pole given the failure of future climate pacts). We give a framework and algorithms for selectively specifying the topic further based on queries to agents. 相似文献

7.

Utility Models for Goal‐Directed, Decision‐Theoretic Planners

Peter Haddawy & Steve Hanks 《Computational Intelligence》1998,14(3):392-429

AI planning agents are goal-directed : success is measured in terms of whether an input goal is satisfied. The goal gives structure to the planning problem, and planning representations and algorithms have been designed to exploit that structure. Strict goal satisfaction may be an unacceptably restrictive measure of good behavior, however.
A general decision-theoretic agent, on the other hand, has no explicit goals: success is measured in terms of an arbitrary preference model or utility function defined over plan outcomes. Although it is a very general and powerful model of problem solving, decision-theoretic choice lacks structure, which can make it difficult to develop effective plan‐generation algorithms.
This paper establishes a middle ground between the two models. We extend the traditional AI goal model in several directions: allowing goals with temporal extent, expressing preferences over partial satisfaction of goals, and balancing goal satisfaction against the cost of the resources consumed in service of the goals. In doing so we provide a utility model for a goal-directed agent.
An important quality of the proposed model is its tractability. We claim that our model, like classical goal models, makes problem structure explicit. This structure can then be exploited by a problem-solving algorithm. We support this claim by reporting on two implemented planning systems that adopt and exploit our model. 相似文献

8.

A heuristic technique for multi‐agent planning

Eithan Ephrati Jeffrey S. Rosenschein 《Annals of Mathematics and Artificial Intelligence》1997,20(1-4):13-67

The subject of multi‐agent planning has been of continuing concern in Distributed Artificial Intelligence (DAI). In this paper, we suggest an approach to multi‐agent planning that contains heuristic elements. Our method makes use of subgoals, and derived sub‐plans, to construct a global plan. Agents solve their individual sub‐plans, which are then merged into a global plan. The suggested approach reduces overall planning time and derives a plan that approximates the optimal global plan that would have been derived by a central planner, given those original subgoals. We explore three different scenarios. The first involves a group of agents with a common goal. The second considers how agents can interleave planning and execution when planning towards a common, though dynamic, goal. The third examines the case where agents, each with their own goal, can plan together to reach a state in consensus for the group. Finally, we consider how these approaches can be adapted to handle rational, manipulative agents. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

9.

Practical voting rules with partial information

Meir Kalech Sarit Kraus Gal A. Kaminka Claudia V. Goldman 《Autonomous Agents and Multi-Agent Systems》2011,22(1):151-182

Voting is an essential mechanism that allows multiple agents to reach a joint decision. The joint decision, representing a function over the preferences of all agents, is the winner among all possible (candidate) decisions. To compute the winning candidate, previous work has typically assumed that voters send their complete set of preferences for computation, and in fact this has been shown to be required in the worst case. However, in practice, it may be infeasible for all agents to send a complete set of preferences due to communication limitations and willingness to keep as much information private as possible. The goal of this paper is to empirically evaluate algorithms to reduce communication on various sets of experiments. Accordingly, we propose an iterative algorithm that allows the agents to send only part of their preferences, incrementally. Experiments with simulated and real-world data show that this algorithm results in an average of 35% savings in communications, while guaranteeing that the actual winning candidate is revealed. A second algorithm applies a greedy heuristic to save up to 90% of communications. While this heuristic algorithm cannot guarantee that a true winning candidate is found, we show that in practice, close approximations are obtained. 相似文献

10.

Action dependencies in privacy-preserving multi-agent planning

Shlomi Maliah Guy Shani Roni Stern 《Autonomous Agents and Multi-Agent Systems》2018,32(6):779-821

Collaborative privacy-preserving planning (CPPP) is a multi-agent planning task in which agents need to achieve a common set of goals without revealing certain private information. In many CPPP algorithms, the individual agents reason about a projection of the multi-agent problem onto a single-agent classical planning problem. For example, an agent can plan as if it controls the public actions of other agents, ignoring any private preconditions and effects theses actions may have, and use the cost of this plan as a heuristic estimate of the cost of the full, multi-agent plan. Using such a projection, however, ignores some dependencies between agents’ public actions. In particular, it does not contain dependencies between public actions of other agents caused by their private facts. We propose a projection in which these private dependencies are maintained. The benefit of our dependency-preserving projection is demonstrated by using it to produce high-level plans in a new privacy-preserving planner, and as a heuristic for guiding forward search privacy-preserving algorithms. Both are able to solve more benchmark problems than any other state-of-the-art privacy-preserving planner. This more informed projection does not explicitly expose any private fact, action, or precondition. In addition, we show that even if an adversary agent knows that an agent has some private objects of a given type (e.g., trucks), it cannot infer the number of such private objects that the agent controls. This introduces a novel form of strong privacy, which we call object-cardinality privacy, that is motivated by real-world requirements. 相似文献

11.

Modelling user preferences and mediating agents in electronic commerce

《Knowledge》2005,18(7):335-352

An important ingredient in agent-mediated electronic commerce is the presence of intelligent mediating agents that assist electronic commerce participants (e.g. individual users, other agents, organisations). These mediating agents are in principle autonomous agents that interact with their environments (e.g. other agents and web-servers) on behalf of participants who have delegated tasks to them. For mediating agents a (preference) model of participants is indispensable. In this paper, a generic mediating agent architecture is introduced. Furthermore, we discuss our view of user preference modelling and its need in agent-mediated electronic commerce. We survey the state of the art in the field of preference modelling and suggest that the preferences of electronic commerce participants can be modelled by learning from their behaviour. In particular, we employ an existing machine learning method called inductive logic programming (ILP). We argue that this method can be used by mediating agents to detect regularities in the behaviour of the involved participants and induce hypotheses about their preferences automatically. Finally, we discuss some advantages and disadvantages of using inductive logic programming as a method for learning user preferences and compare this method with other approaches. 相似文献

12.

Donation Center Location Problem

Chien-Chung Huang Zoya Svitkina 《Algorithmica》2013,66(1):187-210

We introduce and study the donation center location problem, which has an additional application in network testing and may also be of independent interest as a general graph-theoretic problem. Given a set of agents and a set of centers, where agents have preferences over centers and centers have capacities, the goal is to open a subset of centers and to assign a maximum-sized subset of agents to their most-preferred opened centers, while respecting the capacity constraints. We prove that in general, the problem is hard to approximate within n ^1/2?? for any ?>0. In view of this, we investigate two special cases. In one, every agent has a bounded number of centers on its preference list, and in the other, all preferences are induced by a line-metric. We present constant-factor approximation algorithms for the former and exact polynomial-time algorithms for the latter. 相似文献

13.

Multiagent Reinforcement Learning:Rollout and Policy Iteration

下载免费PDF全文

Dimitri Bertsekas 《IEEE/CAA Journal of Automatica Sinica》2021,8(2):249-272

We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest method of this type,where just one improved policy is generated.We can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next iteration.In contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line replanning.Moreover,rollout can use as base policy one of the policies produced by PI,thereby improving on that policy.This is the type of scheme underlying the prominently successful Alpha Zero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate agent.This is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state information.Based on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other agents.The amount of total computation required at every stage grows linearly with the number of agents.By contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of agents.Despite the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base policy.We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the cost improvement property,without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems,we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation.For one of our PI algorithms,we prove convergence to an agentby-agent optimal policy,thus establishing a connection with the theory of teams.For another PI algorithm,which is executed over a more complex state space,we prove convergence to an optimal policy.Approximate forms of these algorithms are also given,based on the use of policy and value neural networks.These PI algorithms,in both their exact and their approximate form are strictly off-line methods,but they can be used to provide a base policy for use in an on-line multiagent rollout scheme. 相似文献

14.

DARE: a system for distributed abductive reasoning

Jiefei Ma Alessandra Russo Krysia Broda Keith Clark 《Autonomous Agents and Multi-Agent Systems》2008,16(3):271-297

相似文献

15.

Intentional learning agent architecture

Budhitama Subagdja Liz Sonenberg Iyad Rahwan 《Autonomous Agents and Multi-Agent Systems》2009,18(3):417-470

Dealing with changing situations is a major issue in building agent systems. When the time is limited, knowledge is unreliable, and resources are scarce, the issue becomes more challenging. The BDI (Belief-Desire-Intention) agent architecture provides a model for building agents that addresses that issue. The model can be used to build intentional agents that are able to reason based on explicit mental attitudes, while behaving reactively in changing circumstances. However, despite the reactive and deliberative features, a classical BDI agent is not capable of learning. Plans as recipes that guide the activities of the agent are assumed to be static. In this paper, an architecture for an intentional learning agent is presented. The architecture is an extension of the BDI architecture in which the learning process is explicitly described as plans. Learning plans are meta-level plans which allow the agent to introspectively monitor its mental states and update other plans at run time. In order to acquire the intricate structure of a plan, a process pattern called manipulative abduction is encoded as a learning plan. This work advances the state of the art by combining the strengths of learning and BDI agent frameworks in a rich language for describing deliberation processes and reactive execution. It enables domain experts to specify learning processes and strategies explicitly, while allowing the agent to benefit from procedural domain knowledge expressed in plans. 相似文献

16.

带有目标偏好的最大成功联盟生成算法^*

张国富杜晓东苏兆品蒋建国《模式识别与人工智能》2017,30(6):489-498

资源结盟博弈(CRGs)研究均假设每个agent可以响应所有目标,即使目标不在其感兴趣的子目标集内.针对此问题,文中提出带有目标偏好的CRGs模型,即每个agent只愿意把自己的有限资源贡献给自己的兴趣集中的目标.此外,设计基于二维二进制编码的最大成功联盟生成算法,并提出编码修正启发式算法解决多个目标竞争同一agent资源可能引起的的资源冲突.最后,通过与已有相关算法的对比实验验证文中算法的有效性. 相似文献

17.

A Resource Logic for Multi-Agent Plan Merging

Mathijs de Weerdt André Bos Hans Tonino Cees Witteveen 《Annals of Mathematics and Artificial Intelligence》2003,37(1-2):93-130

In a multi-agent system, agents are carrying out certain tasks by executing plans. Consequently, the problem of finding a plan, given a certain goal, has been given a lot of attention in the literature. Instead of concentrating on this problem, the focus of this paper is on cooperation between agents which already have constructed plans for their goals. By cooperating, agents might reduce the number of actions they have to perform in order to fulfill their goals. The key idea is that in carrying out a plan an agent possibly produces side products that can be used as resources by other agents. As a result, an other agent can discard some of its planned actions. This process of exchanging products, called plan merging, results in distributed plans in which agents become dependent on each other, but are able to attain their goals more efficiently. In order to model this kind of cooperation, a new formalism is developed in which side products are modeled explicitly. The formalism is a resource logic based on the notions of resource, skill, goal, and service. Starting with some resources, an agent can perform a number of skills in order to produce other resources which suffice to achieve some given goals. Here, a skill is an elementary production process taking as inputs resources satisfying certain constraints. A service is a serial or parallel composition of skills acting as a program. An operational semantics is developed for these services as programs. Using this formalism, an algorithm for plan merging is developed, which is anytime and runs in polynomial time. Furthermore, a variant of this algorithm is proposed that handles the exchange of resources in a more flexible way. The ideas in the paper will be illustrated by an example from public transportation. 相似文献

18.

A self-stabilizing transformer for population protocols with covering

Joffroy Beauquier Shay Kutten 《Theoretical computer science》2011,412(33):4247-4259

Developing self-stabilizing solutions is considered to be more challenging and complicated than developing classical solutions, where a proper initialization of the variables can be assumed. Hence, to ease the task of the developers, some automatic techniques have been proposed to design self-stabilizing algorithms. In this paper, we propose an automatic transformer for algorithms in an extended population protocol model. Population protocols is a model that was introduced recently for networks with a large number of resource-limited mobile agents. We use a variant of this model. First, we assume agents having characteristics (e.g., moving speed, communication radius) affecting their intercommunication “speed”, which is reflected by their cover times. Second, we assume the existence of a special agent with an unbounded memory, the base station. The automatic transformer takes as an input an algorithm solving a static problem (and meeting some additional rather natural requirements) and outputs a self-stabilizing algorithm for the same problem. The transformer is built using a re-execution approach (the technique consisting of executing an algorithm repeatedly in order to obtain its self-stabilizing version). We show that in the model we use, a transformer based on such an approach is impossible without the assumption of an unbounded memory agent. 相似文献

19.

A novel approach to accelerate the convergence speed of a stochastic multi-agent system using recurrent neural nets

Nasim Nourafza Saeed Setayeshi Ahmad Khadem-Zadeh 《Neural computing & applications》2012,21(8):2015-2021

One problem in the design of multi-agent systems is the difficulty of predicting the occurrences that one agent might face, also to recognize and to predict their optimum behavior in these situations. Therefore, one of the most important characteristic of the agent is their ability during adoption, to learn, and correct their behavior. With consideration of the continuously changing environment, the back and forth learning of the agents, the inability to see the agent’s action first hand, and their chosen strategies, learning in a multi-agent environment can be very complex. On the one hand, with recognition to the current learning models that are used in deterministic environment that behaves linearly, which contain weaknesses; therefore, the current learning models are unproductive in complex environments that the actions of agents are stochastic. Therefore, it is necessary for the creation of learning models that are effective in stochastic environments. Purpose of this research is the creation of such a learning model. For this reason, the Hopfield and Boltzmann learning algorithms are used. In order to demonstrate the performance of their algorithms, first, an unlearned multi-agent model is created. During the interactions of the agents, they try to increase their knowledge to reach a specific value. The predicated index is the number of changed states needed to reach the convergence. Then, the learned multi-agent model is created with the Hopfield learning algorithm, and in the end, the learned multi-agent model is created with the Boltzmann learning algorithm. After analyzing the obtained figures, a conclusion can be made that when learning impose to multi-agent environment the average number of changed states needed to reach the convergence decreased and the use of Boltzmann learning algorithm decreased the average number of changed states even further in comparison with Hopfield learning algorithm due to the increase in the number of choices in each situation. Therefore, it is possible to say that the multi-agent systems behave stochastically, the more closer they behave to their true character, the speed of reaching the global solution increases. 相似文献

20.

Elevator Group Control Using Multiple Reinforcement Learning Agents 总被引：22，自引：0，他引：22

Crites Robert H. Barto Andrew G. 《Machine Learning》1998,33(2-3):235-262

Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility. 相似文献