首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
This paper proposes a self-organizing scheme based on ant metaheuristics to optimize the operation of multiple classes of managed elements on an Operations Support Systems (OSSs) for mobile pervasive communications. Ant metaheuristics are characterized by learning and adaptation capabilities against dynamic environment changes and uncertainties. As an important division of swarm agent intelligence, it distinguishes itself from centralized management schemes due to its features of robustness and scalability. We have successfully applied ant metaheuristics to the network service configuration process, which is simply redefined as: the managed elements represented as graphic nodes, and ants traverse by selecting nodes with the minimum cost constraints until the eligible network elements are located along near-optimal paths—the located elements are those needed for the configuration or activation of a particular product and service. Although the configuration process is non-transparent to end users, the negotiated SLAs between users and providers affect the overall process. This proposed self-organized learning and adaptation scheme using Ant Colony Optimization (ACO) is evaluated by simulation in Java. A performance comparison is also made with a class of Genetic Algorithm known as PBIL. Finally, the simulation results show the scalability and robustness capability of autonomous ant-like agents able to adapt to dynamic networks.  相似文献   

2.
For ensuring reliability at the transport level end-to-end multicasting, an efficient loss recovery mechanism is indispensable. We consider scalability, topology independence and robustness as the significant features that such a mechanism should offer, and demonstrate that an epidemic loss recovery approach is superior in all these aspects. We also show that the epidemic approach transparently handles network link failures by using pair-wise propagation of information, and compare it with feedback controlled loss recovery on identical network settings. The contribution of this work is the simulative analysis of recovery overhead distribution on multicast group members in the case of various link failures on the network, the impact of group size, randomized system-wide noise and message rate on scalability, and examination of various scenarios modeling the overlay networks. We investigate the important features of epidemic multicast loss recovery extensively together and reach concrete results on realistic network scenarios.  相似文献   

3.
Boolean network tomography is a promising technique to achieve fault management in networks where the existing IP-based troubleshooting mechanism cannot be used. Aiming to apply Boolean network tomography to fault management, a variety of heuristic methods for configuring monitoring trails and paths have been proposed to localize link failures in managed networks. However, these existing heuristic methods must be executed in a centralized server that administers the entire managed network and the methods present scalability problems when applied to large-scale managed networks. Thus, this paper proposes a novel scheme for achieving lightweight Boolean network tomography in a decentralized manner. The proposed scheme partitions the managed network into multiple management areas and localizes link failures independently within each area. This paper also proposes a heuristic network partition method with the aim of efficiently implementing the proposed scheme. The effectiveness of the proposed scheme is verified using typical fault management scenarios where all single-link failures and all dual-link failures are localized by the least number of monitoring paths on predetermined routes. Simulation results show that the proposed scheme can greatly reduce the computational load on the fault management server when Boolean network tomography is deployed in large-scale managed networks. Furthermore, the degradation of optimality in the proposed scheme can be mitigated in comparison with a centralized scheme that utilizes heuristics to reduce the computational load on the centralized server.  相似文献   

4.
Local or wide-area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long-running distributed computations. We found this to be the case in early experiments with a prototype of the EcliPSe system, a software toolkit for replicative applications on heterogeneous workstation clusters. Hardware or network failures in computations that executed for over a day were not uncommon. In this work, a variety of features for the incorporation of failure resilience in the EcliPSe system are described. Key characteristics of this fault-tolerant system are ease of use, low state-saving cost, system scalability and good performance. We present results of some experiments demonstrating low state-saving overheads and small system-recovery times, as a function of the amount of state saved.  相似文献   

5.
Distributed wide-area storage systems must tolerate both physical failure and logic errors. In particular, these functions are needed to enable the storage system to support remote disaster recovery. There are several solutions for distributed wide-area backup/archive systems implemented at application level, file system level or at storage subsystem level. However, they suffer from high deployment cost and security issues. Moreover, previous researches in literature only focus on any disk-related failures and ignore the fact that storage server linked predominantly to a Wide-Area-Network (WAN) which may be unavailable or owing to network failures. In this paper, we first model the efficiency and reliability of distributed wide area storage systems for all media, taking both network failures and disk failures into consideration. To provide higher performance, efficiency, reliability, and security to the wide-area disaster recovery storage systems, we present a configurable RAID-like data erasure-coding scheme referred to as Replication-based Snapshot Redundant Array of Independent Imagefiles (RSRAII). We argue that this scheme has benefits resulting from the consolidation of both erasure-coding and replication strategies. To this end, we propose a novel algorithm to improve the snapshot performance referred to as SMPDP (Snapshot based on Multi-Parallel Degree Pipeline). We also extend this study towards implementing a prototype system, called as SeWDReSS, which is shown to strike a tradeoff between reliability, storage space, security, and performance for distributed wide-area disaster recovery.  相似文献   

6.
Decentralized sensor networks promise virtually unlimited scalability and can tolerate individual component failures. An experimental active sensor network that leverages environment-centric modes of human-robot interaction can keep up with a network's arbitrary growth. Spatially distributed sensors provide better coverage, faster response to dynamically changing environments, better survivability, and robustness to failure. Taking an extra step to a decentralized system provides further benefits of scalability, modularity, and performance. Our active sensor network is a collection of sensing platforms connected into a network. Some or all of the network components have actuators that we can control, making them, in this sense, active. A mobile robot with onboard sensors and a communication facility is an example of an active component. We investigate the scalability of an important aspect of an ASN: interaction with human operations.  相似文献   

7.
Gossip‐based protocols for group communication have attractive scalability and reliability properties. This paper presents a gossip‐based protocol that enables agents to reach a consensus based on a specific uninorm aggregation operator. We theoretically analyze the convergence, the speed and the randomness features of this protocol as well as its extensions. This model can be used to handle the uncertainty and the fast convergence characteristic of collective decision dynamics. The experimental results show that this protocol is efficient, scalable, and resilient against the failures under various network sizes, and topologies. © 2012 Wiley Periodicals, Inc.  相似文献   

8.
Clusters of workstations, connected by a fast network, are emerging as a viable architecture for building high-throughput fault-tolerant servers. This type of architecture is more scalable and more cost-effective than a tightly coupled multiprocessor and may achieve as good a throughput. Two of the most important issues that a designer of such clustered servers must consider in order for the system to meet its fault-tolerance and throughput goals are the load-balancing scheme and the fault-tolerance scheme that the system will use. This paper explores several combinations of such fault-tolerance and load-balancing schemes and compares their impact on the maximum throughout achievable by the system, and on its survivability. In particular, we show that a fault-tolerance scheme may have an effect on the throughput of the system, while a load-balancing scheme may affect the ability of the system to override failures. We study the scalability of the different schemes under different loads and failure conditions. Our simulations take into consideration the overhead of each scheme, the network contention, and the resource loads.  相似文献   

9.
Recently, a number of query processors has been proposed for the evaluation of relational queries in structured P2P systems. However, as these approaches do not consider peer or link failures, they cannot be deployed without extensions for real-world applications. We show that typical failures in structured P2P systems can have an unpredictable impact on the correctness of the result. In particular stateful operators that store intermediate results on peers, e.g., the distributed hash join, must protect such results against failures. Although many replication schemes for P2P systems exist, they cannot replicate operator states while the query is processed. In this paper we propose an in-query replication scheme which replicates the state of an operator among the neighbors of the processing peer. Our analytical evaluation shows that the network overhead of the in-query replication is in O(1) regarding network size, i.e., our scheme is scalable. We have carried out an extensive experimental evaluation using simulations as well as a PlanetLab deployment. It confirms the effectiveness and the efficiency of the in-query replication scheme and shows the effectiveness of the routing extension in networks of varying reliability.  相似文献   

10.
In this paper, we investigate the problem of dynamically routing bandwidth-guaranteed label switched paths (LSPs) in integrated IP-over-wavelength division multiplexed (WDM) networks with inaccurate link state information. To select a good path, a routing algorithm needs up-to-date link state information. This leads to excessive update overhead and scalability problems. In real networks, from the practical point of view, in order to avoid extensive overhead of advertising and processing link state information, updates need to be made periodically or based on a threshold trigger. This leads to inaccuracies in the link state information. Our contribution is that we consider the routing problem taking into consideration the uncertainty of link state parameters due to wavelength inaccuracy in addition to bandwidth inaccuracy. Based on the threshold-triggered update scheme, we present a probabilistic method to model the uncertainty of link state parameters. We then define a cost function reflecting the uncertainty. Depending on different cost metrics chosen to be optimized, we propose two routing algorithms considering the uncertainty of link state parameters. The objective is to minimize the impact of inaccurate information so that the blocking probability as well as setup failures are reduced. We use various performance metrics such as total blocking probability, blocking probability due to setup failures, blocking probability due to routing failures, bandwidth update frequency, and wavelength update frequency to evaluate the effectiveness of the proposed algorithms. Through extensive simulation experiments, we show that our algorithms can significantly reduce the impact of inaccurate link state information and perform very well.  相似文献   

11.
Latency measures the delay caused by communication between processors and memory modules over the network in a parallel system. Using intensive measurements and simulation, we show that network latency forms a major obstacle to improving parallel computing performance and scalability. We present an experimental metric, using network latency to measure and evaluate the scalability of parallel programs and architectures. This latency metric is an extension to the isoefficiency function [Grama et al., IEEE Parallel Distrib. Technology 1, 3 (1993), 12-21] and iso-speed metric [Sun and Rover, IEEE Trans. Parallel Distrib. Systems 5, 6 (1994), 599-613]. We give a measurement method for using this latency metric, and report the experimental results of evaluating the scalabilities of several scientific computing algorithms on the KSR-1 shared-memory architecture. Our analysis and experiments show that the latency metric is a practical method to effectively predict and evaluate scalability based on measured latencies inherent in the program and the architecture.  相似文献   

12.
With emerging Internet-scale open content and resource sharing, social networks, and complex cyber-physical systems, trust issues become prominent. Conventional trust mechanisms are inadequate at addressing trust issues in decentralized open environments. In this paper, we propose a trust vector based trust management scheme called VectorTrust for aggregation of distributed trust scores. Leveraging a Bellman–Ford based algorithm for fast and lightweight trust score aggregation, VectorTrust features localized and distributed concurrent communication. Built on a trust overlay network in a peer-to-peer network, a VectorTrust-enabled system is decentralized by nature and does not rely on any centralized server or centralized trust aggregation. We design, implement, and analyze trust rating, trust aggregation, and trust management strategies. To evaluate the performance, we design and implement a VectorTrust simulator (VTSim) in an unstructured P2P network. The analysis and simulation results demonstrate the efficiency, accuracy, scalability, and robustness of VectorTrust scheme. On average, VectorTrust converges faster and involves less complexity than most existing trust schemes. VectorTrust remains robust and tolerant to malicious peers and malicious behaviors. With dynamic growth of P2P network scales and topology complexities, VectorTrust scales well with reasonable overheads (about O(lg?N) communication overheads) and fast convergence speed (about O(log? D N) iterations).  相似文献   

13.
In this paper, a recurrent neural network (RNN) control scheme is proposed for a biped robot trajectory tracking system. An adaptive online training algorithm is optimized to improve the transient response of the network via so-called conic sector theorem. Furthermore, L 2-stability of weight estimation error of RNN is guaranteed such that the robustness of the controller is ensured in the presence of uncertainties. In consideration of practical applications, the algorithm is developed in the discrete-time domain. Simulations for a seven-link robot model are presented to justify the advantage of the proposed approach. We give comparisons between the standard PD control and the proposed RNN compensation method.  相似文献   

14.
Multiple failures can have catastrophic consequences on the normal operation of telecommunication networks. In this sense, guaranteeing network robustness to avoid users and services being disconnected is essential. A wide range of metrics have been proposed for measuring network robustness. In this paper the taxonomy of robustness metrics in telecommunication networks has been extended and a classification of multiple failures scenarios has been made. Moreover, a structural and centrality robustness comparison of 15 real telecommunication networks experiencing multiple failures was carried out. Through this analysis the topological properties which are common for grouping networks with similar robustness are able to be identified.  相似文献   

15.
A new approach to solving D> 3 spatial dimensional convection-diffusion equation on clusters of workstations is derived by exploiting the stability and scalability of the combination of a generalized D dimensional high-order compact (HOC) implicit finite difference scheme and parallelized GMRES(m). We then consider its application to multifactor Option pricing using the Black–Scholes equation and further show that an isotropic fourth order compact difference scheme is numerically stable and determine conditions under which its coefficient matrix is positive definite. The performance of GMRES(m) on distributed computers is limited by the inter-processor communication required by the matrix-vector multiplication. It is shown that the compact scheme requires approximately half the number of communications as a non-compact difference scheme of the same order of truncation error. As the dimensionality is increased, the ratio of computation that can be overlapped with communication also increases. CPU times and parallel efficiency graphs for single time step approximation of up to a 7D HOC scheme on 16 processors confirm the numerical stability constraint and demonstrate improved parallel scalability over non-compact difference schemes.  相似文献   

16.
未来100P/E级高性能计算机系统对网络的传输可靠性、性能均衡性、可扩展性方面有更高的需求。本文提出的RDMA传输模型,采取配置少量资源,动态连接使用的策略实现端到端的数据可靠传输。与传统的可靠通信协议如Infiniband相比,本方案的优势为:(1)支持自动重路由,可绕过网络故障区域保证消息的可靠传输;(2)支持报文乱序到达,支持源和目的间的多路径传输,提供消息的流控机制,能较好地均衡网络整体性能,减少网络热点和缓解网络拥塞;(3)基于通信接口硬件实现可靠性数据结构,不需要消耗主存为通信建立连接,具有极高的系统可扩展性。初步测试结果表明,采取了优化措施后,该协议不会增加小于4K字节消息的传输延迟。  相似文献   

17.
Decentralized Reputation Systems have recently emerged as a prominent method of establishing trust among self-interested agents in online environments. A key issue is the efficient aggregation of data in the system; several approaches have been proposed, but they are plagued by major shortcomings. We put forward a novel, decentralized data management scheme grounded in gossip-based algorithms. Rumor mongering is known to possess algorithmic advantages, and indeed, our framework inherits many of their salient features: scalability, robustness, a global perspective, and simplicity. We demonstrate that our scheme motivates agents to maintain a very high reputation, by showing that the higher an agent’s reputation is above the threshold set by its peers, the more transactions it would be able to complete within a certain time unit. We analyze the relation between the amount by which an agent’s average reputation exceeds the threshold and the time required to close a deal. This analysis is carried out both theoretically, and empirically through a simulation system called GossipTrustSim. Finally, we show that our approach is inherently impervious to certain kinds of attacks. A preliminary version of this article appeared in the proceedings of IJCAI 2007.  相似文献   

18.
Intra‐domain routing protocols are based on shortest path first (SPF) routing, where shortest paths are calculated between each pair of nodes (routers) using pre‐assigned link weights, also referred to as link metric. These link weights can be modified by network administrators in accordance with the routing policies of the network operator. The operator's objective is usually to minimize traffic congestion or minimize total routing cost subject to the traffic demands and the protocol constraints. However, determining a link weights combination that best suits the network operator's requirements is a difficult task. This paper provides a survey of meta‐heuristic approaches to traffic engineering, focusing on local search approaches and extensions to the basic problem taking into account changing demands and robustness issues with respect to network failures.  相似文献   

19.
Even after thorough testing,a few bugs still remain in a program with moderate complexity.These residual bugs are randomly distributed throughout the code.We have noticed that bugs in some parts of a program cause frequent and severe failures compared to those in other parts.Then,it is necessary to take a decision about what to test more and what to test less within the testing budget.It is possible to prioritize the methods and classes of an object-oriented program according to their potential to cause failures.For this,we propose a program metric called influence metric to find the influence of a program element on the source code.First,we represent the source code into an intermediate graph called extended system dependence graph.Then,forward slicing is applied on a node of the graph to get the influence of that node.The influence metric for a method m in a program shows the number of statements of the program which directly or indirectly use the result produced by method m.We compute the influence metric for a class c based on the influence metric of all its methods.As influence metric is computed statically,it does not show the expected behavior of a class at run time.It is already known that faults in highly executed parts tend to more failures.Therefore,we have considered operational profile to find the average execution time of a class in a system.Then,classes are prioritized in the source code based on influence metric and average execution time.The priority of an element indicates the potential of the element to cause failures.Once all program elements have been prioritized,the testing effort can be apportioned so that the elements causing frequent failures will be tested thoroughly.We have conducted experiments for two well-known case studies - Library Management System and Trading Automation System - and successfully identified critical elements in the source code of each case study.We have also conducted experiments to compare our scheme with a related scheme.The experimental studies justify that our approach is more accurate than the existing ones in exposing critical elements at the implementation level.  相似文献   

20.
In this paper, we propose Virtual Id Routing (VIRO) a novel “plug-&-play” non-IP routing protocol for future dynamics networks. VIRO decouples routing/forwarding from addressing by introducing a topology-aware, structured virtual id layer to encode the locations of switches and devices in the physical topology. It completely eliminates network-wide flooding in both the data and control planes, and thus is highly scalable and robust. VIRO effectively localizes the effect of failures, performs fast re-routing and supports multiple (logical) topologies on top of the same physical network substrate to further enhance network robustness. We have implemented an initial prototype of VIRO using Open vSwitch, and we extend it (both within the user space and the kernel space) to implement VIRO switching functions in VIRO switches. In addition, we use the POX SDN controller to implement VIRO’s control and management plane functions. We evaluate our prototype implementation through emulation and in the GENI (the Global Environment for Network Innovations) testbed using many synthetic and real topologies. Our evaluation results show that VIRO has better scalability than link-state based protocols (e.g. OSPF and SEATTLE) in terms of routing-table size and control overhead, as well as better mechanisms for failure recovery.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号