共查询到20条相似文献,搜索用时 11 毫秒
1.
Processor thrashing in load distribution refers to the situation when a number of nodes try to negotiate with the same target node simultaneously. The performance of dynamic load-balancing algorithms can be degraded because processor thrashing can lead to receiver node overdrafting, thus causing congestion at a receiver node and reduction of workload distribution. In the paper we present an adaptive algorithm for resolving processor thrashing in load distribution. The algorithm is based on the integration of three components: (1) a batch task assignment policy, which allows a number of tasks to be transferred as a single batch from a sender to a receiver; (2) a negotiation protocol to obtain mutual agreement between a sender and a receiver on the batch size; and (3) an adaptive symmetrically-initiated location policy to select a potential transfer partner. Simulations reveal that our algorithm provides a significant performance improvement at high system loads because the algorithm can avoid processor thrashing so that CPU capacity is more fully utilized. 相似文献
2.
This paper describes the results of controlled experiments with a Honeywell H6000 series multiprocessor computer system, 212 experiments were performed using 14 test workloads on 8 H6000 configurations. The number of processors varied from 1 to 4, system controller units (SCUs) from 1 to 4, and main memory from 256K to 1024K words. The ratio (P) of I/O time and CPU time for the test workloads varied from 0.01 to 5.07. The improvement in throughput is expressed in terms of relative throughput (φ), defined as the ratio of elapsed time for a given test workload on a single-processor configuration to that on a multiprocessor configuration. The relative throughput increased monotonically with an increase in the number of processors for test workloads in the range 0<P<0.4 (CPU-bound) and φ exhibited an asymptotic behaviour for test workloads in the range 0.4<P<5.07 (I/O-bound). 相似文献
3.
For the automatic control systems with tight real time, construction of feasible schedules under the given job execution deadlines was considered, and in addition the constraints on processor memory were taken into consideration. Two methods were developed to solve this problem. The first method is based on reducing the original problem to the search of a multi-commodity flow in a special network. The second method offers a fast algorithm to determine a feasible schedule for the uniprocessor case.Translated from Avtomatika i Telemekhanika, No. 2, 2005, pp. 138–147.Original Russian Text Copyright © 2005 by Guz, Furugyan. 相似文献
4.
The AMD Opteron processor for multiprocessor servers 总被引:1,自引:0,他引:1
Representing AMD's entry into 64-bit computing, Opteron combines the backwards compatibility of the X86-64 architecture with a DDR memory controller and hypertransport links to deliver server-class performance. These features also make Opteron a flexible, modular, and easily connectable component for various multiprocessor configurations. 相似文献
5.
Modern DBMSes are designed to support many transactions running simultaneously. DBMS thrashing is indicated by the existence of a sharp drop in transaction throughput. Thrashing behavior in DBMSes is a serious concern to database administrators (DBAs) as well as to DBMS implementers. From an engineering perspective, therefore, it is of critical importance to understand the causal factors of DBMS thrashing. However, understanding the origin of thrashing in modern DBMSes is challenging, due to many factors that may interact with each other.This article aims to better understand the thrashing phenomenon across multiple DBMSes. We identify some of the underlying causes of DBMS thrashing. We then propose a novel structural causal model to explicate the relationships between various factors contributing to DBMS thrashing. Our model derives a number of specific hypotheses to be subsequently tested across DBMSes, providing empirical support for this model as well as important engineering implications for improvements in transaction processing. 相似文献
6.
A surface micromachine was designed specifically for studying sidewall adhesion in microelectromechanical systems (MEMS). The dependence of surface adhesion on contact load and ambient conditions was investigated under quasistatic normal loading conditions. Insight was obtained into the relative contributions of van der Waals and capillary forces to the measured adhesion force. Several shortcomings in previous adhesion studies of MEMS were overcome, and measurement of the true adhesion force was achieved under different testing conditions. The present experimental procedure enables the isolation of the van der Waals component of the adhesion force and the determination of the contributions of both contacting and noncontacting asperities to the total adhesion force at the inception of surface separation. The major benefits of the developed experimental methodology and surface micromachine are discussed in the context of adhesion results obtained for different values of apparent contact pressure, ambient pressure, and relative humidity. 相似文献
7.
In this article, we consider the problem of self-diagnosis of multiprocessor and multicomputer systems under the generalized comparison model. In this approach, a system consists of a collection n independent heterogeneous processors (or units) interconnected via point-to-point communication links, and it is assumed that at most t of these processors are permanently faulty. For the purpose of diagnosis, system tasks are assigned to pairs of processors and the results are compared. The agreements and disagreements among units are the basis for identifying faulty processors. Such a system is said to be t-diagnosable if, given any complete collection of comparison results, the set of faulty processors can be unambiguously identified. We present an efficient fault identification method based on genetic algorithms. Analysis and simulations are provided, first, to evaluate the genetic parameters of the diagnosis algorithm; second, to show the efficiency of the genetic approach. The new strategy is shown to correctly identify the set of faulty processors, making it an attractive and viable addition or alternative to present fault diagnosis techniques. 相似文献
8.
The design of a network switch for synchronously clocked packet switching networks is presented. The switch includes the node interface and logic handling of the arbitration and routing for a large class of network topologies, namely n-dimensional rectangular grids including hypercubes and other highly efficient topologies. In the context of the SUPRENUM project the paper concentrates on two-dimensional meshes. Routing, arbitration, blocking, and fault tolerance issues are discussed. 相似文献
9.
Dakai ZhuAuthor Vitae Xuan QiAuthor VitaeDaniel MosséAuthor Vitae Rami MelhemAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(10):1411-1425
With the emergence of multicore processors, the research on multiprocessor real-time scheduling has caught more researchers’ attention recently. Although the topic has been studied for decades, it is still an evolving research field with many open problems. In this work, focusing on periodic real-time tasks with quantum-based computation requirements and implicit deadlines, we propose a novel optimal scheduling algorithm, namely boundary fair (Bfair), which can achieve full system utilization as the well-known Pfair scheduling algorithms. However, different from Pfair algorithms that make scheduling decisions and enforce proportional progress (i.e., fairness) for all tasks at each and every time unit, Bfair makes scheduling decisions and enforces fairness to tasks only at tasks’ period boundaries (i.e., deadlines of periodic tasks). The correctness of the Bfair algorithm to meet the deadlines of all tasks’ instances is formally proved and its performance is evaluated through extensive simulations. The results show that, compared to that of Pfair algorithms, Bfair can significantly reduce the number of scheduling points (by up to 94%) and the overhead of Bfair at each scheduling point is comparable to that of the most efficient Pfair algorithm (i.e., PD2). Moreover, by aggregating the time allocation of tasks for the time interval between consecutive period boundaries, the resulting Bfair schedule can dramatically reduce the number of required context switches and task migrations (as much as 82% and 85%, respectively) when compared to those of Pfair schedules, which in turn reduces the run-time overhead of the system. 相似文献
10.
Partitioning of processors on a multiprocessor system involves logically dividing the system into processor partitions. Programs can be executed in the different partitions in parallel. Optimally setting the partition size can significantly improve the throughput of multiprocessor systems. The speedup characteristics of parallel programs can be defined by execution signatures. The execution signature of a parallel program on a multiprocessor system is the rate at which the program executes in the absence of other programs and depends upon the number of allocated processors, the specific architecture, and the specific program implementation. Based on the execution signatures, this paper analyzes simple Markovian models of dynamic partitioning. From the analysis, when there are at most two multiprocessor partitions, the optimal dynamic partition size can be found which maximizes throughput. Compared against other partitioning schemes, the dynamic partitioning scheme is shown to be the best in terms of throughput when thereconfiguration overhead is low. If the reconfiguration overhead is high, dynamic partitioning is to be avoided. An expression for the reconfiguration overhead threshold is derived. A general iterative partitioning technique is presented. It is shown that the technique gives maximum throughput forn partions. 相似文献
11.
This paper generalizes the traditional dataflow model of computation and defines the essential problems in multiprocessing: control implementation, program partitioning, scheduling, synchronization, and memory access. The paper assumes that these essential problems are axes of a multiprocessor design space and that the solutions to these problems are values on the axes. Each point in the space represents a multiprocessor including a computational paradigm that a user must follow to achieve high performance and efficiency on the particular machine. Thus, a classification of machines from the user's point of view is introduced naturally. Five well-known multiprocessors are compared using this classification scheme. 相似文献
12.
Shen C. Ramamritham K. Stankovic J.A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(4):382-397
Most real-time scheduling algorithms schedule tasks with regard to their worst case computation times. Resources reclaiming refers to the problem of utilizing the resources left unused by a task when it executes in less than its worst case computation time, or when a task is deleted from the current schedule. Dynamic resource reclaiming algorithms that are effective, avoid any run time anomalies, and have bounded overhead costs that are independent of the number of tasks in the schedule are presented. Each task is assumed to have a worst case computation time, a deadline, and a set of resource requirements. The algorithms utilize the information given in a multiprocessor task schedule and perform online local optimization. The effectiveness of the algorithms is demonstrated through simulation studies 相似文献
13.
14.
An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems 总被引:2,自引:0,他引:2
Bansal S. Kumar P. Singh K. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):533-544
Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective, in general, for fine grain task graphs and for networks with high communication latencies. However, most of the available duplication algorithms are designed under the assumption of unbounded availability of fully connected processors, and lie in a high complexity range. Low complexity optimal duplication algorithms work under restricted cost and/or shape parameters for the task graphs. Further, the required number of processors grows in proportion to the task-graph size significantly. An improved duplication strategy is proposed that works for arbitrary task graphs, with a limited number of interconnection-constrained processors. Unlike most other algorithms that replicate all possible parents/ancestors of a given task, the proposed algorithm tends to avoid redundant duplications and duplicates the nodes selectively, only if it helps in improving the performance. This results in lower duplications and also lower time and space complexity. Simulation results are presented for clique and an interconnection-constrained network topology with random and regular benchmark task graph suites, representing a variety of parallel numerical applications. Performance, in terms of normalized schedule length and efficiency, is compared with some of the well-known and recently proposed algorithms. The suggested algorithm turns out to be most efficient, as it generates better or comparable schedules with remarkably less processor consumption. 相似文献
15.
《国际计算机数学杂志》2012,89(3):355-372
Hypercubes are viewed as good candidates for parallel processing, because a number of topologies, such as rings, trees, and meshes, can be mapped onto the hypercubes. In this paper, we study a system level diagnosis method for clustered faults in hypercube systems. We investigate the local and global performance of the method under the Bernoulli failur distribution. We demonstrate that the diagnosis scheme can identify almost all processors successfully even if the percentage of fault-free processors is low (much lower than 50%) while almost all processors are guaranteed to be correctly identified. 相似文献
16.
In this paper, we consider a class of modular multiprocessor architectures in which spares are added to each module to cover for faulty nodes within that module, thus forming a fault-tolerant basic block (FTBB). In contrast to reconfiguration techniques that preserve the physical adjacency between active nodes in the system, our goal is to preserve the logical adjacency between active nodes by means of a routing algorithm which delivers messages successfully to their destinations. We introduce two-phase routing strategies that route messages first to their destination FTBB, and then to the destination nodes within the destination FTBB. Such a strategy may be applied to a variety of architectures including binary hypercubes and three-dimensional tori. In the presence of f faults in hypercubes and tori, we show that the worst case length of the message route is min {σ+f, (K+1)σ}+c where σ is the shortest path in the absence of faults, K is the number of spare nodes in an FTBB, and c is a small constant. The average routing overhead is much lower than the worst case overhead 相似文献
17.
18.
On large-scale multiprocessors, access to common memory is one of the key performance limiting factors. The shared-memory performance depends not only on the characteristics of the memory hierarchy itself, but also upon the characteristics of the memory address streams and the interaction between the two. We present a technique for multiprocessor workload construction and a family of artificial kernels, called MAD-kernels, to systematically investigate the behavior of the memory hierarchy. The measured performance is independent of any particular application or algorithm. The proposed methodology is demonstrated on two commercial shared-memory systems 相似文献
19.
《Journal of Parallel and Distributed Computing》2005,65(5):595-608
The scheduling of real-time tasks with primary-backup-based fault-tolerant requirements has been an important problem for several years. Most of the known scheduling schemes are non-adaptive in nature meaning that they do not adapt to the dynamics of faults and task's parameters in the system. In this paper, we propose an adaptive fault-tolerant scheduling scheme that has a mechanism to control the overlap interval between the primary and backup versions of tasks such that the overall performance of the system is improved. The overlap interval is determined based on the observed fault rate and task's soft laxity. We also propose a new performance index, called SR index, that integrates schedulability (S) and reliability (R) into a single metric. To evaluate the proposed scheme, we have conducted analytical and simulation studies under different fault and deadline scenarios, and found that the proposed adaptive scheme adapts to system dynamics and offers better SR index than that of the non-adaptive schemes. 相似文献
20.
Ho-Fung Leung Hing-Fung Ting 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):538-543
In the literature, the problem of global termination detection in parallel systems is usually solved by message passing. In shared-memory systems, this problem can also be solved by using exclusively accessible variables with locking mechanisms. In this paper, we present an algorithm that solves the problem of global termination detection in shared-memory asynchronous multiprocessor systems without using locking. We assume a reasonable computation model in which concurrent reading does not require locking and concurrent writing different values without locking results in an arbitrary one of the values being actually written. For a system of n processors, the algorithm allocates a working space of 2n+1 bits. The worst case time complexity of the algorithm is n+2√+1, which we prove is the lower bound under a reasonable model of computation 相似文献