期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An adaptive algorithm for resolving processor thrashing in load distribution

Chin Lu Sau-Ming Lau 《Concurrency and Computation》1995,7(7):653-670

Processor thrashing in load distribution refers to the situation when a number of nodes try to negotiate with the same target node simultaneously. The performance of dynamic load-balancing algorithms can be degraded because processor thrashing can lead to receiver node overdrafting, thus causing congestion at a receiver node and reduction of workload distribution. In the paper we present an adaptive algorithm for resolving processor thrashing in load distribution. The algorithm is based on the integration of three components: (1) a batch task assignment policy, which allows a number of tasks to be transferred as a single batch from a sender to a receiver; (2) a negotiation protocol to obtain mutual agreement between a sender and a receiver on the batch size; and (3) an adaptive symmetrically-initiated location policy to select a potential transfer partner. Simulations reveal that our algorithm provides a significant performance improvement at high system loads because the algorithm can avoid processor thrashing so that CPU capacity is more fully utilized. 相似文献

2.

An experimental study of relative throughput in a multiprocessor computer system

K. Sreenivasan G. A. Nelson J. A. Maksin 《Software》1980,10(12):973-986

This paper describes the results of controlled experiments with a Honeywell H6000 series multiprocessor computer system, 212 experiments were performed using 14 test workloads on 8 H6000 configurations. The number of processors varied from 1 to 4, system controller units (SCUs) from 1 to 4, and main memory from 256K to 1024K words. The ratio (P) of I/O time and CPU time for the test workloads varied from 0.01 to 5.07. The improvement in throughput is expressed in terms of relative throughput (φ), defined as the ratio of elapsed time for a given test workload on a single-processor configuration to that on a multiprocessor configuration. The relative throughput increased monotonically with an increase in the number of processors for test workloads in the range 0<P<0.4 (CPU-bound) and φ exhibited an asymptotic behaviour for test workloads in the range 0.4<P<5.07 (I/O-bound). 相似文献

3.

Computation scheduling in multiprocessor real-time automatic control systems with constrained processor memory

D. S. Guz M. G. Furugyan 《Automation and Remote Control》2005,66(2):295-303

For the automatic control systems with tight real time, construction of feasible schedules under the given job execution deadlines was considered, and in addition the constraints on processor memory were taken into consideration. Two methods were developed to solve this problem. The first method is based on reducing the original problem to the search of a multi-commodity flow in a special network. The second method offers a fast algorithm to determine a feasible schedule for the uniprocessor case.Translated from Avtomatika i Telemekhanika, No. 2, 2005, pp. 138–147.Original Russian Text Copyright © 2005 by Guz, Furugyan. 相似文献

4.

The AMD Opteron processor for multiprocessor servers 总被引：1，自引：0，他引：1

Keltcher C.N. McGrath K.J. Ahmed A. Conway P. 《Micro, IEEE》2003,23(2):66-76

Representing AMD's entry into 64-bit computing, Opteron combines the backwards compatibility of the X86-64 architecture with a DDR memory controller and hypertransport links to deliver server-class performance. These features also make Opteron a flexible, modular, and easily connectable component for various multiprocessor configurations. 相似文献

5.

An experimental study of sidewall adhesion in microelectromechanical systems 总被引：1，自引：0，他引：1

Timpe S.J. Komvopoulos K. 《Journal of microelectromechanical systems》2005,14(6):1356-1363

A surface micromachine was designed specifically for studying sidewall adhesion in microelectromechanical systems (MEMS). The dependence of surface adhesion on contact load and ambient conditions was investigated under quasistatic normal loading conditions. Insight was obtained into the relative contributions of van der Waals and capillary forces to the measured adhesion force. Several shortcomings in previous adhesion studies of MEMS were overcome, and measurement of the true adhesion force was achieved under different testing conditions. The present experimental procedure enables the isolation of the van der Waals component of the adhesion force and the determination of the contributions of both contacting and noncontacting asperities to the total adhesion force at the inception of surface separation. The major benefits of the developed experimental methodology and surface micromachine are discussed in the context of adhesion results obtained for different values of apparent contact pressure, ambient pressure, and relative humidity. 相似文献

6.

An evolutionary algorithm for generalized comparison-based self-diagnosis of multiprocessor systems

Mourad Elhadefand Béchir Ayeb 《Applied Artificial Intelligence》2013,27(1):73-95

In this article, we consider the problem of self-diagnosis of multiprocessor and multicomputer systems under the generalized comparison model. In this approach, a system consists of a collection n independent heterogeneous processors (or units) interconnected via point-to-point communication links, and it is assumed that at most t of these processors are permanently faulty. For the purpose of diagnosis, system tasks are assigned to pairs of processors and the results are compared. The agreements and disagreements among units are the basis for identifying faulty processors. Such a system is said to be t-diagnosable if, given any complete collection of comparison results, the set of faulty processors can be unambiguously identified. We present an efficient fault identification method based on genetic algorithms. Analysis and simulations are provided, first, to evaluate the genetic parameters of the diagnosis algorithm; second, to show the efficiency of the genetic approach. The new strategy is shown to correctly identify the set of faulty processors, making it an attractive and viable addition or alternative to present fault diagnosis techniques. 相似文献

7.

An optimal boundary fair scheduling algorithm for multiprocessor real-time systems 总被引：2，自引：0，他引：2

Dakai ZhuAuthor Vitae Xuan QiAuthor VitaeDaniel MosséAuthor Vitae Rami MelhemAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(10):1411-1425

With the emergence of multicore processors, the research on multiprocessor real-time scheduling has caught more researchers’ attention recently. Although the topic has been studied for decades, it is still an evolving research field with many open problems. In this work, focusing on periodic real-time tasks with quantum-based computation requirements and implicit deadlines, we propose a novel optimal scheduling algorithm, namely boundary fair (Bfair), which can achieve full system utilization as the well-known Pfair scheduling algorithms. However, different from Pfair algorithms that make scheduling decisions and enforce proportional progress (i.e., fairness) for all tasks at each and every time unit, Bfair makes scheduling decisions and enforces fairness to tasks only at tasks’ period boundaries (i.e., deadlines of periodic tasks). The correctness of the Bfair algorithm to meet the deadlines of all tasks’ instances is formally proved and its performance is evaluated through extensive simulations. The results show that, compared to that of Pfair algorithms, Bfair can significantly reduce the number of scheduling points (by up to 94%) and the overhead of Bfair at each scheduling point is comparable to that of the most efficient Pfair algorithm (i.e., PD²). Moreover, by aggregating the time allocation of tasks for the time interval between consecutive period boundaries, the resulting Bfair schedule can dramatically reduce the number of required context switches and task migrations (as much as 82% and 85%, respectively) when compared to those of Pfair schedules, which in turn reduces the run-time overhead of the system. 相似文献

8.

Comparison of five multiprocessor systems

Daniel D Gajski Jih-Kwon Peir 《Parallel Computing》1985,2(3):265-282

This paper generalizes the traditional dataflow model of computation and defines the essential problems in multiprocessing: control implementation, program partitioning, scheduling, synchronization, and memory access. The paper assumes that these essential problems are axes of a multiprocessor design space and that the solutions to these problems are values on the axes. Each point in the space represents a multiprocessor including a computational paradigm that a user must follow to achieve high performance and efficiency on the particular machine. Thus, a classification of machines from the user's point of view is introduced naturally. Five well-known multiprocessors are compared using this classification scheme. 相似文献

9.

Dynamic partitioning of multiprocessor systems

Kee-Hyun Park Lawrence W. Dowdy 《International journal of parallel programming》1989,18(2):91-120

Partitioning of processors on a multiprocessor system involves logically dividing the system into processor partitions. Programs can be executed in the different partitions in parallel. Optimally setting the partition size can significantly improve the throughput of multiprocessor systems. The speedup characteristics of parallel programs can be defined by execution signatures. The execution signature of a parallel program on a multiprocessor system is the rate at which the program executes in the absence of other programs and depends upon the number of allocated processors, the specific architecture, and the specific program implementation. Based on the execution signatures, this paper analyzes simple Markovian models of dynamic partitioning. From the analysis, when there are at most two multiprocessor partitions, the optimal dynamic partition size can be found which maximizes throughput. Compared against other partitioning schemes, the dynamic partitioning scheme is shown to be the best in terms of throughput when thereconfiguration overhead is low. If the reconfiguration overhead is high, dynamic partitioning is to be avoided. An expression for the reconfiguration overhead threshold is derived. A general iterative partitioning technique is presented. It is shown that the technique gives maximum throughput forn partions. 相似文献

10.

Resource reclaiming in multiprocessor real-time systems

Shen C. Ramamritham K. Stankovic J.A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(4):382-397

Most real-time scheduling algorithms schedule tasks with regard to their worst case computation times. Resources reclaiming refers to the problem of utilizing the resources left unused by a task when it executes in less than its worst case computation time, or when a task is deleted from the current schedule. Dynamic resource reclaiming algorithms that are effective, avoid any run time anomalies, and have bounded overhead costs that are independent of the number of tasks in the schedule are presented. Each task is assumed to have a worst case computation time, a deadline, and a set of resource requirements. The algorithms utilize the information given in a multiprocessor task schedule and perform online local optimization. The effectiveness of the algorithms is demonstrated through simulation studies 相似文献

11.

Dynamic program paralleling in multiprocessor systems

A. B. Godlevskii S. P. Krat 《Cybernetics and Systems Analysis》1984,20(3):448-452

相似文献

12.

Routing in modular fault-tolerant multiprocessor systems

Alam M.S. Melhem R.G. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(11):1206-1220

In this paper, we consider a class of modular multiprocessor architectures in which spares are added to each module to cover for faulty nodes within that module, thus forming a fault-tolerant basic block (FTBB). In contrast to reconfiguration techniques that preserve the physical adjacency between active nodes in the system, our goal is to preserve the logical adjacency between active nodes by means of a routing algorithm which delivers messages successfully to their destinations. We introduce two-phase routing strategies that route messages first to their destination FTBB, and then to the destination nodes within the destination FTBB. Such a strategy may be applied to a variety of architectures including binary hypercubes and three-dimensional tori. In the presence of f faults in hypercubes and tori, we show that the worst case length of the message route is min {σ+f, (K+1)σ}+c where σ is the shortest path in the absence of faults, K is the number of spare nodes in an FTBB, and c is a small constant. The average routing overhead is much lower than the worst case overhead 相似文献

13.

Fault diagnosis in hypercube multiprocessor systems

《国际计算机数学杂志》2012,89(3):355-372

Hypercubes are viewed as good candidates for parallel processing, because a number of topologies, such as rings, trees, and meshes, can be mapped onto the hypercubes. In this paper, we study a system level diagnosis method for clustered faults in hypercube systems. We investigate the local and global performance of the method under the Bernoulli failur distribution. We demonstrate that the diagnosis scheme can identify almost all processors successfully even if the percentage of fault-free processors is low (much lower than 50%) while almost all processors are guaranteed to be correctly identified. 相似文献

14.

Organizing computations in multiprocessor computer systems

V. S. Mikhalevich Yu. V. Kapitonova A. A. Letichevskii I. N. Molchanov S. B. Pogrebinskii 《Cybernetics and Systems Analysis》1984,20(3):305-314

相似文献

15.

An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems 总被引：2，自引：0，他引：2

Bansal S. Kumar P. Singh K. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):533-544

Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective, in general, for fine grain task graphs and for networks with high communication latencies. However, most of the available duplication algorithms are designed under the assumption of unbounded availability of fully connected processors, and lie in a high complexity range. Low complexity optimal duplication algorithms work under restricted cost and/or shape parameters for the task graphs. Further, the required number of processors grows in proportion to the task-graph size significantly. An improved duplication strategy is proposed that works for arbitrary task graphs, with a limited number of interconnection-constrained processors. Unlike most other algorithms that replicate all possible parents/ancestors of a given task, the proposed algorithm tends to avoid redundant duplications and duplicates the nodes selectively, only if it helps in improving the performance. This results in lower duplications and also lower time and space complexity. Simulation results are presented for clique and an interconnection-constrained network topology with random and regular benchmark task graph suites, representing a variety of parallel numerical applications. Performance, in terms of normalized schedule length and efficiency, is compared with some of the well-known and recently proposed algorithms. The suggested algorithm turns out to be most efficient, as it generates better or comparable schedules with remarkably less processor consumption. 相似文献

16.

An optimal algorithm for global termination detection inshared-memory asynchronous multiprocessor systems

Ho-Fung Leung Hing-Fung Ting 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):538-543

In the literature, the problem of global termination detection in parallel systems is usually solved by message passing. In shared-memory systems, this problem can also be solved by using exclusively accessible variables with locking mechanisms. In this paper, we present an algorithm that solves the problem of global termination detection in shared-memory asynchronous multiprocessor systems without using locking. We assume a reasonable computation model in which concurrent reading does not require locking and concurrent writing different values without locking results in an arbitrary one of the values being actually written. For a system of n processors, the algorithm allocates a working space of 2n+1 bits. The worst case time complexity of the algorithm is n+2√+1, which we prove is the lower bound under a reasonable model of computation 相似文献

17.

MAD kernels: an experimental testbed to study multiprocessor memorysystem behavior

Nanda A.K. Ni L.M. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(2):207-217

On large-scale multiprocessors, access to common memory is one of the key performance limiting factors. The shared-memory performance depends not only on the characteristics of the memory hierarchy itself, but also upon the characteristics of the memory address streams and the interaction between the two. We present a technique for multiprocessor workload construction and a family of artificial kernels, called MAD-kernels, to systematically investigate the behavior of the memory hierarchy. The measured performance is independent of any particular application or algorithm. The proposed methodology is demonstrated on two commercial shared-memory systems 相似文献

18.

Simulation of hierarchical multiprocessor database systems

P. S. Kostenetskii L. B. Sokolinsky 《Programming and Computer Software》2013,39(1):10-24

The paper is dedicated to issues concerning simulation and analysis of hierarchical multiprocessor systems oriented to database applications. Requirements for a parallel database system model are given. A survey and comparative analysis of known parallel database system models are presented. A new multiprocessor database system model is introduced. This model allows us to simulate and evaluate arbitrary hierarchical multiprocessor configurations in the context of the OLTP class database applications. Examples of using the database multiprocessor model for simulation study of multiprocessor database systems are presented. 相似文献

19.

Design of distributed multiprocessor operating systems

R. Baumann Prof. Dr. Thierry Lalive d'Epinay Dr. Gerhard Schrott 《Computers in Industry》1979,1(2):117-121

相似文献

20.

Queuing simulation model for multiprocessor systems

Thin-Fong Tsuei Yamamoto W. 《Computer》2003,36(2):58-64

The processor queuing model provides memory-hierarchy and system-design evaluation of memory-intensive commercial online-transaction-processing workloads on large multiprocessor systems. It differs from detailed cycle-accurate and direct-execution simulations in that it does not simulate instruction execution. Instead, as in analytical models, the authors build processor and workload characteristics that are easy to collect and estimate. Because the authors believe that the processor model's function is to accurately generate memory traffic to the rest of the system, they model a minimal set of processor and workload characteristics that captures the important interactions between a complex processor and the system-memory hierarchy. 相似文献