共查询到20条相似文献,搜索用时 0 毫秒
1.
Leonardo Dagum 《Concurrency and Computation》1992,4(3):241-255
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O(N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimum performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analysed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine. 相似文献
2.
A novel recursive aggregation algorithm for numerical simulations of dynamic systems is proposed and analyzed. The algorithm exploits a special structure of the linear equation problem resulting from the discretization of the dynamic system and an aggregation/disaggregation procedure. The algorithm has a time complexity of (I(q)+2M(q)+3)logN for solving linear systems with q states for N discrete time instants, using O(q3N) processors, where I(q) is the parallel time complexity for inverting a q×q matrix, M(q) is the parallel time complexity for matrix multiplication of two q×q matrices. The competing parallel cyclic reduction method for the same problem has a time complexity of (I(q)+3M(q)+4)logN. Thus, the proposed algorithm has a definite speed advantage over the cyclic reduction method. An approximation technique for the unknown boundary conditions in boundary value problems is also proposed. The algorithm was implemented to simulate some dynamical (stable and unstable) systems, and the numerical results show that the accumulation of roundoff errors is insignificant as compared to the discretization errors 相似文献
3.
A. Z. Melikov 《Cybernetics and Systems Analysis》1996,32(6):821-836
Conclusion Our survey naturally could not encompass all the results available in this promising field of the theory of multistream queueing
systems. We have attempted to systematize the main computation and optimization methods for MRQ models, highlighting the topical
issues of the theory and practice of MRQ. Even this partial survey of work on MRQ leads to the conclusion that the main sources
stimulating the development of MRQ theory are the requirements of modern teletraffic theory, and in particular problems that
arise in DIQN. Moreover, MRQ provide a generalization of classical multistream models, and their study is thus of independent
interest for probability experts. These conclusions are borne out by the publications in recent years, and there is every
reason to expect that the next decade will promote MRQ research into one of the central places in the theory of multistream
queueing systems.
I would like to thank Academician V. S. Korolyuk for formulating a number of research topics for multiresource queues and
Academician I. N. Korolenko for his support of this study.
Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 92–112, November–December, 1996. 相似文献
4.
5.
We introduce two data-structural transformations to construct double-ended priority queues from priority queues. To apply
our transformations the priority queues exploited must support the extraction of an unspecified element, in addition to the
standard priority-queue operations. With the first transformation we obtain a double-ended priority queue which guarantees
the worst-case cost of O(1) for find-min, find-max, insert, extract; and the worst-case cost of
O(lg n) with at most lg n + O(1) element comparisons for delete. With the second transformation we get a meldable double-ended priority queue which guarantees the worst-case cost of O(1) for find-min, find-max, insert, extract; the worst-case cost of O(lg n) with at most lg n + O(lg lg n) element comparisons for delete; and the worst-case cost of O(min {lg m, lg n}) for meld. Here, m and n denote the number of elements stored in the data structures prior to the operation in question.
The work of the authors was partially supported by the Danish Natural Science Research Council under contracts 21-02-0501
(project Practical data structures and algorithms) and 272-05-0272 (project Generic programming—algorithms and tools). A.
Elmasry was supported by Alexander von Humboldt fellowship. 相似文献
6.
Pascal Moyal 《Discrete Event Dynamic Systems》2017,27(3):573-584
By conducting a pathwise comparison of multiple server queues, we show in what sense it is better to have more servers in ‘First Come, First Served’ or equivalently, more queues in parallel under the policy ‘Join the Shortest Workload’. This comparison result is based on the recursive representation of Kiefer and Wolfowitz, and on a non-mass conservative generalization of the Schur-Convex semi-ordering, showing in what sense it is better to send jobs to the server of least load. We also show that this pathwise result does not hold true in general for the semi-cyclic policies introduced by Scheller-Wolf, and propose a weaker ordering that suits this larger class of models. 相似文献
7.
On parallel integer sorting 总被引:1,自引:0,他引:1
We present an optimal algorithm for sortingn integers in the range [1,n
c
] (for any constantc) for the EREW PRAM model where the word length isn
, for any >0. Using this algorithm, the best known upper bound for integer sorting on the (O(logn) word length) EREW PRAM model is improved. In addition, a novel parallel range reduction algorithm which results in a near optimal randomized integer sorting algorthm is presented. For the case when the keys are uniformly distributed integers in an arbitrary range, we give an algorithm whose expected running time is optimal.Supported by NSF-DCR-85-03251 and ONR contract N00014-87-K-0310 相似文献
8.
9.
Applying the first policy iteration (FPI) to any static dispatching (task assignment) policy yields a new improved dynamic policy that takes into account the particular cost structure and the expected future arrivals. However, it is generally hard to go beyond that due to the complex state space and the resulting difficulty in computing the value function for a dynamic policy. For example, applying FPI to identical FCFS servers with Bernoulli split gives the Least-Work-Left (LWL) policy, for which no closed-form value function is known. In fact, LWL with identical servers is equivalent to an M/G/k queue, the performance measures of which have remained as open problems. The situation gets even more complicated with heterogeneous servers. In this paper, we take an intermediate approach and consider lookahead actions that concern not only the current job but also the job arriving next, after which a basic (static) policy is assumed to take over. This is important as the benefits from some decisions can only be reaped with appropriate subsequent actions. The lookahead enables sound estimates also for marginal admission costs, e.g., with respect to LWL. The superior performance of the new near-optimal dispatching policies is demonstrated numerically. 相似文献
10.
A new approach to accelerating parallel sorting processes is introduced in this paper. This approach involves the design of a new type of memory chip with sorting functions. This type of sorting memory chip is feasible with today's VLSI techniques. A memory module organizing several sorting memory chips associated with additional ECL or TTL control logic circuits is also presented. Using the sorting memory modules in a shared memory parallel processor machine, parallel sorting algorithms such as the column sort method can reduce the row access time significantly and avoid data collisions in the interconnection network. Experimental simulation results on the practical speedup achieved and the memory utilization for the proposed approach are described. 相似文献
11.
With the popularity of parallel database machines based on the shared-nothing architecture, it has become important to find external sorting algorithms which lead to a load-balanced computation, i.e., balanced execution, communication and output. If during the course of the sorting algorithm each processor is equally loaded, parallelism is fully exploited. Similarly, balanced communication will not congest the network traffic. Since sorting can be used to support a number of other relational operations (joins, duplicate elimination, building indexes etc.) data skew produced by sorting can further lead to execution skew at later stages of these operations. In this paper we present a load-balanced parallel sorting algorithm for shared-nothing architectures. It is a multiple-input multiple-output algorithm with four stages, based on a generalization of Batcher's odd-even merge. At each stage then keys are evenly distributed among thep processors (i.e., there is no final sequential merge phase) and the distribution of keys between stages ensures against network congestion. There is no assumption made on the key distribution and the algorithm performs equally well in the presence of duplicate keys. Hence our approach always guarantees its performance, as long asn is greater thanp
3, which is the case of interest for sorting large relations. In addition, processors can be added incrementally.
Recommended by: Patrick Valduriez 相似文献
12.
V. Singh V. Kumar G. Agha C. Tomlinson 《International journal of parallel programming》1991,20(2):95-131
We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al.
(1) and Schnorret al.
(2) have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort. 相似文献
13.
J.S. Kaufman 《Performance Evaluation》1984,4(3):183-198
Queuing network models are commonly used to analyze the performance of computer system. Unfortunately, the class of queuing network models which can be exactly analyzed excludes CPU priority scheduling disciplines, conspicuously present in most computer systems.A popular approximation technique which we denote the reduced occupancy approximation, is often used to analyze such priority service disciplines because of its simplicity and intuitive appeal. However, despite its widespread use, questions about its accuracy and applicability have received very little attention. Further compounding this situation, is the existence of proprietary software packages which purport to analyze such priority disciplines, but which in fact exhibit behavior remarkably similar to the roa.In this paper we show where, and more importantly why, the roa fails. This understanding leads to a significantly improved approximation technique which sacrifices neither simplicity nor applicability. Although our primary focus is on a two class preemptive priority closed network structure, the basic idea is quite general and extensions to multiclass and nonpreemptive priority structures are indicated. 相似文献
14.
We generalize the well-known odd-even merge sorting algorithm, originally due to Batcher (1968), and show how this generalized algorithm can be applied to sorting on product networks. If G is an arbitrary factor graph with N nodes, its r-dimensional product contains Nr nodes. Our algorithm sorts Nr keys stored in the r-dimensional product of G in O(rrF(N)) time, where F(N) depends on G. We show that, for any factor graph G, F(N) is, at most, O(N), establishing an upper bound of O(r2 N) for the time complexity of sorting Nr keys on any product network. For product networks with bounded r(e.g. for grids), this leads to the asymptotic complexity of O(N) to sort Nr keys, which is optimal for several instances of product networks. There are factor graphs for which F(N)=O(log2 N), which leads to the asymptotic running time of O(log2 N) to sort Nr keys. For networks with bounded N (e.g. in the hypercube N=2, fixed), the asymptotic complexity becomes O(r2). We show how to apply the algorithm to several cases of well-known product networks, as well as others introduced recently. We compare the performance of our algorithm to well-known algorithms developed specifically for these networks, as well as others. The result of these comparisons led us to conjecture that the proposed algorithm is probably the best deterministic algorithm that can be found in terms of the low asymptotic complexity with a small constant 相似文献
15.
We consider a broker-based network of non-observable parallel queues and analyze the minimum expected response time and the optimal routing policy when the broker has the memory of its previous routing decisions. We provide lower bounds on the minimum response time by means of convex programming that are tight, as follows by a numerical comparison with a proposed routing scheme. The “Price of Forgetting” (PoF), the ratio between the minimum response times achieved by a probabilistic broker and a broker with memory, is shown to be unbounded or arbitrarily close to one depending on the coefficient of variation of the service time distributions. In the case of exponential service times, the PoF is bounded from above by two, which is tight in heavy-traffic, and independent of the network size and heterogeneity. These properties yield a simple engineering product-form approximating tightly the minimum response time. Finally, we put our results in the context of game theory revisiting the “Price of Anarchy” (PoA) of parallel queues: it can be decomposed into the product of the PoA achieved by a probabilistic broker (already well understood) and the PoF. 相似文献
16.
Ali 《Performance Evaluation》2005,60(1-4):327-343
We consider a queueing system with a number of identical exponential servers. Each server has its own queue with unlimited capacity. The service discipline in each queue is first-come-first-served (FCFS). Customers arrive according to a state-dependent Poisson process with an arrival rate which is a non-increasing function of the number of customers in the system. Upon arrival, a customer must join a server’s queue according to a stationary state-dependent policy, where the state is taken to be the number of customers in servers’ queues. No jockeying among queues is allowed. Each arriving customer is limited to a generally distributed patience time after which it must depart the system and is considered lost. Two models of customer behavior are considered: deadlines until the beginning of service and deadlines until the end of service. We seek an optimal policy to assign an arriving customer to a server’s queue. We show that, when the distribution of customer impatience satisfies certain property, the policy of joining shortest queue (SQ) stochastically minimizes the number of lost customers during any finite interval in the long run. This property is shown to always hold for the case of deterministic customer impatience. 相似文献
17.
18.
A sorting classification of parallel rendering 总被引:26,自引:0,他引:26
We describe a classification scheme that we believe provides a more structured framework for reasoning about parallel rendering. The scheme is based on where the sort from object coordinates to screen coordinates occurs, which we believe is fundamental whenever both geometry processing and rasterization are performed in parallel. This classification scheme supports the analysis of computational and communication costs, and encompasses the bulk of current and proposed highly parallel renderers - both hardware and software. We begin by reviewing the standard feed-forward rendering pipeline, showing how different ways of parallelizing it lead to three classes of rendering algorithms. Next, we consider each of these classes in detail, analyzing their aggregate processing and communication costs, possible variations, and constraints they may impose on rendering applications. Finally, we use these analyses to compare the classes and identify when each is likely to be preferable 相似文献
19.
20.
Das S.K. Pinotti M.C. Sarkar F. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(6):555-564
We efficiently map a priority queue on the hypercube architecture in a load balanced manner, with no additional communication overhead, and present optimal parallel algorithms for performing insert and deletemin operations. Two implementations for such operations are proposed on the single port hypercube model. In a b-bandwidth, n-item priority queue in which every node contains b items in sorted order, the first implementation achieves optimal speed up of O(min{log n, b log n/log b+log log n}) for inserting b presorted items or deleting b smallest items, where b=O(n1c/) with c>1. In particular, single insertion and deletion operations are cost optimal and require O(log n/p+log p) time using O(log n/log log n) processors. The second implementation is more scalable since it uses a larger number of processors, and attains a “nearly” optimal speedup on the single hypercube. Namely, the insertion of log n presorted items or the deletion of the log n smallest items is accomplished in O(log log n2) time using O(log2 n/log log n) processors. Finally, on the slightly more powerful pipelined hypercube model, the second implementation performs log n operations in O(log log n) time using O(log2 n/log log n) processors, thus achieving an optimal speed up. To the best of our knowledge, our algorithms are the first implementations of b-bandwidth distributed priority queues, which are load balanced and yet guarantee optimal speed ups 相似文献