首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
We study the problem of sorting binary sequences and permutations by length-weighted reversals. We consider a wide class of cost functions, namely f()=α for all α0, where is the length of the reversed subsequence. We present tight or nearly tight upper and lower bounds on the worst-case cost of sorting by reversals. Then we develop algorithms to approximate the optimal cost to sort a given input. Furthermore, we give polynomial-time algorithms to determine the optimal reversal sequence for a restricted but interesting class of sequences and cost functions. Our results have direct application in computational biology to the field of comparative genomics.  相似文献   

2.
In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies.  相似文献   

3.
In this paper,1 we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However, many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on the PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper, we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Next, we show analytically that this idea can reduce the number of parallel I/O’s required to sort the input close to the lower bound of . We verify experimentally our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on the PDM significantly.  相似文献   

4.
Latency measures the delay caused by communication between processors and memory modules over the network in a parallel system. Using intensive measurements and simulation, we show that network latency forms a major obstacle to improving parallel computing performance and scalability. We present an experimental metric, using network latency to measure and evaluate the scalability of parallel programs and architectures. This latency metric is an extension to the isoefficiency function [Grama et al., IEEE Parallel Distrib. Technology 1, 3 (1993), 12-21] and iso-speed metric [Sun and Rover, IEEE Trans. Parallel Distrib. Systems 5, 6 (1994), 599-613]. We give a measurement method for using this latency metric, and report the experimental results of evaluating the scalabilities of several scientific computing algorithms on the KSR-1 shared-memory architecture. Our analysis and experiments show that the latency metric is a practical method to effectively predict and evaluate scalability based on measured latencies inherent in the program and the architecture.  相似文献   

5.
Summary Lower bounds for sorting on mesh-connected arrays of processors are presented. For sorting N=n1 n 2...n r elements on an n 1×n2×... ×n r array 2(n 1+...+n r–1)+n r data interchange steps are needed asymptotically. For two dimensions these bounds are asymptotically best possible provided that n 1 and n 2 are powers of 2. In this case the generalized s 2-way merge sort of Thompson and Kung turns out to be asymptotically optimal. The minimal asymptotic bound of 2 2N interchange steps can be obtained only by sorting algorithms suitable for N/2×2N meshes. For r3 dimensions an analysis of aspect-ratios also demonstrates that there exist mesh-connected architectures which are better suited for sorting than simple r-dimensional cubes.This work was done at the Institut für Informatik und Praktische Mathematik, University of Kiel, Federal Republic of Germany  相似文献   

6.
In this paper we propose two new multilayer grid models for VLSI layout, both of which take into account the number of contact cuts used. For the first model in which nodes exist only on one layer, we prove a tight area × (number of contact cuts) = (n 2) tradeoff for embeddingn-node planar graphs of bounded degree in two layers. For the second model in which nodes exist simultaneously on all layers, we give a number of upper bounds on the area needed to embed groups using no contact cuts. We show that anyn-node graph of thickness 2 can be embedded on two layers inO(n 2) area. This bound is tight even if more layers and any number of contact cuts are allowed. We also show that planar graphs of bounded degree can be embedded on two layers inO(n 3/2(logn)2) area.Some of our embedding algorithms have the additional property that they can respect prespecified grid placements of the nodes of the graph to be embedded. We give an algorithm for embeddingn-node graphs of thicknessk ink layers usingO(n 3) area, using no contact cuts, and respecting prespecified node placements. This area is asymptotically optimal for placement-respecting algorithms, even if more layers are allowed, as long as a fixed fraction of the edges do not use contact cuts. Our results use a new result on embedding graphs in a single-layer grid, namely an embedding ofn-node planar graphs such that each edge makes at most four turns, and all nodes are embedded on the same line.The first author's research was partially supported by NSF Grant No. MCS 820-5167.  相似文献   

7.
A faster divide-and-conquer algorithm for constructing delaunay triangulations   总被引:15,自引:0,他引:15  
Rex A. Dwyer 《Algorithmica》1987,2(1):137-151
An easily implemented modification to the divide-and-conquer algorithm for computing the Delaunay triangulation ofn sites in the plane is presented. The change reduces its (n logn) expected running time toO(n log logn) for a large class of distributions that includes the uniform distribution in the unit square. Experimental evidence presented demonstrates that the modified algorithm performs very well forn216, the range of the experiments. It is conjectured that the average number of edges it creates—a good measure of its efficiency—is no more than twice optimal forn less than seven trillion. The improvement is shown to extend to the computation of the Delaunay triangulation in theL p metric for 1<p.This research was supported by National Science Foundation Grants DCR-8352081 and DCR-8416190.  相似文献   

8.
Various sorting algorithms using parallel architectures have been proposed in the search for more efficient results. This paper introduces the Multi-Sort Algorithm for Multi-Mesh of Trees (MMT) Architecture for N=n 4 elements with more efficient time complexity compared to previous architectures. The shear sort algorithm on Single Instruction Multiple Data (SIMD) mesh model requires \(4\sqrt{N}+O\sqrt{N}\) time for sorting N elements, arranged on a \(\sqrt{N}\times \sqrt{N}\) mesh, whereas Multi-Sort algorithm on the SIMD Multi-Mesh (MM) Architecture takes O(N 1/4) time for sorting the same N elements, which proves that Multi-Sort is a better sorting approach. We have improved the time complexity of intrablock Sort. The Communication time complexity for 2D Sort in MM is O(n), whereas this time in MMT is O(log?n). The time complexity of compare–exchange step in MMT is same as that in MM, i.e., O(n). It has been found that the time complexity of the Multi-Sort on MMT has been improved as on Multi-Mesh architecture.  相似文献   

9.
On parallel integer sorting   总被引:1,自引:0,他引:1  
We present an optimal algorithm for sortingn integers in the range [1,n c ] (for any constantc) for the EREW PRAM model where the word length isn , for any >0. Using this algorithm, the best known upper bound for integer sorting on the (O(logn) word length) EREW PRAM model is improved. In addition, a novel parallel range reduction algorithm which results in a near optimal randomized integer sorting algorthm is presented. For the case when the keys are uniformly distributed integers in an arbitrary range, we give an algorithm whose expected running time is optimal.Supported by NSF-DCR-85-03251 and ONR contract N00014-87-K-0310  相似文献   

10.
This paper presents the analysis of a parallel formulation of depth-first search. At the heart of this parallel formulation is a dynamic work-distribution scheme that divides the work between different processors. The effectiveness of the parallel formulation is strongly influenced by the work-distribution scheme and the target architecture. We introduce the concept of isoefficiency function to characterize the effectiveness of different architectures and work-distribution schemes. Many researchers considered the ring architecture to be quite suitable for parallel depth-first search. Our analytical and experimental results show that hypercube and shared-memory architectures are significantly better. The analysis of previously known work-distribution schemes motivated the design of substantially improved schemes for ring and shared-memory architectures. In particular, we present a work-distribution algorithm that guarantees close to optimal performance on a shared-memory/-network-with-message-combining architecture (e.g. RP3). Much of the analysis presented in this paper is applicable to other parallel algorithms in which work is dynamically shared between different processors (e.g., parallel divide-and-conquer algorithms). The concept of isoefficiency is useful in characterizing the scalability of a variety of parallel algorithms.This work was supported by Army Research Office Grant No. DAAG29-84-K-0060 to the Artificial Intelligence Laboratory, and Office of Naval Research Grant N00014-86-K-0763 to the Computer Science Department at the University of Texas at Austin.  相似文献   

11.
We present optimal embeddings of three genres of butterfly-like graphs in the (boolean) hypercube; each embedding is specified via a linear-time algorithm. Our first embedding finds an instance of the FFT graph as a subgraph of the smallest hypercube that is big enough to hold it; thus, we embed then-level FFT graph, which has (n+1)2 n vertices, in the (n+log2(n+1))-dimensional hypercube, with unit dilation. This embedding yields a mapping of the pipelined FFT algorithm on the hypercube architecture, which is optimal in all resources (time, processor utilization, load balancing, etc.) and which is on-line in the sense that inputs can be added to the transform even during the computation. Second, we find optimal embeddings of then-level butterfly graph and then-level cube-connected cycles graph, each of which hasn2 n vertices, in the (n+log2 n)-dimensional hypercube. These embeddings, too, have optimal dilation, congestion, and expansion. The dilation is 1+(n mod 2), which is best possible. Our embeddings indicate that these two bounded-degree approximations to the hypercube do not have any communication power that is not already present in the hypercube.The research of D. S. Greenberg was supported in part by NSF Grant MIP-86-01885. The research of L. S. Heath was supported in part by NSF Grant DCI-85-04308. The research of A. L. Rosenberg was supported in part by NSF Grants DCI-85-04308, DCI-87-96236, and CCR-88-12567.  相似文献   

12.
In this paper, we consider the linear interval tolerance problem, which consists of finding the largest interval vector included in ([A], [b]) = {x R n | A [A], b [b], Ax = b}. We describe two different polyhedrons that represent subsets of all possible interval vectors in ([A], [b]), and we provide a new definition of the optimality of an interval vector included in ([A], [b]). Finally, we show how the Simplex algorithm can be applied to find an optimal interval vector in ([A], [b]).  相似文献   

13.
Asorting network is a combinational circuit for sorting constructed from comparison-swap units. The depth of such a circuit is a measure of its running time. It is known that sorting-network verification is computationally intractable. However, it is reasonable to hypothesize that only the fastest (that is, the shallowest) networks are likely to be fabricated. It is shown that the verification of shallow sorting networks is also computationally intractable. Firstly, a method for constructing asymptotically optimalsingle-exception sorting networks is demonstrated. These are networks which sort all zero-one inputs except one. More specifically, their depth isD(n-1)+2log(n-1)+2, whereD(n) is the minimum depth of ann-input sorting network. It follows that the verification problem for sorting networks of depth 2D(n)+6logn+O(1) is Co-NP complete. Given the current state of knowledge aboutD(n) for largen, this indicates that the complexity of verification for shallow sorting networks is as great as for deep networks.This research was supported by NSF Grant CCR-8801659.  相似文献   

14.
The classUP [V] is the class of sets accepted by polynomial-time nondeterministic Turing machines which have at most one accepting path for every input. The complexity of this class closely relates to that of computing inverses ofone-way functions, where a one-way function is a one-to-one, length-increasing, and polynomial-time computable function whose inverse cannot be computed within polynomial time. It is known [GS], [K] that there exists a one-way function if and only ifP UP. In this paper the intractability of sets inUP is investigated in terms of polynomial-time reducibility to a sparse set. It is shown thatUP has a set that is m P -reducible to no sparse set ifP UP. We interpret this structural property in the relation with approximation algorithms: it is shown that ifP UP, thenUP has a set with no 1-APT approximation and, furthermore,UP has a set that is not m P -reducible to any set with a 1-APT approximation. The implication of this result in the study of one-way functions is also discussed. In order to prove the main theorem, we introduce a variation of tree-pruning methods.This paper was written while the author visited the Department of Mathematics, University of California, Santa Barbara. This research was supported in part by the National Science Foundation under Grant CCR-8611980.  相似文献   

15.
Fast deterministic selection on mesh-connected processor arrays   总被引:1,自引:0,他引:1  
We present a deterministic algorithm for selecting the element of rankk amongN=n 2 elements, 1kN, on ann×n mesh-connected processor array in 1.45n parallel computation steps, using constant-sized queues (for large enoughn). This is a considerable improvement over the best previous deterministic algorithm, which was based upon sorting and requires2n+o(n) steps. Our algorithm can be generalized to solve the problem of selection on higher-dimensional meshes. In particular, we present an algorithm for the three-dimensional mesh which achieves a time bound better than any of the previously known deterministic results.This research was done while all three authors were at the University of Rochester, New York.  相似文献   

16.
并行算法与并行机相结合的可扩展性   总被引:6,自引:1,他引:5  
可扩展性是设计并行算法和高性能并行机所要考虑的一个重要问题。文中首先分析了等效率和等速度两种可扩展性评价准则,指出其优缺点,然后在分析并行计算时间的基础上提出一种新的可扩展性评价准则(等并行开销计算比可扩展性评价准则),新准则可用来评价并行算法与并行机相结合的可扩展性。最后用该评价准则分析了两个并行算法与YH03高性能并行机相结合的可扩展性。  相似文献   

17.
Yang Cai  M. C. Kong 《Algorithmica》1996,15(6):572-599
In this paper we study the problem of scheduling a set of periodic tasks nonpreemptively in hard-real-time systems, where it is critical for all requests of the tasks to be processed in time. A taskT is characterized by itsarrival time A, itsperiod P, and itsexecution time C. Starting fromA, a new request ofT arrives in everyP units of time, requestingC units of processing time, and itsdeadline coincides with the arrival of the next request ofT. All requests must be processed nonpreemptively to meet their corresponding deadlines. We show that the problem of testing the feasibility of a given task set {T 1,T 2,,T n} satisfyingP 1+1=ki pi, wherek i; is an integer 1 for 1i n–1, on a single processor is NP-hard in the strong sense, even if all tasks have the same arrival time. For task sets satisfyingP i+1=K Pi, whereK is an integer 2 for 1 i n–1 and all tasks have the same arrival time, we present linear-time (in the number of requests) optimal scheduling algorithms as well as linear-time (in the number of tasks, i.e.,n) algorithms for testing feasibility in both uniprocessor and multiprocessor systems. We also extend our results to more general task sets.  相似文献   

18.
Sorting is one of a set of fundamental problems in computer science. In this paper we present the first wait-free algorithm for sorting an input array of size N using P ≤ N processors to achieve optimal running time. We show two variants of the algorithm, one deterministic and one randomized, and prove that, with high probability, the latter suffers no more than contention when run synchronously. Known sorting algorithms, when made wait-free through previously established transformation techniques, have complexity O(log 3 N) . The algorithm we present here, when run in the CRCW PRAM model, executes with high probability in O(log N) time when P=N , and O((Nlog N)/P) otherwise, which is optimal amongst comparison-based sorting algorithms. The wait-free property guarantees that the sort will complete despite any delays or failures incurred by the processors. This is a very desirable property from an operating systems point of view, since it allows oblivious thread scheduling as well as thread creation and deletion, without fear of losing the algorithm's correctness. Received May 15, 1998, and in revised form November 17, 1999. Online publication November 19, 2001.  相似文献   

19.
We study rigid motions of a rectangle amidst polygonal obstacles. The best known algorithms for this problem have a running time of (n 2), wheren is the number of obstacle corners. We introduce thetightness of a motion-planning problem as a measure of the difficulty of a planning problem in an intuitive sense and describe an algorithm with a running time ofO((a/b · 1/crit + 1)n(logn)2), wherea b are the lengths of the sides of a rectangle and crit is the tightness of the problem. We show further that the complexity (= number of vertices) of the boundary ofn bow ties (see Figure 1) isO(n). Similar results for the union of other simple geometric figures such as triangles and wedges are also presented.This work was supported partially by the DFG Schwerpunkt Datenstrukturen und Algorithmen, Grants Me 620/6 and Al 253/1, and by the ESPRIT II Basic Research Actions Program of the EC under Contract No. 3075 (project ALCOM).  相似文献   

20.
A new measure for the study of on-line algorithms   总被引:3,自引:0,他引:3  
An accepted measure for the performance of an on-line algorithm is the competitive ratio introduced by Sleator and Tarjan. This measure is well motivated and has led to the development of a mathematical theory for on-line algorithms.We investigate the behavior of this measure with respect to memory needs and benefits of lookahead and find some counterintuitive features. We present lower bounds on the size of memory devoted to recording the past. It is also observed that the competitive ratio reflects no improvement in the performance of an on-line algorithm due to any (finite) amount of lookahead.We offer an alternative measure that exhibits a different and, in some respects, more intuitive behavior. In particular, we demonstrate the use of our new measure by analyzing the tradeoff between the amortized cost of on-line algorithms for the paging problem and the amount of lookahead available to them. We also derive on-line algorithms for theK-server problem on any bounded metric space, which, relative to the new measure, are optimal among all on-line algorithms (up to a factor of 2) and are within a factor of 2K from the optimal off-line performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号