首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An efficient external sorting algorithm with minimal space requirement is presented in this article. The average number of passes over the data is approximately 1 +Ln(N + 1)/4B, whereN is the number of records in the file to be sorted, andB is the buffer size. The external storage requirement is only the file itself, no additional disk space is required. The internal storage requirement is four buffers: two for input, and two for output. The buffer size can be adjusted to the available memory space. A stack of size log2 N is also required.This work was partially supported by a fellowship and grant from Western Michigan University.  相似文献   

2.
External sorting of large files of records involves use of disk space to store temporary files, processing time for sorting, and transfer time between CPU, cache, memory, and disk. Compression can reduce disk and transfer costs, and, in the case of external sorts, cut merge costs by reducing the number of runs. It is therefore plausible that overall costs of external sorting could be reduced through use of compression. In this paper, we propose new compression techniques for data consisting of sets of records. The best of these techniques, based on building a trie of variable-length common strings, provides fast compression and decompression and allows random access to individual records. We show experimentally that our trie-based compression leads to significant reduction in sorting costs; that is, it is faster to compress the data, sort it, and then decompress it than to sort the uncompressed data. While the degree of compression is not quite as great as can be obtained with adaptive techniques such as Lempel-Ziv methods, these cannot be applied to sorting. Our experiments show that, in comparison to approaches such as Huffman coding of fixed-length substrings, our novel trie-based method is faster and provides greater size reductions. Preliminary versions of parts of this paper, not including the work on vargram compression” [41]  相似文献   

3.
We describe a new algorithm for the problem of perfect sorting a signed permutation by reversals. The worst-case time complexity of this algorithm is parameterized by the maximum prime degree d of the strong interval tree, i.e., f(d).nO(1). This improves the best known algorithm which complexity was based on a parameter always larger than or equal to d.  相似文献   

4.
In this paper,1 we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However, many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on the PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper, we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Next, we show analytically that this idea can reduce the number of parallel I/O’s required to sort the input close to the lower bound of . We verify experimentally our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on the PDM significantly.  相似文献   

5.
6.
We have achieved a strict lower time bound of n−1 for distributed sorting on a line network, where n is the number of processes. The lower time bound has traditionally been considered to be n because it is proved based on the number of disjoint comparison-exchange operations in parallel sorting on a linear array. Our result has overthrown the traditional common belief.  相似文献   

7.
Recently, flash memory has gained its popularity as storage on wide spectrum of computing devices such as cellular phones, digital cameras, digital audio players and PDAs. The integration of high-density flash memory has been accelerated twice every year for past few years. As flash memory’s capacity increases and its price drops, it is expected that flash memory will be more competitive with magnetic disk drives. Therefore, it is desirable to adapt disk-based algorithms to take advantage of the flash memory technology.In this paper, we propose a novel Flash-Aware external SorTing algorithm, FAST, that overcomes the limitation of larger writing cost for flash memory to improve both overall execution time and response time. In FAST, we reduce the write operations with additional read operations. We provide the analysis for both traditional and our flash-aware algorithms by comparing the detailed cost formulas. Experimental results with synthetic and real-life data sets show that FAST can result in faster execution time as well as smaller response time than traditional external sorting algorithms.  相似文献   

8.
A variant of the Ford-Johnson or merge insertion sorting algorithm that we called four Ford-Johnson (4FJ, for short) is presented and proved to execute exactly the same number of comparisons than the Ford-Johnson algorithm. The main advantage of our algorithm is that, instead of recursively working over lists of size the half of the input, as the Ford-Johnson algorithm does, 4FJ recursively works over lists of size the quarter of the input. This allows for implementations of data structures for coordinating the recursive calls of size only 33% of the ones needed for the Ford-Johnson algorithm.  相似文献   

9.
Sequence comparison leads to a combinatorial optimization problem of sorting permutations by reversals and transpositions.Namely,given any two permutations,find the shortest distance between them.This problem is related with genome rearrangement,genes are oriented in DNA sequences.The transpositions which have been studied in the liteature can be viewed as operations working on two consecutive segments of the genome.In this paper,a new kind of transposition which can work on two arbitrary segments of the genome is proposed,and the sorting of signed permutations by reversals and this new kind of transpostitions are studied.After establishing a lower bound on the number of operations needed,a 2-approximation algorithm is presented for this problem and an example is given to show that the performance ratio of the algorithm cannot be improved.  相似文献   

10.
11.
We introduce a probabilistic sequential algorithm for stable sorting n uniformly distributed keys in an arbitrary range. The algorithm runs in linear time and sorts all but a very small fraction of the input sequences; the best previously known bound was . An EREW PRAM extension of this sequential algorithm sorts in O((n/p+lgp)lgn/lg(n/p+lgn)) time using p?n processors under the same probabilistic conditions. For a CRCW PRAM we improve upon the probabilistic bound of obtained by Rajasekaran and Sen to derive a bound. Additionally, we present experimental results for the sequential algorithm that establish the practicality of our method.  相似文献   

12.
This paper reports the development of a sorting algorithm, called a ‘pocket sort.’ It is primarily directed to sorting of character data. The algorithm is strictly of order O(n); sorting time is directly proportional to the number of data elements to be sorted. Further, through the use of pointer - linked list data structures, no internal movement of the records containing the sort field is required. The algorithm has been implemented in Turbo Pascal. Data are presented comparing this pocket sort to other sorting techniques.  相似文献   

13.
In the last decades, there has been an explosion in the volume of data to be processed by data-intensive computing applications. As a result, processing I/O operations efficiently has become an important challenge. SSDs (solid state drives) are an effective solution that not only improves the I/O throughput but also reduces the amount of I/O transfer by adopting the concept of active SSDs. Active SSDs offload a part of the data-processing tasks usually performed in the host to the SSD. Offloading data-processing tasks removes extra data transfer and improves the overall data processing performance.In this work, we propose ActiveSort, a novel mechanism to improve the external sorting algorithm using the concept of active SSDs. External sorting is used extensively in the data-intensive computing frameworks such as Hadoop. By performing merge operations on-the-fly within the SSD, ActiveSort reduces the amount of I/O transfer and improves the performance of external sorting in Hadoop. Our evaluation results on a real SSD platform indicate that the Hadoop applications using ActiveSort outperform the original Hadoop by up to 36.1%. ActiveSort reduces the amount of write by up to 40.4%, thereby improving the lifetime of the SSD.  相似文献   

14.
Improving multikey Quicksort for sorting strings with many equal elements   总被引:1,自引:0,他引:1  
Bentley and Sedgewick proposed multikey Quicksort with ‘split-end’ partitioning for sorting strings. But it can be slow in case of many equal elements because it adopted ‘split-end’ partitioning that moves equal elements to the ends and swaps back to the middle. We present ‘collect-center’ partitioning to improve multikey Quicksort in that case. It moves equal elements to the middle directly like the ‘Dutch National Flag Problem’ partitioning approach and it uses two inner loops like Bentley and McIlroy's. In case of many equal elements such as DNA sequences, HTML files, and English texts, multikey Quicksort with ‘collect-center’ partitioning is faster than multikey Quicksort with ‘split-end’ partitioning.  相似文献   

15.
This paper presents an efficient parallel algorithm for sorting N data items on two-dimensional mesh connected-computers with multiple broadcasting (2-MCCMB). The algorithm uses N × N 2/3 processors and takes 0(N 1/3) time, whereas the previous algorithm by Chung-Horng Lung [3] uses N × N processors and takes 0(N l/2) time on 2-MCCMB.  相似文献   

16.
With the popularity of parallel database machines based on the shared-nothing architecture, it has become important to find external sorting algorithms which lead to a load-balanced computation, i.e., balanced execution, communication and output. If during the course of the sorting algorithm each processor is equally loaded, parallelism is fully exploited. Similarly, balanced communication will not congest the network traffic. Since sorting can be used to support a number of other relational operations (joins, duplicate elimination, building indexes etc.) data skew produced by sorting can further lead to execution skew at later stages of these operations. In this paper we present a load-balanced parallel sorting algorithm for shared-nothing architectures. It is a multiple-input multiple-output algorithm with four stages, based on a generalization of Batcher's odd-even merge. At each stage then keys are evenly distributed among thep processors (i.e., there is no final sequential merge phase) and the distribution of keys between stages ensures against network congestion. There is no assumption made on the key distribution and the algorithm performs equally well in the presence of duplicate keys. Hence our approach always guarantees its performance, as long asn is greater thanp 3, which is the case of interest for sorting large relations. In addition, processors can be added incrementally. Recommended by: Patrick Valduriez  相似文献   

17.
In this study, a Tchebycheff utility function based approach is proposed for multiple criteria sorting problems in order to classify alternatives into ordered categories, such as A, B, C, etc. Since the Tchebycheff function has the ability to reach efficient alternatives located even in the non-convex part of the efficient frontier, it is used in the proposed sorting approach to prevent such alternatives being disadvantages. If the preferences of the DM are not exactly known, each alternative selects its own favorable weights for a weighted Tchebycheff distance function. Then, each alternative is compared with the reference alternatives of a class to compute its strength over them. The average strengths are later used to categorize the alternatives. The experimental analysis results on the performance of the algorithm are presented.  相似文献   

18.
Dalia Motzkin 《Software》1981,11(6):607-611
A sorting algorithm, called Stable Quicksort, is presented. the algorithm is comparable in speed with the Quicksort algorithm, but is stable. The experimental evidence presented support the theoretical evaluation of the performance of Stable Quicksort.  相似文献   

19.
A unified vector sorting algorithm(VSA) is proposed,which sorts N arbitrary numbers with c log2 N-bits on an SIMD multi-processor system (SMMP) with p=N^1 ε/u processors and a composite interconnected network in T=c/ε(4 log2 N-2 log2 u 10u) time,where c is an arbitrary positive constant.When ε is an arbitrary small positive constant and u=log2 N,it is an O(log N) algorithm and p=N^1 ε/log2 N;when ε=1/log N and u=2 log2 N,it is an optimal algorithm (p=N/log2 N,T=O(log^2 N),pT=O(N log N));where u=1,c=1 and ε=0.5 (a constant).  相似文献   

20.
The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog2 n), comparable to that for specialized sorting networks and an improvement on theO(n 1.5) complexity of conventional mesh-connected array processors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号