首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We study the performance of various run placement policies on disks for the merge phase of concurrent mergesorts using parallel prefetching. The initial sorted runs (input) of a merge and its final sorted run (output) are stored on multiple disks but each run resides only on a single disk. In this paper, we examine through detailed simulations three different run placement policies and the impact of buffer thrashing. The results show that, with buffer thrashing avoidance, the best performance can be achieved by a run placement policy that uses a proper subset of the disks dedicated for writing the output runs while the rest of the disks are used for prefetching the input runs in parallel. However, the proper number of write disks is workload dependent, and if not carefully chosen, it can adversely affect the system performance. In practice, a reasonably good performance can be achieved by a run placement policy that does not place the output run of a merge on any of the disks that store its own input runs but allows the output run to share the same disk with some of the input runs of other merges.  相似文献   

2.
External sorting: run formation revisited   总被引:1,自引:0,他引:1  
External mergesort begins with a run formation phase creating the initial sorted runs. Run formation can be done by a load-sort-store algorithm or by replacement selection. A load-sort-store algorithm repeatedly fills available memory with input records, sorts them, and writes the result to a run file. Replacement selection produces longer runs than load-sort-store algorithms and completely overlaps sorting and I/O, but it has poor locality of reference resulting in frequent cache misses and the classical algorithm works only for fixed-length records. This paper introduces batched replacement selection: a cache-conscious version of replacement selection that works also for variable-length records. The new algorithm resembles AlphaSort in the sense that it creates small in-memory runs and merges them to form the output runs. Its performance is experimentally compared with three other run formation algorithms: classical replacement selection, Quicksort, and AlphaSort. The experiments show that batched replacement selection is considerably faster than classic replacement selection. For small records (average 100 bytes), CPU time was reduced by about 50 percent and elapsed time by 47-63 percent. It was also consistently faster than Quicksort, but it did not always outperform AlphaSort. Replacement selection produces fewer runs than Quicksort and AlphaSort. The experiments confirmed that this reduces the merge time whereas the effect on the overall sort time depends on the number of disks available.  相似文献   

3.
External mergesort is normally implemented so that each run is stored continuously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting  相似文献   

4.
We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences ofn bits each can be merged inO(log logn) time. More generally, we describe an algorithm to merge two sorted sequences ofn integers drawn from the set {0, ...,m?1} inO(log logn+log min{n, m}) time with an optimal time-processor product. No sublogarithmic-time merging algorithm for this model of computation was previously known. On the other hand, we show a lower bound of Ω(log min{n, m}) on the time needed to merge two sorted sequences of lengthn each with elements drawn from the set {0, ...,m?1}, implying that our merging algorithm is as fast as possible form=(logn)Ω(1). If we impose an additional stability condition requiring the elements of each input sequence to appear in the same order in the output sequence, the time complexity of the problem becomes Θ(logn), even form=2. Stable merging is thus harder than nonstable merging.  相似文献   

5.
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using commodity processors, memory, and arrays of SCSI disks, AlphaSort runs the industrystandard sort benchmark in seven seconds. This beats the best published record on a 32-CPU 32-disk Hypercube by 8:1. On another benchmark, AlphaSort sorted more than a gigabyte in one minute. AlphaSort is a cache-sensitive, memoryintensive sort algorithm. We argue that modern architectures require algorithm designers to re-examine their use of the memory hierarchy. AlphaSort uses clustered data structures to get good cache locality, file striping to get high disk bandwidth, QuickSort to generate runs, and replacement-selection to merge the runs. It uses shared memory multiprocessors to break the sort into subsort chores. Because startup times are becoming a significant part of the total time, we propose two new benchmarks: (1) MinuteSort: how much can you sort in one minute, and (2) PennySort: how much can you sort for one penny.  相似文献   

6.
An efficient parallel algorithm for merging two sorted lists is presented. The algorithm is based on a novel partitioning algorithm that splits the two lists among the processors, in a way that ensures load balance during the merge. The partitioning algorithm can itself be efficiently parallelized, allowing the solution to scale with increased numbers of processors. A shared memory multiprocessor is assumed. The time complexity for partitioning and merging is O(N/p + log N), where p is the number of processors and N is the total number of elements in the two lists. Implementation results on a twenty node Sequent Symmetry multiprocessor are also presented.  相似文献   

7.
Summary Using the techniques of specification and transformation by parts, algorithms are derived for the longest upsequence problem. First Dijkstra's algorithm and then two new modified merge algorithms are derived and presented in detail. The merge algorithms take advantage of natural runs in the input sequence and have a worst caseO(n logn) time complexity when appropriate merging techniques are used, but can be linear if long runs are present in the sequence. The first merge algorithm is logically equivalent to Dijkstra's algorithm; the second algorithm is based on the first one but uses a different merging technique. Expository remarks describe related results which evolved out of our work in programming by transformation; in particular, parallels are drawn between algorithms for the longest upsequence problem and algorithms for sorting.Work on this paper has been supported in part by: ONR Grant N00014-75-C-0571; NSF Grant MCS-80-04349; USDOE Contract EY-76-C-02-3077  相似文献   

8.
《Performance Evaluation》1986,6(2):135-145
To compare the performance of external sorting and internal sorting in virtual memory, (external) mergesort and (internal) quicksort are performed in corresponding environments. Quicksort is run using a virtual memory with fetch on demand and working set replacement policy. Mergesort uses sublists presorted by replacement selection, double buffering for input and output, and two disks. The performance is measured by main memory space allocation, execution time, and main memory space-time integral.To perform well, quicksort is satisfied with a smaller space allocation than mergesort, and it behaves more consistently with respect to space allocation. Mergesort needs far less execution time than quicksort, mainly because of its efficient overlapping of file handling and processor time. With respect to the space-time integral, quicksort outperforms mergesort only when small files (less than 1000 records) are sorted. With large files, mergesort is better, and the relative difference increases with increasing file size. The optimal page size and window size for quicksort are smaller than those typical to existing virtual memory systems, and they tend to decrease with increasing file size.  相似文献   

9.
External sorting—the process of sorting a file that is too large to fit into the computer's internal memory and must be stored externally on disks—is a fundamental subroutine in database systems[G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM ) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K,Section 5.4.9] recently identified SRM (which he calls ``randomized striping') as the method of choice for sorting with parallel disks. In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM's lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is based on synchronous parallel I/O primitives provided by the TPIE programming environment[TPI]; whenever our program issues an I/O read (write) operation, one block of data is synchronously read from (written to) each disk in parallel. We compare the performance of SRM over a wide range of input sizes with that of disk-striped mergesort (DSM ), which is widely used in practice. DSM consists of a standard mergesort in conjunction with striped I/O for parallel disk access. SRM merges together significantly more runs at a time compared with DSM, and thus it requires fewer merge passes. We demonstrate in practical scenarios that even though the streaming speeds for merging with DSM are a little higher than those for SRM (since DSM merges fewer runs at a time), sorting using SRM is often significantly faster than with DSM (since SRM requires fewer passes). The techniques in this paper can be generalized to meet the load-balancing requirements of other applications using parallel disks, including distribution sort and multiway partitioning of a file into several other files. Since both parallel disk merging and multimedia processing deal with streams that get ``consumed' at nonuniform and partially predictable rates, our techniques for lookahead based upon forecasting data may have relevance in video server applications. Received June 28, 2000, and in revised form June 5, 2001. Online publication April 8, 2002.  相似文献   

10.
Replacement selection is the most popular algorithm used in the creation of initial runs for a sort/merge external sort. In 1972, Frazer and Wong suggested a variation, called natural selection, which uses an auxiliary memory reservoir to increase the performance of replacement selection. Natural selection produces longer runs than replacement selection if the auxiliary memory reservoir is sufficiently large, but it behaves very strangely when the size of the auxiliary memory is small: while using more memory resources than replacement selection, it creates shorter runs, thus being less efficient.As it turns out, this deficiency can be avoided at low cost. This note presents a variation of natural selection that is efficient when the auxiliary memory is small.  相似文献   

11.
A systolic algorithm is described for generating all permutations of n elements in lexicographic order. The algorithm is designed to be executed on a linear array of n processors, each having constant size memory, and each being responsible for producing one element of a given permutation. There is a constant delay per permutation, leading to an O(n!) time solution. This is an improvement over the best previously known techniques in two respects: the algorithm runs on the (arguably) weakest model of parallel computation, and is cost optimal (assuming the time to output the permutations is counted). The algorithm is extended to run adaptively, i.e., when the number of available processors is other than n.  相似文献   

12.
The proliferation of ontologies and taxonomies in many domains increasingly demands the integration of multiple such ontologies. We propose a new taxonomy merging algorithm called Atom that, given as input two taxonomies and a match mapping between them, can generate an integrated taxonomy in a largely automatic manner. The approach is target-driven, i.e. we merge a source taxonomy into the target taxonomy and preserve the target ontology as much as possible. In contrast to previous approaches, Atom does not aim at fully preserving all input concepts and relationships but strives to reduce the semantic heterogeneity of the merge results for improved understandability. Atom can also exploit advanced match mappings containing is-a relationships in addition to equivalence relationships between concepts of the input taxonomies. We evaluate Atom for synthetic and real-world scenarios and compare it with a full merge solution.  相似文献   

13.
Our purpose is to study the optimal way to merge n initially sorted runs, stored on a disk like device, into a unique sorted file. This problem is equivalent to finding a tree with n leaves which minimizes a certain cost function (see Knuth [1]).We shall study some properties of those optimal trees, in the hope of finding efficient ways for constructing them.In particular, if all the initial runs have the same length, an algorithm for constructing the best merge pattern is described ; its running time is proportional to n 2 and it requires space proportional to n.A special case is also analyzed in which the problem is solved in time and space proportional to n, and which provides some insight into the asymptotic behaviour of optimal merge trees.  相似文献   

14.
For 2⩽k⩽n, the k-merge problem is to merge a collection of ksorted sequences of total length n into a new sorted sequence. The k-merge problem is fundamental as it provides a common generalization of both merging and sorting. The main contribution of this work is to give simple and intuitive work-time optimal algorithms for the k-merge problem on three PRAM models, thus settling the status of the k-merge problem. We first prove that Ω(n log k) work is required to solve the k-merge problem on the PRAM models. We then show that the EREW-PRAM and both the CREW-PRAM and the CRCW require Ω(log n) time and Ω(log log n+log k) time, respectively, provided that the amount of work is bounded by O(n log k). Our first k-merge algorithm runs in Θ(log n) time and performs Θ(n log k) work on the EREW-PRAM. Finally, we design a work-time optimal CREW-PRAM k-merge algorithm that runs in Θ(log log n+log k) time and performs Θ(n log k) work. This latter algorithm is also work-time optimal on the CREW-PRAM model. Our algorithms completely settle the status of the k-merge problem on the three main PRAM models  相似文献   

15.
This paper presents a novel parallel merge element (PME), which permits not only more than one key of both input runs compared concurrently, but also all the confirmed keys to be retrieved at one time, which enhances the throughput of hardware merger. The PMEs can be combined to build a merge module in which some hardware queues are placed for maintaining the high throughput. To estimate the merge parallelism of the merge module, a procedure is introduced and its running results are also described. The proposed merge module can be used as a passive high speed module to match the data processing rate of other parallel hardware modules, such as parallel sorter and data filters, in the backend computer or database machine.  相似文献   

16.
Fast joins using join indices   总被引:1,自引:0,他引:1  
Two new algorithms, “Jive join” and “Slam join,” are proposed for computing the join of two relations using a join index. The algorithms are duals: Jive join range-partitions input relation tuple ids and then processes each partition, while Slam join forms ordered runs of input relation tuple ids and then merges the results. Both algorithms make a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a temporary file, whose size is half that of the join index. Both algorithms require only that the number of blocks in main memory is of the order of the square root of the number of blocks in the smaller relation. By storing intermediate and final join results in a vertically partitioned fashion, our algorithms need to manipulate less data in memory at a given time than other algorithms. The algorithms are resistant to data skew and adaptive to memory fluctuations. Selection conditions can be incorporated into the algorithms. Using a detailed cost model, the algorithms are analyzed and compared with competing algorithms. For large input relations, our algorithms perform significantly better than Valduriez's algorithm, the TID join algorithm, and hash join algorithms. An experimental study is also conducted to validate the analytical results and to demonstrate the performance characteristics of each algorithm in practice. Received July 21, 1997 / Accepted June 8, 1998  相似文献   

17.
Inadequate version control for models significantly impedes the application of model-driven software development. In particular, sophisticated support for merging model versions is urgently needed. In this paper, we present a formal approach to both two- and three-way merging of models in the EMF framework. The approach may be applied to instances of arbitrary Ecore models. We specify context-free as well as context-sensitive rules for model merging which both detect and resolve merge conflicts. Based on these rules, a merge algorithm is developed which produces a consistent model from consistent input models. The merge algorithm does neither assume unique object identifiers, nor does it require change logs. In contrast, it relies on matchings among the input models which identify common elements (state-based approach). The requirements imposed on these matchings are reduced to a minimum, e.g., there are no restrictions on the relative positions of matched elements. Altogether, the merge algorithm is widely applicable, preserves consistency, and offers advanced features, such as merging of ordered collections in the presence of arbitrary moves and handling of context-sensitive conflicts which are hard to detect and to resolve.  相似文献   

18.
对并行环境下Delaunay三角网的构建进行了研究。针对海量数据处理的高效性要求,提出了一种归并构网方法。该方法根据构网数据的实际分布特点,对数据点按x坐标进行排序,并将排序后的数据按给定的阈值点数依次分配给各工作线程,构建出一系列的初始子三角网,然后逐轮对相邻的子三角网进行两两归并,直至最终归并为一个三角网。该构网方法过程中子三角网间的相关性小,易于并行处理和流水线作业。该算法既适用于单机串行、多线程和多核并发环境处理,同时也适用于集群计算模式下的分布式并行处理。实验表明,该算法的时空效率较高,最坏的串行时间复杂度为O(nlogn),一般情况下不超过O(n2)。  相似文献   

19.
An efficient external sorting algorithm with minimal space requirement is presented in this article. The average number of passes over the data is approximately 1 +Ln(N + 1)/4B, whereN is the number of records in the file to be sorted, andB is the buffer size. The external storage requirement is only the file itself, no additional disk space is required. The internal storage requirement is four buffers: two for input, and two for output. The buffer size can be adjusted to the available memory space. A stack of size log2 N is also required.This work was partially supported by a fellowship and grant from Western Michigan University.  相似文献   

20.
在跨企业、跨系统的环境中,流程数据通常记录在单独的事件日志中,这使得无法挖掘完整的端到端的执行流程,因此本算法提出仅使用事件名称以及时间戳属性对日志进行合并。首先分别获取两个系统的过程模型以及根据活动的跨系统跟随依赖关系获得的合并模型,接着将两个系统的流程一对一进行合并并按照时间戳排序,留下与合并模型路径一致的合并流程,然后从这些流程中获得一对一的实例对,即唯一主流程仅与唯一子流程可以合并,再从这些实例对中挖掘活动间的时间约束用于剩余日志的合并,重复最后两步直到所有日志均合并或无法一对一合并日志。该算法在真实的事件日志上进行了实验,达到了满意的合并效果并获得较高的准确率与召回率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号