A heap structure designed for secondary storage is suggested that tries to make the best use of the available buffer space in primary memory. The heap is a complete multi-way tree, with multi-page blocks of records as nodes, satisfying a generalized heap property. A special feature of the tree is that the nodes may be partially filled, as in B-trees. The structure is complemented with priority-queue operations insert and delete-max. When handling a sequence of S operations, the number of page transfers performed is shown to be O(∑i = 1S(1/P) log(M/P)(Ni/P)), where P denotes the number of records fitting into a page, M the capacity of the buffer space in records, and Ni, the number of records in the heap prior to the ith operation (assuming P 1 and S> M c · P, where c is a small positive constant). The number of comparisons required when handling the sequence is O(∑i = 1S log2 Ni). Using the suggested data structure we obtain an optimal external heapsort that performs O((N/P) log(M/P)(N/P)) page transfers and O(N log2 N) comparisons in the worst case when sorting N records.  相似文献   

An efficient non-dominated sorting method for evolutionary algorithms   总被引:1,自引:0,他引:1  
We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN(2)) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.  相似文献   

地址基数排序算法的设计与实现   总被引:1,自引:0,他引:1  
本文提出了一种集地址排序与基数排序优点于一体的具有比传统快速排序算法性能更高的排序方法,全文给出了该算法的描述、部分源程序、时间/空间复杂度分析。本算法由于具有不需要关键字比较的特点而特别适用于大数据量的排序,又由于有不需要移动元素的特点而特别适用于大记录的排序工作,应用结果表明,该算法具有很高的实用价值。  相似文献   

This paper reports the development of a sorting algorithm, called a ‘pocket sort.’ It is primarily directed to sorting of character data. The algorithm is strictly of order O(n); sorting time is directly proportional to the number of data elements to be sorted. Further, through the use of pointer - linked list data structures, no internal movement of the records containing the sort field is required. The algorithm has been implemented in Turbo Pascal. Data are presented comparing this pocket sort to other sorting techniques.  相似文献   

The Least Basic Operations on Heap and Improved Heapsort   总被引:2,自引:0,他引:2       下载免费PDF全文
The best algorithms of INSERT and DELETE operations on heap is presented,by which HEAPSORT is improved.Inserting on element into and deleting one element from a heap of n elements spend at most [log log n] comparisons and [log n] comparisons and transfers of element in the worst cases respectively.The improved HEAPSORT spends n log n n log log n O(n) comparisons and element transfers (not swap!) in the worst case.It may be the best HEAPSORT algorithm since the lower bound of sorting algorithm [log n!]≈n log n O(n).Especially,in element transfer,this is the best result we known so far.  相似文献   

This paper presents the design, features and pilot evaluation study of a web-based environment -the SORTING environment- for the learning of sorting algorithms by secondary level education students. The design of this environment is based on modeling methodology, taking into account modern constructivist and social theories of learning while at the same time acknowledging the role of hands-on experience, the significance of students’ expressing their previous knowledge, the importance of interlinked multiple representation systems (MRS) and the role of constructive feedback on student learning. Although SORTING supports student learning of typical sorting algorithms such as Bubble-sort, Quick-sort and Selection-sort, it can also be adapted to integrate more sorting algorithms. The analysis of the data emerging from the pilot evaluation study of SORTING has shown that students used all the representation systems (RS) provided and found them attractive and easy to use. On the whole, student interactions within SORTING helped them to become aware of both the intuitive and the typical sorting procedures used, to conceptualize them, to overcome learning difficulties, to correct themselves and to make connections between different representations of the sorting algorithms used.  相似文献   

It is shown that the existence of duplicate values in some attribute columns has a significant impact on the computational complexity of the sorting and joining operations. This is especially true when the number of distinct tuple values is a small fraction of the total number of tuples. The authors characterize a multirelation M (n, L) by its cardinality n and the number of distinct elements L it contains. Under this characterization, the worst time complexity of sorting such a multirelation with binary comparisons as basic operations is investigated. Upper and lower bounds on the number of three-branch comparisons needed to sort such a multirelation are established. Thereafter, the methodology used to study the complexity of sorting is applied to the natural join operation. It is shown that the existence of duplicate values in the join attribute columns can be exploited to reduce the computational complexity of the natural join operation  相似文献   

Heapsort is an internal sorting method which sorts an array of n records in place in O(n log n) time. Heapsort is generally considered unsuitable for external random-access sorting. By replacing key comparisons with merge operations on pages, it is shown how to obtain an in-place external sort which requires O(m log m) page references, where m is the number of pages which the file occupies. The new sort method (called Hillsort) has several useful properties for advanced database management systems. Not only does Hillsort operate in place, i.e., no additional external storage space is required assuming that the page table can be kept in core memory, but accesses to adjacent pages in the heap require one seek only if the pages are physically contiguous. The authors define the Hillsort model of computation for external random-access sorting, develop the complete algorithm and then prove it correct. The model is next refined and a buffer management concept is introduced so as to reduce the number of merge operations and page references, and make the method competitive to a basic balanced two-way external merge. Performance characteristics are noted such as the worst-case upper bound, which can be carried over from Heapsort, and the average-case behavior, deduced from experimental findings. It is shown that the refined version of the algorithm which is on a par with the external merge sort  相似文献   

Detecting and eliminating duplicate records is one of the major tasks for improving data quality. The task, however, is not as trivial as it seems since various errors, such as character insertion, deletion, transposition, substitution, and word switching, are often present in real-world databases. This paper presents an n-gram-based approach for detecting duplicate records in large databases. Using the approach, records are first mapped to numbers based on the n-grams of their field values. The obtained numbers are then clustered, and records within a cluster are taken as potential duplicate records. Finally, record comparisons are performed within clusters to identify true duplicate records. The unique feature of this method is that it does not require preprocessing to correct syntactic or typographical errors in the source data in order to achieve high accuracy. Moreover, sorting the source data file is unnecessary. Only a fixed number of database scans is required. Therefore, compared with previous methods, the algorithm is more time efficient. Published online: 22 August 2001  相似文献   

When algorithms for sorting and searching are applied to keys that are represented as bit strings, we can quantify the performance of the algorithms not only in terms of the number of key comparisons required by the algorithms but also in terms of the number of bit comparisons. Some of the standard sorting and searching algorithms have been analyzed with respect to key comparisons but not with respect to bit comparisons. In this paper, we investigate the expected number of bit comparisons required by Quickselect (also known as Find). We develop exact and asymptotic formulae for the expected number of bit comparisons required to find the smallest or largest key by Quickselect and show that the expectation is asymptotically linear with respect to the number of keys. Similar results are obtained for the average case. For finding keys of arbitrary rank, we derive an exact formula for the expected number of bit comparisons that (using rational arithmetic) requires only finite summation (rather than such operations as numerical integration) and use it to compute the expectation for each target rank.  相似文献   

Track-before-detect (TBD) algorithms are used for tracking systems, where the object’s signal is below the noise floor (low-SNR objects). A lot of computations and memory transfers for real-time signal processing are necessary. GPGPU in parallel processing devices for TBD algorithms is well suited. Finding optimal or suboptimal code, due to lack of documentation for low-level programming of GPGPUs is not possible. High-level code optimization is necessary and the evolutionary approach, based on the single parent and single child is considered, that is local search approach. Brute force search technique is not feasible, because there are N! code variants, where N is the number of motion vectors components. The proposed evolutionary operator—LREI (local random extraction and insertion) allows source code reordering for the reduction of computation time due to better organization of memory transfer and the texture cache content. The starting point, based on the sorting and the minimal execution time metric is proposed. The unbiased random and biased sorting techniques are compared using experimental approach. Tests shows significant improvements of the computation speed, about 8 % over the conventional code for CUDA code. The time period of optimization for the sample code is about 1 h (1,000 iterations) for the considered recursive spatio-temporal TBD algorithm.  相似文献   

本文给出了一种对关键字在特定范围内的数据记录不用进行数据的比较交换的快速排序算法、算法思想、算法描述、时间复杂度及空间复杂度分析,并用C++语言编写程序进行算法比较。结果表明:在关键字范围远远小于记录数的情况下,此算法的时间复杂度仅为O(n),并且明显优于其他排序算法。  相似文献   

联机分析查询处理中的一种聚集算法   总被引:10,自引:2,他引:10  
联机分析处理(online analytical processing,简称OLAP)查询是涉及大量数据的即席复杂查询,从SQL(structured query language)角度来看,这些查询通常都包含多表连接和分组聚集操作.从OLAP查询处理角度出发,提出一种新的基于排序的聚集查询算法MuSA(sort-based aggregation with multi-table join).该方法充分考虑到数据仓库星型模式的特点,将聚集操作和新的多表连接算法MJoin相结合,排序时采用  相似文献   

数据仓库中的问题数据对数据质量有较大的影响,为了查找和去除这些问题数据,首要的工作是处理相似重复数据,目前针对重复数据清除应用最多的算法是基本邻近排序算法(SNM)。通过分析SNM算法的缺陷,提出了一种改进的SNM算法——ISNM。采用属性区分法计算属性权值,解决了人为主观赋予权值导致的问题;使用字段过滤算法计算2条记录的相似度,减少了窗口内记录属性的比对次数,加快了算法的检测速度;使用可变窗口代替固定大小的窗口,防止记录漏配并减少无用的记录比对。实验结果表明,改进后的ISNM算法在查全率、查准率和运行时间开销上有明显的优势。  相似文献   

Decomposition-based multi-objective evolutionary algorithms have been found to be very promising for many-objective optimization. The recently presented non-dominated sorting genetic algorithm III (NSGA-III) employs the decomposition idea to efficiently promote the population diversity. However, due to the low selection pressure of the Pareto-dominance relation the convergence of NSGA-III could still be improved. For this purpose, an improved NSGA-III algorithm based on niche-elimination operation (we call it NSGA-III-NE) is proposed. In the proposed algorithm, an adaptive penalty distance (APD) function is presented to consider the importance of convergence and diversity in the different stages of the evolutionary process. Moreover, the niche-elimination operation is designed by exploiting the niching technique and the worse-elimination strategy. The niching technique identifies the most crowded subregion, and the worse-elimination strategy finds and further eliminates the worst individual. The proposed NSGA-III-NE is tested on a number of well-known benchmark problems with up to fifteen objectives and shows the competitive performance compared with five state-of-the-art decomposition-based algorithms. Additionally, a vector angle based selection strategy is also proposed for handling irregular Pareto fronts.  相似文献   

A graph-based modeling technique has been developed for the stochastic analysis of systems containing concurrency. The basis of the technique is the use of directed acyclic graphs. These graphs represent event-precedence networks where activities may occur serially, probabilistically, or concurrently. When a set of activities occurs concurrently, the condition for the set of activities to complete is that a specified number of the activities must complete. This includes the special cases that one or all of the activities must complete. The cumulative distribution function associated with an activity is assumed to have exponential polynomial form. Further generality is obtained by allowing these distributions to have a mass at the origin and/or at infinity. The distribution function for the time taken to complete the entire graph is computed symbolically in the time parameter t. The technique allows two or more graphs to be combined hierarchically. Applications of the technique to the evaluation of concurrent program execution time and to the reliability analysis of fault-tolerant systems are discussed.  相似文献   

The process of integrating large volumes of data coming from disparate data sources, in order to detect records that refer to the same entities, has always been an important problem in both academia and industry. This problem becomes significantly more challenging when the integration involves a huge amount of records and needs to be conducted in a real-time fashion to address the requirements of critical applications. In this paper, we propose two novel schemes for online record linkage, which achieve very fast response times and high levels of recall and precision. Our proposed schemes embed the records into a Bloom filter space and employ the Hamming Locality-Sensitive Hashing technique for blocking. Each Bloom filter is hashed to a number of hash tables in order to amplify the probability of formulating similar Bloom filter pairs. The main theoretical premise behind our first scheme relies on the number of times a Bloom filter pair is formulated in the hash tables of the blocking mechanism. We prove that this number strongly depends on the distance of that Bloom filter pair. This correlation allows us to estimate in real-time the Hamming distances of Bloom filter pairs without performing the comparisons. The second scheme is progressive and achieves high recall, upfront during the linkage process, by continuously adjusting the sequence in which the hash tables are scanned, and also guarantees, with high probability, the identification of each similar Bloom filter pair. Our experimental evaluation, using four real-world data sets, shows that the proposed schemes outperform four state-of-the-art methods by achieving higher recall and precision, while being very efficient.  相似文献   

The proper allocation of facilities within Islamic holy places is barely studied. These places annually witness millions of pilgrims and guests. The number of people during pilgrimage has been growing recently and is expected to grow further in the future. Different facilities should be optimally allocated to properly serve this large number of people and efficiently respond to their requests. In this paper, we target the problem of optimally allocating facilities within the largest Islamic holy place, Arafat. We evaluate the current allocation with respect to distance, coverage, and cover inequality metrics. Average-case and worst-case values of the three metrics are considered for evaluation. Results show that the current allocation strategy is far from being optimal. For the three considered metrics, we use crowdedness-based techniques to allocate facilities within the area of Arafat. Optimal allocations are first obtained by solving integer programming (IP) models. Thereafter, two widely used metaheuristics, genetic algorithms (GA) and simulated annealing, are experimented and evaluated. Results show that the optimal solution could be easily obtained for coverage and cover inequality metrics. For the distance metric, the computation time of the IP technique is large and GA appears as a good candidate to balance between computation time and solution quality. Finally, we study allocating facilities from a multiobjective perspective. Both scalar-weighted formulation and nondominated sorting genetic algorithm II techniques are considered. Results show that the latter technique outperforms the former technique in the number of generated Pareto-optimal allocations as well as the quality of these allocations.  相似文献   

水力压裂过程的井下测试是压裂效果评价、改进压裂工艺的重要依据.在进行井下测试时,由于实际作业时间常与计划时间有较大差异,经常造成预先按作业计划设定存储的且存储容量有限的井下测试仪对关键时段没有记录.论文中分析了压裂作业井下压力曲线在不同作业时段的特征,提出了以压力、压力梯度作为特征参数来判定压裂作业所处时段的简单逻辑判别方法,并设计了自主存储策略和基于单片机的仪器电路.测试实验表明该方法是有效的,仪器可自主对关键时段的参数进行密集采样和存储.  相似文献   

本文提出一种混合超启发式遗传算法(HHGA),用于求解一类采用三角模糊数表示工件加工时间的模糊柔性作业车间调度问题(FFJSP),优化目标为最小化最大模糊完工时间(即makespan).首先,详细分析现有三角模糊数排序准则性质,并充分考虑取大操作的近似误差和模糊度,设计一种更为准确的三角模糊数排序准则,可合理计算FFJSP和其他各类调度问题解的目标函数值.其次,为实现对FFJSP解空间不同区域的有效搜索,HHGA将求解过程分为两层,高层利用带自适应变异算子的遗传算法对6种特定操作(即6种有效邻域操作)的排列进行优化;低层将高层所得的每种排列作为一种启发式算法,用于对低层相应个体进行操作来执行紧凑的变邻域局部搜索并生成新个体,同时加入模拟退火机制来避免搜索陷入局部极小.最后,仿真实验和算法比较验证了所提排序准则和HHGA的有效性.  相似文献   

