期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于K-D树的多维数据分布方法

王碧霍红卫《计算机工程》2003,29(3):105-107

系统地介绍了基于并行化k-d树的多维数据分布方法。给出了几种构造k-d树的策略和相应算法，并从理论上分析和比较了各种策略的通信花费及其应用范围。相似文献

2.

Parallel H-Tree Based Data Cubing on Graphics Processors

下载免费PDF全文

Baoyuan Wang Yizhou Yu 《International Journal of Software and Informatics》2012,6(1):61-87

Graphics processing units (GPUs) have an SIMD architecture and have been widely used recently as powerful general-purpose co-processors for the CPU. In this paper, we investigate efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors. H-tree is a hyper-linked tree structure used in both top-k H-cubing and the stream cube. Fast H-tree construction, update and real-time query response are crucial in many OLAP applications. We design highly efficient GPU-based parallel algorithms for these H-tree based data cube operations. This has been made possible by taking effective methods, such as parallel primitives for segmented data and efficient memory access patterns, to achieve load balance on the GPU while hiding memory access latency. As a result, our GPU algorithms can often achieve more than an order of magnitude speedup when compared with their sequential counterparts on a single CPU. To the best of our knowledge, this is the first attempt to develop parallel data cubing algorithms on graphics processors. 相似文献

3.

Performance analysis of partial-match search algorithms for BD trees

Sivarama P. Dandamudi Paul G. Sorenson 《Software》1988,18(1):83-105

Database systems are becoming increasingly popular for answering queries. Partial-match search queries are an important class of queries in such a system. Several storage structures have been proposed to answer these queries efficiently. The BD tree is an example of such a storage structure. A previous study indicated that the k-d tree performance is better than that of the BD tree for partial-match search queries. A recent paper reported some improved algorithms. However, it is unclear whether the improved algorithms show the BD tree in a favourable light for partial-match search queries. This paper explores the performance of these algorithms and compares their performance to that of the k-d tree. Since the BD tree construction process uses some heuristics to make it a better balanced tree, this paper also evaluates the effect of these heuristics on the partial-match search algorithms. The major conclusions of this study are that the BD tree performance for partial-match search is better than that of the k-d tree when an improved algorithm is used for partial-match search, and only the DZ expression rearrangement heuristic has substantial effect on partial-match search performance. 相似文献

4.

Grid k-d tree approach for point location in polyhedral data sets – application to explicit MPC

Xiaojie Xiu 《International journal of control》2020,93(4):872-880

ABSTRACT

Explicit model predictive control (EMPC) moves the online computational burden of linear model predictive control (MPC) to offline computation by using multi-parametric programming which produces control laws defined over a set of polyhedral regions in the state space. The online computation of EMPC is to find the corresponding control law according to a given state, this is called the point location problem. This paper deals with efficient point location in larger polyhedral data sets. The authors propose a hybrid data structure, grid k-d tree (GKDT), which is constructed by the k-dimensional tree (k-d tree), hash table and binary search tree (BST). The main part of GKDT is a multiple branch tree which constructs subtrees by splitting the polyhedral region into several equal grids based on the k-d tree and is traversed by the hash function on each level. GKDT has a high search efficiency, even though it needs much more storage memory. A complexity analysis of the approach in the runtime and storage requirements is provided. Advantages of the method are supported by two examples in the paper. 相似文献

5.

Multidimensional Binary Search Trees in Database Applications

《IEEE transactions on pattern analysis and machine intelligence》1979,(4):333-340

The multidimensional binary search tree (abbreviated k-d tree) is a data structure for storing multikey records. This structure has been used to solve a number of "geometric" problems in statistics and data analysis. The purposes of this paper are to cast k-d trees in a database framework, to collect the results on k-d trees that have appeared since the structure was introduced, and to show how the basic data structure can be modified to facilitate implementation in large (and very large) databases. 相似文献

6.

Constructing large suffix trees on a computational grid

Chunxi Bertil 《Journal of Parallel and Distributed Computing》2006,66(12):1512-1523

The suffix tree is a key data structure for biological sequence analysis, since it permits efficient solutions to many string-based problems. Constructing large suffix trees is challenging because of high memory overheads and poor memory locality. Even though efficient suffix tree construction algorithms exist, their run-time is still very high for long DNA sequences such as whole human chromosomes. In this paper, we are using a hierarchical grid system as a computational platform in order to reduce this run-time significantly. To achieve an efficient mapping onto this type of architecture we introduce a parallel suffix tree construction algorithm that makes use of a new data structure called the common prefix suffix tree. Using this algorithm together with a dynamic load balancing strategy we show that our distributed grid implementation leads to significant run-time savings. 相似文献

7.

Communication and memory optimal parallel data cube construction

Jin R. Vaidyanathan K. Yang G. Agrawal G. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(12):1105-1119

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction, which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results. 相似文献

8.

后缀树的并行构造算法 总被引：1，自引：0，他引：1

葛健王国仁于戈《计算机科学》2004,31(5):96-99

后缀树是一种非常重要的数据结构，它在与字符串处理相关的各种领域里有着非常广泛的应用。构造后缀树是应用后缀树解决问题的前提和关键。虽然很多现有的后缀树构造算法都是线性时间和空间的，但是，当被索引的字符串的长度很长时，构造其后缀树所消耗的时间和空间仍将非常巨大，这极大地限制了后缀树的实际应用。而并行技术是解决这一问题的很好途径，因此人们提出了后缀树的并行构造算法。本文对后缀树的三种并行构造算法进行了综述，通过系统的比较和分析，总结出当前存在的问题，并指明了下一步的研究方向。相似文献

9.

一种基于数据聚类的鲁棒SIFT特征匹配方法 总被引：2，自引：0，他引：2

范志强赵沁平《计算机研究与发展》2012,49(5):1123-1129

针对噪声敏感造成的SIFT特征匹配鲁棒性低问题,提出一种基于数据聚类的两阶段特征匹配方法.在满足特征匹配几何距离最邻近本质要求下扩展了k d数据结构,使其不但能够完成算术平均化匹配特征离线聚类,而且能够实现第1阶段聚类特征在线匹配.在此基础上,给出一种概率最优投票策略选择关键图像进行第2阶段匹配,最后合并两阶段属于关键图像的所有匹配特征对.实验结果表明,对于大量存在重叠关系的图像集合,该方法能够有效减少重复特征数量,降低噪声信息对特征匹配的干扰,极大地提高特征匹配的鲁棒性. 相似文献

10.

Cost-optimal parallel algorithms for the tree bisector and relatedproblems

Biing-Feng Wang Shan-Chyun Ku Keng-Hua Shil 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(9):888-898

An edge is a bisector of a simple path if it contains the middle point of the path. Let T=(V,E) be a tree. Given a source vertex s ∈ V, the single-source tree bisector problem is to find, for every vertex υ ∈ V, a bisector of the simple path from s to υ. The all-pairs tree bisector problem is to find for, every pair of vertices u, υ ∈ V, a bisector of the simple path from u to υ. In this paper, it is first shown that solving the single-source tree bisector problem of a weighted tree has a time lower bound Ω(n log n) in the sequential case. Then, efficient parallel algorithms are proposed on the EREW PRAM for the single-source and all-pairs tree bisector problems. Two O(log n) time single-source algorithms are proposed. One uses O(n) work and is for unweighted trees. The other uses O(n log n) work and is for weighted trees. Previous algorithms for the single-source problem could achieve the same time O(log n) and the same optimal work, O(n) for unweighted trees and O(n log n) for weighted trees, on the CRCW PRAM. The contribution of our single-source algorithms is the improvement from CRCW to EREW. One all-pairs parallel algorithm is proposed. It requires O(log n) time using O(n²) work. All the proposed algorithms are cost-optimal. Efficient tree bisector algorithms have practical applications to several location problems on trees. Using the proposed algorithms, efficient parallel solutions for those problems are also presented 相似文献

11.

基于形状特征k-d树的多维时间序列相似搜索 总被引：2，自引：0，他引：2

黄河史忠植郑征《软件学报》2006,17(10):2048-2056

多维时间序列是信息系统中一类重要的数据对象,相似搜索是其应用的一个核心.两个序列(子序列)相似度加以比较的常用方法是:将序列(子序列)转换成空间中的曲线,然后计算曲线间的欧几里德距离.这种方法的主要缺陷是它仅考虑了序列(子序列)间的整体距离关系,而不能体现它们自身的局部变化.针对此问题,提出了一种新的可应用于多维时间序列的快速相似搜索方法.该方法将序列(子序列)的局部变化特性与检索结构(k-d树)结合起来,使得在搜索k-d树的同时实现了序列(子序列)的局部变化匹配,从而极大地提高了查询效率和正确率.实验结果表明了算法的有效性. 相似文献

12.

EFFICIENT PARALLEL RANGE SEARCHING AND PARTITIONING ALGORITHMS*

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(4):301-317

We present an optimal parallel construction of the range tree data structure and use this construction to solve several geometric partitioning problems. In the range tree, we show how to perform a count-mode orthogonal range query in 0(log n) time by a single processor and a report mode orthogonal range query in 0(log n) time using 0(1 + log n) processors, where k is the number of points inside the query range. We consider partitioning problems of the following nature. Given a planar point set S (∣S∣ = ri) a measure μacting on 5 and a pair of values μ1 and μ2,the task is to find a partition of S into two components S₁ and S₂ (S = S₁U S₂) such that μ(S₁) =μ1 for i=1, 2. We consider several measures like diameter under L∞ and l₁ metric; area, perimeter of the smallest enclosing axes-parallel rectangle; and the side length of the smallest enclosing axes-parallel square. All our parallel algorithms foi partitioning problems run in 0(log n) time using 0(n) processors. Our algorithms are designed for the CREW PRAM model of parallel computation. 相似文献

13.

Extended Branch Decomposition Graphs: Structural Comparison of Scalar Data

Himangshu Saikia Hans‐Peter Seidel Tino Weinkauf 《Computer Graphics Forum》2014,33(3):41-50

We present a method to find repeating topological structures in scalar data sets. More precisely, we compare all subtrees of two merge trees against each other – in an efficient manner exploiting redundancy. This provides pair‐wise distances between the topological structures defined by sub/superlevel sets, which can be exploited in several applications such as finding similar structures in the same data set, assessing periodic behavior in time‐dependent data, and comparing the topology of two different data sets. To do so, we introduce a novel data structure called the extended branch decomposition graph, which is composed of the branch decompositions of all subtrees of the merge tree. Based on dynamic programming, we provide two highly efficient algorithms for computing and comparing extended branch decomposition graphs. Several applications attest to the utility of our method and its robustness against noise. 相似文献

14.

Mining rooted ordered trees under subtree homeomorphism

Mostafa Haghir Chehreghani Maurice Bruynooghe 《Data mining and knowledge discovery》2016,30(5):1249-1272

Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant. 相似文献

15.

High Performance OLAP and Data Mining on Parallel Computers 总被引：2，自引：0，他引：2

Sanjay Goil Alok Choudhary 《Data mining and knowledge discovery》1997,1(4):391-417

On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Multidimensional OLAP systems store data in multidimensional arrays on which analytical operations are performed. Knowledge discovery and data mining requires complex operations on the underlying data which can be very expensive in terms of computation time. High performance parallel systems can reduce this analysis time. Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present algorithms for construction of data cubes on distributed-memory parallel computers. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base cube and compare their performances. Data cubes are used to perform consolidation queries used in roll-up operations using dimension hierarchies. Finally, we show how data cubes are used for data mining using Attribute Focusing techniques. We present results for these on the IBM-SP2 parallel machine. Results show that our algorithms and techniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications. 相似文献

16.

A simple algorithm for nearest neighbor search in high dimensions 总被引：7，自引：0，他引：7

Nene S.A. Nayar S.K. 《IEEE transactions on pattern analysis and machine intelligence》1997,19(9):989-1003

The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user-specified distance ϵ. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance ϵ. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine ϵ in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available 相似文献

17.

Evaluating Steiner-tree heuristics and diameter variations for application layer multicast

《Computer Networks》2008,52(15):2872-2893

Latency reduction in distributed interactive applications has been studied intensively. Such applications may have stringent latency requirements and dynamic user groups. We focus on application-layer multicast with a centralized approach to the group management. The groups are organized in overlay networks that are created using graph algorithms that create a tree structure for the group. A tree has no cycles and uses a small routing table, as opposed to a connected overlay mesh.We investigated a group of spanning tree problems that are referred to as Steiner-tree problems, and we have a particular focus on reducing the diameter of a tree, which is the maximum pairwise latency in a tree. In addition, we focus on reducing the time it takes to execute membership changes. In that context, we use core-selection heuristics to find well-placed client nodes, and edge-pruning algorithms to reduce the number of edges in an otherwise fully meshed overlay. Our edge-pruning algorithms strongly connect well-placed client nodes to the remaining group members, to create new and pruned group graphs. Consequently, when a tree algorithm is applied to a pruned group graph, it is manipulated into creating trees with a smaller diameter.We devised new Steiner-tree heuristics that reduced the diameter, and also proposed new edge-pruning algorithms to make the tree construction faster. These heuristics and algorithms were implemented and analyzed experimentally along with several spanning-tree and core-selection heuristics found in the literature. We found that a full-mesh of shortest paths makes it difficult for Steiner-tree heuristics to find better trees than spanning tree algorithms. The network seen from the application layer is in fact a full mesh of shortest paths. In addition, we found that faster Steiner-tree heuristics that do not explicitly optimize the diameter are able to compete with slower heuristics that do optimize it. 相似文献

18.

一个新的数据挖掘模型与算法

孙林宋国杰张培衢高亚萍《计算机应用研究》2001,18(2):43-45,53

提出一个新的数据挖掘结构ＦＰ＿tree及相应的构造算法。基于ＦＰ＿tree之上的挖掘算法及相关的性质与引理也相应给出,最后给出了本算法与以往算法的性能评价。相似文献

19.

用K-D树实现对双模态多媒体数据库的有效查询

何元烈应自炉张有为《计算机工程与应用》2003,39(18):187-189,232

论文提出了一种用k-d树来查询双模态视觉听觉语音识别数据库的方法。这种方法揉合了查询地理信息系统的多维数据库和空间数据库的方法,结合双模态视觉听觉语音数据库自身的特点提出了在数据库中插入、查询和删除记录的算法。最后还对把查询多维数据的方法应用在双模态语音识别数据库领域进行了展望。相似文献

20.

面向高维特征和多分类的分布式梯度提升树

江佳伟符芳诚邵蓥侠崔斌《软件学报》2019,30(3):784-798

梯度提升树算法由于其高准确率和可解释性,被广泛地应用于分类、回归、排序等各类问题.随着数据规模的爆炸式增长,分布式梯度提升树算法成为研究热点.虽然目前已有一系列分布式梯度提升树算法的实现,但是它们在高维特征和多分类任务上性能较差,原因是它们采用的数据并行策略需要传输梯度直方图,而高维特征和多分类情况下梯度直方图的传输成为性能瓶颈.针对这个问题,研究更加适合高维特征和多分类的梯度提升树的并行策略,具有重要的意义和价值.首先比较了数据并行与特征并行策略,从理论上证明特征并行更加适合高维和多分类场景.根据理论分析的结果,提出了一种特征并行的分布式梯度提升树算法FP-GBDT.FP-GBDT设计了一种高效的分布式数据集转置算法,将原本按行切分的数据集转换为按列切分的数据表征;在建立梯度直方图时,FP-GBDT使用一种稀疏感知的方法来加快梯度直方图的建立;在分裂树节点时,FP-GBDT设计了一种比特图压缩的方法来传输数据样本的位置信息,从而减少通信开销.通过详尽的实验,对比了不同并行策略下分布式梯度提升树算法的性能,首先验证了FP-GBDT提出的多种优化方法的有效性;然后比较了FP-GBDT与XGBoost的性能,在多个数据集上验证了FP-GBDT在高维特征和多分类场景下的有效性,取得了最高6倍的性能提升. 相似文献