首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
At present,most k-dominant Skyline query algorithms are oriented to static datasets,this paper proposes a k-dominant Skyline query algorithm for dynamic datasets.The algorithm is recursive circularly.First,we compute the dominant ability of each object and sort objects in descending order by dominant ability.Then,we maintain an inverted index of the dominant index by k-dominant Skyline point calculation algorithm.When the data changes,it is judged whether the update point will afect the k dominant Skyline point set.So the k-dominant Skyline point of the new data set is obtained by inserting and deleting algorithm.The proposed algorithm resolves maintenance isue of a frequently updated database by dynamically updating the data sets.The experimental results show that the query algorithm can efectively improve query eficiency.  相似文献   

2.
Given two spatial point sets R and B in the plane, with cardinalities m and n, respectively, and stored in two separate R-trees, we propose an efficient algorithm to verify whether R and B are linearly separable. The sets R and B are linearly separable if there exists a line that splits the plane into to halfplanes, one containing all R and the other one containing all B. This is the first algorithm that answers the separability question in the context of the spatial data bases. That is, it considers as input big spatial data stored in secondary storage data structures (e.g., the R-tree) which are not allowed to be completely stored in the main memory of the computer to run a classic algorithm. The algorithms designed in this context aim to minimize as much as possible the number of blocks read from the secondary storage data structures to the main memory. Studied problems in this setting are the k-nearest neighbor problem and the spatial range query problem. Our algorithm explicitly exploits the geometric and spatial properties of the R-trees to access only the nodes relevant to decide the linear separability of the given sets. Our experimental results show the efficiency of the algorithm, since it accesses between the 0.34 and 2.79% of the nodes of the R-trees. We also analyze the asymptotic running time of the algorithm, showing that it runs in \(O(m\log m + n\log n)\) time in the worst case.  相似文献   

3.
Efficient On-Line Nonparametric Kernel Density Estimation   总被引:1,自引:0,他引:1  
Nonparametric density estimation has broad applications in computational finance especially in cases where high frequency data are available. However, the technique is often intractable, given the run times necessary to evaluate a density. We present a new and efficient algorithm based on multipole techniques. Given the n kernels that estimate the density, current methods take O(n) time directly to sum the kernels to perform a single density query. In an on-line algorithm where points are continually added to the density, the cumulative O(n 2 ) running time for n queries makes it very costly, if not impractical, to compute the density for large n . Our new Multipole-accelerated On-line Density Estimation (MODE) algorithm is general in that it can be applied to any kernel (in arbitrary dimensions) that admits a Taylor series expansion. The running time for a density query reduces to O (logn) or even constant time, depending on the kernel chosen, and, hence, the cumulative running time is reduced to O (n logn) or O(n) , respectively. Our results show that the MODE algorithm provides dramatic advantages over the direct approach to density evaluation. For example, we show using a modest computing platform that on-line density updates and queries for 1 million points and two dimensions take 8 days to compute using the direct approach versus 40 seconds with the MODE approach. Received June 2, 1997; revised March 3, 1998.  相似文献   

4.
In the context of data base design for nested relational structures, update anomalies can be avoided if the nested scheme forest composed of scheme trees is in nested normal form (NNF) with respect to the associated set of data dependencies. In practice, minimizing the number of normal scheme trees in the nested scheme forest, which is in NNF, will also be an important design goal. This is because the number of computationally expensive join operations that are required in order to answer a given query is related to the number of normal scheme trees that must be used in the query expression.

We prove that the problem of finding a succinct NNF scheme forest is NP-complete for a subclass of the class of split-free sets of multivalued dependencies.  相似文献   

5.
In a traditional database system, the result of a query is a set of values (those values that satisfy the query). In other data servers, such as a system with queries based on image content, or many text retrieval systems, the result of a query is a sorted list. For example, in the case of a system with queries based on image content, the query might ask for objects that are a particular shade of red, and the result of the query would be a sorted list of objects in the database, sorted by how well the color of the object matches that given in the query. A multimedia system must somehow synthesize both types of queries (those whose result is a set and those whose result is a sorted list) in a consistent manner. In this paper we discuss the solution adopted by Garlic, a multimedia information system being developed at the IBM Almaden Research Center. This solution is based on “graded” (or “fuzzy”) sets. Issues of efficient query evaluation in a multimedia system are very different from those in a traditional database system. This is because the multimedia system receives answers to subqueries from various subsystems, which can be accessed only in limited ways. For the important class of queries that are conjunctions of atomic queries (where each atomic query might be evaluated by a different subsystem), the naive algorithm must retrieve a number of elements that is linear in the database size. In contrast, in this paper an algorithm is given, which has been implemented in Garlic, such that if the conjuncts are independent, then with arbitrarily high probability, the total number of elements retrieved in evaluating the query is sublinear in the database size (in the case of two conjuncts, it is of the order of the square root of the database size). It is also shown that for such queries, the algorithm is optimal. The matching upper and lower bounds are robust, in the sense that they hold under almost any reasonable rule (including the standard min rule of fuzzy logic) for evaluating the conjunction. Finally, we find a query that is provably hard, in the sense that the naive linear algorithm is essentially optimal.  相似文献   

6.
Buffer queries   总被引:2,自引:0,他引:2  
A class of commonly asked queries in a spatial database is known as buffer queries. An example of such a query is to "find house-power line pairs that are within 50 meters of each other." A buffer query involves two spatial data sets and a distance d. The answer to this query are pairs of objects, one from each input set, that are within distance d of each other. Given nonpoint spatial objects, evaluation of buffer queries could be a costly operation, even when the numbers of objects in the input data sets are relatively small. This paper addresses the problem of how to evaluate this class of queries efficiently. A fundamental problem with buffer query evaluation is to find an efficient algorithm for solving the minimum distance (miniDist) problem for lines and regions. An efficient minDist algorithm, which only requires a subsequence of segments from each object to be examined, is derived. Finding a fast minDist algorithm is the first step in evaluating a buffer query efficiently. It is observed that many, and sometimes even most, candidates can be proven in the answer without resorting to the relatively expensive minDist operation. A candidate is first evaluated with a least expensive technique-called O-object filtering. If it fails, a more costly operation, called 1-object filtering, is applied. Finally, if both filterings fail, the most expensive minDist algorithm is invoked. To show the effectiveness of the these techniques, they are incorporated into the well-known tree join algorithm and tested with real-life as well as artificial data sets. Extensive experiments show that the proposed algorithm outperforms existing techniques by a wide margin in both execution time as well as IO accesses. More importantly, the performance gain improves drastically with the increase of distance values.  相似文献   

7.
A graph G is a circular-arc graph if it is the intersection graph of a set of arcs on a circle. That is, there is one arc for each vertex of G, and two vertices are adjacent in G if and only if the corresponding arcs intersect. We give a linear-time algorithm for recognizing this class of graphs. When G is a member of the class, the algorithm gives a certificate in the form of a set of arcs that realize it.  相似文献   

8.
An effective and efficient texture analysis method, based on a new criterion for designing Gabor filter sets, is proposed. The commonly used filter sets are usually designed for optimal signal representation. We propose here an alternative criterion for designing the filter set. We consider a set of filters and its response to pairs of harmonic signals. Two signals are considered separable if the corresponding two sets of vector responses are disjoint in at least one of the components. We propose an algorithm for deriving the set of Gabor filters that maximizes the fraction of separable harmonic signal pairs in a given frequency range. The resulting filters differ significantly from the traditional ones. We test these maximal harmonic discrimination (MHD) filters in several texture analysis tasks: clustering, recognition, and edge detection. It turns out that the proposed filters perform much better than the traditional ones in these tasks. They can achieve performance similar to that of state-of-the-art, distribution based (texton) methods, while being simpler and more computationally efficient.  相似文献   

9.
李鸣鹏  高宏  邹兆年 《软件学报》2014,25(4):797-812
研究了基于图压缩的k可达查询处理,提出了一种支持k可达查询的图压缩算法k-RPC及无需解压缩的查询处理算法,k-RPC算法在所有基于等价类的支持k-reach查询的图压缩算法中是最优的.由于k-RPC算法是基于严格的等价关系,因此进一步又提出了线性时间的近似图压缩算法k-GRPC.k-GRPC算法允许从原始图中删除部分边,然后使用k-RPC获得更好的压缩比.提出了线性时间的无需解压缩的查询处理算法.真实数据上的实验结果表明,对于稀疏的原始图,两种压缩算法的压缩比分别可以达到45%,对于稠密的原始图,两种压缩算法的压缩比分别可以达到75%和67%;与在原始图上直接进行查询处理相比,两种基于压缩图的查询处理算法效率更好,在稀疏图上的查询效率可以提高2.5倍.  相似文献   

10.
XMin: Minimizing Tree Pattern Queries with Minimality Guarantee   总被引:1,自引:0,他引:1  
Due to wide use of XPath, the problem of efficiently processing XPath queries has recently received a lot of attention. In particular, a considerable effort has been devoted to minimizing XPath queries since the efficiency of query processing greatly depends on the size of the query. Research work in this area can be classified into two categories: constraint-independent minimization and constraint-dependent minimization. The former minimizes queries in the absence of integrity constraints while the latter in the presence of them. For a linear path query, which is an XPath query without branching predicates, existing constraint-independent minimization methods are generally known to be unable to minimize the query without processing the query itself. Most recently, however, by using the DataGuide, a representative structural summary of XML data, a constraint-independent method that minimizes linear path queries in a top-down fashion has been proposed. Nevertheless, this method can fail to find a minimal query since it minimizes a query by merely erasing labels from the original query whereas a minimal query could include labels that are not present in the original query. In this paper, we propose a bottom-up approach called XMin that guarantees finding a minimal query for a given tree pattern query by using the DataGuide without processing the query itself. For the linear path query, we first show that the sequence of labels occurring in the minimal query is a subsequence of every schema label sequence that matches the original query. Here, the schema label sequence for a node is the sequence of labels from the root of XML data to the node. We then propose iterative subsequence generation that iteratively generates subsequences from the shortest schema label sequence matching the original query in a bottom-up fashion and tests query equivalence. Using iterative subsequence generation, we can always find a minimal query and we formally prove this guarantee. We also propose an extended algorithm that guarantees the minimality for the tree pattern query, which is a linear path query with branching predicates. These methods have been prototyped in a full-fledged object-relational DBMS. The experimental results using real and synthetic data sets show the practicality of our method.  相似文献   

11.
Approximate query processing using wavelets   总被引:7,自引:0,他引:7  
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001  相似文献   

12.
移动对象反向最近邻查询技术研究   总被引:2,自引:0,他引:2       下载免费PDF全文
提出一种基于自调节网格索引的反向最近邻查询(RNNQ)算法,将空间划分为大小相等的网格单元,每个单元作为一个桶存储移动对象,采用基于桶内对象数目和网格几何特征的剪枝策略减少反向最近邻查询所需访问的节点。查询点周围单元桶内对象过多时进行二次网格划分,减小节点访问代价。实验结果表明,该算法具有良好的查询性能,优于基于TPR树索引的RNNQ算法。  相似文献   

13.
Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context K L , which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from K L . Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.  相似文献   

14.
范明  李连友 《软件学报》1994,5(10):44-49
计数算法是最著名的SL递归处理算法之一.对于一类称作计数线性的递归,它具有良好的性能.然而,计数算法要求查询的约束集为单值的,因此很难用作子目标的处理策略.本文提供一种新的广义查询计数算法,它能处理任意的多值约束集.从而本文提供的算法不仅能够用于查询的求值,而且也能用于子目标的处理.算法的正确性和变换后的规则的有效性也在本文简略讨论.  相似文献   

15.
This paper presents a fast adaptive iterative algorithm to solve linearly separable classification problems in R n.In each iteration,a subset of the sampling data (n-points,where n is the number of features) is adaptively chosen and a hyperplane is constructed such that it separates the chosen n-points at a margin and best classifies the remaining points.The classification problem is formulated and the details of the algorithm are presented.Further,the algorithm is extended to solving quadratically separable classification problems.The basic idea is based on mapping the physical space to another larger one where the problem becomes linearly separable.Numerical illustrations show that few iteration steps are sufficient for convergence when classes are linearly separable.For nonlinearly separable data,given a specified maximum number of iteration steps,the algorithm returns the best hyperplane that minimizes the number of misclassified points occurring through these steps.Comparisons with other machine learning algorithms on practical and benchmark datasets are also presented,showing the performance of the proposed algorithm.  相似文献   

16.
The paper presents the sequential and the parallel algorithm for solving the nearest-neighbor problem in the plane, based on the generalized Voronoi diagram construction. The applications of the problem are found in the areas of networking, communications, distributed systems, computer modeling and information retrieval. The input for the problem is the set of circular sites S with varying radii, the query point p and the metric (Minkowski or power) according to which the site, neighboring the query point, is to be reported. The sequential algorithm takes O(n) time to build the data structure and O(log n) time for each query. The parallel algorithm requires O(log n log) preprocessing time and O(log) query time on EREW PRAM architecture with n/log n processors. The IDG/NNM software was developed for an experimental study of the problem. The experimental results demonstrate that the Voronoi diagram method outperforms the kd tree method for all tested input configurations. The tests were conducted on large data sets, comprising thousands of generators.  相似文献   

17.
近年来,随着XML数据的爆炸式增长,对XML关键字查询技术的研究日益受到关注。数据编码是关键字查询的基础,目前主要有2种方式--基于路径的编码及区间编码。区间编码可更好地适应对查询中的XML数据进行动态的更新,因而具有更多的优势。本文研究基于区间编码的关键字查询问题,提出一种新的查询算法。该算法首先根据预留的区间值建立索引,再根据最小范围值对索引进行选择遍历,减少了不必要的比较,达到了提高查询效率的目的。研究发现,预留空间的选择对查询效率有一定的影响。为此,本文设计一种基于节点自身进行区间预留的编码方式(Interval Reservation Based on Node, IRBN),为节点设置权值,并根据权值进行区间值的设定,形成根据节点自身分配区间的较为均衡的编码。实验表明,IRBN编码是合理的,有较高的查询效率。  相似文献   

18.
基于LazyDFA的XPath在XML数据流上查询优化算法   总被引:2,自引:0,他引:2  
针对XML数据流上XPath查询处理及查询优化问题,给出了一种基于lazyDFA技术的解决方案,并提出了优化算法。共享NFA状态表,通过将NFA中的状态分成共享和独享两个状态集来降低lazyDFA的内存使用量;建立状态转移表优化算法通过在lazyDFA状态结构中增加一个状态转移表,来提高lazyDFA的查询速度。实验结果表明,提出的方法能够在执行效率和空间代价方面优于传统算法。  相似文献   

19.
A collection of sets may have some interesting properties which help identify efficient algorithms for constraint satisfaction problems and combinatorial auction problems. One of the properties is tree convexity. A collection S of sets is tree convex if we can find a tree T whose nodes are the union of the sets of S and each set of S is the nodes of a subtree of T . This concept extends that of row convex sets each of which is an interval over a total ordering of the elements of the union of these sets. An interesting problem is to find efficient algorithms to test whether a collection of sets is tree convex. It is not known before if there exists a linear time algorithm for this test. In this paper, we review the materials that are the key to a linear algorithm: hypergraphs, a characterization of tree convex sets and the acyclic hypergraph test algorithm. Some typos in the original paper of the acyclicity test are corrected here. Some experiments show that the linear algorithm is significantly faster than a well‐known existing algorithm.  相似文献   

20.
Recently, several techniques have been proposed to protect the user location privacy for location-based services in the Euclidean space. Applying these techniques directly to the road network environment would lead to privacy leakage and inefficient query processing. In this paper, we propose a new location anonymization algorithm that is designed specifically for the road network environment. Our algorithm relies on the commonly used concept of spatial cloaking, where a user location is cloaked into a set of connected road segments of a minimum total length L{\cal L} including at least K{\cal K} users. Our algorithm is “query-aware” as it takes into account the query execution cost at a database server and the query quality, i.e., the number of objects returned to users by the database server, during the location anonymization process. In particular, we develop a new cost function that balances between the query execution cost and the query quality. Then, we introduce two versions of our algorithm, namely, pure greedy and randomized greedy, that aim to minimize the developed cost function and satisfy the user specified privacy requirements. To accommodate intervals with a high workload, we introduce a shared execution paradigm that boosts the scalability of our location anonymization algorithm and the database server to support large numbers of queries received in a short time period. Extensive experimental results show that our algorithms are more efficient and scalable than the state-of-the-art technique, in terms of both query execution cost and query quality. The results also show that our algorithms have very strong resilience to two privacy attacks, namely, the replay attack and the center-of-cloaked-area attack.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号