共查询到20条相似文献,搜索用时 15 毫秒
1.
高效多子空间Skyline查询处理算法 总被引:1,自引:0,他引:1
《计算机科学与探索》2016,(5):623-634
随着Skyline查询应用的增多,子空间Skyline查询成为热点。针对实际应用中用户从多角度审视某一数据集的需求,充分研究了多子空间Skyline查询问题。在分析现有子空间Skyline查询算法解决该问题不足的基础上,提出了子空间立方体群(subspace skycube group,SSG)结构,并给出了基于该结构的同时计算任意多个子空间Skyline查询的MSSC(multiple subspace skycube)算法。该算法采用子空间候选集(subspace candidate sets,SCS),并充分利用了子空间立方体群结构中各子空间Skyline结果间的共享关系;在此基础上,算法采用求和过滤以及最大值过滤等方法,对数据集进行剪枝和过滤,从而进一步提高算法效率。最后,分别用人造数据和真实数据对算法进行实验,并与现有算法进行比较,结果表明MSSC算法可以高效地解决多子空间Skyline查询问题。 相似文献
2.
3.
Pareto-optimal objects are favored as each of such objects has at least one competitive edge against all other objects, or “not dominated”. Recently, in the database literature, skyline queries have gained attention as an effective way to identify such pareto-optimal objects. In particular, this paper studies the pareto-optimal objects in perspective of facility or business locations. More specifically, given data points P and query points Q in two-dimensional space, our goal is to retrieve data points that are farther from at least one query point than all the other data points. Such queries are helpful in identifying spatial locations far away from undesirable locations, e.g., unpleasant facilities or business competitors. To solve this problem, we first study a baseline Algorithm TFSS and propose an efficient progressive Algorithm BBFS, which significantly outperforms TFSS by exploiting spatial locality. We also develop an efficient approximation algorithm to trade accuracy for efficiency. We validate our proposed algorithms using extensive evaluations over synthetic and real datasets. 相似文献
4.
Graphs are widely used for modeling complicated data such as social networks, chemical compounds, protein interactions and semantic web. To effectively understand and utilize any collection of graphs, a graph database that efficiently supports elementary querying mechanisms is crucially required. For example, Subgraph and Supergraph queries are important types of graph queries which have many applications in practice. A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems. Relational database management systems (RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data. RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness, proper join ordering and powerful indexing mechanisms. In this article, we study the problem of indexing and querying graph databases using the relational infrastructure. We present a purely relational framework for processing graph queries. This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database. We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries. Finally, we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques. 相似文献
5.
As stream data is being more frequently collected and analyzed, stream processing systems are faced with more design challenges. One challenge is to perform continuous window aggregation, which involves intensive computation. When there are a large number of aggregation queries, the system may suffer from scalability problems. The queries are usually similar and only differ in window specifications. In this paper, we propose collaborative aggregation which promotes aggregate sharing among the windows so that repeated aggregate operations can be avoided. Different from the previous approaches in which the aggregate sharing is restricted by the window pace, we generalize the aggregation over multiple values as a series of reductions. Therefore, the results generated by each reduction step can be shared. The sharing process is formalized in the feed semantics and we present the compose-and-declare framework to determine the data sharing logic at a very low cost. Experimental results show that our approach offers an order of magnitude performance improvement to the state-of-the-art results and has a small memory footprint. 相似文献
6.
一种基于HBase的高效空间关键字查询策略 总被引:2,自引:0,他引:2
随着移动定位技术的发展以及智能手机的普及,互联网中空间文本对象的数量正在急速增长,如何在规模庞大且动态增长的空间文本对象中进行高效的空间关键字查询成为了许多空间关键字查询应用所关心的问题.现有的方法通常利用基于R树和倒排索引的混合索引结构来处理空间关键字查询,然而,面对数量巨大而且不断增长的空间文本对象,这些方法往往难以为空间关键字查询的高效性和扩展性提供支持.对此,提出一种基于HBase的空间文本数据索引结构SK-HBase.SK-HBase以HBase作为数据存储,通过有效的数据分配策略对空间文本对象的空间信息和文本信息同时进行索引.在SK-HBase的基础上,本文提出了两种空间关键字查询算法,以保证不同空间范围下的空间关键字查询的高效性和可扩展性.实验证明,我们的方法能够在海量数据下进行高效的空间关键字查询并具有良好的可扩展性. 相似文献
7.
TwigStar——快速处理XML Twig查询中含通配符*的算法 总被引:1,自引:0,他引:1
XMLTwig查询可以表示为一棵带标签结点的查询树,它支持对XML文档进行带有复杂谓词的结构或内容查询.整体(holistic)Twig查询算法已经被公认为XML查询处理的核心算法.很多学者提出了大量基于整体处理的XML Twig查询算法.但是目前已有的算法都只适合于Twig查询中不包含通配符*的情况.而当Twig查询中包含通配符*时,一种简单而直接处理的方法就是,把被查询文档中的所有结点元素都读到内存,把这些元素都看做通配符*所对应的元素,然后按照已有的算法进行查询处理.显然这种方法是不合理的,它会增加大量I/O开销.因此提出了一种有效地支持通配符*的查询处理算法.通过建立索引,它可以很好地处理含通配符*的查询,从而可以避免不必要的I/O开销.最后通过实验证明,算法要明显好于已有的算法. 相似文献
8.
Continuous Skyline Queries for Moving Objects 总被引:3,自引:0,他引:3
Zhiyong Huang Hua Lu Beng Chin Ooi Tung A.K.H. 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(12):1645-1658
The literature on skyline algorithms has so far dealt mainly with queries of static query points over static data sets. With the increasing number of mobile service applications and users, however, the need for continuous skyline query processing has become more pressing. A continuous skyline query involves not only static dimensions, but also the dynamic one. In this paper, we examine the spatiotemporal coherence of the problem and propose a continuous skyline query processing strategy for moving query points. First, we distinguish the data points that are permanently in the skyline and use them to derive a search bound. Second, we investigate the connection between the spatial positions of data points and their dominance relationship, which provides an indication of where to find changes in the skyline and how to maintain the skyline continuously. Based on the analysis, we propose a kinetic-based data structure and an efficient skyline query processing algorithm. We concisely analyze the space and time costs of the proposed method and conduct an extensive experiment to evaluate the method. To the best of our knowledge, this is the first work on continuous skyline query processing 相似文献
9.
10.
数据流上连续动态skyline查询研究 总被引:2,自引:0,他引:2
skyline查询能够从大规模数据集上计算满足多个标准的最优点.数据流上的skyline计算是数据流上最基本的查询操作之一,对于很多在线应用具有非常重要的意义,尤其在移动计算环境、网络监控、通信网络以及传感器网络等领域.不同于大部分传统的skyline研究,主要研究数据流上约束skvline和动态skyline计算问题.采用网格索引存储元组,提出了GBDS算法用于计算和维护动态skvline.通过为每个查询定义影响区域,使得在元组到达和失效时需要处理的元组个数最小化.理论分析和实验结果证明了提出方法的有效性. 相似文献
11.
给定一组多维的点,轮廓(skyline)查询能返回在所有维度上均不被其他点所支配(dominate)的点集.目前,对于集中式环境下的静态数据,BBS(分支界限轮廓)是一种最为有效的轮廓查询算法.然而,它却存在内存空间耗费大的不足.鉴于此,提出了一种基于最佳优先最近邻居查找的轮廓查询算法,称为IBBS(改进的分支界限轮廓).它既有最佳的I/O代价和较低的CPU开销,又有最少的内存空间消耗.其核心是利用一系列有效的剪枝策略丢弃所有不必要的记录.大量的实验证实IBBS优于BBS,尤其是在低维空间. 相似文献
12.
In this paper, we consider skyline queries in a mobile and distributed environment, where data objects are distributed in some sites (database servers) which are interconnected through a high-speed wired network, and queries are issued by mobile units (laptop, cell phone, etc.) which access the data objects of database servers by wireless channels. The inherent properties of mobile computing environment such as mobility, limited wireless bandwidth, frequent disconnection, make skyline queries more complicat... 相似文献
13.
Efficient processing of continual range queries is important in providing location-aware mobile services. In this paper, we study a new main memory-based approach to indexing continual range queries to support location-aware mobile services. The query index is used to quickly answer the following question continually: “Which moving objects are currently located inside the boundaries of individual queries?” We present a covering tile-based (COVET) query index. A set of virtual tiles are predefined, each with a unique ID. One or more of the virtual tiles are used to strictly cover the region defined by an individual range query. The query ID is inserted into the ID lists associated with the covering tiles. These covering tiles touch each other only at the edges. A COVET index maintains a mapping between a covering tile and all the queries that contain that tile. For any object position, search is conducted indirectly via the covering tiles. More importantly, a COVET-based query index allows query evaluation to take advantage of incremental changes in object locations. Computation can be saved for those objects that have not moved outside the boundaries of covering tiles. Simulations are conducted to evaluate the effectiveness of the COVET index and compare virtual tiles of different shapes and sizes. 相似文献
14.
Top-k查询由于其广泛的应用而倍受欢迎.不确定数据库中通常考虑的两条生成规则是:独立和互斥,一个x-tuple是由一些互斥的元组组成的,构成一个x-tuple的各个元组称为该x-tuple的可选元组.U-kRanks查询考虑x-tuple中每个可选元组排在前k的概率,并返回最可能排在前k的k个元组.已有的Top-k语义都没有将x-tuple作为一个整体,因此,定义了一种新的Top-k查询语义,不确定x-kRanks查询 (U-x-kRanks),该Top-k语义返回最可能排在前k的k个x-tuple而非元组.新语义考虑x-tuple中的每个可选元组位于前k的概率,并将之汇集,得到整个x-tuple位于前k的概率.提出了一种基于动态规划的有效算法处理U-x-kRanks 查询,在最小的搜索空间内完成查询处理过程.不同数据集合上的综合实验显示,所提出的算法是高效的. 相似文献
15.
Yi Ke Li Feifei Kollios George Srivastava Divesh 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(12):1669-1682
This work introduces new algorithms for processing top-$k$ queries in uncertain databases, under the generally adopted model of x-relations. An x-relation consists of a number of x-tuples, and each x-tuple randomly instantiates into one tuple from one or more alternatives. Soliman et al.~cite{soliman07} first introduced the problem of top-$k$ query processing in uncertain databases and proposed various algorithms to answer such queries. Under the x-relation model, our new results significantly improve the state of the art, in terms of both running time and memory usage. In the single-alternative case, our new algorithms are 2 to 3 orders of magnitude faster than the previous algorithms. In the multi-alternative case, the improvement is even more dramatic: while the previous algorithms have exponential complexity in both time and space, our algorithms run in near linear or low polynomial time. Our study covers both types of top-$k$ queries proposed in cite{soliman07}. We provide both the theoretical analysis and an extensive experimental evaluation to demonstrate the superiority of the new approaches over existing solutions. 相似文献
16.
Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications. 相似文献
17.
随着数据规模的日益庞大,在大规模数据集中帮助用户定位出数据量可控的代表性信息显得越发重要。虽然Top-k Skyline查询能够找到数据集中前k个最具代表性的信息,在获取代表性信息的同时又控制了结果规模,满足了上述要求,但是现有的Top-k Skyline查询在面对大规模数据集时效率较低,并不适用于大规模数据集。为了解决这个问题,将Top-k Skyline查询与并行化处理相结合,提出了一种面向大规模数据集的并行化Top-k Skyline查询算法PTKS(parallel Top-k Skyline),通过充分利用分布式资源,将原有查询进行有效的并行化处理,同时设计了基于用户偏好的用于缩减结果数据量的筛选规则,满足用户需求。在真实数据集上进行了相关实验,并与现有方法进行了对比,结果表明PTKS在大规模数据集上的查询效率更具有优势,能很好地适用于大规模数据集。 相似文献
18.
不确定移动对象概率Skyline集的查询更新 总被引:1,自引:0,他引:1
Skyline查询的研究已从传统的静态Skyline操作延伸到动态的、不确定数据集上的Skyline查询和计算上。研究了移动环境下,查询点位置固定、目标点处于运动状态并且位置不确定情况下的连续概率Skyline计算问题。这个过程中,移动对象与查询对象之间的距离随时间不断变化。移动对象由于其运动状态导致位置无法精确定位,因此移动对象之间的支配关系只能采用概率形式表示,且随时间不断变化。给出了移动对象间的支配概率的定义,以及移动对象Skyline概率的定义,并定义了触发事件来记录对象支配概率发生变化的时刻,实现概率Skyline计算的连续跟踪和动态更新。提出了基于事件触发的连续概率Skyline查询算法(event triggered continuous probabilistic Skyline query for uncertain moving object,U-ECPS),对移动环境下的Skyline集进行连续查询和更新。大量的实验结果验证了U-ECPS算法的有效性。 相似文献
19.
A critical problem for managers of temporal information is the treatment of assertions and of complex types of queries because in many cases the treatment could involve reasoning on the whole knowledge base of temporal constraints. We propose an efficient approach to this problem. First, we show how different types of queries can be answered (in a complete way) in a time polynomial in the dimension of the query and independently of the dimension of the knowledge base. Second, we provide an efficient (and complete) procedure to deal with sessions of interleaved assertions and queries to the knowledge base. We provide both analytical and experimental evaluations of our approach, and we discuss some application areas. 相似文献
20.
针对连续多范围查询处理,结合多核多线程技术和大容量内存技术,通过将移动对象和查询放在内存中处理,提出了一种基于多线程的连续多范围查询处理框架.该框架基于多核处理器平台采用多线程技术周期性地处理查询和移动对象的更新,并周期性地计算多范围查询的结果.提出了基于移动对象数据均匀划分的多线程连续多范围查询处理算法,该算法以为查询建立的格网索引为基础.给出了该索引的构建思想和更新算法.考虑到基于内存的算法受Cache访问性能影响,提出了基于空间填充曲线的移动对象存储优化方法.实验证明,基于多核平台的多线程处理能够高效地处理连续多范围查询,同时通过移动对象存储优化能够提高算法运行中Cache访问命中率,进而提高算法性能. 相似文献