期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Spatial query processing for fuzzy objects

Kai Zheng Xiaofang Zhou Pui Cheong Fung Kexin Xie 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(5):729-751

Range and nearest neighbor queries are the most common types of spatial queries, which have been investigated extensively in the last decades due to its broad range of applications. In this paper, we study this problem in the context of fuzzy objects that have indeterministic boundaries. Fuzzy objects play an important role in many areas, such as biomedical image databases and GIS communities. Existing research on fuzzy objects mainly focuses on modeling basic fuzzy object types and operations, leaving the processing of more advanced queries largely untouched. In this paper, we propose two new kinds of spatial queries for fuzzy objects, namely single threshold query and continuous threshold query, to determine the query results which qualify at a certain probability threshold and within a probability interval, respectively. For efficient single threshold query processing, we optimize the classical R-tree-based search algorithm by deriving more accurate approximations for the distance function between fuzzy objects and the query object. To enhance the performance of continuous threshold queries, effective pruning rules are developed to reduce the search space and speed up the candidate refinement process. The efficiency of our proposed algorithms as well as the optimization techniques is verified with an extensive set of experiments using both synthetic and real datasets. 相似文献

2.

The partial sequenced route query with traveling rules in road networks 总被引：1，自引：0，他引：1

Haiquan Chen Wei-Shinn Ku Min-Te Sun Roger Zimmermann 《GeoInformatica》2011,15(3):541-569

In modern geographic information systems, route search represents an important class of queries. In route search related applications, users may want to define a number of traveling rules (traveling preferences) when they plan their trips. However, these traveling rules are not considered in most existing techniques. In this paper, we propose a novel spatial query type, the multi-rule partial sequenced route (MRPSR) query, which enables efficient trip planning with user defined traveling rules. The MRPSR query provides a unified framework that subsumes the well-known trip planning query (TPQ) and the optimal sequenced route (OSR) query. The difficulty in answering MRPSR queries lies in how to integrate multiple choices of points-of-interest (POI) with traveling rules when searching for satisfying routes. We prove that MRPSR query is NP-hard and then provide three algorithms by mapping traveling rules to an activity on vertex network. Afterwards, we extend all the proposed algorithms to road networks. By utilizing both real and synthetic POI datasets, we investigate the performance of our algorithms. The results of extensive simulations show that our algorithms are able to answer MRPSR queries effectively and efficiently with underlying road networks. Compared to the Light Optimal Route Discoverer (LORD) based brute-force solution, the response time of our algorithms is significantly reduced while the distances of the computed routes are only slightly longer than the shortest route. 相似文献

3.

QQL: A DB&;IR Query Language

Ingo Schmitt 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(1):39-56

Traditional database query languages are based on set theory and crisp first order logic. However, many applications require retrieval-like queries which return result objects associated with a degree of being relevant to the query. Historically, retrieval systems estimate relevance by exploiting hidden object semantics whereas query processing in database systems relies on matching select-conditions with attribute values. Thus, different mechanisms were developed for database and information retrieval systems. In consequence, there is a lack of support for queries involving both retrieval and database search terms. In this work, we introduce the quantum query language (QQL). Its underlying unifying theory is based on the mathematical formalism of quantum mechanics and quantum logic. Van Rijsbergen already discussed the strong relation between the formalism of quantum mechanics and information retrieval. In this work, we interrelate concepts from database query processing to concepts from quantum mechanics and logic. As result, we obtain a common theory which allows us to incorporate seamlessly retrieval search into traditional database query processing. 相似文献

4.

Probabilistic inverse ranking queries in uncertain databases

Xiang Lian Lei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(1):107-127

Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG? PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings. 相似文献

5.

Efficient Processing of Metric Skyline Queries

Lei Chen Xiang Lian 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(3):351-365

Skyline query is of great importance in many applications, such as multi-criteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points usually assume static data objects in the database (i.e. their attribute vectors are fixed), whereas several recent work focus on skyline queries with dynamic attributes. In this paper, we propose a novel variant of skyline queries, namely metric skyline, whose dynamic attributes are defined in the metric space (i.e. not limited to the Euclidean space). We illustrate an efficient and effective pruning mechanism to answer metric skyline queries through a metric index. Most importantly, we formalize the query performance of the metric skyline query in terms of the pruning power, by a cost model, in light of which we construct an optimized metric index aiming to maximize the pruning power of metric skyline queries. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques as well as the constructed index in answering metric skyline queries. 相似文献

6.

Probabilistic nearest neighbor query processing on distributed uncertain data

Daichi Amagata Yuya Sasaki Takahiro Hara Shojiro Nishio 《Distributed and Parallel Databases》2016,34(2):259-287

A nearest neighbor (NN) query, which returns the most similar object to a user-specified query object, plays an important role in a wide range of applications and hence has received considerable attention. In many such applications, e.g., sensor data collection and location-based services, objects are inherently uncertain. Furthermore, due to the ever increasing generation of massive datasets, the importance of distributed databases, which deal with such data objects, has been growing. One emerging challenge is to efficiently process probabilistic NN queries over distributed uncertain databases. The straightforward approach, that each local site forwards its own database to the central server, is communication-expensive, so we have to minimize communication cost for the NN object retrieval. In this paper, we focus on two important queries, namely top-k probable NN queries and probabilistic star queries, and propose efficient algorithms to process them over distributed uncertain databases. Extensive experiments on both real and synthetic data have demonstrated that our algorithms significantly reduce communication cost. 相似文献

7.

面向存在不确定对象的组最近邻查询方法

陈默贾子熙谷峪于戈《小型微型计算机系统》2012,33(4):684-687

组最近邻查询是空间对象查询领域的一类重要查询,通过该查询可找到距离给定查询点集最近的空间对象.由于图像分辨率或解析度的限制等因素,空间对象的存在不确定性广泛存在于某些涉及图像处理的查询应用中.这些对象位置数据的存在不确定性会对组最近邻查询结果产生影响.本文给出面向存在不确定对象的概率阈值组最近邻查询定义,设计了高效的查询处理机制,通过剪枝优化等手段提高概率阈值组最近邻查询效率,并进一步提出了高效概率阈值组最近邻查询算法.采用多个真实数据集对概率阈值组最近邻算法进行了实验验证,结果表明所提算法具有良好的查询效率. 相似文献

8.

Optimal-Location-Selection Query Processing in Spatial Databases

Gao Yunjun Zheng Baihua Chen Gencai Li Qing 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(8):1162-1177

This paper introduces and solves a novel type of spatial queries, namely, Optimal-Location-Selection (OLS) search, which has many applications in real life. Given a data object set D_A, a target object set D_B, a spatial region R, and a critical distance d_c in a multidimensional space, an OLS query retrieves those target objects in D_B that are outside R but have maximal optimality. Here, the optimality of a target object b in D_B located outside R is defined as the number of the data objects from D_A that are inside R and meanwhile have their distances to b not exceeding d_c. When there is a tie, the accumulated distance from the data objects to b serves as the tie breaker, and the one with smaller distance has the better optimality. In this paper, we present the optimality metric, formalize the OLS query, and propose several algorithms for processing OLS queries efficiently. A comprehensive experimental evaluation has been conducted using both real and synthetic data sets to demonstrate the efficiency and effectiveness of the proposed algorithms. 相似文献

9.

gStore: a graph-based SPARQL query engine

Lei Zou M. Tamer Özsu Lei Chen Xuchuan Shen Ruizhe Huang Dongyan Zhao 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(4):565-590

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions. 相似文献

10.

Visible Reverse k-Nearest Neighbor Query Processing in Spatial Databases

Gao Yunjun Zheng Baihua Chen Gencai Lee Wang-Chien Lee Ken C. K. Li Qing 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(9):1314-1327

Reverse nearest neighbor (RNN) queries have a broad application base such as decision support, profile-based marketing, resource allocation, etc. Previous work on RNN search does not take obstacles into consideration. In the real world, however, there are many physical obstacles (e.g., buildings) and their presence may affect the visibility between objects. In this paper, we introduce a novel variant of RNN queries, namely, visible reverse nearest neighbor (VRNN) search, which considers the impact of obstacles on the visibility of objects. Given a data set P, an obstacle set O, and a query point q in a 2D space, a VRNN query retrieves the points in P that have q as their visible nearest neighbor. We propose an efficient algorithm for VRNN query processing, assuming that P and O are indexed by R-trees. Our techniques do not require any preprocessing and employ half-plane property and visibility check to prune the search space. In addition, we extend our solution to several variations of VRNN queries, including: 1) visible reverse k-nearest neighbor (VRkNN) search, which finds the points in P that have q as one of their k visible nearest neighbors; 2) delta-VRkNN search, which handles VRkNN retrieval with the maximum visible distance delta constraint; and 3) constrained VRkNN (CVRkNN) search, which tackles the VRkNN query with region constraint. Extensive experiments on both real and synthetic data sets have been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings. 相似文献

11.

路网环境中关于模糊组最近邻问题的研究

陈舒《计算机应用研究》2016,33(2)

为了解决路网环境中传统的组最近邻查询无法支持用户不确定搜索的问题,在组最近邻查询的基础上引入了“模糊”因子来描述用户查询的不确定性,并提出了四种不同的算法,其中朴素的全局搜索算法利用了Dijkstra 算法的特性来处理不确定性,多维向量算法和V-Tree 算法在此基础上通过缩小搜索空间进一步优化,最后提出的近似算法在牺牲了一定正确率的前提下进一步提高了查询效率。通过在真实路网数据集上的大量实验,总结归纳了不同算法的优势,并充分验证了各个算法的合理性与实用性。相似文献

12.

Reverse Nearest Neighbors Search in Ad Hoc Subspaces 总被引：1，自引：0，他引：1

Man Lung Yiu Nikos Mamoulis 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(3):412-426

Given an object q, modeled by a multidimensional point, a reverse nearest neighbors (RNN) query returns the set of objects in the database that have q as their nearest neighbor. In this paper, we study an interesting generalization of the RNN query, where not all dimensions are considered, but only an ad hoc subset thereof. The rationale is that 1) the dimensionality might be too high for the result of a regular RNN query to be useful, 2) missing values may implicitly define a meaningful subspace for RNN retrieval, and 3) analysts may be interested in the query results only for a set of (ad hoc) problem dimensions (i.e., object attributes). We consider a suitable storage scheme and develop appropriate algorithms for projected RNN queries, without relying on multidimensional indexes. Given the significant cost difference between random and sequential data accesses, our algorithms are based on applying sequential accesses only on the projected atomic values of the data at each dimension, to progressively derive a set of RNN candidates. Whether these candidates are actual RNN results is then validated via an optimized refinement step. In addition, we study variants of the projected RNN problem, including RkNN search, bichromatic RNN, and RNN retrieval for the case where sequential accesses are not possible. Our methods are experimentally evaluated with real and synthetic data 相似文献

13.

Batch Nearest Neighbor Search for Video Retrieval 总被引：2，自引：0，他引：2

Jie Shao Zi Huang Heng Tao Shen Xiaofang Zhou Ee-Peng Lim Yijun Li 《Multimedia, IEEE Transactions on》2008,10(3):409-420

To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of high- dimensional feature vectors. Typically, given a query video containing m feature vectors, an independent nearest neighbor (NN) search for each feature vector is often first performed. After completing all the NN searches, an overall similarity is then computed, i.e., a single content-based video retrieval usually involves m individual NN searches. Since normally nearby feature vectors in a video are similar, a large number of expensive random disk accesses are expected to repeatedly occur, which crucially affects the overall query performance. Batch nearest neighbor (BNN) search is stated as a batch operation that performs a number of individual NN searches. This paper presents a novel approach towards efficient high-dimensional BNN search called dynamic query ordering (DQO) for advanced optimizations of both I/O and CPU costs. Observing the overlapped candidates (or search space) of a pervious query may help to further reduce the candidate sets of subsequent queries, DQO aims at progressively finding a query order such that the common candidates among queries are fully utilized to maximally reduce the total number of candidates. Modelling the candidate set relationship of queries by a candidate overlapping graph (COG), DQO iteratively selects the next query to be executed based on its estimated pruning power to the rest of queries with the dynamically updated COG. Extensive experiments are conducted on real video datasets and show the significance of our BNN query processing strategy. 相似文献

14.

《Information Systems》2020

Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results. 相似文献

15.

Uncertain Distance-Based Range Queries over Uncertain Moving Objects 总被引：1，自引：0，他引：1

下载免费PDF全文

Yi-Fei Chen Xiao-Lin Qin Liang Liu 《计算机科学技术学报》2010,25(5):982-998

Distance-based range search is crucial in many real applications. In particular, given a database and a query issuer, a distance-based range search retrieves all the objects in the database whose distances from the query issuer are less than or equal to a given threshold. Often, due to the accuracy of positioning devices, updating protocols or characteristics of applications (for example, location privacy protection), data obtained from real world are imprecise or uncertain. Therefore, existing approaches over exact databases cannot be directly applied to the uncertain scenario. In this paper, we redefine the distance-based range query in the context of uncertain databases, namely the probabilistic uncertain distance-based range (PUDR) queries, which obtain objects with confidence guarantees. We categorize the topological relationships between uncertain objects and uncertain search ranges into six cases and present the probability evaluation in each case. It is verified by experiments that our approach outperform Monte-Carlo method utilized in most existing work in precision and time cost for uniform uncertainty distribution. This approach approximates the probabilities of objects following other practical uncertainty distribution, such as Gaussian distribution with acceptable errors. Since the retrieval of a PUDR query requires accessing all the objects in the databases, which is quite costly, we propose spatial pruning and probabilistic pruning techniques to reduce the search space. Two metrics, false positive rate and false negative rate are introduced to measure the qualities of query results. An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings. 相似文献

16.

Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data 总被引：2，自引：0，他引：2

Xiang Lian Lei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):787-808

Reverse nearest neighbor (RNN) search is very crucial in many real applications. In particular, given a database and a query object, an RNN query retrieves all the data objects in the database that have the query object as their nearest neighbors. Often, due to limitation of measurement devices, environmental disturbance, or characteristics of applications (for example, monitoring moving objects), data obtained from the real world are uncertain (imprecise). Therefore, previous approaches proposed for answering an RNN query over exact (precise) database cannot be directly applied to the uncertain scenario. In this paper, we re-define the RNN query in the context of uncertain databases, namely probabilistic reverse nearest neighbor (PRNN) query, which obtains data objects with probabilities of being RNNs greater than or equal to a user-specified threshold. Since the retrieval of a PRNN query requires accessing all the objects in the database, which is quite costly, we also propose an effective pruning method, called geometric pruning (GP), that significantly reduces the PRNN search space yet without introducing any false dismissals. Furthermore, we present an efficient PRNN query procedure that seamlessly integrates our pruning method. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed GP-based PRNN query processing approach, under various experimental settings. 相似文献

17.

No-but-semantic-match: computing semantically matched xml keyword search results

Mehdi?Naseriparsa Email author Md.?Saiful?Islam Chengfei?Liu Irene?Moser 《World Wide Web》2018,21(5):1223-1257

Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-k results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches. 相似文献

18.

Algorithms for Nearest Neighbor Search on Moving Object Trajectories 总被引：2，自引：1，他引：1

Elias Frentzos Kostas Gratsias Nikos Pelekis Yannis Theodoridis 《GeoInformatica》2007,11(2):159-193

Nearest Neighbor (NN) search has been in the core of spatial and spatiotemporal database research during the last decade. The literature on NN query processing algorithms so far deals with either stationary or moving query points over static datasets or future (predicted) locations over a set of continuously moving points. With the increasing number of Mobile Location Services (MLS), the need for effective k-NN query processing over historical trajectory data has become the vehicle for data analysis, thus improving existing or even proposing new services. In this paper, we investigate mechanisms to perform NN search on R-tree-like structures storing historical information about moving object trajectories. The proposed (depth-first and best-first) algorithms vary with respect to the type of the query object (stationary or moving point) as well as the type of the query result (historical continuous or not), thus resulting in four types of NN queries. We also propose novel metrics to support our search ordering and pruning strategies. Using the implementation of the proposed algorithms on two members of the R-tree family for trajectory data (namely, the TB-tree and the 3D-R-tree), we demonstrate their scalability and efficiency through an extensive experimental study using large synthetic and real datasets.

Yannis Theodoridis (Corresponding author)Email: URL: http://dke.cti.gr http://isl.cs.unipi.gr/db

相似文献

19.

Adapting metric indexes for searching in multi-metric spaces

Benjamin Bustos Sebastian Kreft Tomáš Skopal 《Multimedia Tools and Applications》2012,58(3):467-496

An important research issue in multimedia databases is the retrieval of similar objects. For most applications in multimedia databases, an exact search is not meaningful. Thus, much effort has been devoted to develop efficient and effective similarity search techniques. A recent approach that has been shown to improve the effectiveness of similarity search in multimedia databases resorts to the usage of combinations of metrics (i.e., a search on a multi-metric space). In this approach, the desirable contribution (weight) of each metric is chosen at query time. It follows that standard metric indexes cannot be directly used to improve the efficiency of dynamically weighted queries, because they assume that there is only one fixed distance function at indexing and query time. This paper presents a methodology for adapting metric indexes to multi-metric indexes, that is, to support similarity queries with dynamic combinations of metric functions. The adapted indexes are built with a single distance function and store partial distances to estimate the dynamically weighed distances. We present two novel indexes for multimetric space indexing, which are the result of the application of the proposed methodology. 相似文献

20.

Efficient semantic search on DHT overlays

Yingwu Zhu Yiming Hu 《Journal of Parallel and Distributed Computing》2007

Distributed hash tables (DHTs) excel at exact-match lookups, but they do not directly support complex queries such as semantic search that is based on content. In this paper, we propose a novel approach to efficient semantic search on DHT overlays. The basic idea is to place indexes of semantically close files into same peer nodes with high probability by exploiting information retrieval algorithms and locality sensitive hashing. A query for retrieving semantically close files is answered with high recall by consulting only a small number (e.g., 10–20) of nodes that stores the indexes of the files semantically close to the query. Our approach adds only index information to peer nodes, imposing only a small storage overhead. Via detailed simulations, we show that our approach achieves high recall for queries at very low cost, i.e., the number of nodes visited for a query is about 10–20, independent of the overlay size. 相似文献