期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probabilistic Group Nearest Neighbor Queries in Uncertain Databases

Xiang Lian Lei Chen 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(6):809-824

The importance of query processing over uncertain data has recently arisen due to its wide usage in many real-world applications. In the context of uncertain databases, previous works have studied many query types such as nearest neighbor query, range query, top-k query, skyline query, and similarity join. In this paper, we focus on another important query, namely, probabilistic group nearest neighbor (PGNN) query, in the uncertain database, which also has many applications. Specifically, given a set, Q, of query points, a PGNN query retrieves data objects that minimize the aggregate distance (e.g., sum, min, and max) to query set Q. Due to the inherent uncertainty of data objects, previous techniques to answer group nearest neighbor (GNN) query cannot be directly applied to our PGNN problem. Motivated by this, we propose effective pruning methods, namely, spatial pruning and probabilistic pruning, to reduce the PGNN search space, which can be seamlessly integrated into our PGNN query procedure. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach, in terms of the wall clock time and the speed-up ratio against linear scan. 相似文献

2.

面向存在不确定对象的组最近邻查询方法

陈默贾子熙谷峪于戈《小型微型计算机系统》2012,33(4):684-687

组最近邻查询是空间对象查询领域的一类重要查询,通过该查询可找到距离给定查询点集最近的空间对象.由于图像分辨率或解析度的限制等因素,空间对象的存在不确定性广泛存在于某些涉及图像处理的查询应用中.这些对象位置数据的存在不确定性会对组最近邻查询结果产生影响.本文给出面向存在不确定对象的概率阈值组最近邻查询定义,设计了高效的查询处理机制,通过剪枝优化等手段提高概率阈值组最近邻查询效率,并进一步提出了高效概率阈值组最近邻查询算法.采用多个真实数据集对概率阈值组最近邻算法进行了实验验证,结果表明所提算法具有良好的查询效率. 相似文献

3.

Visible Reverse k-Nearest Neighbor Query Processing in Spatial Databases

Gao Yunjun Zheng Baihua Chen Gencai Lee Wang-Chien Lee Ken C. K. Li Qing 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(9):1314-1327

Reverse nearest neighbor (RNN) queries have a broad application base such as decision support, profile-based marketing, resource allocation, etc. Previous work on RNN search does not take obstacles into consideration. In the real world, however, there are many physical obstacles (e.g., buildings) and their presence may affect the visibility between objects. In this paper, we introduce a novel variant of RNN queries, namely, visible reverse nearest neighbor (VRNN) search, which considers the impact of obstacles on the visibility of objects. Given a data set P, an obstacle set O, and a query point q in a 2D space, a VRNN query retrieves the points in P that have q as their visible nearest neighbor. We propose an efficient algorithm for VRNN query processing, assuming that P and O are indexed by R-trees. Our techniques do not require any preprocessing and employ half-plane property and visibility check to prune the search space. In addition, we extend our solution to several variations of VRNN queries, including: 1) visible reverse k-nearest neighbor (VRkNN) search, which finds the points in P that have q as one of their k visible nearest neighbors; 2) delta-VRkNN search, which handles VRkNN retrieval with the maximum visible distance delta constraint; and 3) constrained VRkNN (CVRkNN) search, which tackles the VRkNN query with region constraint. Extensive experiments on both real and synthetic data sets have been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings. 相似文献

4.

Reverse Nearest Neighbors Search in Ad Hoc Subspaces 总被引：1，自引：0，他引：1

Man Lung Yiu Nikos Mamoulis 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(3):412-426

Given an object q, modeled by a multidimensional point, a reverse nearest neighbors (RNN) query returns the set of objects in the database that have q as their nearest neighbor. In this paper, we study an interesting generalization of the RNN query, where not all dimensions are considered, but only an ad hoc subset thereof. The rationale is that 1) the dimensionality might be too high for the result of a regular RNN query to be useful, 2) missing values may implicitly define a meaningful subspace for RNN retrieval, and 3) analysts may be interested in the query results only for a set of (ad hoc) problem dimensions (i.e., object attributes). We consider a suitable storage scheme and develop appropriate algorithms for projected RNN queries, without relying on multidimensional indexes. Given the significant cost difference between random and sequential data accesses, our algorithms are based on applying sequential accesses only on the projected atomic values of the data at each dimension, to progressively derive a set of RNN candidates. Whether these candidates are actual RNN results is then validated via an optimized refinement step. In addition, we study variants of the projected RNN problem, including RkNN search, bichromatic RNN, and RNN retrieval for the case where sequential accesses are not possible. Our methods are experimentally evaluated with real and synthetic data 相似文献

5.

Probabilistic inverse ranking queries in uncertain databases

Xiang Lian Lei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(1):107-127

Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG? PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings. 相似文献

6.

Reverse nearest neighbors in large graphs 总被引：3，自引：0，他引：3

Man Lung Yiu Dimitris Papadias Nikos Mamoulis Yufei Tao 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(4):540-553

A reverse nearest neighbor (RNN) query returns the data objects that have a query point as their nearest neighbor (NN). Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we provide a fundamental lemma, which can be used to prune the search space while traversing the graph in search for RNN. Based on it, we develop two RNN methods; an eager algorithm that attempts to prune network nodes as soon as they are visited and a lazy technique that prunes the search space when a data point is discovered. We study retrieval of an arbitrary number k of reverse nearest neighbors, investigate the benefits of materialization, cover several query types, and deal with cases where the queries and the data objects reside on nodes or edges of the graph. The proposed techniques are evaluated in various practical scenarios involving spatial maps, computer networks, and the DBLP coauthorship graph. 相似文献

7.

Uncertain Distance-Based Range Queries over Uncertain Moving Objects 总被引：1，自引：0，他引：1

下载免费PDF全文

Yi-Fei Chen Xiao-Lin Qin Liang Liu 《计算机科学技术学报》2010,25(5):982-998

Distance-based range search is crucial in many real applications. In particular, given a database and a query issuer, a distance-based range search retrieves all the objects in the database whose distances from the query issuer are less than or equal to a given threshold. Often, due to the accuracy of positioning devices, updating protocols or characteristics of applications (for example, location privacy protection), data obtained from real world are imprecise or uncertain. Therefore, existing approaches over exact databases cannot be directly applied to the uncertain scenario. In this paper, we redefine the distance-based range query in the context of uncertain databases, namely the probabilistic uncertain distance-based range (PUDR) queries, which obtain objects with confidence guarantees. We categorize the topological relationships between uncertain objects and uncertain search ranges into six cases and present the probability evaluation in each case. It is verified by experiments that our approach outperform Monte-Carlo method utilized in most existing work in precision and time cost for uniform uncertainty distribution. This approach approximates the probabilities of objects following other practical uncertainty distribution, such as Gaussian distribution with acceptable errors. Since the retrieval of a PUDR query requires accessing all the objects in the databases, which is quite costly, we propose spatial pruning and probabilistic pruning techniques to reduce the search space. Two metrics, false positive rate and false negative rate are introduced to measure the qualities of query results. An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings. 相似文献

8.

Probabilistic nearest neighbor query processing on distributed uncertain data

Daichi Amagata Yuya Sasaki Takahiro Hara Shojiro Nishio 《Distributed and Parallel Databases》2016,34(2):259-287

A nearest neighbor (NN) query, which returns the most similar object to a user-specified query object, plays an important role in a wide range of applications and hence has received considerable attention. In many such applications, e.g., sensor data collection and location-based services, objects are inherently uncertain. Furthermore, due to the ever increasing generation of massive datasets, the importance of distributed databases, which deal with such data objects, has been growing. One emerging challenge is to efficiently process probabilistic NN queries over distributed uncertain databases. The straightforward approach, that each local site forwards its own database to the central server, is communication-expensive, so we have to minimize communication cost for the NN object retrieval. In this paper, we focus on two important queries, namely top-k probable NN queries and probabilistic star queries, and propose efficient algorithms to process them over distributed uncertain databases. Extensive experiments on both real and synthetic data have demonstrated that our algorithms significantly reduce communication cost. 相似文献

9.

Finding the least influenced set in uncertain databases

Xiang Lian Lei Chen Guoren Wang 《Information Systems》2011

Due to the inherent existence of uncertainty in many real-world applications, in this paper, we investigate an important query in uncertain databases, namely probabilistic least influenced set (PLIS) query, which retrieves all the uncertain objects in an uncertain database such that they are the least affected by a given query object with high probabilities. Such a PLIS query is useful in applications such as business planning. We propose and tackle both monochromatic and bichromatic versions (i.e. M-PLIS and B-PLIS, respectively) of the PLIS query. In order to efficiently answer PLIS queries, we present three pruning methods, MINMAX, Regional, and Candidate pruning, which can effectively reduce the PLIS search space. The proposed pruning methods can be seamlessly integrated into efficient query procedures. Moreover, we also study important variants of PLIS query with uncertain query object (i.e. UQ-PLIS). Furthermore, we formulate and tackle the PLIS problem on uncertain moving objects (i.e. UMOD-PLIS). Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches under various settings. 相似文献

10.

Efficient processing of probabilistic group subspace skyline queries in uncertain databases

Xiang Lian Lei Chen 《Information Systems》2013

Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches. 相似文献

11.

Monochromatic and bichromatic reverse top-k group nearest neighbor queries

《Expert systems with applications》2016

The Group Nearest Neighbor (GNN) search is an important approach for expert and intelligent systems, i.e., Geographic Information System (GIS) and Decision Support System (DSS). However, traditional GNN search starts from users’ perspective and selects the locations or objects that users like. Such applications fail to help the managers since they do not provide managerial insights. In this paper, we focus on solving the problem from the managers’ perspective. In particular, we propose a novel GNN query, namely, the reverse top-k group nearest neighbor (RkGNN) query which returns k groups of data objects so that each group has the query object q as their group nearest neighbor (GNN). This query is an important tool for decision support, e.g., location-based service, product data analysis, trip planning, and disaster management because it provides data analysts an intuitive way for finding significant groups of data objects with respect to q. Despite their importance, this kind of queries has not received adequate attention from the research community and it is a challenging task to efficiently answer the RkGNN queries. To this end, we first formalize the reverse top-k group nearest neighbor query in both monochromatic and bichromatic cases, and then propose effective pruning methods, i.e., sorting and threshold pruning, MBR property pruning, and window pruning, to reduce the search space during the RkGNN query processing. Furthermore, we improve the performance by employing the reuse heap technique. As an extension to the RkGNN query, we also study an interesting variant of the RkGNN query, namely a constrained reverse top-k group nearest neighbor (CRkGN) query. Extensive experiments using synthetic and real datasets demonstrate the efficiency and effectiveness of our approaches. 相似文献

12.

Reverse Nearest Neighbor Search in Metric Spaces 总被引：7，自引：0，他引：7

《Knowledge and Data Engineering, IEEE Transactions on》2006,18(9):1239-1252

Given a set {cal D} of objects, a reverse nearest neighbor (RNN) query returns the objects o in {cal D} such that o is closer to a query object q than to any other object in {cal D}, according to a certain similarity metric. The existing RNN solutions are not sufficient because they either 1) rely on precomputed information that is expensive to maintain in the presence of updates or 2) are applicable only when the data consists of "Euclidean objects” and similarity is measured using the L_2 norm. In this paper, we present the first algorithms for efficient RNN search in generic metric spaces. Our techniques require no detailed representations of objects, and can be applied as long as their mutual distances can be computed and the distance metric satisfies the triangle inequality. We confirm the effectiveness of the proposed methods with extensive experiments. 相似文献

13.

Continuous visible k nearest neighbor query on moving objects

《Information Systems》2014

A visible k nearest neighbor (Vk NN) query retrieves k objects that are visible and nearest to the query object, where “visible” means that there is no obstacle between an object and the query object. Existing studies on the Vk NN query have focused on static data objects. In this paper we investigate how to process the query on moving objects continuously. We propose an effective filtering-and-refinement framework for evaluating this type of queries. We exploit spatial proximity and visibility properties between the query object and data objects to prune search space under this framework. A detailed cost analysis and a comprehensive experimental study are conducted on the proposed framework. The results validate the effectiveness of the pruning techniques and verify the efficiency of the proposed framework. The proposed framework outperforms a straightforward solution by an order of magnitude in terms of both communication and computation costs. 相似文献

14.

一种局部相关不确定数据库快照集合上的概率频繁最近邻算法 总被引：2，自引：0，他引：2

苗东菁石胜飞李建中《计算机研究与发展》2011,48(10):1812-1822

局部相关空间不确定数据越来越受到许多实际应用的关注.提出了一种新颖的定义在不确定数据库的多个快照上的概率频繁近邻查询,目的是在多个快照数据上找到以一定概率频繁成为查询点最近邻的那些对象.应用现有的基于传统数据和基于不确定数据上的近邻查询算法直接处理这种查询会产生昂贵的开销.为了很好地解决这一问题,提出了一般的处理框架,... 相似文献

15.

Aggregate nearest neighbor queries in uncertain graphs

Zhang Liu Chaokun Wang Jianmin Wang 《World Wide Web》2014,17(1):161-188

Most recently, uncertain graph data begin attracting significant interests of database research community, because uncertainty is the intrinsic property of the real-world and data are more suitable to be modeled as graphs in numbers of applications, e.g. social network analysis, PPI networks in biology, and road network monitoring. Meanwhile, as one of the basic query operators, aggregate nearest neighbor (ANN) query retrieves a data entity whose aggregate distance, e.g. sum, max, to the given query data entities is smaller than those of other data entities in a database. ANN query on both certain graph data and high dimensional data has been well studied by previous work. However, existing ANN query processing approaches cannot handle the situation of uncertain graphs, because topological structures of an uncertain graph may vary in different possible worlds. Motivated by this, we propose the aggregate nearest neighbor query in uncertain graphs (UG-ANN) in this paper. First of all, we give the formal definition of UG-ANN query and the basic UG-ANN query algorithm. After that, to improve the efficiency of UG-ANN query processing, we develop two kinds of pruning approaches, i.e. structural pruning and instance pruning. The structural pruning takes advantages the monotonicity of the aggregate distance to derive the upper and lower bounds of the aggregate distance for reducing the graph size. Whereas, the instance pruning decreases the number of possible worlds to be checked in the searching tree. Comprehensive experimental results on real-world data sets demonstrate that the proposed method significantly improves the efficiency of the UG-ANN query processing. 相似文献

16.

路网中双色数据集上连续反向k近邻查询处理的研究 总被引：2，自引：2，他引：0

李艳红李国徽杜小坤《计算机科学》2012,39(11):131-136

近年来,反向最近邻查询(RNN)算法研究得到了普遍的关注,成为了数据库领域的一个研究热点。欧氏空间中提出了较多的高效算法,而路网中的反向最近邻处理方面所做的工作不够,有关这方面的成果较少。路网中查询点和数据对象之间以及不同数据对象之间的距离受到路网连通性的影响,欧氏空间中的反向最近部方法在路网中不适用。反向最近部查询有两种类型:单色反向最近部查询(Monochromatic RNN, MRNN)和双色反向最近部查询(13i- chromatic RNN,13RNN)。到目前为止,仍然没有有效的算法来处理路网中双色数据集上的连续反向k近部查询。因此,研究路网中双色数据集上连续反向k近部查询是很有意义的。相似文献

17.

Spatial query processing for fuzzy objects

Kai Zheng Xiaofang Zhou Pui Cheong Fung Kexin Xie 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(5):729-751

Range and nearest neighbor queries are the most common types of spatial queries, which have been investigated extensively in the last decades due to its broad range of applications. In this paper, we study this problem in the context of fuzzy objects that have indeterministic boundaries. Fuzzy objects play an important role in many areas, such as biomedical image databases and GIS communities. Existing research on fuzzy objects mainly focuses on modeling basic fuzzy object types and operations, leaving the processing of more advanced queries largely untouched. In this paper, we propose two new kinds of spatial queries for fuzzy objects, namely single threshold query and continuous threshold query, to determine the query results which qualify at a certain probability threshold and within a probability interval, respectively. For efficient single threshold query processing, we optimize the classical R-tree-based search algorithm by deriving more accurate approximations for the distance function between fuzzy objects and the query object. To enhance the performance of continuous threshold queries, effective pruning rules are developed to reduce the search space and speed up the candidate refinement process. The efficiency of our proposed algorithms as well as the optimization techniques is verified with an extensive set of experiments using both synthetic and real datasets. 相似文献

18.

Shooting top-<Emphasis Type="Italic">k</Emphasis> stars in uncertain databases

Xiang Lian Lei Chen 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(6):819-840

Query processing in the uncertain database has played an important role in many real-world applications due to the wide existence of uncertain data. Although many previous techniques can correctly handle precise data, they are not directly applicable to the uncertain scenario. In this article, we investigate and propose a novel query, namely probabilistic top-k star (PTkS) query, which aims to retrieve k objects in an uncertain database that are “closest” to a static/dynamic query point, considering both distance and probability aspects. In order to efficiently answer PTkS queries with a static/moving query point, we propose effective pruning methods to reduce the PTkS search space, which can be seamlessly integrated into an efficient query procedure. Finally, extensive experiments have demonstrated the efficiency and effectiveness of our proposed PTkS approaches on both real and synthetic data sets, under various parameter settings. 相似文献

19.

Efficient Metric All-k-Nearest-Neighbor Search on Datasets Without Any Index

下载免费PDF全文

Hai-Da Zhang Zhi-Hao Xing Lu Chen Yun-Jun Gao 《计算机科学技术学报》2016,31(6):1194-1211

An all-k-nearest-neighbor (AkNN) query finds k nearest neighbors for each query object. This problem arises naturally in many areas, such as GIS (geographic information system), multimedia retrieval, and recommender systems. To support various data types and flexible distance metrics involved in real applications, we study AkNN retrieval in metric spaces, namely, metric AkNN (MAkNN) search. Consider that the underlying indexes on the query set and the object set may not exist, which is natural in many scenarios. For example, the query set and the object set could be the results of other queries, and thus, the underlying indexes cannot be built in advance. To support MAkNN search on datasets without any underlying index, we propose an efficient disk-based algorithm, termed as Partition-Based MAkNN Algorithm (PMA), which follows a partition-search framework and employs a series of pruning rules for accelerating the search. In addition, we extend our techniques to tackle an interesting variant of MAkNN queries, i.e., metric self-AkNN (MSAkNN) search, where the query set is identical to the object set. Extensive experiments using both real and synthetic datasets demonstrate the effectiveness of our pruning rules and the efficiency of the proposed algorithms, compared with state-of-the-art MAkNN and MSAkNN algorithms. 相似文献

20.

不确定性对象的反向最近邻查询

下载免费PDF全文

王淼郝忠孝《计算机工程》2010,36(10):47-49

多数不确定性对象的反向近邻查询不能明确回答某个不确定性对象是否为查询对象的反向最近邻,针对该问题,提出概率反向最近邻查询的概念,设计不确定性对象的概率反向最近邻查询的索引结构,给出一种基于该结构的不确定性对象的反向最近邻查询算法。相似文献