期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analysis and evaluation of V*-kNN: an efficient algorithm for moving kNN queries

Sarana Nutanong Rui Zhang Egemen Tanin Lars Kulik 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(3):307-332

The moving k nearest neighbor (MkNN) query continuously finds the k nearest neighbors of a moving query point. MkNN queries can be efficiently processed through the use of safe regions. In general, a safe region is a region within which the query point can move without changing the query answer. This paper presents an incremental safe-region-based technique for answering MkNN queries, called the V*-Diagram, as well as analysis and evaluation of its associated algorithm, V*-kNN. Traditional safe-region approaches compute a safe region based on the data objects but independent of the query location. Our approach exploits the knowledge of the query location and the boundary of the search space in addition to the data objects. As a result, V*-kNN has much smaller I/O and computation costs than existing methods. We further provide cost models to estimate the number of data accesses for V*-kNN and a competitive technique, RIS-kNN. The V*-Diagram and V*-kNN are also applicable to the domain of spatial networks and we present algorithms to construct a spatial-network V*-Diagram. Our experimental results show that V*-kNN significantly outperforms the competitive technique. The results also verify the accuracy of the cost models. 相似文献

2.

The direction-constrained <Emphasis Type="Italic">k</Emphasis> nearest neighbor query

Min-Joong Lee Dong-Wan Choi SangYeon Kim Ha-Myung Park Sunghee Choi Chin-Wan Chung 《GeoInformatica》2016,20(3):471-502

Finding k nearest neighbor objects in spatial databases is a fundamental problem in many geospatial systems and the direction is one of the key features of a spatial object. Moreover, the recent tremendous growth of sensor technologies in mobile devices produces an enormous amount of spatio-directional (i.e., spatially and directionally encoded) objects such as photos. Therefore, an efficient and proper utilization of the direction feature is a new challenge. Inspired by this issue and the traditional k nearest neighbor search problem, we devise a new type of query, called the direction-constrained k nearest neighbor (DCkNN) query. The DCkNN query finds k nearest neighbors from the location of the query such that the direction of each neighbor is in a certain range from the direction of the query. We develop a new index structure called MULTI, to efficiently answer the DCkNN query with two novel index access algorithms based on the cost analysis. Furthermore, our problem and solution can be generalized to deal with spatio-circulant dimensional (such as a direction and circulant periods of time such as an hour, a day, and a week) objects. Experimental results show that our proposed index structure and access algorithms outperform two adapted algorithms from existing kNN algorithms. 相似文献

3.

Monochromatic and bichromatic ranked reverse boolean spatial keyword nearest neighbors search

Pengpeng?Zhao Email author Hailin?Fang Victor?S.?Sheng Zhixu?Li Jiajie?Xu Jian?Wu Zhiming?Cui 《World Wide Web》2017,20(1):39-59

Recently, Reverse k Nearest Neighbors (RkNN) queries, returning every answer for which the query is one of its k nearest neighbors, have been extensively studied on the database research community. But the RkNN query cannot retrieve spatio-textual objects which are described by their spatial location and a set of keywords. Therefore, researchers proposed a RSTkNN query to find these objects, taking both spatial and textual similarity into consideration. However, the RSTkNN query cannot control the size of answer set and to be sorted according to the degree of influence on the query. In this paper, we propose a new problem Ranked Reverse Boolean Spatial Keyword Nearest Neighbors query called Ranked-RBSKNN query, which considers both spatial similarity and textual relevance, and returns t answers with most degree of influence. We propose a separate index and a hybrid index to process such queries efficiently. Experimental results on different real-world and synthetic datasets show that our approaches achieve better performance. 相似文献

4.

Multiple <Emphasis Type="Italic">k</Emphasis> nearest neighbor search

Yu-Chi Chung I-Fang Su Chiang Lee Pei-Chi Liu 《World Wide Web》2017,20(2):371-398

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time. 相似文献

5.

Answering why-not and why questions on reverse top-<Emphasis Type="Italic">k</Emphasis> queries

Qing Liu Yunjun Gao Gang Chen Baihua Zheng Linlin Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(6):867-892

Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings. 相似文献

6.

Approximate Continuous Top-<Emphasis Type="Italic">k</Emphasis> Query over Sliding Window

下载免费PDF全文

Rui Zhu Bin Wang Shi-Ying Luo Xiao-Chun Yang Guo-Ren Wang 《计算机科学技术学报》2017,32(1):93-109

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 ? δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate. 相似文献

7.

Scalable and efficient processing of top-<Emphasis Type="Italic">k</Emphasis> multiple-type integrated queries

Hyuk-Yoon?Kwon Kyu-Young?Whang Email author 《World Wide Web》2016,19(6):1051-1075

In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various data types such as spatial, textual, and relational data types can be used for the top-k MULTI query. The top-k MULTI query distinguishes itself from the traditional top-k query in that the component scores to calculate final scores are determined dependent of the query. Hence, each component score is calculated only when the query is given for each data type rather than being calculated apriori as in the top-k query. As a representative instance, the traditional top-k spatial keyword query is an instance of the top-k MULTI query. It deals with the spatial data type and text data type and finds the information based on spatial proximity and textual relevance between the query and the object, which is determined only when the query is given. In this paper, we first define the top-k MULTI query formally and define a new specific instance for the top-k MULTI query, the top-k spatial-keyword-relational(SKR) query, by integrating the relational data type into the traditional top-k spatial keyword query. Then, we investigate the processing approaches for the top-k MULTI query. We discuss the scalability of those approaches as new data types are integrated. We also devise the processing methods for the top-k SKR query. Finally, through extensive experiments on the top-k SKR query using real and synthetic data sets, we compare efficiency of the methods in terms of the query performance and storage. 相似文献

8.

Continuous visible nearest neighbor query processing in spatial databases 总被引：1，自引：0，他引：1

Yunjun Gao Baihua Zheng Gencai Chen Qing Li Xiaofa Guo 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):371-396

In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms. 相似文献

9.

Uncertain top-<Emphasis Type="Italic">k</Emphasis> query processing in distributed environments

Xite?Wang Email author Derong?Shen Ge?Yu 《Distributed and Parallel Databases》2016,34(4):567-589

The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments. 相似文献

10.

Know your customer: computing <Emphasis Type="Italic">k</Emphasis>-most promising products for targeted marketing

Md. Saiful Islam Chengfei Liu 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(4):545-570

The advancement of World Wide Web has revolutionized the way the manufacturers can do business. The manufacturers can collect customer preferences for products and product features from their sales and other product-related Web sites to enter and sustain in the global market. For example, the manufactures can make intelligent use of these customer preference data to decide on which products should be selected for targeted marketing. However, the selected products must attract as many customers as possible to increase the possibility of selling more than their respective competitors. This paper addresses this kind of product selection problem. That is, given a database of existing products P from the competitors, a set of company’s own products Q, a dataset C of customer preferences and a positive integer k, we want to find k-most promising products (k-MPP) from Q with maximum expected number of total customers for targeted marketing. We model k-MPP query and propose an algorithmic framework for processing such query and its variants. Our framework utilizes grid-based data partitioning scheme and parallel computing techniques to realize k-MPP query. The effectiveness and efficiency of the framework are demonstrated by conducting extensive experiments with real and synthetic datasets. 相似文献

11.

Countering overlapping rectangle privacy attack for moving kNN queries

Tanzima Hashem Lars Kulik Rui Zhang 《Information Systems》2013

An important class of LBSs is supported by the moving k nearest neighbor (MkNN) query, which continuously returns the k nearest data objects for a moving user. For example, a tourist may want to observe the five nearest restaurants continuously while exploring a city so that she can drop in to one of them anytime. Using this kind of services requires the user to disclose her location continuously and therefore may cause privacy leaks derived from the user's locations. A common approach to protecting a user's location privacy is the use of imprecise locations (e.g., regions) instead of exact positions when requesting LBSs. However, simply updating a user's imprecise location to a location-based service provider (LSP) cannot ensure a user's privacy for an MkNN query: continuous disclosure of regions enable LSPs to refine more precise location of the user. We formulate this type of attack to a user's location privacy that arises from overlapping consecutive regions, and provide the first solution to counter this attack. Specifically, we develop algorithms which can process an MkNN query while protecting the user's privacy from the above attack. Extensive experiments validate the effectiveness of our privacy protection technique and the efficiency of our algorithm. 相似文献

12.

Continuous monitoring of range spatial keyword query over moving objects

Chaluka Salgado Muhammad Aamir Cheema Mohammed Eunus Ali 《World Wide Web》2018,21(3):687-712

In this paper, we propose an efficient solution for processing continuous range spatial keyword queries over moving spatio-textual objects (namely, CRSK-mo queries). Major challenges in efficient processing of CRSK-mo queries are as follows: (i) the query range is determined based on both spatial proximity and textual similarity; thus a straightforward spatial proximity based pruning of the search space is not applicable as any object far from a query location with a high textual similarity score can still be the answer (and vice versa), (ii) frequent location updates may invalidate a query result, and thus require frequent re-computing of the result set for any object updates. To address these challenges, the key idea of our approach is to exploit the spatial and textual upper bounds between queries and objects to form safe zones (at the client-side) and buffer regions (at the server-side), and then use these bounds to quickly prune objects and queries through smart in-memory data structures. We conduct extensive experiments with a synthetic dataset that verify the effectiveness and efficiency of our proposed algorithm. 相似文献

13.

Top-<Emphasis Type="Italic">k</Emphasis> query processing in mobile-P2P networks using economic incentive schemes

Nilesh Padhariya Anirban Mondal Sanjay Kumar Madria 《Peer-to-Peer Networking and Applications》2016,9(4):731-751

This work proposes the E-Top system for the efficient processing of top-k queries in mobile ad hoc peer to peer (M-P2P) networks using economic incentive schemes. In E-Top, brokers facilitate top-k query processing in lieu of a commission. E-Top issues economic rewards to the mobile peers, which send relevant data items (i.e., those that contribute to the top-k query result), and penalizes peers otherwise, thereby optimizing the communication traffic. Peers use the payoffs (rewards/penalties) as a means of feedback to re-evaluate the scores of their items for re-ranking purposes. The main contributions of E-Top are three-fold. First, it proposes two economic incentive schemes, namely ETK and ETK+, in which peers act individually towards top-k query processing. Second, it extends ETK and ETK+ to propose a peer group-based economic incentive scheme ETG. Third, our performance evaluation shows that our schemes are indeed effective in improving the performance of top-k queries in terms of query response times and accuracy at reasonable communication traffic cost. 相似文献

14.

Exact and approximate flexible aggregate similarity search

Feifei?Li Ke?Yi Yufei?Tao Bin?Yao Email author Yang?Li Dong?Xie Min?Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):317-338

Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support \(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor ( Fann ) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality. 相似文献

15.

The <Emphasis Type="Italic">B</Emphasis><Superscript><Emphasis Type="Italic">dual</Emphasis></Superscript>-Tree: indexing moving objects by space filling curves in the dual space

Man Lung Yiu Yufei Tao Nikos Mamoulis 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(3):379-400

Existing spatiotemporal indexes suffer from either large update cost or poor query performance, except for the B ^x-tree (the state-of-the-art), which consists of multiple B ⁺-trees indexing the 1D values transformed from the (multi-dimensional) moving objects based on a space filling curve (Hilbert, in particular). This curve, however, does not consider object velocities, and as a result, query processing with a B ^x-tree retrieves a large number of false hits, which seriously compromises its efficiency. It is natural to wonder “can we obtain better performance by capturing also the velocity information, using a Hilbert curve of a higher dimensionality?”. This paper provides a positive answer by developing the B ^dual-tree, a novel spatiotemporal access method leveraging pure relational methodology. We show, with theoretical evidence, that the B ^dual-tree indeed outperforms the B ^x-tree in most circum- stances. Furthermore, our technique can effectively answer progressive spatiotemporal queries, which are poorly supported by B ^x-trees. 相似文献

16.

Kai?Zheng Bolong?Zheng Email author Jiajie?Xu Guanfeng?Liu An?Liu Zhixu?Li 《World Wide Web》2017,20(4):749-773

The proliferation of GPS-enabled smart mobile devices enables us to collect a large-scale trajectories of moving objects with GPS tags. While the raw trajectories that only consists of positional information have been studied extensively, many recent works have been focusing on enriching the raw trajectories with semantic knowledge. The resulting data, called activity trajectories, embed the information about behaviors of the moving objects and support a variety of applications for better quality of services. In this paper, we propose a Top-k Spatial Keyword (TkSK) query for activity trajectories, with the objective to find a set of trajectories that are not only close geographically but also meet the requirements of the query semantically. Such kind of query can deliver more informative results than existing spatial keyword queries for static objects, since activity trajectories are able to reflect the popularity of user activities and reveal preferable combinations of facilities. However, it is a challenging task to answer this query efficiently due to the inherent difficulties in indexing trajectories as well as the new complexity introduced by the textual dimension. In this work, we provide a comprehensive solution, including the novel similarity function, hybrid indexing structure, efficient search algorithm and further optimizations. Extensive empirical studies on real trajectory set have demonstrated the scalability of our proposed solution. 相似文献

17.

Locally adaptive <Emphasis Type="Italic">k</Emphasis> parameter selection for nearest neighbor classifier: one nearest cluster

Faruk Bulut Mehmet Fatih Amasyali 《Pattern Analysis & Applications》2017,20(2):415-425

The k nearest neighbors (k-NN) classification technique has a worldly wide fame due to its simplicity, effectiveness, and robustness. As a lazy learner, k-NN is a versatile algorithm and is used in many fields. In this classifier, the k parameter is generally chosen by the user, and the optimal k value is found by experiments. The chosen constant k value is used during the whole classification phase. The same k value used for each test sample can decrease the overall prediction performance. The optimal k value for each test sample should vary from others in order to have more accurate predictions. In this study, a dynamic k value selection method for each instance is proposed. This improved classification method employs a simple clustering procedure. In the experiments, more accurate results are found. The reasons of success have also been understood and presented. 相似文献

18.

Top-<Emphasis Type="Italic">k</Emphasis> coupled keyword recommendation for relational keyword queries

Xiangfu Meng Longbing Cao Xiaoyan Zhang Jingyu Shao 《Knowledge and Information Systems》2017,50(3):883-916

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated. 相似文献

19.

<Emphasis Type="Italic">k</Emphasis>-most suitable locations selection

Yu-Chi Chung I-Fang Su Chiang Lee 《GeoInformatica》2018,22(4):661-692

Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently. 相似文献

20.

Diverse nearest neighbors queries using linear skylines

Camila F. Costa Mario A. Nascimento Matthias Schubert 《GeoInformatica》2018,22(4):815-844

k-nearest neighbor (k-NN) queries are well-known and widely used in a plethora of applications. However, in the original definition of k-NN queries there is no concern regarding diversity of the answer set with respect to the user’s interests. For instance, travelers may be looking for touristic sites that are close to where they are, but that would also lead them to see different parts of the city. Likewise, if one is looking for restaurants close by, it may be more interesting to learn about restaurants of different categories or ethnicities which are nonetheless relatively close. The interesting novel aspect of this type of query is that there are two competing criteria to be optimized: closeness and diversity. We propose two approaches that leverage the notion of linear skyline queries in order to find the k diverse nearest neighbors within a radius r from a given query point, or (k, r)-DNNs for short. Our proposed approaches return a relatively small set containing all optimal solutions for any linear combination of the weights a user could give to the two competing criteria, and we consider three different notions of diversity: spatial, categorical and angular. Our experiments, varying a number of parameters and exploring synthetic and real datasets, in both Euclidean space and road networks, respectively, show that our approaches are several orders of magnitude faster than a straightforward approach. 相似文献