期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient moving <Emphasis Type="Italic">k</Emphasis> nearest neighbor queries over line segment objects

Yu Gu Hui Zhang Zhigang Wang Ge Yu 《World Wide Web》2016,19(4):653-677

The growing need for location based services motivates the moving k nearest neighbor query (MkNN), which requires to find the k nearest neighbors of a moving query point continuously. In most existing solutions, data objects are abstracted as points. However, lots of real-world data objects, such as roads, rivers or pipelines, should be reasonably modeled as line segments or polyline segments. In this paper, we present LV*-Diagram to handle MkNN queries over line segment data objects. LV*-Diagram dynamically constructs a safe region. The query results remain unchanged if the query point is in the safe region, and hence, the computation cost of the server is greatly reduced. Experimental results show that our approach significantly outperforms the baseline method w.r.t. CPU load, I/O, and communication costs. 相似文献

2.

Monochromatic and bichromatic ranked reverse boolean spatial keyword nearest neighbors search

Pengpeng?Zhao Email author Hailin?Fang Victor?S.?Sheng Zhixu?Li Jiajie?Xu Jian?Wu Zhiming?Cui 《World Wide Web》2017,20(1):39-59

Recently, Reverse k Nearest Neighbors (RkNN) queries, returning every answer for which the query is one of its k nearest neighbors, have been extensively studied on the database research community. But the RkNN query cannot retrieve spatio-textual objects which are described by their spatial location and a set of keywords. Therefore, researchers proposed a RSTkNN query to find these objects, taking both spatial and textual similarity into consideration. However, the RSTkNN query cannot control the size of answer set and to be sorted according to the degree of influence on the query. In this paper, we propose a new problem Ranked Reverse Boolean Spatial Keyword Nearest Neighbors query called Ranked-RBSKNN query, which considers both spatial similarity and textual relevance, and returns t answers with most degree of influence. We propose a separate index and a hybrid index to process such queries efficiently. Experimental results on different real-world and synthetic datasets show that our approaches achieve better performance. 相似文献

3.

The direction-constrained <Emphasis Type="Italic">k</Emphasis> nearest neighbor query

Min-Joong Lee Dong-Wan Choi SangYeon Kim Ha-Myung Park Sunghee Choi Chin-Wan Chung 《GeoInformatica》2016,20(3):471-502

Finding k nearest neighbor objects in spatial databases is a fundamental problem in many geospatial systems and the direction is one of the key features of a spatial object. Moreover, the recent tremendous growth of sensor technologies in mobile devices produces an enormous amount of spatio-directional (i.e., spatially and directionally encoded) objects such as photos. Therefore, an efficient and proper utilization of the direction feature is a new challenge. Inspired by this issue and the traditional k nearest neighbor search problem, we devise a new type of query, called the direction-constrained k nearest neighbor (DCkNN) query. The DCkNN query finds k nearest neighbors from the location of the query such that the direction of each neighbor is in a certain range from the direction of the query. We develop a new index structure called MULTI, to efficiently answer the DCkNN query with two novel index access algorithms based on the cost analysis. Furthermore, our problem and solution can be generalized to deal with spatio-circulant dimensional (such as a direction and circulant periods of time such as an hour, a day, and a week) objects. Experimental results show that our proposed index structure and access algorithms outperform two adapted algorithms from existing kNN algorithms. 相似文献

4.

Uncertain top-<Emphasis Type="Italic">k</Emphasis> query processing in distributed environments

Xite?Wang Email author Derong?Shen Ge?Yu 《Distributed and Parallel Databases》2016,34(4):567-589

The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments. 相似文献

5.

Answering why-not and why questions on reverse top-<Emphasis Type="Italic">k</Emphasis> queries

Qing Liu Yunjun Gao Gang Chen Baihua Zheng Linlin Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(6):867-892

Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings. 相似文献

6.

Top-<Emphasis Type="Italic">k</Emphasis> coupled keyword recommendation for relational keyword queries

Xiangfu Meng Longbing Cao Xiaoyan Zhang Jingyu Shao 《Knowledge and Information Systems》2017,50(3):883-916

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated. 相似文献

7.

Scalable and efficient processing of top-<Emphasis Type="Italic">k</Emphasis> multiple-type integrated queries

Hyuk-Yoon?Kwon Kyu-Young?Whang Email author 《World Wide Web》2016,19(6):1051-1075

In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various data types such as spatial, textual, and relational data types can be used for the top-k MULTI query. The top-k MULTI query distinguishes itself from the traditional top-k query in that the component scores to calculate final scores are determined dependent of the query. Hence, each component score is calculated only when the query is given for each data type rather than being calculated apriori as in the top-k query. As a representative instance, the traditional top-k spatial keyword query is an instance of the top-k MULTI query. It deals with the spatial data type and text data type and finds the information based on spatial proximity and textual relevance between the query and the object, which is determined only when the query is given. In this paper, we first define the top-k MULTI query formally and define a new specific instance for the top-k MULTI query, the top-k spatial-keyword-relational(SKR) query, by integrating the relational data type into the traditional top-k spatial keyword query. Then, we investigate the processing approaches for the top-k MULTI query. We discuss the scalability of those approaches as new data types are integrated. We also devise the processing methods for the top-k SKR query. Finally, through extensive experiments on the top-k SKR query using real and synthetic data sets, we compare efficiency of the methods in terms of the query performance and storage. 相似文献

8.

Approximate Continuous Top-<Emphasis Type="Italic">k</Emphasis> Query over Sliding Window

下载免费PDF全文

Rui Zhu Bin Wang Shi-Ying Luo Xiao-Chun Yang Guo-Ren Wang 《计算机科学技术学报》2017,32(1):93-109

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 ? δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate. 相似文献

9.

An effective suggestion method for keyword search of databases

Hai Huang Zonghai Chen Chengfei Liu He Huang Xiangliang Zhang 《World Wide Web》2017,20(4):729-747

This paper solves the problem of providing high-quality suggestions for user keyword queries over databases. With the assumption that the returned suggestions are independent, existing query suggestion methods over databases score candidate suggestions individually and return the top-k best of them. However, the top-k suggestions have high redundancy with respect to the topics. To provide informative suggestions, the returned k suggestions are expected to be diverse, i.e., maximizing the relevance to the user query and the diversity with respect to topics that the user might be interested in simultaneously. In this paper, an objective function considering both factors is defined for evaluating a suggestion set. We show that maximizing the objective function is a submodular function maximization problem subject to n matroid constraints, which is an NP-hard problem. An greedy approximate algorithm with an approximation ratio O(\(\frac {1}{1+n}\)) is also proposed. Experimental results show that our suggestion outperforms other methods on providing relevant and diverse suggestions. 相似文献

10.

Efficient group-by reverse skyline computation

Zonghui Wang Yunjun Gao Qing Liu Xiaoye Miao Qing Li Chuan Li 《World Wide Web》2016,19(6):1023-1049

The reverse skyline query is very useful in many decision making applications. Given a multi-dimensional dataset P and a query point q, the reverse skyline query returns all the points in P whose dynamic skyline contains q. Although the reverse skyline retrieval has been well-studied in the literature, there is, to the best of our knowledge, no prior work on one of the most intuitive and practical types of reverse skyline queries, namely, group-by reverse skyline (GRS) query, which retrieves the reverse skyline for each group in a specified dataset. We formalize the GRS query including monochromatic and bichromatic versions, and identify its properties, and then propose a set of efficient algorithms for computing the group-by reverse skyline. Extensive experimental evaluation using both real and synthetic datasets demonstrates the performance of our proposed algorithms in terms of effectiveness and efficiency under a variety of experimental settings. 相似文献

11.

Continuous visible nearest neighbor query processing in spatial databases 总被引：1，自引：0，他引：1

Yunjun Gao Baihua Zheng Gencai Chen Qing Li Xiaofa Guo 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):371-396

In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms. 相似文献

12.

Optimizing top-<Emphasis Type="Italic">k</Emphasis> retrieval: submodularity analysis and search strategies

Chaofeng Sha Keqiang Wang Dell Zhang Xiaoling Wang Aoying Zhou 《Frontiers of Computer Science》2016,10(3):477-487

The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1?\(\frac{1}{e}\) for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently. 相似文献

13.

Parallel <Emphasis Type="Italic">LL</Emphasis> parsing

Ladislav Vagner Bořivoj Melichar 《Acta Informatica》2007,44(1):1-21

A deterministic parallel LL parsing algorithm is presented. The algorithm is based on a transformation from a parsing problem to parallel reduction. First, a nondeterministic version of a parallel LL parser is introduced. Then, it is transformed into the deterministic version—the LLP parser. The deterministic LLP(q,k) parser uses two kinds of information to select the next operation — a lookahead string of length up to k symbols and a lookback string of length up to q symbols. Deterministic parsing is available for LLP grammars, a subclass of LL grammars. Since the presented deterministic and nondeterministic parallel parsers are both based on parallel reduction, they are suitable for most parallel architectures. 相似文献

14.

Top-<Emphasis Type="Italic">k</Emphasis> query processing in mobile-P2P networks using economic incentive schemes

Nilesh Padhariya Anirban Mondal Sanjay Kumar Madria 《Peer-to-Peer Networking and Applications》2016,9(4):731-751

This work proposes the E-Top system for the efficient processing of top-k queries in mobile ad hoc peer to peer (M-P2P) networks using economic incentive schemes. In E-Top, brokers facilitate top-k query processing in lieu of a commission. E-Top issues economic rewards to the mobile peers, which send relevant data items (i.e., those that contribute to the top-k query result), and penalizes peers otherwise, thereby optimizing the communication traffic. Peers use the payoffs (rewards/penalties) as a means of feedback to re-evaluate the scores of their items for re-ranking purposes. The main contributions of E-Top are three-fold. First, it proposes two economic incentive schemes, namely ETK and ETK+, in which peers act individually towards top-k query processing. Second, it extends ETK and ETK+ to propose a peer group-based economic incentive scheme ETG. Third, our performance evaluation shows that our schemes are indeed effective in improving the performance of top-k queries in terms of query response times and accuracy at reasonable communication traffic cost. 相似文献

15.

Protecting query privacy with differentially private <Emphasis Type="Italic">k</Emphasis>-anonymity in location-based services

Jinbao?Wang Zhipeng?Cai Yingshu?Li Donghua?Yang Email author Ji?Li Hong?Gao 《Personal and Ubiquitous Computing》2018,22(3):453-469

Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions. 相似文献

16.

Reverse Furthest Neighbors Query in Road Networks

下载免费PDF全文

Xiao-Jun Xu Jin-Song Bao Bin Yao Jing-Yu Zhou Fei-Long Tang Min-Yi Guo Jian-Qiu Xu 《计算机科学技术学报》2017,32(1):155-167

Given a road network G = (V,E), where V (E) denotes the set of vertices(edges) in G, a set of points of interest P and a query point q residing in G, the reverse furthest neighbors (Rfn _R) query in road networks fetches a set of points p ∈ P that take q as their furthest neighbor compared with all points in P ∪ {q}. This is the monochromatic Rfn _R (Mrfn _R) query. Another interesting version of Rfn _R query is the bichromatic reverse furthest neighbor (Brfn _R) query. Given two sets of points P and Q, and a query point q ∈ Q, a Brfn _R query fetches a set of points p ∈ P that take q as their furthest neighbor compared with all points in Q. This paper presents efficient algorithms for both Mrfn _R and Brfn _R queries, which utilize landmarks and partitioning-based techniques. Experiments on real datasets confirm the efficiency and scalability of proposed algorithms. 相似文献

17.

Mining collective knowledge: inferring functional labels from online review for business

Feifan Fan Wayne Xin Zhao Ji-Rong Wen Ge Xu Edward Y. Chang 《Knowledge and Information Systems》2017,50(3):723-750

Uncertain graph has been widely used to represent graph data with inherent uncertainty in structures. Reliability search is a fundamental problem in uncertain graph analytics. This paper investigates on a new problem with broad real-world applications, the top-k reliability search problem on uncertain graphs, that is, finding the k vertices v with the highest reliabilities of connections from a source vertex s to v. Note that the existing algorithm for the threshold-based reliability search problem is inefficient for the top-k reliability search problem. We propose a new algorithm to efficiently solve the top-k reliability search problem. The algorithm adopts two important techniques, namely the BFS sharing technique and the offline sampling technique. The BFS sharing technique exploits overlaps among different sampled possible worlds of the input uncertain graph and performs a single BFS on all possible worlds simultaneously. The offline sampling technique samples possible worlds offline and stores them using a compact structure. The algorithm also takes advantages of bit vectors and bitwise operations to improve efficiency. In addition, we generalize the top-k reliability search problem from single-source case to the multi-source case and show that the multi-source case of the problem can be equivalently converted to the single-source case of the problem. Moreover, we define two types of the reverse top-k reliability search problems with different semantics on uncertain graphs. We propose appropriate solutions for both of them. Extensive experiments carried out on both real and synthetic datasets verify that the optimized algorithm outperforms the baselines by 1–2 orders of magnitude in execution time while achieving comparable accuracy. Meanwhile, the optimized algorithm exhibits linear scalability with respect to the size of the input uncertain graph. 相似文献

18.

Diverse nearest neighbors queries using linear skylines

Camila F. Costa Mario A. Nascimento Matthias Schubert 《GeoInformatica》2018,22(4):815-844

k-nearest neighbor (k-NN) queries are well-known and widely used in a plethora of applications. However, in the original definition of k-NN queries there is no concern regarding diversity of the answer set with respect to the user’s interests. For instance, travelers may be looking for touristic sites that are close to where they are, but that would also lead them to see different parts of the city. Likewise, if one is looking for restaurants close by, it may be more interesting to learn about restaurants of different categories or ethnicities which are nonetheless relatively close. The interesting novel aspect of this type of query is that there are two competing criteria to be optimized: closeness and diversity. We propose two approaches that leverage the notion of linear skyline queries in order to find the k diverse nearest neighbors within a radius r from a given query point, or (k, r)-DNNs for short. Our proposed approaches return a relatively small set containing all optimal solutions for any linear combination of the weights a user could give to the two competing criteria, and we consider three different notions of diversity: spatial, categorical and angular. Our experiments, varying a number of parameters and exploring synthetic and real datasets, in both Euclidean space and road networks, respectively, show that our approaches are several orders of magnitude faster than a straightforward approach. 相似文献

19.

Bo Lang Bo Wu Yang Liu Xianglong Liu Boyu Zhang 《Multimedia Tools and Applications》2018,77(13):16177-16198

Similarity search in graph databases has been widely investigated. It is worthwhile to develop a fast algorithm to support similarity search in large-scale graph databases. In this paper, we investigate a k-NN (k-Nearest Neighbor) similarity search problem by locality sensitive hashing (LSH). We propose an innovative fast graph search algorithm named LSH-GSS, which first transforms complex graphs into vectorial representations based on prototypes in the database and later accelerates a query in Euclidean space by employing LSH. Because images can be represented as attributed graphs, we propose an approach to transform attributed graphs into n-dimensional vectors and apply LSH-GSS to execute further image retrieval. Experiments on three real graph datasets and two image datasets show that our methods are highly accurate and efficient. 相似文献

20.

Know your customer: computing <Emphasis Type="Italic">k</Emphasis>-most promising products for targeted marketing

Md. Saiful Islam Chengfei Liu 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(4):545-570

The advancement of World Wide Web has revolutionized the way the manufacturers can do business. The manufacturers can collect customer preferences for products and product features from their sales and other product-related Web sites to enter and sustain in the global market. For example, the manufactures can make intelligent use of these customer preference data to decide on which products should be selected for targeted marketing. However, the selected products must attract as many customers as possible to increase the possibility of selling more than their respective competitors. This paper addresses this kind of product selection problem. That is, given a database of existing products P from the competitors, a set of company’s own products Q, a dataset C of customer preferences and a positive integer k, we want to find k-most promising products (k-MPP) from Q with maximum expected number of total customers for targeted marketing. We model k-MPP query and propose an algorithmic framework for processing such query and its variants. Our framework utilizes grid-based data partitioning scheme and parallel computing techniques to realize k-MPP query. The effectiveness and efficiency of the framework are demonstrated by conducting extensive experiments with real and synthetic datasets. 相似文献