期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient moving <Emphasis Type="Italic">k</Emphasis> nearest neighbor queries over line segment objects

Yu Gu Hui Zhang Zhigang Wang Ge Yu 《World Wide Web》2016,19(4):653-677

The growing need for location based services motivates the moving k nearest neighbor query (MkNN), which requires to find the k nearest neighbors of a moving query point continuously. In most existing solutions, data objects are abstracted as points. However, lots of real-world data objects, such as roads, rivers or pipelines, should be reasonably modeled as line segments or polyline segments. In this paper, we present LV*-Diagram to handle MkNN queries over line segment data objects. LV*-Diagram dynamically constructs a safe region. The query results remain unchanged if the query point is in the safe region, and hence, the computation cost of the server is greatly reduced. Experimental results show that our approach significantly outperforms the baseline method w.r.t. CPU load, I/O, and communication costs. 相似文献

2.

Monochromatic and bichromatic ranked reverse boolean spatial keyword nearest neighbors search

Pengpeng?Zhao Email author Hailin?Fang Victor?S.?Sheng Zhixu?Li Jiajie?Xu Jian?Wu Zhiming?Cui 《World Wide Web》2017,20(1):39-59

Recently, Reverse k Nearest Neighbors (RkNN) queries, returning every answer for which the query is one of its k nearest neighbors, have been extensively studied on the database research community. But the RkNN query cannot retrieve spatio-textual objects which are described by their spatial location and a set of keywords. Therefore, researchers proposed a RSTkNN query to find these objects, taking both spatial and textual similarity into consideration. However, the RSTkNN query cannot control the size of answer set and to be sorted according to the degree of influence on the query. In this paper, we propose a new problem Ranked Reverse Boolean Spatial Keyword Nearest Neighbors query called Ranked-RBSKNN query, which considers both spatial similarity and textual relevance, and returns t answers with most degree of influence. We propose a separate index and a hybrid index to process such queries efficiently. Experimental results on different real-world and synthetic datasets show that our approaches achieve better performance. 相似文献

3.

Multiple <Emphasis Type="Italic">k</Emphasis> nearest neighbor search

Yu-Chi Chung I-Fang Su Chiang Lee Pei-Chi Liu 《World Wide Web》2017,20(2):371-398

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time. 相似文献

4.

Continuous visible nearest neighbor query processing in spatial databases 总被引：1，自引：0，他引：1

Yunjun Gao Baihua Zheng Gencai Chen Qing Li Xiaofa Guo 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):371-396

In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms. 相似文献

5.

Answering why-not and why questions on reverse top-<Emphasis Type="Italic">k</Emphasis> queries

Qing Liu Yunjun Gao Gang Chen Baihua Zheng Linlin Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(6):867-892

Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings. 相似文献

6.

Approximate Continuous Top-<Emphasis Type="Italic">k</Emphasis> Query over Sliding Window

下载免费PDF全文

Rui Zhu Bin Wang Shi-Ying Luo Xiao-Chun Yang Guo-Ren Wang 《计算机科学技术学报》2017,32(1):93-109

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 ? δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate. 相似文献

7.

DART+: Direction-aware bichromatic reverse k nearest neighbor query processing in spatial databases

Kyoung-Won Lee Dong-Wan Choi Chin-Wan Chung 《Journal of Intelligent Information Systems》2014,43(2):349-377

This article presents a novel type of queries in spatial databases, called the direction-aware bichromatic reverse k nearest neighbor(DBRkNN) queries, which extend the bichromatic reverse nearest neighbor queries. Given two disjoint sets, P and S, of spatial objects, and a query object q in S, the DBRkNN query returns a subset P′ of P such that k nearest neighbors of each object in P′ include q and each object in P′ has a direction toward q within a pre-defined distance. We formally define the DBRkNN query, and then propose an efficient algorithm, called DART, for processing the DBRkNN query. Our method utilizes a grid-based index to cluster the spatial objects, and the B⁺-tree to index the direction angle. We adopt a filter-refinement framework that is widely used in many algorithms for reverse nearest neighbor queries. In the filtering step, DART eliminates all the objects that are away from the query object more than a pre-defined distance, or have an invalid direction angle. In the refinement step, remaining objects are verified whether the query object is actually one of the k nearest neighbors of them. As a major extension of DART, we also present an improved algorithm, called DART+, for DBRkNN queries. From extensive experiments with several datasets, we show that DART outperforms an R-tree-based naive algorithm in both indexing time and query processing time. In addition, our extension algorithm, DART+, also shows significantly better performance than DART. 相似文献

8.

Uncertain top-<Emphasis Type="Italic">k</Emphasis> query processing in distributed environments

Xite?Wang Email author Derong?Shen Ge?Yu 《Distributed and Parallel Databases》2016,34(4):567-589

The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments. 相似文献

9.

Scalable and efficient processing of top-<Emphasis Type="Italic">k</Emphasis> multiple-type integrated queries

Hyuk-Yoon?Kwon Kyu-Young?Whang Email author 《World Wide Web》2016,19(6):1051-1075

In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various data types such as spatial, textual, and relational data types can be used for the top-k MULTI query. The top-k MULTI query distinguishes itself from the traditional top-k query in that the component scores to calculate final scores are determined dependent of the query. Hence, each component score is calculated only when the query is given for each data type rather than being calculated apriori as in the top-k query. As a representative instance, the traditional top-k spatial keyword query is an instance of the top-k MULTI query. It deals with the spatial data type and text data type and finds the information based on spatial proximity and textual relevance between the query and the object, which is determined only when the query is given. In this paper, we first define the top-k MULTI query formally and define a new specific instance for the top-k MULTI query, the top-k spatial-keyword-relational(SKR) query, by integrating the relational data type into the traditional top-k spatial keyword query. Then, we investigate the processing approaches for the top-k MULTI query. We discuss the scalability of those approaches as new data types are integrated. We also devise the processing methods for the top-k SKR query. Finally, through extensive experiments on the top-k SKR query using real and synthetic data sets, we compare efficiency of the methods in terms of the query performance and storage. 相似文献

10.

Continuous <Emphasis Type="Italic">k</Emphasis> nearest neighbor queries over large multi-attribute trajectories: a systematic approach

Jianqiu Xu Ralf Hartmut Güting Yunjun Gao 《GeoInformatica》2018,22(4):723-766

相似文献

11.

<Emphasis Type="Italic">k</Emphasis>-most suitable locations selection

Yu-Chi Chung I-Fang Su Chiang Lee 《GeoInformatica》2018,22(4):661-692

Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently. 相似文献

12.

Ranking continuous nearest neighbors for uncertain trajectories

Goce Trajcevski Roberto Tamassia Isabel F. Cruz Peter Scheuermann David Hartglass Christopher Zamierowski 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(5):767-791

This article addresses the problem of performing Nearest Neighbor (NN) queries on uncertain trajectories. The answer to an NN query for certain trajectories is time parameterized due to the continuous nature of the motion. As a consequence of uncertainty, there may be several objects that have a non-zero probability of being a nearest neighbor to a given querying object, and the continuous nature further complicates the semantics of the answer. We capture the impact that the uncertainty of the trajectories has on the semantics of the answer to continuous NN queries and we propose a tree structure for representing the answers, along with efficient algorithms to compute them. We also address the issue of performing NN queries when the motion of the objects is restricted to road networks. Finally, we formally define and show how to efficiently execute several variants of continuous NN queries. Our experiments demonstrate that the proposed algorithms yield significant performance improvements when compared with the corresponding naïve approaches. 相似文献

13.

Know your customer: computing <Emphasis Type="Italic">k</Emphasis>-most promising products for targeted marketing

Md. Saiful Islam Chengfei Liu 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(4):545-570

The advancement of World Wide Web has revolutionized the way the manufacturers can do business. The manufacturers can collect customer preferences for products and product features from their sales and other product-related Web sites to enter and sustain in the global market. For example, the manufactures can make intelligent use of these customer preference data to decide on which products should be selected for targeted marketing. However, the selected products must attract as many customers as possible to increase the possibility of selling more than their respective competitors. This paper addresses this kind of product selection problem. That is, given a database of existing products P from the competitors, a set of company’s own products Q, a dataset C of customer preferences and a positive integer k, we want to find k-most promising products (k-MPP) from Q with maximum expected number of total customers for targeted marketing. We model k-MPP query and propose an algorithmic framework for processing such query and its variants. Our framework utilizes grid-based data partitioning scheme and parallel computing techniques to realize k-MPP query. The effectiveness and efficiency of the framework are demonstrated by conducting extensive experiments with real and synthetic datasets. 相似文献

14.

Diverse nearest neighbors queries using linear skylines

Camila F. Costa Mario A. Nascimento Matthias Schubert 《GeoInformatica》2018,22(4):815-844

k-nearest neighbor (k-NN) queries are well-known and widely used in a plethora of applications. However, in the original definition of k-NN queries there is no concern regarding diversity of the answer set with respect to the user’s interests. For instance, travelers may be looking for touristic sites that are close to where they are, but that would also lead them to see different parts of the city. Likewise, if one is looking for restaurants close by, it may be more interesting to learn about restaurants of different categories or ethnicities which are nonetheless relatively close. The interesting novel aspect of this type of query is that there are two competing criteria to be optimized: closeness and diversity. We propose two approaches that leverage the notion of linear skyline queries in order to find the k diverse nearest neighbors within a radius r from a given query point, or (k, r)-DNNs for short. Our proposed approaches return a relatively small set containing all optimal solutions for any linear combination of the weights a user could give to the two competing criteria, and we consider three different notions of diversity: spatial, categorical and angular. Our experiments, varying a number of parameters and exploring synthetic and real datasets, in both Euclidean space and road networks, respectively, show that our approaches are several orders of magnitude faster than a straightforward approach. 相似文献

15.

Reverse Furthest Neighbors Query in Road Networks

下载免费PDF全文

Xiao-Jun Xu Jin-Song Bao Bin Yao Jing-Yu Zhou Fei-Long Tang Min-Yi Guo Jian-Qiu Xu 《计算机科学技术学报》2017,32(1):155-167

Given a road network G = (V,E), where V (E) denotes the set of vertices(edges) in G, a set of points of interest P and a query point q residing in G, the reverse furthest neighbors (Rfn _R) query in road networks fetches a set of points p ∈ P that take q as their furthest neighbor compared with all points in P ∪ {q}. This is the monochromatic Rfn _R (Mrfn _R) query. Another interesting version of Rfn _R query is the bichromatic reverse furthest neighbor (Brfn _R) query. Given two sets of points P and Q, and a query point q ∈ Q, a Brfn _R query fetches a set of points p ∈ P that take q as their furthest neighbor compared with all points in Q. This paper presents efficient algorithms for both Mrfn _R and Brfn _R queries, which utilize landmarks and partitioning-based techniques. Experiments on real datasets confirm the efficiency and scalability of proposed algorithms. 相似文献

16.

Mutually independent Hamiltonian cycles in dual-cubes 总被引：1，自引：0，他引：1

Yuan-Kang Shih Hui-Chun Chuang Shin-Shin Kao Jimmy J. M. Tan 《The Journal of supercomputing》2010,54(2):239-251

The hypercube family Q _n is one of the most well-known interconnection networks in parallel computers. With Q _n, dual-cube networks, denoted by DC _n, was introduced and shown to be a (n+1)-regular, vertex symmetric graph with some fault-tolerant Hamiltonian properties. In addition, DC _n’s are shown to be superior to Q _n’s in many aspects. In this article, we will prove that the n-dimensional dual-cube DC _n contains n+1 mutually independent Hamiltonian cycles for n≥2. More specifically, let v _i∈V(DC _n) for 0≤i≤|V(DC _n)|?1 and let \(\langle v_{0},v_{1},\ldots ,v_{|V(\mathit{DC}_{n})|-1},v_{0}\rangle\) be a Hamiltonian cycle of DC _n. We prove that DC _n contains n+1 Hamiltonian cycles of the form \(\langle v_{0},v_{1}^{k},\ldots,v_{|V(\mathit{DC}_{n})|-1}^{k},v_{0}\rangle\) for 0≤k≤n, in which v _i ^k ≠v _i ^k′ whenever k≠k′. The result is optimal since each vertex of DC _n has only n+1 neighbors. 相似文献

17.

The Number of Tree Stars Is <Emphasis Type="Italic">O</Emphasis><Superscript>*</Superscript>(1.357<Superscript><Emphasis Type="Italic">k</Emphasis></Superscript>)

Bernhard Fuchs Walter Kern Xinhui Wang 《Algorithmica》2007,49(3):232-244

Every rectilinear Steiner tree problem admits an optimal tree T ^* which is composed of tree stars. Moreover, the currently fastest algorithms for the rectilinear Steiner tree problem proceed by composing an optimum tree T ^* from tree star components in the cheapest way. The efficiency of such algorithms depends heavily on the number of tree stars (candidate components). Fößmeier and Kaufmann (Algorithmica 26, 68–99, 2000) showed that any problem instance with k terminals has a number of tree stars in between 1.32^k and 1.38^k (modulo polynomial factors) in the worst case. We determine the exact bound O ^*(ρ ^k) where ρ≈1.357 and mention some consequences of this result. 相似文献

18.

Exact and approximate flexible aggregate similarity search

Feifei?Li Ke?Yi Yufei?Tao Bin?Yao Email author Yang?Li Dong?Xie Min?Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):317-338

Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support \(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor ( Fann ) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality. 相似文献

19.

Flexible integration of multimedia sub-queries with qualitative preferences 总被引：1，自引：0，他引：1

Ilaria Bartolini Paolo Ciaccia Vincent Oria M. Tamer Özsu 《Multimedia Tools and Applications》2007,33(3):275-300

Complex multimedia queries, aiming to retrieve from large databases those objects that best match the query specification, are usually processed by splitting them into a set of m simpler sub-queries, each dealing with only some of the query features. To determine which are the overall best-matching objects, a rule is then needed to integrate the results of such sub-queries, i.e., how to globally rank the m-dimensional vectors of matching degrees, or partial scores, that objects obtain on the m sub-queries. It is a fact that state-of-the-art approaches all adopt as integration rule a scoring function, such as weighted average, that aggregates the m partial scores into an overall (numerical) similarity score, so that objects can be linearly ordered and only the highest scored ones returned to the user. This choice however forces the system to compromise between the different sub-queries and can easily lead to miss relevant results. In this paper we explore the potentialities of a more general approach, based on the use of qualitative preferences, able to define arbitrary partial (rather than only linear) orders on database objects, so that a larger flexibility is gained in shaping what the user is looking for. For the purpose of efficient evaluation, we propose two integration algorithms able to work with any (monotone) partial order (thus also with scoring functions): MPO, which delivers objects one layer of the partial order at a time, and iMPO, which can incrementally return one object at a time, thus also suitable for processing top k queries. Our analysis demonstrates that using qualitative preferences pays off. In particular, using Skyline and Region-prioritized Skyline preferences for queries on a real image database, we show that the results we get have a precision comparable to that obtainable using scoring functions, yet they are obtained much faster, saving up to about 70% database accesses. 相似文献

20.

Locally adaptive <Emphasis Type="Italic">k</Emphasis> parameter selection for nearest neighbor classifier: one nearest cluster

Faruk Bulut Mehmet Fatih Amasyali 《Pattern Analysis & Applications》2017,20(2):415-425

The k nearest neighbors (k-NN) classification technique has a worldly wide fame due to its simplicity, effectiveness, and robustness. As a lazy learner, k-NN is a versatile algorithm and is used in many fields. In this classifier, the k parameter is generally chosen by the user, and the optimal k value is found by experiments. The chosen constant k value is used during the whole classification phase. The same k value used for each test sample can decrease the overall prediction performance. The optimal k value for each test sample should vary from others in order to have more accurate predictions. In this study, a dynamic k value selection method for each instance is proposed. This improved classification method employs a simple clustering procedure. In the experiments, more accurate results are found. The reasons of success have also been understood and presented. 相似文献