期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiple <Emphasis Type="Italic">k</Emphasis> nearest neighbor search

Yu-Chi Chung I-Fang Su Chiang Lee Pei-Chi Liu 《World Wide Web》2017,20(2):371-398

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time. 相似文献

2.

Efficient moving <Emphasis Type="Italic">k</Emphasis> nearest neighbor queries over line segment objects

Yu Gu Hui Zhang Zhigang Wang Ge Yu 《World Wide Web》2016,19(4):653-677

The growing need for location based services motivates the moving k nearest neighbor query (MkNN), which requires to find the k nearest neighbors of a moving query point continuously. In most existing solutions, data objects are abstracted as points. However, lots of real-world data objects, such as roads, rivers or pipelines, should be reasonably modeled as line segments or polyline segments. In this paper, we present LV*-Diagram to handle MkNN queries over line segment data objects. LV*-Diagram dynamically constructs a safe region. The query results remain unchanged if the query point is in the safe region, and hence, the computation cost of the server is greatly reduced. Experimental results show that our approach significantly outperforms the baseline method w.r.t. CPU load, I/O, and communication costs. 相似文献

3.

Locally adaptive <Emphasis Type="Italic">k</Emphasis> parameter selection for nearest neighbor classifier: one nearest cluster

Faruk Bulut Mehmet Fatih Amasyali 《Pattern Analysis & Applications》2017,20(2):415-425

The k nearest neighbors (k-NN) classification technique has a worldly wide fame due to its simplicity, effectiveness, and robustness. As a lazy learner, k-NN is a versatile algorithm and is used in many fields. In this classifier, the k parameter is generally chosen by the user, and the optimal k value is found by experiments. The chosen constant k value is used during the whole classification phase. The same k value used for each test sample can decrease the overall prediction performance. The optimal k value for each test sample should vary from others in order to have more accurate predictions. In this study, a dynamic k value selection method for each instance is proposed. This improved classification method employs a simple clustering procedure. In the experiments, more accurate results are found. The reasons of success have also been understood and presented. 相似文献

4.

Exact bootstrap <Emphasis Type="Italic">k</Emphasis>-nearest neighbor learners

Brian M. Steele 《Machine Learning》2009,74(3):235-255

Bootstrap aggregation, or bagging, is a method of reducing the prediction error of a statistical learner. The goal of bagging is to construct a new learner which is the expectation of the original learner with respect to the empirical distribution function. In nearly all cases, the expectation cannot be computed analytically, and bootstrap sampling is used to produce an approximation. The k-nearest neighbor learners are exceptions to this generalization, and exact bagging of many k-nearest neighbor learners is straightforward. This article presents computationally simple and fast formulae for exact bagging of k-nearest neighbor learners and extends exact bagging methods from the conventional bootstrap sampling (sampling n observations with replacement from a set of n observations) to bootstrap sub-sampling schemes (with and without replacement). In addition, a partially exact k-nearest neighbor regression learner is developed. The article also compares the prediction error associated with elementary and exact bagging k-nearest neighbor learners, and several other ensemble methods using a suite of publicly available data sets. 相似文献

5.

Maximum <Emphasis Type="Italic">k</Emphasis>-Splittable <Emphasis Type="Italic">s</Emphasis>,<Emphasis Type="Italic">t</Emphasis>-Flows

Ronald Koch Martin Skutella Ines Spenke 《Theory of Computing Systems》2008,43(1):56-66

Given a graph with a source and a sink node, the NP-hard maximum k-splittable s,t-flow (M k SF) problem is to find a flow of maximum value from s to t with a flow decomposition using at most k paths. The multicommodity variant of this problem is a natural generalization of disjoint paths and unsplittable flow problems. Constructing a k-splittable flow requires two interdepending decisions. One has to decide on k paths (routing) and on the flow values for the paths (packing). We give efficient algorithms for computing exact and approximate solutions by decoupling the two decisions into a first packing step and a second routing step. Usually the routing is considered before the packing. Our main contributions are as follows: (i) We show that for constant k a polynomial number of packing alternatives containing at least one packing used by an optimal M k SF solution can be constructed in polynomial time. If k is part of the input, we obtain a slightly weaker result. In this case we can guarantee that, for any fixed ε>0, the computed set of alternatives contains a packing used by a (1−ε)-approximate solution. The latter result is based on the observation that (1−ε)-approximate flows only require constantly many different flow values. We believe that this observation is of interest in its own right. (ii) Based on (i), we prove that, for constant k, the M k SF problem can be solved in polynomial time on graphs of bounded treewidth. If k is part of the input, this problem is still NP-hard and we present a polynomial time approximation scheme for it. 相似文献

6.

Continuous <Emphasis Type="Italic">k</Emphasis> nearest neighbor queries over large multi-attribute trajectories: a systematic approach

Jianqiu Xu Ralf Hartmut Güting Yunjun Gao 《GeoInformatica》2018,22(4):723-766

相似文献

7.

Nonblocking <Emphasis Type="Italic">k</Emphasis>-Compare-Single-Swap

Victor Luchangco Mark Moir Nir Shavit 《Theory of Computing Systems》2009,44(1):39-66

The current literature offers two extremes of nonblocking software synchronization support for concurrent data structure design: intricate designs of specific structures based on single-location operations such as compare-and-swap (CAS), and general-purpose multilocation transactional memory implementations. While the former are sometimes efficient, they are invariably hard to extend and generalize. The latter are flexible and general, but costly. This paper aims at a middle ground: reasonably efficient multilocation operations that are general enough to reduce the design difficulties of algorithms based on CAS alone. We present an obstruction-free implementation of an atomic k -location-compare single-location-swap (KCSS) operation. KCSS allows for simple nonblocking manipulation of linked data structures by overcoming the key algorithmic difficulty in their design: making sure that while a pointer is being manipulated, neighboring parts of the data structure remain unchanged. Our algorithm is efficient in the common uncontended case: A successful k-location KCSS operation requires only two CAS operations, two stores, and 2k noncached loads when there is no contention. We therefore believe our results lend themselves to efficient and flexible nonblocking manipulation of list-based data structures in today’s architectures. A preliminary version of this paper appeared in the Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 314–323, San Diego, California, USA, 2003. 相似文献

8.

Categorical top-<Emphasis Type="Italic">k</Emphasis> spatial influence query

Jianye Yang Wenjie Zhang Ying Zhang Xiaoyang Wang Xuemin Lin 《World Wide Web》2017,20(2):175-203

The influence of a spatial facility object depicts the importance of the object in the whole data space. In this paper, we present a novel definition of object influence in applications where objects are of different categories. We study the problem of Spatial Influence Query which considers the contribution of an object in forming functional units consisting of a given set of objects with different categories designated by users. We first show that the problem of spatial influence query is NP-hard with respect to the number of object categories in the functional unit. To tackle the computational hardness, we develop an efficient framework following two main steps, possible participants finding and optimal functional unit computation. Based on this framework, for the first step, novel and efficient pruning techniques are developed based on the nearest neighbor set (NNS) approach. To find the optimal functional unit efficiently, we propose two algorithms, an exact algorithm and an efficient approximate algorithm with performance guarantee. Comprehensive experiments on both real and synthetic datasets demonstrate the effectiveness and efficiency of our techniques. 相似文献

9.

Supporting top-<Emphasis Type="Italic">k</Emphasis><Emphasis Type="Italic">join</Emphasis> queries in relational databases

Ihab?F.?Ilyas Email author Walid?G.?Aref Ahmed?K.?Elmagarmid 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(3):207-221

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765 相似文献

10.

Uncertain top-<Emphasis Type="Italic">k</Emphasis> query processing in distributed environments

Xite?Wang Email author Derong?Shen Ge?Yu 《Distributed and Parallel Databases》2016,34(4):567-589

The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments. 相似文献

11.

A multi-resolution surface distance model for <Emphasis Type="Italic">k</Emphasis>-NN query processing

Ke Deng Xiaofang Zhou Heng Tao Shen Qing Liu Kai Xu Xuemin Lin 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):1101-1119

A spatial k-NN query returns k nearest points in a point dataset to a given query point. To measure the distance between two points, most of the literature focuses on the Euclidean distance or the network distance. For many applications, such as wildlife movement, it is necessary to consider the surface distance, which is computed from the shortest path along a terrain surface. In this paper, we investigate the problem of efficient surface k-NN (sk-NN) query processing. This is an important yet highly challenging problem because the underlying environment data can be very large and the computational cost of finding the shortest path on a surface can be very high. To minimize the amount of surface data to be used and the cost of surface distance computation, a multi-resolution surface distance model is proposed in this paper to take advantage of monotonic distance changes when the distances are computed at different resolution levels. Based on this innovative model, sk-NN queries can be processed efficiently by accessing and processing surface data at a just-enough resolution level within a just-enough search region. Our extensive performance evaluations using real world datasets confirm the efficiency of our proposed model. 相似文献

12.

Multivariate fuzzy <Emphasis Type="Italic">k</Emphasis>-modes algorithm

Diêgo B. M. Maciel Getulio J. A. Amaral Renata M. C. R. de Souza Bruno A. Pimentel 《Pattern Analysis & Applications》2017,20(1):59-71

In the fuzzy k-modes clustering, there is just one membership degree of interest by class for each individual which cannot be sufficient to model ambiguity of data precisely. It is known that the essence of a multivariate thinking allows to expose the inherent structure and meaning revealed within a set of variables classified. In this paper, a multivariate approach for membership degrees is presented to better handle ambiguous data that share properties of different clusters. This method is compared with other fuzzy k-modes methods of the literature based on a multivariate internal index that is also proposed in this paper. Synthetic and real categorical data sets are considered in this study. 相似文献

13.

<Emphasis Type="Italic">k</Emphasis>-most suitable locations selection

Yu-Chi Chung I-Fang Su Chiang Lee 《GeoInformatica》2018,22(4):661-692

Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently. 相似文献

14.

Online <Emphasis Type="Italic">k</Emphasis>-Server Routing Problems

Vincenzo Bonifaci Leen Stougie 《Theory of Computing Systems》2009,45(3):470-485

In an online k-server routing problem, a crew of k servers has to visit points in a metric space as they arrive in real time. Possible objective functions include minimizing the makespan (k-Traveling Salesman Problem) and minimizing the sum of completion times (k-Traveling Repairman Problem). We give competitive algorithms, resource augmentation results and lower bounds for k-server routing problems in a wide class of metric spaces. In some cases the competitive ratio is dramatically better than that of the corresponding single server problem. Namely, we give a 1+O((log k)/k)-competitive algorithm for the k-Traveling Salesman Problem and the k-Traveling Repairman Problem when the underlying metric space is the real line. We also prove that a similar result cannot hold for the Euclidean plane. An extended abstract of this work has appeared in the proceedings of the 4th Workshop on Approximation and Online Algorithms, September 2006. Research of V. Bonifaci partly supported by the Dutch Ministry of Education, Culture and Science through a Huygens scholarship. Research of L. Stougie partly supported by MRT Network ADONET of the European Community (MRTN-CT-2003-504438) and the Dutch BSIK/BRICKS project. 相似文献

15.

On the <Emphasis Type="Italic">k</Emphasis>-accessibility of cores of <Emphasis Type="Italic">TU</Emphasis>-cooperative games

V. A. Vasil’ev 《Automation and Remote Control》2017,78(12):2248-2264

This paper proposes a strengthening of the author’s core-accessibility theorem for balanced TU-cooperative games. The obtained strengthening relaxes the influence of the nontransitivity of classical domination αv on the quality of the sequential improvement of dominated imputations in a game v. More specifically, we establish the k-accessibility of the core C(α_v) of any balanced TU-cooperative game v for all natural numbers k: for each dominated imputation x, there exists a converging sequence of imputations x₀, x₁,..., such that x₀ = x, lim x_r ∈ C(α_v) and x_r?m is dominated by any successive imputation x_r with m ∈ [1, k] and r ≥ m. For showing that the TU-property is essential to provide the k-accessibility of the core, we give an example of an NTU-cooperative game G with a ”black hole” representing a nonempty closed subset B ? G(N) of dominated imputations that contains all the α_G-monotonic sequential improvement trajectories originating at any point x ∈ B. 相似文献

16.

Pseudo-Randomness of Certain Sequences of <Emphasis Type="Italic">k</Emphasis> Symbols with Length <Emphasis Type="Italic">pq</Emphasis>

下载免费PDF全文

Zhi-Xiong Chen Xiao-Ni Du Chen-Huang Wu 《计算机科学技术学报》2011,26(2):276-282

The theory of finite pseudo-random binary sequences was built by C. Mauduit and A. Sárközy and later extended to sequences of k symbols (or k-ary sequences). Certain constructions of pseudo-random sequences of k symbols were presented over finite fields in the literature. In this paper, two families of sequences of k symbols are constructed by using the integers modulo pq for distinct odd primes p and q. The upper bounds on the well-distribution measure and the correlation measure of the families sequences are presented in terms of certain character sums over modulo pq residue class rings. And low bounds on the linear complexity profile are also estimated. 相似文献

17.

Participant selection for <Emphasis Type="Italic">t</Emphasis>-sweep <Emphasis Type="Italic">k</Emphasis>-coverage crowd sensing tasks

Zhiyong?Yu Jie?Zhou Wenzhong?Guo Email author Longkun?Guo Zhiwen?Yu 《World Wide Web》2018,21(3):741-758

With the popularization of wireless networks and mobile intelligent terminals, mobile crowd sensing is becoming a promising sensing paradigm. Tasks are assigned to users with mobile devices, which then collect and submit ambient information to the server. The composition of participants greatly determines the quality and cost of the collected information. This paper aims to select fewest participants to achieve the quality required by a sensing task. The requirement namely “t-sweep k-coverage” means for a target location, every t time interval should at least k participants sense. The participant selection problem for “t-sweep k-coverage” crowd sensing tasks is NP-hard. Through delicate matrix stacking, linear programming can be adopted to solve the problem when it is in small size. We further propose a participant selection method based on greedy strategy. The two methods are evaluated through simulated experiments using users’ call detail records. The results show that for small problems, both the two methods can find a participant set meeting the requirement. The number of participants picked by the greedy based method is roughly twice of the linear programming based method. However, when problems become larger, the linear programming based method performs unstably, while the greedy based method can still output a reasonable solution. 相似文献

18.

Maximizing bichromatic reverse nearest neighbor for <Emphasis Type="Italic">L</Emphasis><Subscript><Emphasis Type="Italic">p</Emphasis></Subscript>-norm in two- and three-dimensional spaces

Raymond Chi-Wing Wong M. Tamer Özsu Ada Wai-Chee Fu Philip S. Yu Lian Liu Yubao Liu 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(6):893-919

Bichromatic reverse nearest neighbor (BRNN) has been extensively studied in spatial database literature. In this paper, we study a related problem called MaxBRNN: find an optimal region that maximizes the size of BRNNs for L _p-norm in two- and three- dimensional spaces. Such a problem has many real-life applications, including the problem of finding a new server point that attracts as many customers as possible by proximity. A straightforward approach is to determine the BRNNs for all possible points that are not feasible since there are a large (or infinite) number of possible points. To the best of our knowledge, there are no existing algorithms which solve MaxBRNN for any L _p-norm space of two- and three-dimensionality. Based on some interesting properties of the problem, we come up with an efficient algorithm called MaxOverlap for to solve this problem. Extensive experiments are conducted to show that our algorithm is efficient. 相似文献

19.

Algorithms for constrained <Emphasis Type="Italic">k</Emphasis>-nearest neighbor queries over moving object trajectories

Yunjun Gao Baihua Zheng Gencai Chen Qing Li 《GeoInformatica》2010,14(2):241-276

An important query for spatio-temporal databases is to find nearest trajectories of moving objects. Existing work on this topic focuses on the closest trajectories in the whole data space. In this paper, we introduce and solve constrained k-nearest neighbor (CkNN) queries and historical continuous CkNN (HCCkNN) queries on R-tree-like structures storing historical information about moving object trajectories. Given a trajectory set D, a query object (point or trajectory) q, a temporal extent T, and a constrained region CR, (i) a CkNN query over trajectories retrieves from D within T, the k (≥ 1) trajectories that lie closest to q and intersect (or are enclosed by) CR; and (ii) an HCCkNN query on trajectories retrieves the constrained k nearest neighbors (CkNNs) of q at any time instance of T. We propose a suite of algorithms for processing CkNN queries and HCCkNN queries respectively, with different properties and advantages. In particular, we thoroughly investigate two types of CkNN queries, i.e., CkNN_P and CkNN_T, which are defined with respect to stationary query points and moving query trajectories, respectively; and two types of HCCkNN queries, namely, HCCkNN_P and HCCkNN_T, which are continuous counterparts of CkNN_P and CkNN_T, respectively. Our methods utilize an existing data-partitioning index for trajectory data (i.e., TB-tree) to achieve low I/O and CPU cost. Extensive experiments with both real and synthetic datasets demonstrate the performance of the proposed algorithms in terms of efficiency and scalability. 相似文献

20.

Protecting query privacy with differentially private <Emphasis Type="Italic">k</Emphasis>-anonymity in location-based services

Jinbao?Wang Zhipeng?Cai Yingshu?Li Donghua?Yang Email author Ji?Li Hong?Gao 《Personal and Ubiquitous Computing》2018,22(3):453-469

Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions. 相似文献