期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast exact k nearest neighbors search using an orthogonal search tree

Yi-Ching Liaw^{Author Vitae} Maw-Lin Leou Author VitaeAuthor Vitae 《Pattern recognition》2010,43(6):2351-5168

The problem of k nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. In this paper, a novel fast kNN search method using an orthogonal search tree is proposed. The proposed method creates an orthogonal search tree for a data set using an orthonormal basis evaluated from the data set. To find the kNN for a query point from the data set, projection values of the query point onto orthogonal vectors in the orthonormal basis and a node elimination inequality are applied for pruning unlikely nodes. For a node, which cannot be deleted, a point elimination inequality is further used to reject impossible data points. Experimental results show that the proposed method has good performance on finding kNN for query points and always requires less computation time than available kNN search algorithms, especially for a data set with a big number of data points or a large standard deviation. 相似文献

2.

Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

Yong-Sheng Chen Author Vitae Yi-Ping Hung Author Vitae Ting-Fang Yen^{Author Vitae} 《Pattern recognition》2007,40(2):360-375

In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems). 相似文献

3.

Parallelism in alpha-beta search

Raphael A. Finkel John P. Fishburn 《Artificial Intelligence》1982,19(1):89-106

We present a distributed algorithm for implementing α-β search on a tree of processors. Each processor is an independent computer with its own memory and is connected by communication lines to each of its nearest neighbors. Measurements of the algorithm's performance on the Arachne distributed operating system are presented. A theoretical model is developed that predicts at least order of

k^{12}

speedup with k processors. 相似文献

4.

Probably correct k-nearest neighbor search in high dimensions

Jun Toyama Author Vitae Author Vitae Hideyuki Imai Author Vitae 《Pattern recognition》2010,43(4):1361-1372

A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We exploit the marginal distribution of the k th nearest neighbors in low dimensions, which is estimated from the stored data (an empirical percentile approach). We analyze the basic nature of the marginal distribution and show the advantage of the implemented algorithm, which is a probabilistic variant of the partial distance searching. Its query time is sublinear in data size n, that is, O(mnδ) with δ=o(1) in n and δ≤1, for any fixed dimension m. 相似文献

5.

A unified tree-based framework for joint action localization,recognition and segmentation

Zhuolin Jiang Zhe Lin Larry S. Davis 《Computer Vision and Image Understanding》2013,117(10):1345-1355

相似文献

6.

Effective protocols for kNN search on broadcast multi-dimensional index trees

Chuan-Ming Liu Shu-Yu Fu 《Information Systems》2008

In a wireless mobile environment, data broadcasting provides an efficient way to disseminate data. Via data broadcasting, a server can provide location-based services to a large client population in a wireless environment. Among different location-based services, the k nearest neighbors (kNN) search is important and is used to find the k closest objects to a given point. However, the kNN search in a broadcast environment is particularly challenging due to the sequential access to the data on a broadcast channel. We propose efficient protocols for the kNN search on a broadcast R-tree, which is a popular multi-dimensional index tree, in a wireless broadcast environment in terms of latency and tuning time as well as memory usage. We investigate how a server schedules the broadcast and provide the corresponding kNN search algorithms at the mobile clients. One of our kNN search protocols further allows a kNN search to start at an arbitrary time instance and it can skip the waiting time for the beginning of a broadcast cycle, thereby reducing the latency. The experimental results validate that our mechanisms achieve the objectives. 相似文献

7.

Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances 总被引：1，自引：0，他引：1

Ada Wai-chee Fu Polly Mei-shuen Chan Yin-Ling Cheung Yiu Sang Moon 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):154-173

Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the -tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation. Received June 9, 1998 / Accepted January 31, 2000 相似文献

8.

Chromatic distribution of k-nearest neighbors of a line segment in a planar colored point set

Partha P. Goswami Subhas C. Nandy 《Information Processing Letters》2007,102(4):163-168

Let P be a set of n colored points distributed arbitrarily in R². The chromatic distribution of the k-nearest neighbors of a query line segment ? is to report the number of points of each color among the k-nearest points of the query line segment. While solving this problem, we have encountered another interesting problem, namely the semicircular range counting query. Here a set of n points is given. The objective is to report the number of points inside a given semicircular range. We propose a simple algorithm for this problem with preprocessing time and space complexity O(n³), and the query time complexity O(logn). Finally, we propose the algorithm for reporting the chromatic distribution of k nearest neighbors of a query line segment. Using our proposed technique for semicircular range counting query, it runs in O(log²n) time. 相似文献

9.

Noisy data elimination using mutual k-nearest neighbor for classification mining

Huawen Liu Shichao Zhang 《Journal of Systems and Software》2012,85(5):1067-1074

k nearest neighbor (kNN) is an effective and powerful lazy learning algorithm, notwithstanding its easy-to-implement. However, its performance heavily relies on the quality of training data. Due to many complex real-applications, noises coming from various possible sources are often prevalent in large scale databases. How to eliminate anomalies and improve the quality of data is still a challenge. To alleviate this problem, in this paper we propose a new anomaly removal and learning algorithm under the framework of kNN. The primary characteristic of our method is that the evidence of removing anomalies and predicting class labels of unseen instances is mutual nearest neighbors, rather than k nearest neighbors. The advantage is that pseudo nearest neighbors can be identified and will not be taken into account during the prediction process. Consequently, the final learning result is more creditable. An extensive comparative experimental analysis carried out on UCI datasets provided empirical evidence of the effectiveness of the proposed method for enhancing the performance of the k-NN rule. 相似文献

10.

PCA-based branch and bound search algorithms for computing K nearest neighbors

《Pattern recognition letters》2003,24(9-10):1437-1451

In this paper, the efficiency of branch and bound search algorithms for the computation of K nearest neighbors is studied. The most important aspects that influence the efficiency of the search algorithm are: (1) the decomposition method, (2) the elimination rule, (3) the traversal order and (4) the level of decomposition. First, a theoretical derivation of an efficient decomposition method based on principal component analysis is given. Then, different elimination rules and traversal orders are combined resulting in ten different search algorithms. Since the efficiency is strongly dependent on the level of decomposition, this user specified parameter is optimized first. This optimization is realized by a probabilistic model that expresses the total computation time in function of the node traversal cost and the distance computation cost. All comparisons are based on the total computation time for the optimal level of decomposition. 相似文献

11.

Refinements to nearest-neighbor searching ink-dimensional trees

Robert F. Sproull 《Algorithmica》1991,6(1-6):579-589

This note presents a simplification and generalization of an algorithm for searchingk-dimensional trees for nearest neighbors reported by Friedmanet al [3]. If the distance between records is measured usingL ₂, the Euclidean norm, the data structure used by the algorithm to determine the bounds of the search space can be simplified to a single number. Moreover, because distance measurements inL ₂ are rotationally invariant, the algorithm can be generalized to allow a partition plane to have an arbitrary orientation, rather than insisting that it be perpendicular to a coordinate axis, as in the original algorithm. When ak-dimensional tree is built, this plane can be found from the principal eigenvector of the covariance matrix of the records to be partitioned. These techniques and others yield variants ofk-dimensional trees customized for specific applications. It is wrong to assume thatk-dimensional trees guarantee that a nearest-neighbor query completes in logarithmic expected time. For smallk, logarithmic behavior is observed on all but tiny trees. However, for largerk, logarithmic behavior is achievable only with extremely large numbers of records. Fork = 16, a search of ak-dimensional tree of 76,000 records examines almost every record. 相似文献

12.

Visual word expansion and BSIFT verification for large-scale image search

Wengang Zhou Houqiang Li Yijuan Lu Meng Wang Qi Tian 《Multimedia Systems》2015,21(3):245-254

相似文献

13.

k-Nearest neighbor query processing algorithm for cloaking regions towards user privacy protection in location-based services

Jung-Ho Um Yong-Ki Kim Hyun-Jo Lee Miyoung Jang Jae-Woo Chang 《Journal of Systems Architecture》2012,58(9):354-371

Due to the advancement of wireless internet and mobile positioning technology, the application of location-based services (LBSs) has become popular for mobile users. Since users have to send their exact locations to obtain the service, it may lead to several privacy threats. To solve this problem, a cloaking method has been proposed to blur users’ exact locations into a cloaked spatial region with a required privacy threshold (k). With the cloaked region, an LBS server can carry out a k-nearest neighbor (k-NN) search algorithm. Some recent studies have proposed methods to search k-nearest POIs while protecting a user’s privacy. However, they have at least one major problem, such as inefficiency on query processing or low precision of retrieved result. To resolve these problems, in this paper, we propose a novel k-NN query processing algorithm for a cloaking region to satisfy both requirements of fast query processing time and high precision of the retrieved result. To achieve fast query processing time, we propose a new pruning technique based on a 2D-coodinate scheme. In addition, we make use of a Voronoi diagram for retrieving the nearest POIs efficiently. To satisfy the requirement of high precision of the retrieved result, we guarantee that our k-NN query processing algorithm always contains the exact set of k nearest neighbors. Our performance analysis shows that our algorithm achieves better performance in terms of query processing time and the number of candidate POIs compared with other algorithms. 相似文献

14.

Efficient mutual nearest neighbor query processing for moving object trajectories

Yunjun Gao Baihua Zheng 《Information Sciences》2010,180(11):2176-5168

Given a set D of trajectories, a query object q, and a query time extent Γ, a mutual (i.e., symmetric) nearest neighbor (MNN) query over trajectories finds from D, the set of trajectories that are among the k₁ nearest neighbors (NNs) of q within Γ, and meanwhile, have q as one of their k₂ NNs. This type of queries is useful in many applications such as decision making, data mining, and pattern recognition, as it considers both the proximity of the trajectories to q and the proximity of q to the trajectories. In this paper, we first formalize MNN search and identify its characteristics, and then develop several algorithms for processing MNN queries efficiently. In particular, we investigate two classes of MNN queries, i.e., MNN_P and MNN_T queries, which are defined with respect to stationary query points and moving query trajectories, respectively. Our methods utilize the batch processing and reusing technology to reduce the I/O cost (i.e., number of node/page accesses) and CPU time significantly. In addition, we extend our techniques to tackle historical continuous MNN (HCMNN) search for moving object trajectories, which returns the mutual nearest neighbors of q (for a specified k₁ and k₂) at any time instance of Γ. Extensive experiments with real and synthetic datasets demonstrate the performance of our proposed algorithms in terms of efficiency and scalability. 相似文献

15.

An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list

Jim Z.C. Lai 《Information Sciences》2011,181(9):1722-3410

In this paper, a new algorithm is developed to reduce the computational complexity of Ward’s method. The proposed approach uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge. Double linked algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm by obtaining an approximate solution of hierarchical agglomerative clustering. In this paper, we propose a method to resolve the problem of a non-optimal solution for DLA while keeping the corresponding advantage of low computational complexity. The computational complexity of the proposed method DKNNA + FS (dynamic k-nearest-neighbor algorithm with a fast search) in terms of the number of distance calculations is O(N²), where N is the number of data points. Compared to FPNN with a fast search (FPNN + FS), the proposed method using the same fast search algorithm (DKNNA + FS) can reduce the computing time by a factor of 1.90-2.18 for the data set from a real image. In comparison with FPNN + FS, DKNNA + FS can reduce the computing time by a factor of 1.92-2.02 using the data set generated from three images. Compared to DLA with a fast search (DLA + FS), DKNNA + FS can decrease the average mean square error by 1.26% for the same data set. 相似文献

16.

Direct neighbor search

《Information Systems》2014

相似文献

17.

Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization

Min DU Xing-shu CHEN 《浙江大学学报:C卷英文版》2013,14(6):407-416

Text categorization is a significant technique to manage the surging text data on the Internet.The k-nearest neighbors(kNN) algorithm is an effective,but not efficient,classification model for text categorization.In this paper,we propose an effective strategy to accelerate the standard kNN,based on a simple principle:usually,near points in space are also near when they are projected into a direction,which means that distant points in the projection direction are also distant in the original space.Using the proposed strategy,most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point,which greatly decreases the computation cost.Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN,with little degradation in accuracy.Specifically,it is superior in applications that have large and high-dimensional datasets. 相似文献

18.

A refined search tree technique for Dominating Set on planar graphs

Jochen Alber Hongbing Fan Henning Fernau Rolf Niedermeier Fran Rosamond 《Journal of Computer and System Sciences》2005,71(4):385-405

We establish a refined search tree technique for the parameterized DOMINATING SET problem on planar graphs. Here, we are given an undirected graph and we ask for a set of at most k vertices such that every other vertex has at least one neighbor in this set. We describe algorithms with running times O(8^kn) and O(8^kk+n³), where n is the number of vertices in the graph, based on bounded search trees. We describe a set of polynomial time data-reduction rules for a more general “annotated” problem on black/white graphs that asks for a set of k vertices (black or white) that dominate all the black vertices. An intricate argument based on the Euler formula then establishes an efficient branching strategy for reduced inputs to this problem. In addition, we give a family examples showing that the bound of the branching theorem is optimal with respect to our reduction rules. Our final search tree algorithm is easy to implement; its analysis, however, is involved. 相似文献

19.

An effective local search for the maximum clique problem 总被引：2，自引：0，他引：2

Kengo Katayama Akihiro Hamamoto Hiroyuki Narihisa 《Information Processing Letters》2005,95(5):503-511

We propose a variable depth search based algorithm, called k-opt local search (KLS), for the maximum clique problem. KLS efficiently explores the k-opt neighborhood defined as the set of neighbors that can be obtained by a sequence of several add and drop moves that are adaptively changed in the feasible search space. Computational results on DIMACS benchmark graphs indicate that KLS is capable of finding considerably satisfactory cliques with reasonable running times in comparison with those of state-of-the-art metaheuristics. 相似文献

20.

Multidimensional reverse <Emphasis Type="Italic">k</Emphasis>NN search

Yufei Tao Dimitris Papadias Xiang Lian Xiaokui Xiao 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(3):293-316

Given a multidimensional point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following deficiencies: they (i) do not support arbitrary values of k, (ii) cannot deal efficiently with database updates, (iii) are applicable only to 2D data but not to higher dimensionality, and (iv) retrieve only approximate results. Motivated by these shortcomings, we develop algorithms for exact RkNN processing with arbitrary values of k on dynamic, multidimensional datasets. Our methods utilize a conventional data-partitioning index on the dataset and do not require any pre-computation. As a second step, we extend the proposed techniques to continuous RkNN search, which returns the RkNN results for every point on a line segment. We evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic datasets. 相似文献