首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 97 毫秒
1.
The problem of k nearest neighbors (kNN) is to find the nearest k neighbors for a query point from a given data set. In this paper, a novel fast kNN search method using an orthogonal search tree is proposed. The proposed method creates an orthogonal search tree for a data set using an orthonormal basis evaluated from the data set. To find the kNN for a query point from the data set, projection values of the query point onto orthogonal vectors in the orthonormal basis and a node elimination inequality are applied for pruning unlikely nodes. For a node, which cannot be deleted, a point elimination inequality is further used to reject impossible data points. Experimental results show that the proposed method has good performance on finding kNN for query points and always requires less computation time than available kNN search algorithms, especially for a data set with a big number of data points or a large standard deviation.  相似文献   

2.
In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems).  相似文献   

3.
We present a distributed algorithm for implementing α-β search on a tree of processors. Each processor is an independent computer with its own memory and is connected by communication lines to each of its nearest neighbors. Measurements of the algorithm's performance on the Arachne distributed operating system are presented. A theoretical model is developed that predicts at least order of k12 speedup with k processors.  相似文献   

4.
A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We exploit the marginal distribution of the k th nearest neighbors in low dimensions, which is estimated from the stored data (an empirical percentile approach). We analyze the basic nature of the marginal distribution and show the advantage of the implemented algorithm, which is a probabilistic variant of the partial distance searching. Its query time is sublinear in data size n, that is, O(mnδ) with δ=o(1) in n and δ≤1, for any fixed dimension m.  相似文献   

5.
6.
In a wireless mobile environment, data broadcasting provides an efficient way to disseminate data. Via data broadcasting, a server can provide location-based services to a large client population in a wireless environment. Among different location-based services, the k nearest neighbors (kNN) search is important and is used to find the k closest objects to a given point. However, the kNN search in a broadcast environment is particularly challenging due to the sequential access to the data on a broadcast channel. We propose efficient protocols for the kNN search on a broadcast R-tree, which is a popular multi-dimensional index tree, in a wireless broadcast environment in terms of latency and tuning time as well as memory usage. We investigate how a server schedules the broadcast and provide the corresponding kNN search algorithms at the mobile clients. One of our kNN search protocols further allows a kNN search to start at an arbitrary time instance and it can skip the waiting time for the beginning of a broadcast cycle, thereby reducing the latency. The experimental results validate that our mechanisms achieve the objectives.  相似文献   

7.
Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the -tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation. Received June 9, 1998 / Accepted January 31, 2000  相似文献   

8.
Let P be a set of n colored points distributed arbitrarily in R2. The chromatic distribution of the k-nearest neighbors of a query line segment ? is to report the number of points of each color among the k-nearest points of the query line segment. While solving this problem, we have encountered another interesting problem, namely the semicircular range counting query. Here a set of n points is given. The objective is to report the number of points inside a given semicircular range. We propose a simple algorithm for this problem with preprocessing time and space complexity O(n3), and the query time complexity O(logn). Finally, we propose the algorithm for reporting the chromatic distribution of k nearest neighbors of a query line segment. Using our proposed technique for semicircular range counting query, it runs in O(log2n) time.  相似文献   

9.
k nearest neighbor (kNN) is an effective and powerful lazy learning algorithm, notwithstanding its easy-to-implement. However, its performance heavily relies on the quality of training data. Due to many complex real-applications, noises coming from various possible sources are often prevalent in large scale databases. How to eliminate anomalies and improve the quality of data is still a challenge. To alleviate this problem, in this paper we propose a new anomaly removal and learning algorithm under the framework of kNN. The primary characteristic of our method is that the evidence of removing anomalies and predicting class labels of unseen instances is mutual nearest neighbors, rather than k nearest neighbors. The advantage is that pseudo nearest neighbors can be identified and will not be taken into account during the prediction process. Consequently, the final learning result is more creditable. An extensive comparative experimental analysis carried out on UCI datasets provided empirical evidence of the effectiveness of the proposed method for enhancing the performance of the k-NN rule.  相似文献   

10.
《Pattern recognition letters》2003,24(9-10):1437-1451
In this paper, the efficiency of branch and bound search algorithms for the computation of K nearest neighbors is studied. The most important aspects that influence the efficiency of the search algorithm are: (1) the decomposition method, (2) the elimination rule, (3) the traversal order and (4) the level of decomposition. First, a theoretical derivation of an efficient decomposition method based on principal component analysis is given. Then, different elimination rules and traversal orders are combined resulting in ten different search algorithms. Since the efficiency is strongly dependent on the level of decomposition, this user specified parameter is optimized first. This optimization is realized by a probabilistic model that expresses the total computation time in function of the node traversal cost and the distance computation cost. All comparisons are based on the total computation time for the optimal level of decomposition.  相似文献   

11.
This note presents a simplification and generalization of an algorithm for searchingk-dimensional trees for nearest neighbors reported by Friedmanet al [3]. If the distance between records is measured usingL 2 , the Euclidean norm, the data structure used by the algorithm to determine the bounds of the search space can be simplified to a single number. Moreover, because distance measurements inL 2 are rotationally invariant, the algorithm can be generalized to allow a partition plane to have an arbitrary orientation, rather than insisting that it be perpendicular to a coordinate axis, as in the original algorithm. When ak-dimensional tree is built, this plane can be found from the principal eigenvector of the covariance matrix of the records to be partitioned. These techniques and others yield variants ofk-dimensional trees customized for specific applications. It is wrong to assume thatk-dimensional trees guarantee that a nearest-neighbor query completes in logarithmic expected time. For smallk, logarithmic behavior is observed on all but tiny trees. However, for largerk, logarithmic behavior is achievable only with extremely large numbers of records. Fork = 16, a search of ak-dimensional tree of 76,000 records examines almost every record.  相似文献   

12.
13.
Due to the advancement of wireless internet and mobile positioning technology, the application of location-based services (LBSs) has become popular for mobile users. Since users have to send their exact locations to obtain the service, it may lead to several privacy threats. To solve this problem, a cloaking method has been proposed to blur users’ exact locations into a cloaked spatial region with a required privacy threshold (k). With the cloaked region, an LBS server can carry out a k-nearest neighbor (k-NN) search algorithm. Some recent studies have proposed methods to search k-nearest POIs while protecting a user’s privacy. However, they have at least one major problem, such as inefficiency on query processing or low precision of retrieved result. To resolve these problems, in this paper, we propose a novel k-NN query processing algorithm for a cloaking region to satisfy both requirements of fast query processing time and high precision of the retrieved result. To achieve fast query processing time, we propose a new pruning technique based on a 2D-coodinate scheme. In addition, we make use of a Voronoi diagram for retrieving the nearest POIs efficiently. To satisfy the requirement of high precision of the retrieved result, we guarantee that our k-NN query processing algorithm always contains the exact set of k nearest neighbors. Our performance analysis shows that our algorithm achieves better performance in terms of query processing time and the number of candidate POIs compared with other algorithms.  相似文献   

14.
Given a set D of trajectories, a query object q, and a query time extent Γ, a mutual (i.e., symmetric) nearest neighbor (MNN) query over trajectories finds from D, the set of trajectories that are among the k1 nearest neighbors (NNs) of q within Γ, and meanwhile, have q as one of their k2 NNs. This type of queries is useful in many applications such as decision making, data mining, and pattern recognition, as it considers both the proximity of the trajectories to q and the proximity of q to the trajectories. In this paper, we first formalize MNN search and identify its characteristics, and then develop several algorithms for processing MNN queries efficiently. In particular, we investigate two classes of MNN queries, i.e., MNNP and MNNT queries, which are defined with respect to stationary query points and moving query trajectories, respectively. Our methods utilize the batch processing and reusing technology to reduce the I/O cost (i.e., number of node/page accesses) and CPU time significantly. In addition, we extend our techniques to tackle historical continuous MNN (HCMNN) search for moving object trajectories, which returns the mutual nearest neighbors of q (for a specified k1 and k2) at any time instance of Γ. Extensive experiments with real and synthetic datasets demonstrate the performance of our proposed algorithms in terms of efficiency and scalability.  相似文献   

15.
In this paper, a new algorithm is developed to reduce the computational complexity of Ward’s method. The proposed approach uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge. Double linked algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm by obtaining an approximate solution of hierarchical agglomerative clustering. In this paper, we propose a method to resolve the problem of a non-optimal solution for DLA while keeping the corresponding advantage of low computational complexity. The computational complexity of the proposed method DKNNA + FS (dynamic k-nearest-neighbor algorithm with a fast search) in terms of the number of distance calculations is O(N2), where N is the number of data points. Compared to FPNN with a fast search (FPNN + FS), the proposed method using the same fast search algorithm (DKNNA + FS) can reduce the computing time by a factor of 1.90-2.18 for the data set from a real image. In comparison with FPNN + FS, DKNNA + FS can reduce the computing time by a factor of 1.92-2.02 using the data set generated from three images. Compared to DLA with a fast search (DLA + FS), DKNNA + FS can decrease the average mean square error by 1.26% for the same data set.  相似文献   

16.
17.
Text categorization is a significant technique to manage the surging text data on the Internet.The k-nearest neighbors(kNN) algorithm is an effective,but not efficient,classification model for text categorization.In this paper,we propose an effective strategy to accelerate the standard kNN,based on a simple principle:usually,near points in space are also near when they are projected into a direction,which means that distant points in the projection direction are also distant in the original space.Using the proposed strategy,most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point,which greatly decreases the computation cost.Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN,with little degradation in accuracy.Specifically,it is superior in applications that have large and high-dimensional datasets.  相似文献   

18.
We establish a refined search tree technique for the parameterized DOMINATING SET problem on planar graphs. Here, we are given an undirected graph and we ask for a set of at most k vertices such that every other vertex has at least one neighbor in this set. We describe algorithms with running times O(8kn) and O(8kk+n3), where n is the number of vertices in the graph, based on bounded search trees. We describe a set of polynomial time data-reduction rules for a more general “annotated” problem on black/white graphs that asks for a set of k vertices (black or white) that dominate all the black vertices. An intricate argument based on the Euler formula then establishes an efficient branching strategy for reduced inputs to this problem. In addition, we give a family examples showing that the bound of the branching theorem is optimal with respect to our reduction rules. Our final search tree algorithm is easy to implement; its analysis, however, is involved.  相似文献   

19.
An effective local search for the maximum clique problem   总被引:2,自引:0,他引:2  
We propose a variable depth search based algorithm, called k-opt local search (KLS), for the maximum clique problem. KLS efficiently explores the k-opt neighborhood defined as the set of neighbors that can be obtained by a sequence of several add and drop moves that are adaptively changed in the feasible search space. Computational results on DIMACS benchmark graphs indicate that KLS is capable of finding considerably satisfactory cliques with reasonable running times in comparison with those of state-of-the-art metaheuristics.  相似文献   

20.
Given a multidimensional point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following deficiencies: they (i) do not support arbitrary values of k, (ii) cannot deal efficiently with database updates, (iii) are applicable only to 2D data but not to higher dimensionality, and (iv) retrieve only approximate results. Motivated by these shortcomings, we develop algorithms for exact RkNN processing with arbitrary values of k on dynamic, multidimensional datasets. Our methods utilize a conventional data-partitioning index on the dataset and do not require any pre-computation. As a second step, we extend the proposed techniques to continuous RkNN search, which returns the RkNN results for every point on a line segment. We evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号