共查询到20条相似文献,搜索用时 15 毫秒
1.
Most fast k-nearest neighbor (k-NN) algorithms exploit metric properties of distance measures for reducing computation cost and a few can work effectively on both metric and nonmetric measures. We propose a cluster-based tree algorithm to accelerate k-NN classification without any presuppositions about the metric form and properties of a dissimilarity measure. A mechanism of early decision making and minimal side-operations for choosing searching paths largely contribute to the efficiency of the algorithm. The algorithm is evaluated through extensive experiments over standard NIST and MNIST databases. 相似文献
2.
Particle methods provide a simple yet powerful framework for simulating both discrete and continuous systems either deterministically or stochastically. The inherent adaptivity of particle methods is particularly appealing when simulating multiscale models or systems that develop a wide spectrum of length scales. Evaluating particle–particle interactions using neighbor-finding algorithms such as cell lists or Verlet lists, however, quickly becomes inefficient in adaptive-resolution simulations where the interaction cutoff radius is a function of space. We present a novel adaptive-resolution cell list algorithm and the associated data structures that provide efficient access to the interaction partners of a particle, independent of the (potentially continuous) spectrum of cutoff radii present in a simulation. We characterize the computational cost of the proposed algorithm for a wide range of resolution spans and particle numbers, showing that the present algorithm outperforms conventional uniform-resolution cell lists in most adaptive-resolution settings. 相似文献
3.
We propose a fast agglomerative clustering method using an approximate nearest neighbor graph for reducing the number of distance calculations. The time complexity of the algorithm is improved from O(tauN 2) to O(tauN log N) at the cost of a slight increase in distortion; here, tau denotes the lumber of nearest neighbor updates required at each iteration. According to the experiments, a relatively small neighborhood size is sufficient to maintain the quality close to that of the full search 相似文献
5.
In this paper, we present a novel nearest neighbor rule-based implementation of the structural risk minimization principle to address a generic classification problem. We propose a fast reference set thinning algorithm on the training data set similar to a support vector machine (SVM) approach. We then show that the nearest neighbor rule based on the reduced set implements the structural risk minimization principle, in a manner which does not involve selection of a convenient feature space. Simulation results on real data indicate that this method significantly reduces the computational cost of the conventional SVMs, and achieves a nearly comparable test error performance. 相似文献
6.
Vector quantization has been widely employed in nearest neighbor search because it can approximate the Euclidean distance of two vectors with the table look-up way that can be precomputed. Additive quantization (AQ) algorithm validated that low approximation error can be achieved by representing each input vector with a sum of dependent codewords, each of which is from its own codebook. However, the AQ algorithm relies on computational expensive beam search algorithm to encode each vector, which is prohibitive for the efficiency of the approximate nearest neighbor search. In this paper, we propose a fast AQ algorithm that significantly accelerates the encoding phase. We formulate the beam search algorithm as an optimization of codebook selection orders. According to the optimal order, we learn the codebooks with hierarchical construction, in which the search width can be set very small. Specifically, the codewords are firstly exchanged into proper codebooks by the indexed frequency in each step. Then the codebooks are updated successively to adapt the quantization residual of previous quantization level. In coding phase, the vectors are compressed with learned codebooks via the best order, where the search range is considerably reduced. The proposed method achieves almost the same performance as AQ, while the speed for the vector encoding phase can be accelerated dozens of times. The experiments are implemented on two benchmark datasets and the results verify our conclusion. 相似文献
7.
为解决室内WiFi指纹定位速度慢及定位波动大的问题,采用高斯拟合和多次测量取平均值的方法对接收的信号进行平滑处理;以距离为相似性测度,规定一个阈值对建立指纹数据库进行分类;改进K近邻算法,并在分类的基础上实现K近邻快速匹配.实验结果表明:经过分类处理过的定位系统耗时有很大程度的改善,平均降幅62.8%;WiFi指纹定位精度的平均误差从4.17 m降到了2.12 m. 相似文献
9.
In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems). 相似文献
10.
Learning from high-dimensional data is usually quite challenging, as captured by the well-known phrase curse of dimensionality. Data analysis often involves measuring the similarity between different examples. This sometimes becomes a problem, as many widely used metrics tend to concentrate in high-dimensional feature spaces. The reduced contrast makes it more difficult to distinguish between close and distant points, which renders many traditional distance-based learning methods ineffective. Secondary distances based on shared neighbor similarities have recently been proposed as one possible solution to this problem. However, these initial metrics failed to take hubness into account. Hubness is a recently described aspect of the dimensionality curse, and it affects all sorts of $k$ -nearest neighbor learning methods in severely negative ways. This paper is the first to discuss the impact of hubs on forming the shared neighbor similarity scores. We propose a novel, hubness-aware secondary similarity measure $simhub_s$ and an extensive experimental evaluation shows it to be much more appropriate for high-dimensional data classification than the standard $simcos_s$ measure. The proposed similarity changes the underlying $k$ NN graph in such a way that it reduces the overall frequency of label mismatches in $k$ -neighbor sets and increases the purity of occurrence profiles, which improves classifier performance. It is a hybrid measure, which takes into account both the supervised and the unsupervised hubness information. The analysis shows that both components are useful in their own ways and that the measure is therefore properly defined. This new similarity does not increase the overall computational cost, and the improvement is essentially ‘free’. 相似文献
11.
Traditional fast k-nearest neighbor search algorithms based on pyramid structures need either many extra memories or long search time. This paper proposes a fast k-nearest neighbor search algorithm based on the wavelet transform, which exploits the important information hiding in the transform coefficients to reduce the computational complexity. The study indicates that the Haar wavelet transform brings two kinds of important pyramids. Two elimination criteria derived from the transform coefficients are used to reject those impossible candidates. Experimental results on texture classification verify the effectiveness of the proposed algorithm. 相似文献
12.
An efficient and universal similarity search solution is a holy grail for multimedia information retrieval. Most similarity indexes work by mapping the original multimedia objects into simpler representations, which are then searched by proximity using a suitable distance function. 相似文献
13.
Sparse coding has received extensive attention in the literature of image classification. Traditional sparse coding strategies tend to approximate local features in terms of a linear combination of basis vectors, without considering feature neighboring relationships. In this scenario, similar instances in the feature space may result in totally different sparse codes. To address this shortcoming, we investigate how to develop new sparse representations which preserve feature similarities. We commence by establishing two modules to improve the discriminative ability of sparse representation. The first module selects discriminative features for each class, and the second module eliminates non-informative visual words. We then explore the distribution of similar features over the dominant basis vectors for each class. We incorporate the feature distribution into the objective function, spanning a class-specific low dimensional subspace for effective sparse coding. Extensive experiments on various image classification tasks validate that the proposed approach consistently outperforms several state-of-the-art methods. 相似文献
14.
The k nearest neighbor ( k- NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k- NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k- NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k- MSN) is proposed. Some experiments with synthetic and real data are presented. 相似文献
15.
The human liver disorder is a genetic problem due to the habituality of alcohol or effect by the virus. It can lead to liver failure or liver cancer, if not been detected in initial stage. The aim of the proposed method is to detect the liver disorder in initial stage using liver function test dataset. The problem with many real-world datasets including liver disease diagnosis data is class imbalanced. The word imbalance refers to the conditions that the number of observations belongs to one class having more or less than the other class(es). Traditional K- Nearest Neighbor (KNN) or Fuzzy KNN classifier does not work well on the imbalanced dataset because they treat the neighbor equally. The weighted variant of Fuzzy KNN assign a large weight for the neighbor belongs to the minority class data and relatively small weight for the neighbor belongs to the majority class to resolve the issues with data imbalance. In this paper, Variable- Neighbor Weighted Fuzzy K Nearest Neighbor Approach (Variable-NWFKNN) is proposed, which is an improved variant of Fuzzy-NWKNN. The proposed Variable-NWFKNN method is implemented on three real-world imbalance liver function test datasets BUPA, ILPD from UCI and MPRLPD. The Variable-NWFKNN is compared with existing NWKNN and Fuzzy-NWKKNN methods and found accuracy 73.91% (BUPA Dataset), 77.59% (ILPD Dataset) and 87.01% (MPRLPD Dataset). Further, TL_RUS method is used for preprocessing and it improved the accuracy as 78.46% (BUPA Dataset), 78.46% (ILPD Dataset) and 95.79% (MPRLPD Dataset). 相似文献
16.
In this paper, a novel center-based nearest neighbor (CNN) classifier is proposed to deal with the pattern classification problems. Unlike nearest feature line (NFL) method, CNN considers the line passing through a sample point with known label and the center of the sample class. This line is called the center-based line (CL). These lines seem to have more capacity of representation for sample classes than the original samples and thus can capture more information. Similar to NFL, CNN is based on the nearest distance from an unknown sample point to a certain CL for classification. As a result, the computation time of CNN can be shortened dramatically with less accuracy decrease when compared with NFL. The performance of CNN is demonstrated in one simulation experiment from computational biology and high classification accuracy has been achieved in the leave-one-out test. The comparisons with nearest neighbor (NN) classifier and NFL classifier indicate that this novel classifier achieves competitive performance. 相似文献
17.
Nearest neighbour classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbour classification to try to ameliorate this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighbourhoods. We determine the local decision boundaries from centroid information, and then shrink neighbourhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighbourhood-based classifier can be employed, using the modified neighbourhoods. The posterior probabilities tend to be more homogeneous in the modified neighbourhoods. We also propose a method for global dimension reduction, that combines local dimension information. In a number of examples, the methods demonstrate the potential for substantial improvements over nearest neighbour classification 相似文献
18.
As databases increasingly integrate different types of information such as time-series, multimedia and scientific data, it becomes necessary to support efficient retrieval of multi-dimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. As a result of the scale and high dimensional nature, the traditional techniques have proven inadequate. In this paper, we propose search techniques that are effective especially for large high dimensional data sets. We first propose VA +-file technique which is based on scalar quantization of the data. VA +-file is especially useful for searching exact nearest neighbors (NN) in non-uniform high dimensional data sets. We then discuss how to improve the search and make it progressive by allowing some approximations in the query result. We develop a general framework for approximate NN queries, discuss various approaches for progressive processing of similarity queries, and develop a metric for evaluation of such techniques. Finally, a new technique based on clustering is proposed, which merges the benefits of various approaches for progressive similarity searching. Extensive experimental evaluation is performed on several real-life data sets. The evaluation establishes the superiority of the proposed techniques over the existing techniques for high dimensional similarity searching. The techniques proposed in this paper are effective for real-life data sets, which are typically non-uniform, and they are scalable with respect to both dimensionality and size of the data set. 相似文献
19.
Recent research of sparse signal representation has aimed at learning discriminative sparse models instead of purely reconstructive ones for classification tasks, such as sparse representation based classification (SRC) which obtains state-of-the-art results in face recognition. In this paper, a new method is proposed in that direction. With the assumption of locally linear embedding, the proposed method achieves the classification goal via sparse neighbor representation, combining the reconstruction property, sparsity and discrimination power. The experiments on several data sets are performed and results show that the proposed method is acceptable for nonlinear data sets. Further, it is argued that the proposed method is well suited for the classification of low dimensional data dimensionally reduced by dimensionality reduction methods, especially the methods obtaining the low dimensional and neighborhood preserving embeddings, and it costs less time. 相似文献
20.
This paper addresses the problem of continuous aggregate nearest-neighbor (CANN) queries for moving objects in spatio-temporal data stream management systems. A CANN query specifies a set of landmarks, an integer k, and an aggregate distance function f (e.g., min, max, or sum), where f computes the aggregate distance between a moving object and each of the landmarks. The answer to this continuous query is the set of k moving objects that have the smallest aggregate distance f. A CANN query may also be viewed as a combined set of nearest neighbor queries. We introduce several algorithms to continuously and incrementally answer CANN queries. Extensive experimentation shows that the proposed operators outperform the state-of-the-art algorithms by up to a factor of 3 and incur low memory overhead. 相似文献
|