首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fast k-nearest neighbor classification using cluster-based trees   总被引:5,自引:0,他引:5  
Most fast k-nearest neighbor (k-NN) algorithms exploit metric properties of distance measures for reducing computation cost and a few can work effectively on both metric and nonmetric measures. We propose a cluster-based tree algorithm to accelerate k-NN classification without any presuppositions about the metric form and properties of a dissimilarity measure. A mechanism of early decision making and minimal side-operations for choosing searching paths largely contribute to the efficiency of the algorithm. The algorithm is evaluated through extensive experiments over standard NIST and MNIST databases.  相似文献   

2.
Particle methods provide a simple yet powerful framework for simulating both discrete and continuous systems either deterministically or stochastically. The inherent adaptivity of particle methods is particularly appealing when simulating multiscale models or systems that develop a wide spectrum of length scales. Evaluating particle–particle interactions using neighbor-finding algorithms such as cell lists or Verlet lists, however, quickly becomes inefficient in adaptive-resolution simulations where the interaction cutoff radius is a function of space. We present a novel adaptive-resolution cell list algorithm and the associated data structures that provide efficient access to the interaction partners of a particle, independent of the (potentially continuous) spectrum of cutoff radii present in a simulation. We characterize the computational cost of the proposed algorithm for a wide range of resolution spans and particle numbers, showing that the present algorithm outperforms conventional uniform-resolution cell lists in most adaptive-resolution settings.  相似文献   

3.
Fast agglomerative clustering using a k-nearest neighbor graph   总被引:1,自引:0,他引:1  
We propose a fast agglomerative clustering method using an approximate nearest neighbor graph for reducing the number of distance calculations. The time complexity of the algorithm is improved from O(tauN2) to O(tauN log N) at the cost of a slight increase in distortion; here, tau denotes the lumber of nearest neighbor updates required at each iteration. According to the experiments, a relatively small neighborhood size is sufficient to maintain the quality close to that of the full search  相似文献   

4.
5.
In this paper, we present a novel nearest neighbor rule-based implementation of the structural risk minimization principle to address a generic classification problem. We propose a fast reference set thinning algorithm on the training data set similar to a support vector machine (SVM) approach. We then show that the nearest neighbor rule based on the reduced set implements the structural risk minimization principle, in a manner which does not involve selection of a convenient feature space. Simulation results on real data indicate that this method significantly reduces the computational cost of the conventional SVMs, and achieves a nearly comparable test error performance.  相似文献   

6.
Vector quantization has been widely employed in nearest neighbor search because it can approximate the Euclidean distance of two vectors with the table look-up way that can be precomputed. Additive quantization (AQ) algorithm validated that low approximation error can be achieved by representing each input vector with a sum of dependent codewords, each of which is from its own codebook. However, the AQ algorithm relies on computational expensive beam search algorithm to encode each vector, which is prohibitive for the efficiency of the approximate nearest neighbor search. In this paper, we propose a fast AQ algorithm that significantly accelerates the encoding phase. We formulate the beam search algorithm as an optimization of codebook selection orders. According to the optimal order, we learn the codebooks with hierarchical construction, in which the search width can be set very small. Specifically, the codewords are firstly exchanged into proper codebooks by the indexed frequency in each step. Then the codebooks are updated successively to adapt the quantization residual of previous quantization level. In coding phase, the vectors are compressed with learned codebooks via the best order, where the search range is considerably reduced. The proposed method achieves almost the same performance as AQ, while the speed for the vector encoding phase can be accelerated dozens of times. The experiments are implemented on two benchmark datasets and the results verify our conclusion.  相似文献   

7.
8.
为解决室内WiFi指纹定位速度慢及定位波动大的问题,采用高斯拟合和多次测量取平均值的方法对接收的信号进行平滑处理;以距离为相似性测度,规定一个阈值对建立指纹数据库进行分类;改进K近邻算法,并在分类的基础上实现K近邻快速匹配.实验结果表明:经过分类处理过的定位系统耗时有很大程度的改善,平均降幅62.8%;WiFi指纹定位精度的平均误差从4.17 m降到了2.12 m.  相似文献   

9.
邻居发现的速度影响着整个网络组网和通信的效率,现有的邻居发现协议未考虑多个节点同时发送信标时,产生信标冲突的情况。针对这个问题,提出了一种有效避免信标冲突的快速邻居发现机制,即在发送信标前,采用载波侦听机制去侦听信道状态,从而减少信标冲突,提高发现效率。同时又采用动态增加唤醒时隙来减少发现时延,加快邻居发现。仿真结果表明,采用该机制无论在占空比对称还是非对称的情况下,都能有效加快现有协议发现速度。  相似文献   

10.
重抽样方法是常用的解决数据非平衡问题的一种有效手段,为提高入侵检测系统的检测效率,降低数据的不平衡程度,提出了快速分层最近邻FHNN重抽样方法,采用两阶段的基于负载均衡策略的高速网络入侵检测模型,按协议类型把KDD’99的训练数据集划分并在每类子集上进行了各种实验。实验结果表明该方法不仅可以很好地删除噪声数据和冗余信息,尤其是类区域内样本,减小数据的不平衡度和样本总量,而且由于算法时间复杂度是线性阶的,在样本数量很大的情况下,运行速度非常快,适合从海量的数据中快速而有效地检测各类攻击。  相似文献   

11.
In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems).  相似文献   

12.
Learning from high-dimensional data is usually quite challenging, as captured by the well-known phrase curse of dimensionality. Data analysis often involves measuring the similarity between different examples. This sometimes becomes a problem, as many widely used metrics tend to concentrate in high-dimensional feature spaces. The reduced contrast makes it more difficult to distinguish between close and distant points, which renders many traditional distance-based learning methods ineffective. Secondary distances based on shared neighbor similarities have recently been proposed as one possible solution to this problem. However, these initial metrics failed to take hubness into account. Hubness is a recently described aspect of the dimensionality curse, and it affects all sorts of $k$ -nearest neighbor learning methods in severely negative ways. This paper is the first to discuss the impact of hubs on forming the shared neighbor similarity scores. We propose a novel, hubness-aware secondary similarity measure $simhub_s$ and an extensive experimental evaluation shows it to be much more appropriate for high-dimensional data classification than the standard $simcos_s$ measure. The proposed similarity changes the underlying $k$ NN graph in such a way that it reduces the overall frequency of label mismatches in $k$ -neighbor sets and increases the purity of occurrence profiles, which improves classifier performance. It is a hybrid measure, which takes into account both the supervised and the unsupervised hubness information. The analysis shows that both components are useful in their own ways and that the measure is therefore properly defined. This new similarity does not increase the overall computational cost, and the improvement is essentially ‘free’.  相似文献   

13.
14.
Traditional fast k-nearest neighbor search algorithms based on pyramid structures need either many extra memories or long search time. This paper proposes a fast k-nearest neighbor search algorithm based on the wavelet transform, which exploits the important information hiding in the transform coefficients to reduce the computational complexity. The study indicates that the Haar wavelet transform brings two kinds of important pyramids. Two elimination criteria derived from the transform coefficients are used to reject those impossible candidates. Experimental results on texture classification verify the effectiveness of the proposed algorithm.  相似文献   

15.
An efficient and universal similarity search solution is a holy grail for multimedia information retrieval. Most similarity indexes work by mapping the original multimedia objects into simpler representations, which are then searched by proximity using a suitable distance function.  相似文献   

16.

The human liver disorder is a genetic problem due to the habituality of alcohol or effect by the virus. It can lead to liver failure or liver cancer, if not been detected in initial stage. The aim of the proposed method is to detect the liver disorder in initial stage using liver function test dataset. The problem with many real-world datasets including liver disease diagnosis data is class imbalanced. The word imbalance refers to the conditions that the number of observations belongs to one class having more or less than the other class(es). Traditional K- Nearest Neighbor (KNN) or Fuzzy KNN classifier does not work well on the imbalanced dataset because they treat the neighbor equally. The weighted variant of Fuzzy KNN assign a large weight for the neighbor belongs to the minority class data and relatively small weight for the neighbor belongs to the majority class to resolve the issues with data imbalance. In this paper, Variable- Neighbor Weighted Fuzzy K Nearest Neighbor Approach (Variable-NWFKNN) is proposed, which is an improved variant of Fuzzy-NWKNN. The proposed Variable-NWFKNN method is implemented on three real-world imbalance liver function test datasets BUPA, ILPD from UCI and MPRLPD. The Variable-NWFKNN is compared with existing NWKNN and Fuzzy-NWKKNN methods and found accuracy 73.91% (BUPA Dataset), 77.59% (ILPD Dataset) and 87.01% (MPRLPD Dataset). Further, TL_RUS method is used for preprocessing and it improved the accuracy as 78.46% (BUPA Dataset), 78.46% (ILPD Dataset) and 95.79% (MPRLPD Dataset).

  相似文献   

17.
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented.  相似文献   

18.
In this paper, a novel center-based nearest neighbor (CNN) classifier is proposed to deal with the pattern classification problems. Unlike nearest feature line (NFL) method, CNN considers the line passing through a sample point with known label and the center of the sample class. This line is called the center-based line (CL). These lines seem to have more capacity of representation for sample classes than the original samples and thus can capture more information. Similar to NFL, CNN is based on the nearest distance from an unknown sample point to a certain CL for classification. As a result, the computation time of CNN can be shortened dramatically with less accuracy decrease when compared with NFL. The performance of CNN is demonstrated in one simulation experiment from computational biology and high classification accuracy has been achieved in the leave-one-out test. The comparisons with nearest neighbor (NN) classifier and NFL classifier indicate that this novel classifier achieves competitive performance.  相似文献   

19.
Sparse coding has received extensive attention in the literature of image classification. Traditional sparse coding strategies tend to approximate local features in terms of a linear combination of basis vectors, without considering feature neighboring relationships. In this scenario, similar instances in the feature space may result in totally different sparse codes. To address this shortcoming, we investigate how to develop new sparse representations which preserve feature similarities. We commence by establishing two modules to improve the discriminative ability of sparse representation. The first module selects discriminative features for each class, and the second module eliminates non-informative visual words. We then explore the distribution of similar features over the dominant basis vectors for each class. We incorporate the feature distribution into the objective function, spanning a class-specific low dimensional subspace for effective sparse coding. Extensive experiments on various image classification tasks validate that the proposed approach consistently outperforms several state-of-the-art methods.  相似文献   

20.
Discriminant adaptive nearest neighbor classification   总被引:11,自引:0,他引:11  
Nearest neighbour classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbour classification to try to ameliorate this curse of dimensionality. We use a local linear discriminant analysis to estimate an effective metric for computing neighbourhoods. We determine the local decision boundaries from centroid information, and then shrink neighbourhoods in directions orthogonal to these local decision boundaries, and elongate them parallel to the boundaries. Thereafter, any neighbourhood-based classifier can be employed, using the modified neighbourhoods. The posterior probabilities tend to be more homogeneous in the modified neighbourhoods. We also propose a method for global dimension reduction, that combines local dimension information. In a number of examples, the methods demonstrate the potential for substantial improvements over nearest neighbour classification  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号