首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The k-nearest neighbor (KNN) rule is a classical and yet very effective nonparametric technique in pattern classification, but its classification performance severely relies on the outliers. The local mean-based k-nearest neighbor classifier (LMKNN) was firstly introduced to achieve robustness against outliers by computing the local mean vector of k nearest neighbors for each class. However, its performances suffer from the choice of the single value of k for each class and the uniform value of k for different classes. In this paper, we propose a new KNN-based classifier, called multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. In our method, the k nearest neighbors in each class are first found, and then used to compute k different local mean vectors, which are employed to compute their harmonic mean distance to the query sample. Finally, MLM-KHNN proceeds in classifying the query sample to the class with the minimum harmonic mean distance. The experimental results, based on twenty real-world datasets from UCI and KEEL repository, demonstrated that the proposed MLM-KHNN classifier achieves lower classification error rate and is less sensitive to the parameter k, when compared to nine related competitive KNN-based classifiers, especially in small training sample size situations.  相似文献   

2.
The nearest neighbor classification method assigns an unclassified point to the class of the nearest case of a set of previously classified points. This rule is independent of the underlying joint distribution of the sample points and their classifications. An extension to this approach is the k-NN method, in which the classification of the unclassified point is made by following a voting criteria within the k nearest points.The method we present here extends the k-NN idea, searching in each class for the k nearest points to the unclassified point, and classifying it in the class which minimizes the mean distance between the unclassified point and the k nearest points within each class. As all classes can take part in the final selection process, we have called the new approach k Nearest Neighbor Equality (k-NNE).Experimental results we obtained empirically show the suitability of the k-NNE algorithm, and its effectiveness suggests that it could be added to the current list of distance based classifiers.  相似文献   

3.
4.
A new approach called shortest feature line segment (SFLS) is proposed to implement pattern classification in this paper, which can retain the ideas and advantages of nearest feature line (NFL) and at the same time can counteract the drawbacks of NFL. The proposed SFLS uses the length of the feature line segment satisfying given geometric relation with query point instead of the perpendicular distance defined in NFL. SFLS has clear geometric-theoretic foundation and is relatively simple. Experimental results on some artificial datasets and real-world datasets are provided, together with the comparisons between SFLS and other neighborhood-based classification methods, including nearest neighbor (NN), k-NN, NFL and some refined NFL methods, etc. It can be concluded that SFLS is a simple yet effective classification approach.  相似文献   

5.
This paper presents a novel method for differential diagnosis of erythemato-squamous disease. The proposed method is based on fuzzy weighted pre-processing, k-NN (nearest neighbor) based weighted pre-processing, and decision tree classifier. The proposed method consists of three parts. In the first part, we have used decision tree classifier to diagnosis erythemato-squamous disease. In the second part, first of all, fuzzy weighted pre-processing, which can improved by ours, is a new method and applied to inputs erythemato-squamous disease dataset. Then, the obtained weighted inputs were classified using decision tree classifier. In the third part, k-NN based weighted pre-processing, which can improved by ours, is a new method and applied to inputs erythemato-squamous disease dataset. Then, the obtained weighted inputs were classified via decision tree classifier. The employed decision tree classifier, fuzzy weighted pre-processing decision tree classifier, and k-NN based weighted pre-processing decision tree classifier have reached to 86.18, 97.57, and 99.00% classification accuracies using 20-fold cross validation, respectively.  相似文献   

6.
Nearest neighbor (NN) rule is one of the simplest and the most important methods in pattern recognition. In this paper, we propose a kernel difference-weighted k-nearest neighbor (KDF-KNN) method for pattern classification. The proposed method defines the weighted KNN rule as a constrained optimization problem, and we then propose an efficient solution to compute the weights of different nearest neighbors. Unlike traditional distance-weighted KNN which assigns different weights to the nearest neighbors according to the distance to the unclassified sample, difference-weighted KNN weighs the nearest neighbors by using both the correlation of the differences between the unclassified sample and its nearest neighbors. To take into account the effective nonlinear structure information, we further extend difference-weighted KNN to its kernel version KDF-KNN. Our experimental results indicate that KDF-WKNN is much better than the original KNN and the distance-weighted KNN methods, and is comparable to or better than several state-of-the-art methods in terms of classification accuracy.  相似文献   

7.
In this article, we propose a new generalization of the rank nearest neighbor (RNN) rule for multivariate data for diagnosis of breast cancer. We study the performance of this rule using two well known databases and compare the results with the conventional k-NN rule. We observe that this rule performed remarkably well, and the computational complexity of the proposed k-RNN is much less than the conventional k-NN rule.  相似文献   

8.
Hao Du 《Pattern recognition》2007,40(5):1486-1497
This paper points out and analyzes the advantages and drawbacks of the nearest feature line (NFL) classifier. To overcome the shortcomings, a new feature subspace with two simple and effective improvements is built to represent each class. The proposed method, termed rectified nearest feature line segment (RNFLS), is shown to possess a novel property of concentration as a result of the added line segments (features), which significantly enhances the classification ability. Another remarkable merit is that RNFLS is applicable to complex tasks such as the two-spiral distribution, which the original NFL cannot deal with properly. Finally, experimental comparisons with NFL, NN(nearest neighbor), k-NN and NNL (nearest neighbor line) using both artificial and real-world data-sets demonstrate that RNFLS offers the best performance.  相似文献   

9.
A novel classification method based on multiple-point statistics (MPS) is proposed in this article. The method is a modified version of the spatially weighted k-nearest neighbour (k-NN) classifier, which accounts for spatial correlation through weights applied to neighbouring pixels. The MPS characterizes the spatial correlation between multiple points of land-cover classes by learning local patterns in a training image. This rich spatial information is then converted to multiple-point probabilities and incorporated into the k-NN classifier. Experiments were conducted in two study areas, in which the proposed method for classification was tested on a WorldView-2 sub-scene of the Sichuan mountainous area and an IKONOS image of the Beijing urban area. The multiple-point weighted k-NN method (MPk-NN) was compared to several alternatives; including the traditional k-NN and two previously published spatially weighted k-NN schemes; the inverse distance weighted k-NN, and the geostatistically weighted k-NN. The classifiers using the Bayesian and Support Vector Machine (SVM) methods, and these classifiers weighted with spatial context using the Markov random field (MRF) model, were also introduced to provide a benchmark comparison with the MPk-NN method. The proposed approach increased classification accuracy significantly relative to the alternatives, and it is, thus, recommended for the identification of land-cover types with complex and diverse spatial distributions.  相似文献   

10.
Manifold-ranking is a powerful method in semi-supervised learning, and its performance heavily depends on the quality of the constructed graph. In this paper, we propose a novel graph structure named k-regular nearest neighbor (k-RNN) graph as well as its constructing algorithm, and apply the new graph structure in the framework of manifold-ranking based retrieval. We show that the manifold-ranking algorithm based on our proposed graph structure performs better than that of the existing graph structures such as k-nearest neighbor (k-NN) graph and connected graph in image retrieval, 2D data clustering as well as 3D model retrieval. In addition, the automatic sample reweighting and graph updating algorithms are presented for the relevance feedback of our algorithm. Experiments demonstrate that the proposed algorithm outperforms the state-of-the-art algorithms.  相似文献   

11.
In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems).  相似文献   

12.
Data weighting is of paramount importance with respect to classification performance in pattern recognition applications. In this paper, the output labels of datasets have been encoded using binary codes (numbers) and by this way provided a novel data weighting method called binary encoded output based data weighting (BEOBDW). In the proposed data weighting method, first of all, the output labels of datasets have been encoded with binary codes and then obtained two encoded output labels. Depending to these encoded outputs, the data points in datasets have been weighted using the relationships between features of datasets and two encoded output labels. To generalize the proposed data weighting method, five datasets have been used. These datasets are chain link (2 classes), two spiral (2 classes), iris (3 classes), wine (3 classes), and dermatology (6 classes). After applied BEOBDW to five datasets, the k-NN (nearest neighbor) classifier has been used to classify the weighted datasets. A set of experiments on used real world datasets demonstrated that the proposed data weighting method is a very efficient and has robust discrimination ability in the classification of datasets. BEOBDW method could be confidently used before many classification algorithms.  相似文献   

13.
Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN.  相似文献   

14.

Recently, big data are widely noticed in many fields like machine learning, pattern recognition, medical, financial, and transportation fields. Data analysis is crucial to converting data into more specific information fed to the decision-making systems. With the diverse and complex types of datasets, knowledge discovery becomes more difficult. One solution is to use feature subset selection preprocessing that reduces this complexity, so the computation and analysis become convenient. Preprocessing produces a reliable and suitable source for any data-mining algorithm. The effective features’ selection can improve a model’s performance and help us understand the characteristics and underlying structure of complex data. This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm. The proposed model showed good performance compared with the simple weighted nearest neighbor. The proposed model combines the firefly distance metric and the Euclidean distance used in the k nearest neighbor. The experimental results showed good insights in both time usage and feature weights compared with the weighted nearest neighbor. It also showed improvement in the classification accuracy by 12% compared with the weighted nearest neighbor algorithm. And using the cloud-distributed model reduced the processing time up to 30%, which is deliberated to be substantial compared with the recent state-of-the-art methods.

  相似文献   

15.
A reliable and precise classification of tumors is essential for successful treatment of cancer. Gene selection is an important step for improved diagnostics. The modified SFFS (sequential forward floating selection) algorithm based on weighted Mahalanobis distance, called MSWM, is proposed to identify optimal informative gene subsets taking into account joint discriminatory power for accurate discrimination in this study. Firstly, we make use of the one-dimensional weighted Mahalanobis distance to perform a preliminary selection of genes and then make use of the modified SFFS method and multidimensional weighted Mahalanobis distance to obtain the optimal informative gene subset for tumor classification. Finally, we used the k nearest neighbor and naive Bayes methods to classify tumors based on the optimal gene subset selected using the MSWM method. To validate the efficiency, the proposed MSWM method is applied to classify two different DNA microarray datasets. Our empirical study shows that the MSWM method for tumor classification can obtain better effectiveness of classification than the BWR (the ratio of between-groups to within-groups sum of squares) and IVGA_I (independent variable group analysis I) methods. It suggests that the MSWM gene selection method is ability to obtain correct informative gene subsets taking into account genes’ joint discriminatory power for tumor classification.  相似文献   

16.
It is very expensive and time-consuming to annotate huge amounts of data. Active learning would be a suitable approach to minimize the effort of annotation. A novel active learning approach, coupled K nearest neighbor pseudo pruning (CKNNPP), is proposed in the paper, which is based on querying examples by KNNPP method. The KNNPP method applies k nearest neighbor technique to search for k neighbor samples from labeled samples of unlabeled samples. When k labeled samples are not belong to the same class, the corresponded unlabeled sample is queried and given its right label by supervisor, and then it is added to labeled training set. In contrast with the previous depiction, the unlabeled sample is not selected and pruned, that is the pseudo pruning. This definition is enlightened from the K nearest neighbor pruning preprocessing. These samples selected by KNNPP are considered to be near or on the optimal classification hyperplane that is crucial for active learning. Especially, in order to avoid the excursion of the optimal classification hyperplane after adding a queried sample, CKNNPP method is proposed finally that two samples with different class label (like a couple, annotated by supervisor) are queried by KNNPP and added in the training set simultaneously for updating training set in each iteration. The CKNNPP can provide a good performance, and especially it is simple, effective, and robust, and can solve the classification problem with unbalanced dataset compared with the existing methods. Then, the computational complexity of CKNNPP is analyzed. Additionally, a new stopping criterion is applied in the proposed method, and the classifier is implemented by Lagrangian Support Vector Machines in iterations of active learning. Finally, twelve UCI datasets, image datasets of aircrafts, and the dataset of radar high-resolution range profile are used to validate the feasibility and effectiveness of the proposed method. The results illuminate that CKNNPP gains superior performance compared with the other seven state-of-the-art active learning approaches.  相似文献   

17.
The k nearest neighbors (k-NN) classification technique has a worldly wide fame due to its simplicity, effectiveness, and robustness. As a lazy learner, k-NN is a versatile algorithm and is used in many fields. In this classifier, the k parameter is generally chosen by the user, and the optimal k value is found by experiments. The chosen constant k value is used during the whole classification phase. The same k value used for each test sample can decrease the overall prediction performance. The optimal k value for each test sample should vary from others in order to have more accurate predictions. In this study, a dynamic k value selection method for each instance is proposed. This improved classification method employs a simple clustering procedure. In the experiments, more accurate results are found. The reasons of success have also been understood and presented.  相似文献   

18.
The k nearest neighbor is a lazy learning algorithm that is inefficient in the classification phase because it needs to compare the query sample with all training samples. A template reduction method is recently proposed that uses only samples near the decision boundary for classification and removes those far from the decision boundary. However, when class distributions overlap, more border samples are retrained and it leads to inefficient performance in the classification phase. Because the number of reduced samples are limited, using an appropriate feature reduction method seems a logical choice to improve classification time. This paper proposes a new prototype reduction method for the k nearest neighbor algorithm, and it is based on template reduction and ViSOM. The potential property of ViSOM is displaying the topology of data on a two-dimensional feature map, it provides an intuitive way for users to observe and analyze data. An efficient classification framework is then presented, which combines the feature reduction method and the prototype selection algorithm. It needs a very small data size for classification while keeping recognition rate. In the experiments, both of synthetic and real datasets are used to evaluate the performance. Experimental results demonstrate that the proposed method obtains above 70 % speedup ratio and 90 % compression ratio while maintaining similar performance to kNN.  相似文献   

19.
Due to the advancement of wireless internet and mobile positioning technology, the application of location-based services (LBSs) has become popular for mobile users. Since users have to send their exact locations to obtain the service, it may lead to several privacy threats. To solve this problem, a cloaking method has been proposed to blur users’ exact locations into a cloaked spatial region with a required privacy threshold (k). With the cloaked region, an LBS server can carry out a k-nearest neighbor (k-NN) search algorithm. Some recent studies have proposed methods to search k-nearest POIs while protecting a user’s privacy. However, they have at least one major problem, such as inefficiency on query processing or low precision of retrieved result. To resolve these problems, in this paper, we propose a novel k-NN query processing algorithm for a cloaking region to satisfy both requirements of fast query processing time and high precision of the retrieved result. To achieve fast query processing time, we propose a new pruning technique based on a 2D-coodinate scheme. In addition, we make use of a Voronoi diagram for retrieving the nearest POIs efficiently. To satisfy the requirement of high precision of the retrieved result, we guarantee that our k-NN query processing algorithm always contains the exact set of k nearest neighbors. Our performance analysis shows that our algorithm achieves better performance in terms of query processing time and the number of candidate POIs compared with other algorithms.  相似文献   

20.
Automatic text classification is usually based on models constructed through learning from training examples. However, as the size of text document repositories grows rapidly, the storage requirements and computational cost of model learning is becoming ever higher. Instance selection is one solution to overcoming this limitation. The aim is to reduce the amount of data by filtering out noisy data from a given training dataset. A number of instance selection algorithms have been proposed in the literature, such as ENN, IB3, ICF, and DROP3. However, all of these methods have been developed for the k-nearest neighbor (k-NN) classifier. In addition, their performance has not been examined over the text classification domain where the dimensionality of the dataset is usually very high. The support vector machines (SVM) are core text classification techniques. In this study, a novel instance selection method, called Support Vector Oriented Instance Selection (SVOIS), is proposed. First of all, a regression plane in the original feature space is identified by utilizing a threshold distance between the given training instances and their class centers. Then, another threshold distance, between the identified data (forming the regression plane) and the regression plane, is used to decide on the support vectors for the selected instances. The experimental results based on the TechTC-100 dataset show the superior performance of SVOIS over other state-of-the-art algorithms. In particular, using SVOIS to select text documents allows the k-NN and SVM classifiers perform better than without instance selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号