首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
加权KNN(k-nearest neighbor)方法,仅利用了k个最近邻训练样本所提供的类别信息,而没考虑测试样本的贡献,因而常会导致一些误判。针对这个缺陷,提出了半监督KNN分类方法。该方法对序列样本和非序列样本,均能够较好地执行分类。在分类决策时,还考虑了c个最近邻测试样本的贡献,从而提高了分类的正确性。在Cohn-Kanade人脸库上,序列图像的识别率提高了5.95%,在CMU-AMP人脸库上,非序列图像的识别率提高了7.98%。实验结果表明,该方法执行效率高,分类效果好。  相似文献   

2.
A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We exploit the marginal distribution of the k th nearest neighbors in low dimensions, which is estimated from the stored data (an empirical percentile approach). We analyze the basic nature of the marginal distribution and show the advantage of the implemented algorithm, which is a probabilistic variant of the partial distance searching. Its query time is sublinear in data size n, that is, O(mnδ) with δ=o(1) in n and δ≤1, for any fixed dimension m.  相似文献   

3.
Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN.  相似文献   

4.
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented.  相似文献   

5.
A leaders set which is derived using the leaders clustering method can be used in place of a large training set to reduce the computational burden of a classifier. Recently, a fast and efficient leader-based classifier called weighted k-nearest leader-based classifier is shown by us to be an efficient and faster classifier. But, there exist some uncertainty while calculating the relative importance (weight) of the prototypes. This paper proposes a generalization over the earlier proposed k-nearest leader-based classifier where a novel soft computing approach is used to resolve the uncertainty. Combined principles of rough set theory and fuzzy set theory are used to analyze the proposed method. The proposed method called rough-fuzzy weighted k-nearest leader classifier (RF-wk-NLC) uses a two level hierarchy of prototypes along with their relative importance. RF-wk-NLC is shown by using some standard data sets to have improved performance and is compared with the earlier related methods.  相似文献   

6.
We develop a new non-parametric information theoretic clustering algorithm based on implicit estimation of cluster densities using the k-nearest neighbors (k-nn) approach. Compared to a kernel-based procedure, our hierarchical k-nn approach is very robust with respect to the parameter choices, with a key ability to detect clusters of vastly different scales. Of particular importance is the use of two different values of k, depending on the evaluation of within-cluster entropy or across-cluster cross-entropy, and the use of an ensemble clustering approach wherein different clustering solutions vote in order to obtain the final clustering. We conduct clustering experiments, and report promising results.  相似文献   

7.
Continuous reverse k nearest neighbor (CRkNN) monitoring in road networks has recently received increasing attentions. However, there is still a lack of efficient CRkNN algorithms in road networks up to now. In road networks, moving query objects and data objects are restricted by the connectivity of the road network and both the object–query distance and object–object distance updates affect the result of CRkNN queries. In this paper, we present a novel algorithm for continuous and incremental evaluation of CRkNN queries in road networks. Our method is based on a novel data structure called dual layer multiway tree (DLM tree) we proposed to represent the whole monitoring region of a CRkNN query q. We propose several lemmas to reduce the monitoring region of q and the number of candidate objects as much as possible. Moreover, by associating a variable NN_count with each candidate object, we can simplify the monitoring of candidate objects. There are a large number of objects roaming in a road network and many of them are irrelevant to a specific CRkNN query of a query object q. To minimize the processing extension, for a road in the network, we give an IQL list and an IQCL list to specify the set of query objects and data objects whose location updates should be maintained for CRkNN processing of query objects. Our CRkNN method consists of two phase: the initial result generating phase and incremental maintenance phase. In each phase, algorithms with high performance are proposed to make our CRkNN method more efficient. Extensive simulation experiments are conducted and the result shows that our proposed approach is efficient and scalable in processing CRkNN queries in road networks.  相似文献   

8.
With the growing demand in e-learning, numerous research works have been done to enhance teaching quality in e-learning environments. Among these studies, researchers have indicated that adaptive learning is a critical requirement for promoting the learning performance of students. Adaptive learning provides adaptive learning materials, learning strategies and/or courses according to a student’s learning style. Hence, the first step for achieving adaptive learning environments is to identify students’ learning styles. This paper proposes a learning style classification mechanism to classify and then identify students’ learning styles. The proposed mechanism improves k-nearest neighbor (k-NN) classification and combines it with genetic algorithms (GA). To demonstrate the viability of the proposed mechanism, the proposed mechanism is implemented on an open-learning management system. The learning behavioral features of 117 elementary school students are collected and then classified by the proposed mechanism. The experimental results indicate that the proposed classification mechanism can effectively classify and identify students’ learning styles.  相似文献   

9.
This paper addresses the problem of reinforcing the ability of the k-NN classification of handwritten characters via distortion-tolerant template matching techniques with a limited quantity of data. We compare three kinds of matching techniques: the conventional simple correlation, the tangent distance, and the global affine transformation (GAT) correlation. Although the k-NN classification method is straightforward and powerful, it consumes a lot of time. Therefore, to reduce the computational cost of matching in k-NN classification, we propose accelerating the GAT correlation method by reformulating its computational model and adopting efficient lookup tables. Recognition experiments performed on the IPTP CDROM1B handwritten numerical database show that the matching techniques of the simple correlation, the tangent distance, and the accelerated GAT correlation achieved recognition rates of 97.07%, 97.50%, and 98.70%, respectively. The computation time ratios of the tangent distance and the accelerated GAT correlation to the simple correlation are 26.3 and 36.5 to 1.0, respectively.  相似文献   

10.
This paper presents an efficient algorithm, called pattern reduction (PR), for reducing the computation time of k-means and k-means-based clustering algorithms. The proposed algorithm works by compressing and removing at each iteration patterns that are unlikely to change their membership thereafter. Not only is the proposed algorithm simple and easy to implement, but it can also be applied to many other iterative clustering algorithms such as kernel-based and population-based clustering algorithms. Our experiments—from 2 to 1000 dimensions and 150 to 10,000,000 patterns—indicate that with a small loss of quality, the proposed algorithm can significantly reduce the computation time of all state-of-the-art clustering algorithms evaluated in this paper, especially for large and high-dimensional data sets.  相似文献   

11.
改进的k-nn快速分类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对传统的k-近邻(k-nn)方法的缺点,将聚类中的K均值和分类中的k近邻算法有机结合,提出了一种改进的k-nn快速分类算法。实验表明该算法在影响分类效果不大的情况下能达到快速分类的目的。  相似文献   

12.
In this paper, two novel classifiers based on locally nearest neighborhood rule, called nearest neighbor line and nearest neighbor plane, are presented for pattern classification. Comparison to nearest feature line and nearest feature plane, the proposed methods take much lower computation cost and achieve competitive performance.  相似文献   

13.
We propose an efficient approach, FSKNN, which employs fuzzy similarity measure (FSM) and k nearest neighbors (KNN), for multi-label text classification. One of the problems associated with KNN-like approaches is its demanding computational cost in finding the k nearest neighbors from all the training patterns. For FSKNN, FSM is used to group the training patterns into clusters. Then only the training documents in those clusters whose fuzzy similarities to the document exceed a predesignated threshold are considered in finding the k nearest neighbors for the document. An unseen document is labeled based on its k nearest neighbors using the maximum a posteriori estimate. Experimental results show that our proposed method can work more effectively than other methods.  相似文献   

14.
Applying k-Means to minimize the sum of the intra-cluster variances is the most popular clustering approach. However, after a bad initialization, poor local optima can be easily obtained. To tackle the initialization problem of k-Means, we propose the MinMax k-Means algorithm, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective. Weights are learned together with the cluster assignments, through an iterative procedure. The proposed weighting scheme limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization. Experiments verify the effectiveness of our approach and its robustness over bad initializations, as it compares favorably to both k-Means and other methods from the literature that consider the k-Means initialization problem.  相似文献   

15.
For a positive integer k, a graph G is k-ordered hamiltonian if for every ordered sequence of k vertices there is a hamiltonian cycle that encounters the vertices of the sequence in the given order. In this paper, we show that if G is a ⌊3k/2⌋-connected graph of order n?100k, and d(u)+d(v)?n for any two vertices u and v with d(u,v)=2, then G is k-ordered hamiltonian. Our result implies the theorem of G. Chen et al. [Ars Combin. 70 (2004) 245-255] [1], which requires the degree sum condition for all pairs of non-adjacent vertices, not just those distance 2 apart.  相似文献   

16.
Arpe and Manthey [J. Arpe, B. Manthey, Approximability of minimum AND-circuits, Algorithmica 53 (3) (2009) 337-357] recently studied the minimum AND-circuit problem, which is a circuit minimization problem, and showed some results including approximation algorithms, APX-hardness and fixed parameter tractability of the problem. In this note, we show that algorithms via the k-set cover problem yield improved approximation ratios for the minimum AND-circuit problem with maximum degree three. In particular, we obtain an approximation ratio of 1.199 for the problem with maximum degree three and unbounded multiplicity.  相似文献   

17.
To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the data do not contain any identifying information about the corresponding respondents. On the other hand, the existing technique of Privacy-Enhancing k-Anonymization can make the collected data anonymous by eliminating the identifying information. However, it assumes that each respondent submits her data through an unidentified communication channel. In this paper, we propose k-Anonymous Data Collection, which has the advantages of both Anonymity-Preserving Data Collection and Privacy-Enhancing k-Anonymization but does not rely on their assumptions described above. We give rigorous proofs for the correctness and privacy of our protocol, and experimental results for its efficiency. Furthermore, we extend our solution to the fully malicious model, in which a dishonest participant can deviate from the protocol and behave arbitrarily.  相似文献   

18.
We say that a distribution over {0,1}n is (ε,k)-wise independent if its restriction to every k coordinates results in a distribution that is ε-close to the uniform distribution. A natural question regarding (ε,k)-wise independent distributions is how close they are to some k-wise independent distribution. We show that there exist (ε,k)-wise independent distributions whose statistical distance is at least nO(k)·ε from any k-wise independent distribution. In addition, we show that for any (ε,k)-wise independent distribution there exists some k-wise independent distribution, whose statistical distance is nO(k)·ε.  相似文献   

19.
The statistical properties of the k-NN estimators are investigated in a design-based framework, avoiding any assumption about the population under study. The issue of coupling remotely sensed digital imagery with data arising from forest inventories conducted using probabilistic sampling schemes is considered. General results are obtained for the k-NN estimator at the pixel level. When averages (or totals) of forest attributes for the whole study area or sub-areas are of interest, the use of the empirical difference estimator is proposed. The estimator is shown to be approximately unbiased with a variance admitting unbiased or conservative estimators. The performance of the empirical difference estimator is evaluated by an extensive simulation study performed on several populations whose dimensions and covariate values are taken from a real case study. Samples are selected from the populations by means of simple random sampling without replacement. Comparisons with the generalized regression estimator and Horvitz-Thompson estimators are also performed. An application to a local forest inventory on a test area of central Italy is considered.  相似文献   

20.
Let k be a positive integer, and let G=(V,E) be a graph with minimum degree at least k−1. A function f:V→{−1,1} is said to be a signed k-dominating function (SkDF) if uN[v]f(u)?k for every vV. An SkDF f of a graph G is minimal if there exists no SkDF g such that gf and g(v)?f(v) for every vV. The maximum of the values of vVf(v), taken over all minimal SkDFs f, is called the upper signed k-domination numberΓkS(G). In this paper, we present a sharp upper bound on this number for a general graph.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号