期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probably correct k-nearest neighbor search in high dimensions

Jun Toyama Author Vitae Author Vitae Hideyuki Imai Author Vitae 《Pattern recognition》2010,43(4):1361-1372

A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We exploit the marginal distribution of the k th nearest neighbors in low dimensions, which is estimated from the stored data (an empirical percentile approach). We analyze the basic nature of the marginal distribution and show the advantage of the implemented algorithm, which is a probabilistic variant of the partial distance searching. Its query time is sublinear in data size n, that is, O(mnδ) with δ=o(1) in n and δ≤1, for any fixed dimension m. 相似文献

2.

A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric

Toh Koon Charlie Neo Dan Ventura 《Pattern recognition letters》2012,33(1):92-102

Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN. 相似文献

3.

Selene Hernández-Rodríguez^{Author Vitae} J. Fco Martínez-Trinidad Author Vitae Author Vitae 《Pattern recognition》2010,43(3):873-5456

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented. 相似文献

4.

Rough-fuzzy weighted k-nearest leader classifier for large data sets

V. Suresh Babu Author Vitae P. Viswanath^{Author Vitae} 《Pattern recognition》2009,42(9):1719-5884

A leaders set which is derived using the leaders clustering method can be used in place of a large training set to reduce the computational burden of a classifier. Recently, a fast and efficient leader-based classifier called weighted k-nearest leader-based classifier is shown by us to be an efficient and faster classifier. But, there exist some uncertainty while calculating the relative importance (weight) of the prototypes. This paper proposes a generalization over the earlier proposed k-nearest leader-based classifier where a novel soft computing approach is used to resolve the uncertainty. Combined principles of rough set theory and fuzzy set theory are used to analyze the proposed method. The proposed method called rough-fuzzy weighted k-nearest leader classifier (RF-wk-NLC) uses a two level hierarchy of prototypes along with their relative importance. RF-wk-NLC is shown by using some standard data sets to have improved performance and is compared with the earlier related methods. 相似文献

5.

Continuous reverse k nearest neighbor monitoring on moving objects in road networks

Li Guohui Li Yanhong Li Jianjun LihChyun Shu Yang Fumin 《Information Systems》2010

Continuous reverse k nearest neighbor (CRkNN) monitoring in road networks has recently received increasing attentions. However, there is still a lack of efficient CRkNN algorithms in road networks up to now. In road networks, moving query objects and data objects are restricted by the connectivity of the road network and both the object–query distance and object–object distance updates affect the result of CRkNN queries. In this paper, we present a novel algorithm for continuous and incremental evaluation of CRkNN queries in road networks. Our method is based on a novel data structure called dual layer multiway tree (DLM tree) we proposed to represent the whole monitoring region of a CRkNN query q. We propose several lemmas to reduce the monitoring region of q and the number of candidate objects as much as possible. Moreover, by associating a variable NN_count with each candidate object, we can simplify the monitoring of candidate objects. There are a large number of objects roaming in a road network and many of them are irrelevant to a specific CRkNN query of a query object q. To minimize the processing extension, for a road in the network, we give an IQL list and an IQCL list to specify the set of query objects and data objects whose location updates should be maintained for CRkNN processing of query objects. Our CRkNN method consists of two phase: the initial result generating phase and incremental maintenance phase. In each phase, algorithms with high performance are proposed to make our CRkNN method more efficient. Extensive simulation experiments are conducted and the result shows that our proposed approach is efficient and scalable in processing CRkNN queries in road networks. 相似文献

6.

Information theoretic clustering using a k-nearest neighbors approach

Vidar V. Vikjord Robert Jenssen 《Pattern recognition》2014

We develop a new non-parametric information theoretic clustering algorithm based on implicit estimation of cluster densities using the k-nearest neighbors (k-nn) approach. Compared to a kernel-based procedure, our hierarchical k-nn approach is very robust with respect to the parameter choices, with a key ability to detect clusters of vastly different scales. Of particular importance is the use of two different values of k, depending on the evaluation of within-cluster entropy or across-cluster cross-entropy, and the use of an ensemble clustering approach wherein different clustering solutions vote in order to obtain the final clustering. We conduct clustering experiments, and report promising results. 相似文献

7.

A learning style classification mechanism for e-learning

Yi-Chun Chang Wen-Yan Kao Chih-Ping Chu Chiung-Hui Chiu 《Computers & Education》2009

With the growing demand in e-learning, numerous research works have been done to enhance teaching quality in e-learning environments. Among these studies, researchers have indicated that adaptive learning is a critical requirement for promoting the learning performance of students. Adaptive learning provides adaptive learning materials, learning strategies and/or courses according to a student’s learning style. Hence, the first step for achieving adaptive learning environments is to identify students’ learning styles. This paper proposes a learning style classification mechanism to classify and then identify students’ learning styles. The proposed mechanism improves k-nearest neighbor (k-NN) classification and combines it with genetic algorithms (GA). To demonstrate the viability of the proposed mechanism, the proposed mechanism is implemented on an open-learning management system. The learning behavioral features of 117 elementary school students are collected and then classified by the proposed mechanism. The experimental results indicate that the proposed classification mechanism can effectively classify and identify students’ learning styles. 相似文献

8.

k-NN classification of handwritten characters via accelerated GAT correlation

Toru Wakahara Yukihiko Yamashita 《Pattern recognition》2014

This paper addresses the problem of reinforcing the ability of the k-NN classification of handwritten characters via distortion-tolerant template matching techniques with a limited quantity of data. We compare three kinds of matching techniques: the conventional simple correlation, the tangent distance, and the global affine transformation (GAT) correlation. Although the k-NN classification method is straightforward and powerful, it consumes a lot of time. Therefore, to reduce the computational cost of matching in k-NN classification, we propose accelerating the GAT correlation method by reformulating its computational model and adopting efficient lookup tables. Recognition experiments performed on the IPTP CDROM1B handwritten numerical database show that the matching techniques of the simple correlation, the tangent distance, and the accelerated GAT correlation achieved recognition rates of 97.07%, 97.50%, and 98.70%, respectively. The computation time ratios of the tangent distance and the accelerated GAT correlation to the simple correlation are 26.3 and 36.5 to 1.0, respectively. 相似文献

9.

A time-efficient pattern reduction algorithm for k-means clustering

Ming-Chao Chiang Author Vitae Chun-Wei Tsai Author Vitae Chu-Sing Yang Author Vitae 《Information Sciences》2011,181(4):716-3410

This paper presents an efficient algorithm, called pattern reduction (PR), for reducing the computation time of k-means and k-means-based clustering algorithms. The proposed algorithm works by compressing and removing at each iteration patterns that are unlikely to change their membership thereafter. Not only is the proposed algorithm simple and easy to implement, but it can also be applied to many other iterative clustering algorithms such as kernel-based and population-based clustering algorithms. Our experiments—from 2 to 1000 dimensions and 150 to 10,000,000 patterns—indicate that with a small loss of quality, the proposed algorithm can significantly reduce the computation time of all state-of-the-art clustering algorithms evaluated in this paper, especially for large and high-dimensional data sets. 相似文献

10.

Locally nearest neighbor classifiers for pattern classification

Wenming Zheng Li Zhao Cairong Zou 《Pattern recognition》2004,37(6):1307-1309

In this paper, two novel classifiers based on locally nearest neighborhood rule, called nearest neighbor line and nearest neighbor plane, are presented for pattern classification. Comparison to nearest feature line and nearest feature plane, the proposed methods take much lower computation cost and achieve competitive performance. 相似文献

11.

A Fan-type result on k-ordered graphs

Ruijuan Li 《Information Processing Letters》2010,110(16):651-654

For a positive integer k, a graph G is k-ordered hamiltonian if for every ordered sequence of k vertices there is a hamiltonian cycle that encounters the vertices of the sequence in the given order. In this paper, we show that if G is a ⌊3k/2⌋-connected graph of order n?100k, and d(u)+d(v)?n for any two vertices u and v with d(u,v)=2, then G is k-ordered hamiltonian. Our result implies the theorem of G. Chen et al. [Ars Combin. 70 (2004) 245-255] [1], which requires the degree sum condition for all pairs of non-adjacent vertices, not just those distance 2 apart. 相似文献

12.

The MinMax k-Means clustering algorithm

Grigorios Tzortzis Aristidis Likas 《Pattern recognition》2014

Applying k-Means to minimize the sum of the intra-cluster variances is the most popular clustering approach. However, after a bad initialization, poor local optima can be easily obtained. To tackle the initialization problem of k-Means, we propose the MinMax k-Means algorithm, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective. Weights are learned together with the cluster assignments, through an iterative procedure. The proposed weighting scheme limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization. Experiments verify the effectiveness of our approach and its robustness over bad initializations, as it compares favorably to both k-Means and other methods from the literature that consider the k-Means initialization problem. 相似文献

13.

FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors

Jung-Yi JiangShian-Chi Tsai Shie-Jue Lee 《Expert systems with applications》2012,39(3):2813-2821

We propose an efficient approach, FSKNN, which employs fuzzy similarity measure (FSM) and k nearest neighbors (KNN), for multi-label text classification. One of the problems associated with KNN-like approaches is its demanding computational cost in finding the k nearest neighbors from all the training patterns. For FSKNN, FSM is used to group the training patterns into clusters. Then only the training documents in those clusters whose fuzzy similarities to the document exceed a predesignated threshold are considered in finding the k nearest neighbors for the document. An unseen document is labeled based on its k nearest neighbors using the maximum a posteriori estimate. Experimental results show that our proposed method can work more effectively than other methods. 相似文献

14.

Improved approximation algorithms for minimum AND-circuits problem via k-set cover

Hiroki Morizumi 《Information Processing Letters》2011,111(5):218-221

Arpe and Manthey [J. Arpe, B. Manthey, Approximability of minimum AND-circuits, Algorithmica 53 (3) (2009) 337-357] recently studied the minimum AND-circuit problem, which is a circuit minimization problem, and showed some results including approximation algorithms, APX-hardness and fixed parameter tractability of the problem. In this note, we show that algorithms via the k-set cover problem yield improved approximation ratios for the minimum AND-circuit problem with maximum degree three. In particular, we obtain an approximation ratio of 1.199 for the problem with maximum degree three and unbounded multiplicity. 相似文献

15.

Almost k-wise independence versus k-wise independence

Noga Alon Oded Goldreich Yishay Mansour 《Information Processing Letters》2003,88(3):107-110

We say that a distribution over {0,1}ⁿ is (ε,k)-wise independent if its restriction to every k coordinates results in a distribution that is ε-close to the uniform distribution. A natural question regarding (ε,k)-wise independent distributions is how close they are to some k-wise independent distribution. We show that there exist (ε,k)-wise independent distributions whose statistical distance is at least n^O(k)·ε from any k-wise independent distribution. In addition, we show that for any (ε,k)-wise independent distribution there exists some k-wise independent distribution, whose statistical distance is n^O(k)·ε. 相似文献

16.

k-Anonymous data collection

Sheng Zhong Zhiqiang Yang 《Information Sciences》2009,179(17):2948-2963

To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the data do not contain any identifying information about the corresponding respondents. On the other hand, the existing technique of Privacy-Enhancing k-Anonymization can make the collected data anonymous by eliminating the identifying information. However, it assumes that each respondent submits her data through an unidentified communication channel. In this paper, we propose k-Anonymous Data Collection, which has the advantages of both Anonymity-Preserving Data Collection and Privacy-Enhancing k-Anonymization but does not rely on their assumptions described above. We give rigorous proofs for the correctness and privacy of our protocol, and experimental results for its efficiency. Furthermore, we extend our solution to the fully malicious model, in which a dishonest participant can deviate from the protocol and behave arbitrarily. 相似文献

17.

Upper signed k-domination in a general graph

Dejan Deli? 《Information Processing Letters》2010,110(16):662-665

Let k be a positive integer, and let G=(V,E) be a graph with minimum degree at least k−1. A function f:V→{−1,1} is said to be a signed k-dominating function (SkDF) if _∑u∈N[v]f(u)?k for every v∈V. An SkDF f of a graph G is minimal if there exists no SkDF g such that g≠f and g(v)?f(v) for every v∈V. The maximum of the values of _∑v∈Vf(v), taken over all minimal SkDFs f, is called the upper signed k-domination numberΓkS(G). In this paper, we present a sharp upper bound on this number for a general graph. 相似文献

18.

Design-based approach to k-nearest neighbours technique for coupling field and remotely sensed data in forest surveys

Federica Baffetta Sara Franceschi 《Remote sensing of environment》2009,113(3):463-3174

The statistical properties of the k-NN estimators are investigated in a design-based framework, avoiding any assumption about the population under study. The issue of coupling remotely sensed digital imagery with data arising from forest inventories conducted using probabilistic sampling schemes is considered. General results are obtained for the k-NN estimator at the pixel level. When averages (or totals) of forest attributes for the whole study area or sub-areas are of interest, the use of the empirical difference estimator is proposed. The estimator is shown to be approximately unbiased with a variance admitting unbiased or conservative estimators. The performance of the empirical difference estimator is evaluated by an extensive simulation study performed on several populations whose dimensions and covariate values are taken from a real case study. Samples are selected from the populations by means of simple random sampling without replacement. Comparisons with the generalized regression estimator and Horvitz-Thompson estimators are also performed. An application to a local forest inventory on a test area of central Italy is considered. 相似文献

19.

The global k-means clustering algorithm

Aristidis LikasAuthor Vitae Nikos VlassisAuthor VitaeJakob J. VerbeekAuthor Vitae 《Pattern recognition》2003,36(2):451-461

We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the k-means algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on well-known data sets and they compare favorably to the k-means algorithm with random restarts. 相似文献

20.

An algorithm for computing simple k-factors

Henk Meijer David Rappaport 《Information Processing Letters》2009,109(12):620-625

A k-factor of graph G is defined as a k-regular spanning subgraph of G. For instance, a 2-factor of G is a set of cycles that span G. 2-factors have multiple applications in Graph Theory, Computer Graphics, and Computational Geometry. We define a simple 2-factor as a 2-factor without degenerate cycles. In general, simple k-factors are defined as k-regular spanning subgraphs where no edge is used more than once. We propose a new algorithm for computing simple k-factors for all values of k?2. 相似文献