共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Gongde Guo Hui Wang David Bell Yaxin Bi Kieran Greer 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(5):423-430
An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier
called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora:
the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results
show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.
This work was partly supported by the European Commission project ICONS, project no. IST-2001-32429. 相似文献
3.
Graph embedding based learning method plays an increasingly significant role on dimensionality reduction (DR). However, the selection to neighbor parameters of graph is intractable. In this paper, we present a novel DR method called adaptive graph embedding discriminant projections (AGEDP). Compared with most existing DR methods based on graph embedding, such as marginal Fisher analysis which usually predefines the intraclass and interclass neighbor parameters, AGEDP applies all the homogeneous samples for constructing the intrinsic graph, and simultaneously selects heterogeneous samples within the neighborhood generated by the farthest homogeneous sample for constructing the penalty graph. Therefore, AGEDP not only greatly enhances the intraclass compactness and interclass separability, but also adaptively performs neighbor parameter selection which considers the fact that local manifold structure of each sample is generally different. Experiments on AR and COIL-20 datasets demonstrate the effectiveness of the proposed method for face recognition and object categorization, and especially under the interference of occlusion, noise and poses, it is superior to other graph embedding based methods with three different classifiers: nearest neighbor classifier, sparse representation classifier and linear regression classifier. 相似文献
4.
Yan S Xu D Zhang B Zhang HJ Yang Q Lin S 《IEEE transactions on pattern analysis and machine intelligence》2007,29(1):40-51
A large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different solutions to the problem of dimensionality reduction. Despite the different motivations of these algorithms, we present in this paper a general formulation known as graph embedding to unify them within a common framework. In graph embedding, each algorithm can be considered as the direct graph embedding or its linear/kernel/tensor extension of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph that characterizes a statistical or geometric property that should be avoided. Furthermore, the graph embedding framework can be used as a general platform for developing new dimensionality reduction algorithms. By utilizing this framework as a tool, we propose a new supervised dimensionality reduction algorithm called marginal Fisher analysis in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizes the interclass separability. We show that MFA effectively overcomes the limitations of the traditional linear discriminant analysis algorithm due to data distribution assumptions and available projection directions. Real face recognition experiments show the superiority of our proposed MFA in comparison to LDA, also for corresponding kernel and tensor extensions 相似文献
5.
6.
Dimensionality reduction of high dimensional data is involved in many problems in information processing. A new dimensionality
reduction approach called maximal local interclass embedding (MLIE) is developed in this paper. MLIE can be viewed as a linear
approach of a multimanifolds-based learning framework, in which the information of neighborhood is integrated with the local
interclass relationships. In MLIE, the local interclass graph and the intrinsic graph are constructed to find a set of projections
that maximize the local interclass scatter and the local intraclass compactness simultaneously. This characteristic makes
MLIE more powerful than marginal Fisher analysis (MFA). MLIE maintains all the advantages of MFA. Moreover, the computational
complexity of MLIE is less than that of MFA. The proposed algorithm is applied to face recognition. Experiments have been
performed on the Yale, AR and ORL face image databases. The experimental results show that owing to the locally discriminating
property, MLIE consistently outperforms up-to-date MFA, Smooth MFA, neighborhood preserving embedding and locality preserving
projection in face recognition. 相似文献
7.
支持向量机是一种基于小样本学习的有效工具,作为分类器被认为具有很高的推广性能,无需先验知识。但是参数的选取与支持向量机的识别性能是相关的,核函数参数σ2和惩罚因子C对支持向量机识别性能会产生很大的影响。针对支持向量机在人脸识别问题中的应用,提出了一种基于遗传算法(GA)的参数选择优化方法。利用笔者曾提出的基于小波分解和积分投影的人脸特征提取算法对人脸图像进行特征参数提取,然后利用优化的支持向量机进行识别。实验结果表明,该方法是有效的。 相似文献
8.
An important query for spatio-temporal databases is to find nearest trajectories of moving objects. Existing work on this
topic focuses on the closest trajectories in the whole data space. In this paper, we introduce and solve constrained k-nearest neighbor (CkNN) queries and historical continuous CkNN (HCCkNN) queries on R-tree-like structures storing historical information about moving object trajectories. Given a trajectory
set D, a query object (point or trajectory) q, a temporal extent T, and a constrained region CR, (i) a CkNN query over trajectories retrieves from D within T, the k (≥ 1) trajectories that lie closest to q and intersect (or are enclosed by) CR; and (ii) an HCCkNN query on trajectories retrieves the constrained k nearest neighbors (CkNNs) of q at any time instance of T. We propose a suite of algorithms for processing CkNN queries and HCCkNN queries respectively, with different properties and advantages. In particular, we thoroughly investigate two types of CkNN queries, i.e., CkNNP and CkNNT, which are defined with respect to stationary query points and moving query trajectories, respectively; and two types of
HCCkNN queries, namely, HCCkNNP and HCCkNNT, which are continuous counterparts of CkNNP and CkNNT, respectively. Our methods utilize an existing data-partitioning index for trajectory data (i.e., TB-tree) to achieve low
I/O and CPU cost. Extensive experiments with both real and synthetic datasets demonstrate the performance of the proposed
algorithms in terms of efficiency and scalability. 相似文献
9.
Sarana Nutanong Rui Zhang Egemen Tanin Lars Kulik 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(3):307-332
The moving k nearest neighbor (MkNN) query continuously finds the k nearest neighbors of a moving query point. MkNN queries can be efficiently processed through the use of safe regions. In general, a safe region is a region within which
the query point can move without changing the query answer. This paper presents an incremental safe-region-based technique
for answering MkNN queries, called the V*-Diagram, as well as analysis and evaluation of its associated algorithm, V*-kNN. Traditional safe-region approaches compute a safe region based on the data objects but independent of the query location.
Our approach exploits the knowledge of the query location and the boundary of the search space in addition to the data objects.
As a result, V*-kNN has much smaller I/O and computation costs than existing methods. We further provide cost models to estimate the number
of data accesses for V*-kNN and a competitive technique, RIS-kNN. The V*-Diagram and V*-kNN are also applicable to the domain of spatial networks and we present algorithms to construct a spatial-network V*-Diagram.
Our experimental results show that V*-kNN significantly outperforms the competitive technique. The results also verify the accuracy of the cost models. 相似文献
10.
目的 针对投影非负矩阵分解(PNMF)不能揭示数据空间的流形几何结构和判别信息的缺点,提出一种图嵌入正则化投影非负矩阵分解(GEPNMF)人脸图像特征提取方法。 方法 首先构建了描述数据空间的流形几何结构和类间分离度的两个近邻图,然后采用它们的拉普拉斯矩阵设计了一个图嵌入正则项,并将该图嵌入正则项与PNMF的目标函数融合以建立GEPNMF的目标函数。由于引入了图嵌入正则项,GEPNMF求得的子空间能在保持数据空间的流形几何结构的同时,类间间距最大。此外,在GEPNMF目标函数中引入了一个正交正则项,以确保GEPNMF子空间基向量具有数据局部表示能力。最后,对求解GEPNMF目标函数的累乘更新规则(MUR)进行了详细推导,并从理论上证明了其收敛性。结果 在ORL、Yale和CMU PIE人脸图像数据库上分别进行了人脸识别实验,识别率分别达到了94.00%、64.33%和98.58%。结论 实验结果表明,GEPNMF提取的人脸图像特征用于人脸识别时,具有较高的识别率。 相似文献
11.
邻域保持嵌入(NPE)算法本质上仍是一种无监督方法,并没有有效利用已有的类别信息提高分类效率。为此提出两种有监督流形学习方法:正交边界邻域保持嵌入(OMNPE)和不相关边界邻域保持嵌入(UMNPE)。首先构造类内和类间邻接图,并定义类内和类间重构误差;然后分别在正交和不相关约束条件下寻找最小化类内重构误差同时最大化类间重构误差的投影向量;将训练样本和测试样本分别投影到低维子空间中,再利用最近邻分类器进行分类识别。在ORL和Yale人脸库上的实验结果表明,与线性判别分析(LDA)、边界Fisher分析(MFA)等子空间人脸识别算法相比,所提算法的平均识别率提高了0.5%~3%,验证了算法的有效性。 相似文献
12.
Chenn-Jung Huang Yi-Ju Yang Dian-Xiu Yang You-Jia Chen 《Applied Artificial Intelligence》2013,27(7):553-569
An intelligent frog call identifier is developed in this work to provide the public with easy online consultation. The raw frog call samples are first filtered by noise removal, high frequency compensation, and discrete wavelet transform techniques in that order. An adaptive end-point detection segmentation algorithm is proposed to effectively separate the individual syllables from the noise. Eight features, including spectral centroid, signal bandwidth, spectral roll-off, threshold-crossing rate, delta spectrum magnitude, spectral flatness, average energy, and mel-frequency cepstral coefficients are extracted and serve as the input parameters of the classifier. Three well-known classifiers, the kth nearest neighboring, a backpropagation neural network, and a naive Bayes classifier, are employed in this work for comparison. A series of experiments were conducted to measure the outcome performance of the proposed work. Experimental results show that the recognition rate of the k-nearest neighbor classifier with the parameters of mel-frequency cepstral coefficients can achieve up to 93.81%. The effectiveness of the proposed frog call identifier is thus verified. 相似文献
13.
This paper presents a new model developed by merging a non-parametric k-nearest-neighbor (kNN) preprocessor into an underlying support vector machine (SVM) to provide shelters for meaningful training examples, especially
for stray examples scattered around their counterpart examples with different class labels. Motivated by the method of adding
heavier penalty to the stray example to attain a stricter loss function for optimization, the model acts to shelter stray
examples. The model consists of a filtering kNN emphasizer stage and a classical classification stage. First, the filtering kNN emphasizer stage was employed to collect information from the training examples and to produce arbitrary weights for stray
examples. Then, an underlying SVM with parameterized real-valued class labels was employed to carry those weights, representing
various emphasized levels of the examples, in the classification. The emphasized weights given as heavier penalties changed
the regularization in the quadratic programming of the SVM, and brought the resultant decision function into a higher training
accuracy. The novel idea of real-valued class labels for conveying the emphasized weights provides an effective way to pursue
the solution of the classification inspired by the additional information. The adoption of the kNN preprocessor as a filtering stage is effective since it is independent of SVM in the classification stage. Due to its property
of estimating density locally, the kNN method has the advantage of distinguishing stray examples from regular examples by merely considering their circumstances
in the input space. In this paper, detailed experimental results and a simulated application are given to address the corresponding
properties. The results show that the model is promising in terms of its original expectations. 相似文献
14.
Fixed Parameter Algorithms for DOMINATING SET and Related Problems on Planar Graphs 总被引:12,自引:0,他引:12
Abstract. We present an algorithm that constructively produces a solution to the k -DOMINATING SET problem for planar graphs in time O(c^ \sqrt k n) , where c=4^ 6\sqrt 34 . To obtain this result, we show that the treewidth of a planar graph with domination number γ (G) is O(\sqrt \rule 0pt 4pt \smash γ (G) ) , and that such a tree decomposition can be found in O(\sqrt \rule 0pt 4pt \smash γ (G) n) time. The same technique can be used to show that the k -FACE COVER problem (find a size k set of faces that cover all vertices of a given plane graph) can be solved in O(c
1
^ \sqrt k n) time, where c
1
=3^ 36\sqrt 34 and k is the size of the face cover set. Similar results can be obtained in the planar case for some variants of k -DOMINATING SET, e.g., k -INDEPENDENT DOMINATING SET and k -WEIGHTED DOMINATING SET. 相似文献
15.
Hairong Liu Xingwei Yang Longin Jan Latecki Shuicheng Yan 《International Journal of Computer Vision》2012,98(1):65-82
In this paper, we study the problem of how to reliably compute neighborhoods on affinity graphs. The k-nearest neighbors (kNN) is one of the most fundamental and simple methods widely used in many tasks, such as classification and graph construction.
Previous research focused on how to efficiently compute kNN on vectorial data. However, most real-world data have no vectorial representations, and only have affinity graphs which
may contain unreliable affinities. Since the kNN of an object o is a set of k objects with the highest affinities to o, it is easily disturbed by errors in pairwise affinities between o and other objects, and also it cannot well preserve the structure underlying the data. To reliably analyze the neighborhood
on affinity graphs, we define the k-dense neighborhood (kDN), which considers all pairwise affinities within the neighborhood, i.e., not only the affinities between o and its neighbors but also between the neighbors. For an object o, its kDN is a set kDN(o) of k objects which maximizes the sum of all pairwise affinities of objects in the set {o}∪kDN(o). We analyze the properties of kDN, and propose an efficient algorithm to compute it. Both theoretic analysis and experimental results on shape retrieval,
semi-supervised learning, point set matching and data clustering show that kDN significantly outperforms kNN on affinity graphs, especially when many pairwise affinities are unreliable. 相似文献
16.
Analysis of a Plurality Voting-based Combination of Classifiers 总被引:1,自引:0,他引:1
In various studies, it has been demonstrated that combining the decisions of multiple classifiers can lead to better recognition
result. Plurality voting is one of the most widely used combination strategies. In this paper, we both theoretically and experimentally
analyze the performance of a plurality voting based ensemble classifier. Theoretical expressions for system performance are
derived as a function of the model parameters: N (number of classifiers), m (number of classes), and p (probability that a single classifier is correct). Experimental results on the human face recognition problem show that the
voting strategy can successfully achieve high detection and identification rates, and, simultaneously, low false acceptance
rates. 相似文献
17.
18.
In a graph, a vertex is simplicial if its neighborhood is a clique. For an integer k≥1, a graph G=(VG,EG) is the k-simplicial power of a graph H=(VH,EH) (H a root graph of G) if VG is the set of all simplicial vertices of H, and for all distinct vertices x and y in VG, xyEG if and only if the distance in H between x and y is at most k. This concept generalizes k-leaf powers introduced by Nishimura, Ragde and Thilikos which were motivated by the search for underlying phylogenetic trees; k-leaf powers are the k-simplicial powers of trees. Recently, a lot of work has been done on k-leaf powers and their roots as well as on their variants phylogenetic roots and Steiner roots. For k≤5, k-leaf powers can be recognized in linear time, and for k≤4, structural characterizations are known. For k≥6, the recognition and characterization problems of k-leaf powers are still open. Since trees and block graphs (i.e., connected graphs whose blocks are cliques) have very similar metric properties, it is natural to study k-simplicial powers of block graphs. We show that leaf powers of trees and simplicial powers of block graphs are closely related, and we study simplicial powers of other graph classes containing all trees such as ptolemaic graphs and strongly chordal graphs. 相似文献
19.
针对图嵌入方法在构造邻域关系图的过程中,简单地将样本数据划入某一类的做法并不妥当的问题,提出了模糊渐进的隶属度表示方法。该方法借助模糊数学的思想,通过模糊渐进的隶属度,将样本归属于不同类别。针对图嵌入方法中分类器效率偏低的问题,引入了协作表示分类方法,该分类方法大幅度提高了算法的计算效率。基于这两点,提出了基于协作表示和模糊渐进最大边界嵌入的特征抽取算法。在ORL、AR人脸数据库上,以及USPS数字手写体数据库上的实验表明,该算法优于主成分分析(PCA)、线性鉴别分析(LDA)、局部保留投影(LPP)和边界Fisher分析(MFA)。 相似文献
20.
Recently, pattern recognition techniques have been applied for fault diagnosis. Principal component analysis (PCA) and kernel principal component analysis (KPCA) are introduced for feature extraction. However, those unsupervised learning methods have not incorporated the prior knowledge of process patterns. This paper proposes a novel fault diagnosis system to improve the performance of fault diagnosis. Kernel Fisher discriminant analysis (KFDA) is used in the first step for feature extraction, then Gaussian mixture model (GMM) and k-nearest neighbor (kNN) are applied for fault detection and isolation on the KFDA subspace. Since the performance of fault diagnosis system would be degraded in the fault detection stage, fault detection and identification are presented in a holistic manner without an intermediate step in the novel system. A case study of the Tennessee Eastman (TE) benchmark process indicates that the proposed methods are more efficient, compared to the traditional ones. Furthermore, as the performances of GMM and kNN are comparable, the data structure of the process should be checked beforehand, depending on which the optimal classifier can be selected. 相似文献