首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对文本情感分类准确率不高的问题,提出基于CCA-VSM分类器和KFD的多级文本情感分类方法。采用典型相关性分析对文档的权重特征向量和词性特征向量进行降维,在约简向量集上构建向量空间模型,根据模型之间的差异度设计VSM分类器,筛选出与测试文档差异度较小的R个模型作为核Fisher判别的输入,最终判别出文档的情感观点。实验结果表明:该方法比传统支持向量机有较高的分类准确率和较快的分类速度,权重特征和词性特征对分类准确率的影响较大。  相似文献   

2.
在给定概率分布条件下对贝叶斯分类器进行改进,提出一种基于数据库的小本征值阈值重置的贝叶斯分类器。用一个阈值替代类协方差矩阵小于阈值的本征值,使给定数据库的分类错误率最小,是一种优于零子空间法的分类方法。通过在MNIST 6×104个手写体数字数据库的测试,识别率大于96%。对小字集手写体汉字进行的实验表明,识别率大于99%。  相似文献   

3.
For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.  相似文献   

4.
王蓓  孙玉东  金晶  张涛  王行愚 《控制与决策》2019,34(6):1319-1324
高斯判别分析、朴素贝叶斯等传统贝叶斯分类方法在构建变量的联合概率分布时,往往会对变量间的相关性进行简化处理,从而使得贝叶斯决策理论中类条件概率密度的估计与实际数据之间存在一定的偏差.对此,结合Copula函数研究特征变量之间的相关性优化问题,设计基于D-vine Copula理论的贝叶斯分类器,主要目的是为了提高类条件概率密度估计的准确性.将变量的联合概率分布分解为一系列二元Copula函数与边缘概率密度函数的乘积,采用核函数方法对边缘概率密度进行估计 ,通过极大似然估计对二元Copula函数的参数分别进行优化,进而得到类条件概率密度函数的形式.将基于D-vine Copula理论的贝叶斯分类器应用到生物电信号的分类问题上,并对分类效果进行分析和验证.结果表明,所提出的方法在各项分类指标上均具备良好的性能.  相似文献   

5.
The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for classification, as introduced by Vapnik, a nonlinear decision boundary is obtained by mapping the input vector first in a nonlinear way to a high-dimensional kernel-induced feature space in which a linear large margin classifier is constructed. Practical expressions are formulated in the dual space in terms of the related kernel function, and the solution follows from a (convex) quadratic programming (QP) problem. In least-squares SVMs (LS-SVMs), the SVM problem formulation is modified by introducing a least-squares cost function and equality instead of inequality constraints, and the solution follows from a linear system in the dual space. Implicitly, the least-squares formulation corresponds to a regression formulation and is also related to kernel Fisher discriminant analysis. The least-squares regression formulation has advantages for deriving analytic expressions in a Bayesian evidence framework, in contrast to the classification formulations used, for example, in gaussian processes (GPs). The LS-SVM formulation has clear primal-dual interpretations, and without the bias term, one explicitly constructs a model that yields the same expressions as have been obtained with GPs for regression. In this article, the Bayesian evidence framework is combined with the LS-SVM classifier formulation. Starting from the feature space formulation, analytic expressions are obtained in the dual space on the different levels of Bayesian inference, while posterior class probabilities are obtained by marginalizing over the model parameters. Empirical results obtained on 10 public domain data sets show that the LS-SVM classifier designed within the Bayesian evidence framework consistently yields good generalization performances.  相似文献   

6.
The presence of complex distributions of samples concealed in high-dimensional, massive sample-size data challenges all of the current classification methods for data mining. Samples within a class usually do not uniformly fill a certain (sub)space but are individually concentrated in certain regions of diverse feature subspaces, revealing the class dispersion. Current classifiers applied to such complex data inherently suffer from either high complexity or weak classification ability, due to the imbalance between flexibility and generalization ability of the discriminant functions used by these classifiers. To address this concern, we propose a novel representation of discriminant functions in Bayesian inference, which allows multiple Bayesian decision boundaries per class, each in its individual subspace. For this purpose, we design a learning algorithm that incorporates the naive Bayes and feature weighting approaches into structural risk minimization to learn multiple Bayesian discriminant functions for each class, thus combining the simplicity and effectiveness of naive Bayes and the benefits of feature weighting in handling high-dimensional data. The proposed learning scheme affords a recursive algorithm for exploring class density distribution for Bayesian estimation, and an automated approach for selecting powerful discriminant functions while keeping the complexity of the classifier low. Experimental results on real-world data characterized by millions of samples and features demonstrate the promising performance of our approach.  相似文献   

7.
基于多重判别分析的朴素贝叶斯分类器   总被引:4,自引:1,他引:4  
通过分析朴素贝叶斯分类器的分类原理,并结合多重判别分析的优点,提出了一种基于多重判别分析的朴素贝叶斯分类器DANB(Discriminant Analysis Naive Bayesian classifier).将该分类方法与朴素贝叶斯分类器(Naive Bayesian classifier, NB)和TAN分类器(Tree Augmented Naive Bayesian classifier)进行实验比较,实验结果表明在大多数数据集上,DANB分类器具有较高的分类正确率.  相似文献   

8.
The common vector (CV) method is a linear subspace classifier method which allows one to discriminate between classes of data sets, such as those arising in image and word recognition. This method utilizes subspaces that represent classes during classification. Each subspace is modeled such that common features of all samples in the corresponding class are extracted. To accomplish this goal, the method eliminates features that are in the direction of the eigenvectors corresponding to the nonzero eigenvalues of the covariance matrix of each class. In this paper, we introduce a variation of the CV method, which will be referred to as the modified CV (MCV) method. Then, a novel approach is proposed to apply the MCV method in a nonlinearly mapped higher dimensional feature space. In this approach, all samples are mapped into a higher dimensional feature space using a kernel mapping function, and then, the MCV method is applied in the mapped space. Under certain conditions, each class gives rise to a unique CV, and the method guarantees a 100% recognition rate with respect to the training set data. Moreover, experiments with several test cases also show that the generalization performance of the proposed kernel method is comparable to the generalization performances of other linear subspace classifier methods as well as the kernel-based nonlinear subspace method. While both the MCV method and its kernel counterpart did not outperform the support vector machine (SVM) classifier in most of the reported experiments, the application of our proposed methods is simpler than that of the multiclass SVM classifier. In addition, it is not necessary to adjust any parameters in our approach.  相似文献   

9.

朴素贝叶斯分类器不能有效地利用属性之间的依赖信息, 而目前所进行的依赖扩展更强调效率, 使扩展后分类器的分类准确性还有待提高. 针对以上问题, 在使用具有平滑参数的高斯核函数估计属性密度的基础上, 结合分类器的分类准确性标准和属性父结点的贪婪选择, 进行朴素贝叶斯分类器的网络依赖扩展. 使用UCI 中的连续属性分类数据进行实验, 结果显示网络依赖扩展后的分类器具有良好的分类准确性.

  相似文献   

10.
A novel fuzzy nonlinear classifier, called kernel fuzzy discriminant analysis (KFDA), is proposed to deal with linear non-separable problem. With kernel methods KFDA can perform efficient classification in kernel feature space. Through some nonlinear mapping the input data can be mapped implicitly into a high-dimensional kernel feature space where nonlinear pattern now appears linear. Different from fuzzy discriminant analysis (FDA) which is based on Euclidean distance, KFDA uses kernel-induced distance. Theoretical analysis and experimental results show that the proposed classifier compares favorably with FDA.  相似文献   

11.
Neural and statistical classifiers-taxonomy and two case studies   总被引:1,自引:0,他引:1  
Pattern classification using neural networks and statistical methods is discussed. We give a tutorial overview in which popular classifiers are grouped into distinct categories according to their underlying mathematical principles; also, we assess what makes a classifier neural. The overview is complemented by two case studies using handwritten digit and phoneme data that test the performance of a number of most typical neural-network and statistical classifiers. Four methods of our own are included: reduced kernel discriminant analysis, the learning k-nearest neighbors classifier, the averaged learning subspace method, and a version of kernel discriminant analysis.  相似文献   

12.
For classifying large data sets, we propose a discriminant kernel that introduces a nonlinear mapping from the joint space of input data and output label to a discriminant space. Our method differs from traditional ones, which correspond to map nonlinearly from the input space to a feature space. The induced distance of our discriminant kernel is Eu- clidean and Fisher separable, as it is defined based on distance vectors of the feature space to distance vectors on the discriminant space. Unlike the support vector machines or the kernel Fisher discriminant analysis, the classifier does not need to solve a quadric program- ming problem or eigen-decomposition problems. Therefore, it is especially appropriate to the problems of processing large data sets. The classifier can be applied to face recognition, shape comparison and image classification benchmark data sets. The method is significantly faster than other methods and yet it can deliver comparable classification accuracy.  相似文献   

13.
The output of a classifier is usually determined by the value of a discriminant function and a decision is made based on this output which does not necessarily represent the posterior probability for the soft decision of classification. In this context, it is desirable that the output of a classifier be calibrated in such a way to give the meaning of the posterior probability of class membership. This paper presents a new method of postprocessing for the probabilistic scaling of classifier's output. For this purpose, the output of a classifier is analyzed and the distribution of the output is described by the beta distribution parameters. For more accurate approximation of class output distribution, the beta distribution parameters as well as the kernel parameters describing the discriminant function are adjusted in such a way to improve the uniformity of beta cumulative distribution function (CDF) values for the given class output samples. As a result, the classifier with the proposed scaling method referred to as the class probability output network (CPON) can provide accurate posterior probabilities for the soft decision of classification. To show the effectiveness of the proposed method, the simulation for pattern classification using the support vector machine (SVM) classifiers is performed for the University of California at Irvine (UCI) data sets. The simulation results using the SVM classifiers with the proposed CPON demonstrated a statistically meaningful performance improvement over the SVM and SVM-related classifiers, and also other probabilistic scaling methods.  相似文献   

14.
Peculiarity-oriented mining is a data mining method consisting of peculiar data identification and peculiar data analysis. Peculiarity factor and local peculiarity factor are important concepts employed to describe the peculiarity of a data point in the identification step. One can study the notions at both attribute and record levels. In this paper, a new record LPF called distance-based record LPF (D-record LPF) is proposed, which is defined as the sum of distances between a point and its nearest neighbors. The authors prove that D-record LPF can characterize the probability density of a continuous m-dimensional distribution accurately. This provides a theoretical basis for some existing distance-based anomaly detection techniques. More importantly, it also provides an effective method for describing the class-conditional probabilities in a Bayesian classifier. The result enables us to apply D-record LPF to solve classification problems. A novel algorithm called LPF-Bayes classifier and its kernelized implementation are proposed, which have some connection to the Bayesian classifier. Experimental results on several benchmark datasets demonstrate that the proposed classifiers are competitive to some excellent classifiers such as AdaBoost, support vector machines and kernel Fisher discriminant.  相似文献   

15.
为解决传统核极限学习机算法参数优化困难的问题,提高分类准确度,提出一种改进贝叶斯优化的核极限学习机算法.用樽海鞘群设计贝叶斯优化框架中获取函数的下置信界策略,提高算法的局部搜索能力和寻优能力;用这种改进的贝叶斯优化算法对核极限学习机的参数进行寻优,用最优参数构造核极限学习机分类器.在UCI真实数据集上进行仿真实验,实验...  相似文献   

16.
针对相关向量机(RVM)算法分类精度低、核参数选择困难等问题,文中提出临界滑动阈值的概念并以其为基础将RVM与K近邻(KNN)算法结合构建分类器——KNN-RVM分类器。从理论上提出并证明KNN-RVM分类过程等价于带软间隔约束的支持向量机的分类过程、KNN-RVM分类器等价于每类只选一个代表点的1-NN分类器、KNN-RVM分类效果优于RVM这3个结论。对这3个不同数据集进行实验证明临界滑动阈值的临界性与滑动性及KNN-RVM分类器的准确性、适应性及全局最优性,提高分类精度,减轻算法对核参数的依赖性,进而证明KNN-RVM分类器是一种有效的分类器。  相似文献   

17.
杜超  王志海  江晶晶  孙艳歌 《软件学报》2017,28(11):2891-2904
基于模式的贝叶斯分类模型是解决数据挖掘领域分类问题的一种有效方法.然而,大多数基于模式的贝叶斯分类器只考虑模式在目标类数据集中的支持度,而忽略了模式在对立类数据集合中的支持度.此外,对于高速动态变化的无限数据流环境,在静态数据集下的基于模式的贝叶斯分类器就不能适用.为了解决这些问题,提出了基于显露模式的数据流贝叶斯分类模型EPDS(Bayesian classifier algorithm based on emerging pattern for data stream).该模型使用一个简单的混合森林结构来维护内存中事务的项集,并采用一种快速的模式抽取机制来提高算法速度.EPDS采用半懒惰式学习策略持续更新显露模式,并为待分类事务在每个类下建立局部分类模型.大量实验结果表明,该算法比其他数据流分类模型有较高的准确度.  相似文献   

18.
Qualitatively, a filter is said to be “robust” if its performance degradation is acceptable for distributions close to the one for which it is optimal, that is, the one for which it has been designed. This paper adapts the signal-processing theory of optimal robust filters to classifiers. The distribution (class conditional distributions) to which the classifier is to be applied is parameterized by a state vector and the principle issue is to choose a design state that is optimal in comparison to all other states relative to some measure of robustness. A minimax robust classifier is one whose worst performance over all states is better than the worst performances of the other classifiers (defined at the other states). A Bayesian robust classifier is one whose expected performance is better than the expected performances of the other classifiers. The state corresponding to the Bayesian robust classifier is called the maximally robust state. Minimax robust classifiers tend to give too much weight to states for which classification is very difficult and therefore our effort is focused on Bayesian robust classifiers. Whereas the signal-processing theory of robust filtering concentrates on design with full distributional knowledge and a fixed number of observation variables (features), design via training from sample data and feature selection are so important for classification that robustness optimality must be considered from these perspectives—in particular, for small samples. In this context, for a given sample size, we will be concerned with the maximally robust state-feature pair. All definitions are independent of the classification rule; however, applications are only considered for linear and quadratic discriminant analysis, for which there are parametric forms for the optimal discriminants.  相似文献   

19.
There are two standard approaches to the classification task: generative, which use training data to estimate a probability model for each class, and discriminative, which try to construct flexible decision boundaries between the classes. An ideal classifier should combine these two approaches. In this paper a classifier combining the well-known support vector machine (SVM) classifier with regularized discriminant analysis (RDA) classifier is presented. The hybrid classifier is used for protein structure prediction which is one of the most important goals pursued by bioinformatics. The obtained results are promising, the hybrid classifier achieves better result than the SVM or RDA classifiers alone. The proposed method achieves higher recognition ratio than other methods described in the literature.  相似文献   

20.
基于Fisher判别分析的贝叶斯分类器   总被引:1,自引:0,他引:1       下载免费PDF全文
曹玲玲  潘建寿 《计算机工程》2011,37(10):162-164
针对满足“类条件属性相互独立”假定的经典贝叶斯分类器无法有效利用类间信息的缺陷,结合Fisher线性判别分析,给出一种基于Fisher线性判别分析的贝叶斯分类器的改进算法。该算法通过寻找类与类最大分离的投影空间,将原样本向最大分离空间投影,以获得新样本,并采用贝叶斯分类器对新样本进行分类。实验结果表明,在给定的数据集上,该贝叶斯分类器的分类正确率较高,分类性能较好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号