首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 171 毫秒
1.
高斯混合模型是一种含隐变量的概率图模型,其参数通常由EM算法迭代训练得到.本文在简单推导高斯混合模型的EM算法后,将使用高斯混合模型对鸢尾花(iris)数据集进行分类判别.同时,针对EM算法受初始值影响大的问题,使用了K均值聚类算法作为其初始值的估计方法.在得到K均值聚类算法和EM算法的分类判别结果后,对比两种算法的判...  相似文献   

2.
针对朴素贝叶斯分类算法中缺失数据填补问题,提出一种基于改进EM(Expectation Maximization)算法的朴素贝叶斯分类算法。该算法首先根据灰色相关度对缺失数据一个估计,估计值作为执行EM算法的初始值,迭代执行E步M步后完成缺失数据的填补,然后用朴素贝叶斯分类算法对样本进行分类。实验结果表明,改进算法具有较高的分类准确度。并将改进的算法应用于高校教师岗位等级的评定。  相似文献   

3.
基于EM和贝叶斯网络的丢失数据填充算法   总被引:2,自引:0,他引:2  
实际应用中存在大量的丢失数据的数据集,对丢失数据的处理已成为目前分类领域的研究热点。分析和比较了几种通用的丢失数据填充算法,并提出一种新的基于EM和贝叶斯网络的丢失数据填充算法。算法利用朴素贝叶斯估计出EM算法初值,然后将EM和贝叶斯网络结合进行迭代确定最终更新器,同时得到填充后的完整数据集。实验结果表明,与经典填充算法相比,新算法具有更高的分类准确率,且节省了大量开销。  相似文献   

4.
蔡崇超  王士同 《计算机应用》2007,27(5):1235-1237
在Bernoulli混合模型和期望最大化(EM)算法的基础上给出了一种基于不完整数据的改进方法。首先在已标记数据的基础上通过Bernoulli混合模型和朴素贝叶斯算法得到似然函数参数估计初始值, 然后利用含有权值的EM算法对分类器的先验概率模型进行参数估计,得到最终的分类器。实验结果表明,该方法在准确率和查全率方面要优于朴素贝叶斯文本分类。  相似文献   

5.
摘 要: 多维分类根据数据实例的特征向量将数据实例在多个维度上进行分类,具有广泛的应用前景。在多维分类算法的模型学习过程中,海量的训练数据使得准确的分类算法需要很长的模型训练时间。为了提高多维分类的执行效率,同时保持高的预测准确性,本文提出了一种基于贝叶斯网络的多维分类学习方法。首先,将多维分类问题描述为条件概率分布问题。其次,根据类别向量之间的依赖关系建立了条件树贝叶斯网络模型。最后,根据训练数据集对条件树贝叶斯网络模型的结构和参数进行学习,并提出了一种多维分类预测算法。大量的真实数据集实验表明,本文提出的方法与当前最好的多维分类算法MMOC相比,在保持高准确性的同时将模型的训练时间降低了两个数量级。因此,本文提出的方法更适用于海量数据的多维分类应用中。  相似文献   

6.
随着工控设备越来越多暴露于互联网,面临的安全威胁不断增加,主动防御已经成为一种必要的防御手段,蜜罐技术是一种有效的主动防御技术。攻击者为了攻击真实的资产设备,研究人员开始研究识别蜜罐的方法。对蜜罐进行准确识别涉及到许多不确定性因素。贝叶斯网络用于解决不确定性问题,与蜜罐识别问题相符合。基于蜜罐识别与贝叶斯网络的特点,提出了贝叶斯网络参数学习EM算法模型的工控蜜罐识别方法。首先,介绍了贝叶斯网络的理论基础及贝叶斯网络用于蜜罐识别的优势;接着,描述参数建模所用算法及预测推理算法,完成用于识别蜜罐的贝叶斯网络模型;最后,通过与SVM、KNN、随机森林和Native bayes算法作对比实验,验证所采用贝叶斯网络EM算法训练模型的性能更优,该模型借助贝叶斯联结树推理算法来完成预测识别,通过实例分析进行验证。实验结果表明,用EM算法训练的模型对于识别蜜罐是有效的。  相似文献   

7.
基于改进SEM算法的基因调控网络构建方法*   总被引:1,自引:0,他引:1  
动态贝叶斯网络(DBN)是基因调控网络的一种有力建模工具。贝叶斯结构期望最大算法(SEM)能较好地处理构建基因调控网络中数据缺失的情况,但SEM算法学习的结果对初始参数设置依赖性强。针对此问题,提出一种改进的SEM算法,通过随机生成一些候选初始值,在经过一次迭代后得到的参数中选择一个最好的初始值作为模型的初始参数值,然后执行基本的SEM算法。利用啤酒酵母细胞周期微阵列表达数据,构建其基因调控网络并与现有文献比较,结果显示该算法进一步提高了调控网络构建的精度。  相似文献   

8.
针对传统高斯分布容易受到数据样本边缘值和离群点噪声的影响,改用t分布替代原有的高斯混合模型,并使用期望最大化(Expectation Maximization,EM)算法对网络流数据样本进行t分布混合模型的建模。为降低EM算法的迭代次数,对t分布混合模型进行了改进,用理论和实验验证了算法的有效性,并对网络多媒体业务流进行了分类研究。实验表明,提出的算法有较高的分类准确率,拟合的模型要优于传统的K-Means算法和传统的高斯混合模型的EM算法。  相似文献   

9.
对于建立动态贝叶斯网络(DBN)分类模型时,带有类标注样本数据集获得困难的问题,提出一种基于EM和分类损失的半监督主动DBN学习算法.半监督学习中的EM算法可以有效利用未标注样本数据来学习DBN分类模型,但是由于迭代过程中易于加入错误的样本分类信息而影响模型的准确性.基于分类损失的主动学习借鉴到EM学习中,可以自主选择有用的未标注样本来请求用户标注,当把这些样本加入训练集后能够最大程度减少模型对未标注样本分类的不确定性.实验表明,该算法能够显著提高DBN学习器的效率和性能,并快速收敛于预定的分类精度.  相似文献   

10.
为了清理互联网与移动通信网络所带来的不良诈骗信息,使用文本分类技术来识别电信诈骗信息。采用中文分词技术(jieba)对数据样本的中文信息进行分词,用TF-IDF算法提取电信诈骗信息的特征,向量空间模型(VSM)构建文本内容的特征,选取朴素贝叶斯分类算法的伯努利模型和多项式模型,分别训练数据并对比测试得出各自对电信诈骗信息的识别效果评估。  相似文献   

11.
It sometimes happens (for instance in case control studies) that a classifier is trained on a data set that does not reflect the true a priori probabilities of the target classes on real-world data. This may have a negative effect on the classification accuracy obtained on the real-world data set, especially when the classifier's decisions are based on the a posteriori probabilities of class membership. Indeed, in this case, the trained classifier provides estimates of the a posteriori probabilities that are not valid for this real-world data set (they rely on the a priori probabilities of the training set). Applying the classifier as is (without correcting its outputs with respect to these new conditions) on this new data set may thus be suboptimal. In this note, we present a simple iterative procedure for adjusting the outputs of the trained classifier with respect to these new a priori probabilities without having to refit the model, even when these probabilities are not known in advance. As a by-product, estimates of the new a priori probabilities are also obtained. This iterative algorithm is a straightforward instance of the expectation-maximization (EM) algorithm and is shown to maximize the likelihood of the new data. Thereafter, we discuss a statistical test that can be applied to decide if the a priori class probabilities have changed from the training set to the real-world data. The procedure is illustrated on different classification problems involving a multilayer neural network, and comparisons with a standard procedure for a priori probability estimation are provided. Our original method, based on the EM algorithm, is shown to be superior to the standard one for a priori probability estimation. Experimental results also indicate that the classifier with adjusted outputs always performs better than the original one in terms of classification accuracy, when the a priori probability conditions differ from the training set to the real-world data. The gain in classification accuracy can be significant.  相似文献   

12.
It is an actual and challenging issue to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data, because some time labeled data are very difficult, time consuming and/or expensive to obtain. To solve this issue, in this paper we proposed two classification strategies to learn cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM). The first method, Direct-EM, uses EM to build a semi-supervised classifier, then directly computes the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM, modifies EM by incorporating misclassification cost into the probability estimation process. We conducted extensive experiments to evaluate the efficiency, and results show that when using only a small number of labeled training examples, the CS-EM outperforms the other competing methods on majority of the selected UCI data sets across different cost ratios, especially when cost ratio is high.  相似文献   

13.
针对基于固定阶Markov链模型的方法不能充分利用不同阶次子序列结构特征的问题,提出一种基于多阶Markov模型的符号序列贝叶斯分类新方法。首先,建立了基于多阶次Markov模型的条件概率分布模型;其次,提出一种附后缀表的n-阶子序列后缀树结构和高效的树构造算法,该算法能够在扫描一遍序列集过程中建立多阶条件概率模型;最后,提出符号序列的贝叶斯分类器,其训练算法基于最大似然法学习不同阶次模型的权重,分类算法使用各阶次的加权条件概率进行贝叶斯分类预测。在三个应用领域实际序列集上进行了系列实验,结果表明:新分类器对模型阶数变化不敏感;与使用固定阶模型的支持向量机等现有方法相比,所提方法在基因序列与语音序列上可以取得40%以上的分类精度提升,且可输出符号序列Markov模型最优阶数参考值。  相似文献   

14.
Estimation of the extent and spread of wildland fires is an important application of high spatial resolution multispectral images. This work addresses a fuzzy segmentation algorithm to map fire extent, active fire front, hot burn scar, and smoke regions based on a statistical model. The fuzzy results are useful data sources for integrated fire behavior and propagation models built using Dynamic Data Driven Applications Systems (DDDAS) concepts that use data assimilation techniques which require error estimates or probabilities for the data parameters. The Hidden Markov Random Field (HMRF) model has been used widely in image segmentation, but it is assumed that each pixel has a particular class label belonging to a prescribed finite set. The mixed pixel problem can be addressed by modeling the fuzzy membership process as a continuous Multivariate Gaussian Markov Random Field. Techniques for estimating the class membership and model parameters are discussed. Experimental results obtained by applying this technique to two Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) images show that the proposed methodology is robust with regard to noise and variation in fire characteristics as well as background. The segmentation results of our algorithm are compared with the results of a K-means algorithm, an Expectation Maximization (EM) algorithm (which is very similar to the Fuzzy C-Means Clustering algorithm with entropy regularization), and an MRF-MAP algorithm. Our fuzzy algorithm achieves more consistent segmentation results than the comparison algorithms for these test images with the added advantage of simultaneously providing a proportion or error map needed for the data assimilation problem.  相似文献   

15.
针对二元关联法(BR)未考虑标签之间相关性,容易造成分类器输出在训练集中不存在或次数较少标签的不足,提出了基于贝叶斯模型的多标签分类算法(MLBM)和马尔可夫型多标签分类算法(MMLBM)。首先,建立仿真模型分析BR算法的不足,考虑到标签的取值应由属性置信度和标签置信度共同决定,提出MLBM。其中,通过传统的分类算法计算获得属性置信度,以及通过训练集得到标签置信度。然后,考虑到MLBM在计算属性置信度时必须考虑所有已分类的标签,分类器的性能容易受无关或弱关系的标签影响,所以使用马尔可夫模型简化置信度的计算提出了MMLBM。理论分析和仿真实验表明,与BR算法相比,MMLBM的平均分类精度在emotions数据集上提高约4.8%,在yeast数据集上提高约9.8%,在flags数据集上提高约7.3%。实验结果表明,当数据集中实例的标签基数较大时,相对于BR算法,MMLBM的准确性有较大的提升。  相似文献   

16.
K nearest neighbor and Bayesian methods are effective methods of machine learning. Expectation maximization is an effective Bayesian classifier. In this work a data elimination approach is proposed to improve data clustering. The proposed method is based on hybridization of k nearest neighbor and expectation maximization algorithms. The k nearest neighbor algorithm is considered as the preprocessor for expectation maximization algorithm to reduce the amount of training data making it difficult to learn. The suggested method is tested on well-known machine learning data sets iris, wine, breast cancer, glass and yeast. Simulations are done in MATLAB environment and performance results are concluded.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号