首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
基于增量式贝叶斯模型的中文问句分类研究   总被引:1,自引:0,他引:1  
固定训练集生成的分类器性能不理想且不能跟踪用户需求,为此,提出一种将增量式贝叶斯思想用于问句分类的方法。采用遗传算法选取最优特征子集优化分类器,从而避免训练集特征过分冗余,使分类器在学习过程中动态地扩大训练集并修改分类器参数。在对问句进行分类时,提取问句的疑问词、句法结构、疑问意向词和疑问意向词在知网的首项义原作为分类特征。为了验证增量式贝叶斯方法的有效性,从语料库中随机抽取不同规模的问句构成增量集,基于不同的增量集对同一测试集中的问句进行分类。实验结果表明,增量式贝叶斯分类器较朴素贝叶斯分类器有更高的分类精度,大类和小类的准确率分别达到90.2%和76.3%,在提高准确率的同时优化了运行效率。  相似文献   

2.
在处理大规模数据时,近似支持向量机及其增量式版本(ISVM)是一种比传统支持向量机更加简单而有效的分类器.但在处理高维数据时,由于ISVM通过计算矩阵的逆来更新模型参数,这使得其计算效果有待提高.针对上述问题,本文提出了基于最小二乘法的增量式方法.该增量式方法通过对矩阵运算的恒等推导,把矩阵求逆问题转变成了除法运算,得到了简单的模型参数更新公式,从而获得了和ISVM同样的预测精度,且在处理高维数据时运行效率更高.在合成数据及图像和生物数据上的试验表明该增量式方法优于ISVM方法.  相似文献   

3.
用于不平衡数据分类的0阶TSK型模糊系统   总被引:3,自引:0,他引:3  
顾晓清  蒋亦樟  王士同 《自动化学报》2017,43(10):1773-1788
处理不平衡数据分类时,传统模糊系统对少数类样本识别率较低.针对这一问题,首先,在前件参数学习上,提出了竞争贝叶斯模糊聚类(Bayesian fuzzy clustering based on competitive learning,BFCCL)算法,BFCCL算法考虑不同类别样本聚类中心间的排斥作用,采用交替迭代的执行方式并通过马尔科夫蒙特卡洛方法获得模型参数最优解.其次,在后件参数学习上,基于大间隔的策略并通过参数调节使得少数类到分类面的距离大于多数类到分类面的距离,该方法能有效纠正分类面的偏移.基于上述思想以0阶TSK型模糊系统为具体研究对象构造了适用于不平衡数据分类问题的0阶TSK型模糊系统(0-TSK-IDC).人工和真实医学数据集实验结果表明,0-TSK-IDC在不平衡数据分类问题中对少数类和多数类均具有较高的识别率,且具有良好的鲁棒性和可解释性.  相似文献   

4.
为有效发现道路交通拥堵状态,提出基于增量式贝叶斯分类器的交通拥堵判别方法.该方法把交通拥堵是否发生看成是特殊的分类问题,选取增量式贝叶斯分类器,根据以往是否发生交通拥堵的检测数据,即分别把在发生交通拥堵和不发生交通拥堵两种情况下的交通参数作为特征参数对其进行训练,然后用得到的分类器对检测到的交通参数进行分类,判别是否发生交通拥堵.微观交通仿真数据表明该方法的可行性和有效性.  相似文献   

5.
利用Learn++思想对Cascade组合分类器进行了改进,提出了一种基于Cascade的增量式组合分类算法,并将之应用到肝脏图像的分类中。实验结果表明,与原有组合分类器相比,该增量式组合分类方法可以在保证分类准确度的前提下有效地提高新增样本的学习效率。  相似文献   

6.
贝叶斯在训练样本不完备的情况下,对未知类别新增训练集进行增量学习时,会将分类错误的训练样本过早地加入到分类器中而降低其性能,另外增量学习采用固定的置信度评估参数会使其效率低下,泛化性能不稳定.为解决上述问题,提出一种动态置信度的序列选择增量学习方法.首先,在现有的分类器基础上选出分类正确的文本组成新增训练子集.其次,利用置信度动态监控分类器性能来对新增训练子集进行批量实例选择.最后,通过选择合理的学习序列来强化完备数据的积极影响,弱化噪声数据的消极影响,并实现对测试文本的分类.实验结果表明,本文提出的方法在有效提高分类精度的同时也能明显改善增量学习效率.  相似文献   

7.
周塔  邓赵红  蒋亦樟  王士同 《软件学报》2020,31(11):3506-3518
利用重构训练样本空间的手段,提出一种多训练模块Takagi-Sugeno-Kang (TSK)模糊分类器H-TSK-FS.它具有良好的分类性能和较高的可解释性,可以解决现有层次模糊分类器中间层输出和模糊规则难以解释的难题.为了实现良好的分类性能,H-TSK-FS由多个优化零阶TSK模糊分类器组成.这些零阶TSK模糊分类器内部采用一种巧妙的训练方式.原始训练样本、上一层训练样本中的部分样本点以及所有已训练层中最逼近真实值的部分决策信息均被投影到当前层训练模块中,并构成其输入空间.通过这种训练方式,前层的训练结果对后层的训练起到引导和控制作用.这种随机选取样本点、在一定范围内随机选取训练特征的手段可以打开原始输入空间的流形结构,保证较好或相当的分类性能.另外,该研究主要针对少量样本点且训练特征数不是很大的数据集.在设计每个训练模块时采用极限学习机获取模糊规则后件参数.对于每个中间训练层,采用短规则表达知识.每条模糊规则则通过约束方式确定不固定的输入特征以及高斯隶属函数,目的是保证所选输入特征具有高可解释性.真实数据集和应用案例实验结果表明,H-TSK-FS具有良好的分类性能和高可解释性.  相似文献   

8.
k近邻分类(kNN)是一种简单而有效的非参数分类算法, 但存在着参数需要人工确定, 没有显式构建分类模型造成存储空间大、分类效率低, 且易受到“维灾”效应影响等缺点. 针对这些缺点, 提出一种高效的近邻分类新方法, 构造了两个新的近邻分类器. 新方法使用由K均值聚类产生的优化的簇原型集合为分类模型, 减少了存储空间的同时提高了分类效率; 提出三种类重叠分析策略并引入模糊基准度量以减轻维灾影响. 以该分类模型学习方法为基础, 提出一种新的kNN分类器和组合朴素贝叶斯的新分类器, 算法涉及的参数都可以自动确定. 在人工和现实数据集上进行的实验表明, 新分类器具有良好的分类效率和分类准确率.  相似文献   

9.
针对分层Takagi-Sugeno-Kang(TSK)模糊分类器可解释性差,以及当增加或删除一个TSK模糊子分类器时Boosting模糊分类器需要重新训练所有TSK模糊子分类器等问题,提出一种并行集成具有高可解释的TSK模糊分类器EP-Q-TSK.该集成模糊分类器每个TSK模糊子分类器可以使用最小学习机(LLM)被并行地快速构建.作为一种新的集成学习方式,该分类器利用每个TSK模糊子分类器的增量输出来扩展原始验证数据空间,然后采用经典的模糊聚类算法FCM获取一系列代表性中心点,最后利用KNN对测试数据进行分类.在标准UCI数据集上,分别从分类性能和可解释性两方面验证了EP-Q-TSK的有效性.  相似文献   

10.
识别癫痫脑电信号的关键在于获取有效的特征和构建可解释的分类器.为此,提出一种基于增强深度特征的TSK模糊分类器(ED-TSK-FC).首先,ED-TSK-FC使用一维卷积神经网络(1D-CNN)自动获取癫痫脑电信号的深度特征与潜在类别信息,并将深度特征和潜在类别信息合并为增强深度特征;其次,将增强深度特征作为ED-TSK-FC模糊规则前件与后件部分的训练变量,保证原始输入的深度特征及其潜在意义都出现在模糊规则中,进而对增强深度特征作出良好的解释;然后,采用岭回归极限学习算法对模糊规则的后件参数进行快速求解,在不显著降低分类准确度的情况下,ED-TSK-FC的廉价训练方法可以缩短模型的训练时间;最后,在Bonn癫痫数据集上,分别从分类性能、学习效率和可解释性3个方面,验证ED-TSK-FC的优越性.  相似文献   

11.
Incremental learning has been used extensively for data stream classification. Most attention on the data stream classification paid on non-evolutionary methods. In this paper, we introduce new incremental learning algorithms based on harmony search. We first propose a new classification algorithm for the classification of batch data called harmony-based classifier and then give its incremental version for classification of data streams called incremental harmony-based classifier. Finally, we improve it to reduce its computational overhead in absence of drifts and increase its robustness in presence of noise. This improved version is called improved incremental harmony-based classifier. The proposed methods are evaluated on some real world and synthetic data sets. Experimental results show that the proposed batch classifier outperforms some batch classifiers and also the proposed incremental methods can effectively address the issues usually encountered in the data stream environments. Improved incremental harmony-based classifier has significantly better speed and accuracy on capturing concept drifts than the non-incremental harmony based method and its accuracy is comparable to non-evolutionary algorithms. The experimental results also show the robustness of improved incremental harmony-based classifier.  相似文献   

12.
In this paper, we investigate a comprehensive learning algorithm for text classification without pre-labeled training set based on incremental learning. In order to overcome the high cost in getting labeled training examples, this approach reforms fuzzy partition clustering to obtain a small quantity of labeled training data. Then the incremental learning of Bayesian classifier is applied. The model of the proposed classifier is composed of a Naïve-Bayes-based incremental learning algorithm and a modified fuzzy partition clustering method. For improved efficiency, a feature reduction is designed based on the Quadratic Entropy in Mutual Information. We perform experiments to demonstrate the performance of the approach, and the results show that our approach is feasible and effective.  相似文献   

13.
Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.  相似文献   

14.
An effective neuro-fuzzy paradigm for machinery condition healthmonitoring   总被引:2,自引:0,他引:2  
An innovative neuro-fuzzy network appropriate for fault detection and classification in a machinery condition health monitoring environment is proposed. The network, called an incremental learning fuzzy neural (ILFN) network, uses localized neurons to represent the distributions of the input space and is trained using a one-pass, on-line, and incremental learning algorithm that is fast and can operate in real time. The ILFN network employs a hybrid supervised and unsupervised learning scheme to generate its prototypes. The network is a self-organized structure with the ability to adaptively learn new classes of failure modes and update its parameters continuously while monitoring a system. To demonstrate the feasibility and effectiveness of the proposed network, numerical simulations have been performed using some well-known benchmark data sets, such as the Fisher's Iris data and the Deterding vowel data set. Comparison studies with other well-known classifiers were performed and the ILFN network was found competitive with or even superior to many existing classifiers. The ILFN network was applied on the vibration data known as Westland data set collected from a U.S. Navy CH-46E helicopter test stand, in order to assess its efficiency in machinery condition health monitoring. Using a simple fast Fourier transform (FFT) technique for feature extraction, the ILFN network has shown promising results. With various torque levels for training the network, 100% correct classification was achieved for the same torque Levels of the test data.  相似文献   

15.
We propose a two-layer decision fusion technique, called Fuzzy Stacked Generalization (FSG) which establishes a hierarchical distance learning architecture. At the base-layer of an FSG, fuzzy k-NN classifiers receive different feature sets each of which is extracted from the same dataset to gain multiple views of the dataset. At the meta-layer, first, a fusion space is constructed by aggregating decision spaces of all the base-layer classifiers. Then, a fuzzy k-NN classifier is trained in the fusion space by minimizing the difference between the large sample and N-sample classification error. In order to measure the degree of collaboration among the base-layer classifiers and the diversity of the feature spaces, a new measure called, shareability, is introduced. Shearability is defined as the number of samples that are correctly classified by at least one of the base-layer classifiers in FSG. In the experiments, we observe that FSG performs better than the popular distance learning and ensemble learning algorithms when the shareability measure is large enough such that most of the samples are correctly classified by at least one of the base-layer classifiers. The relationship between the proposed and state-of-the-art diversity measures is experimentally analyzed. The tests performed on a variety of artificial and real-world benchmark datasets show that the classification performance of FSG increases compared to that of state-of-the art ensemble learning and distance learning methods as the number of classes increases.  相似文献   

16.
范莹  计华  张化祥 《计算机应用》2008,28(5):1204-1207
提出一种新的基于模糊聚类的组合分类器算法,该算法利用模糊聚类技术产生训练样本的分布特征,据此为每一个样本赋予一个权值,来确定它们被采样的概率,利用采样样本训练的分类器调整训练集的采样概率,依次生成新的分类器直至达到一定的精度。该组合分类器算法在UCI的多个标准数据集上进行了测试,并与Bagging和AdaBoost算法进行了比较,实验结果表明新的算法具有更好的健壮性和更高的分类精度。  相似文献   

17.
《Applied Soft Computing》2008,8(1):543-554
This paper presents a hybrid neural network classifier of fuzzy ARTMAP (FAM) and the dynamic decay adjustment (DDA) algorithm. The proposed FAMDDA model is a conflict-resolving classifier that can perform stable and incremental learning while settling overlapping of hyper-rectangular prototypes of different classes in minimizing misclassification rates. The performance of FAMDDA is evaluated using a number of benchmark data sets. The results are analyzed and compared with those from FAM and a number of machine learning classifiers. The outcomes show that FAMDDA has a better generalization capability than FAM, and its performance is comparable with those from other classifiers. The effectiveness of FAMDDA is also demonstrated in an application pertaining to condition monitoring of a circulating water system in a power generation station. Implications on the effectiveness of FAMDDA from the application point of view are discussed.  相似文献   

18.
将集成学习的思想引入到增量学习之中可以显著提升学习效果,近年关于集成式增量学习的研究大多采用加权投票的方式将多个同质分类器进行结合,并没有很好地解决增量学习中的稳定-可塑性难题。针对此提出了一种异构分类器集成增量学习算法。该算法在训练过程中,为使模型更具稳定性,用新数据训练多个基分类器加入到异构的集成模型之中,同时采用局部敏感哈希表保存数据梗概以备待测样本近邻的查找;为了适应不断变化的数据,还会用新获得的数据更新集成模型中基分类器的投票权重;对待测样本进行类别预测时,以局部敏感哈希表中与待测样本相似的数据作为桥梁,计算基分类器针对该待测样本的动态权重,结合多个基分类器的投票权重和动态权重判定待测样本所属类别。通过对比实验,证明了该增量算法有比较高的稳定性和泛化能力。  相似文献   

19.
经典的模糊c均值聚类算法对非球型或椭球型分布的数据集进行聚类效果较差。将经典的模糊c均值聚类中的欧氏距离用Mahalanobis距离替代,利用Mahalanobis距离的优点,将其用于增量学习中,提出一种基于马氏距离的模糊增量聚类学习算法。实验结果表明该算法能较有效地解决模糊聚类方法中的缺陷,提高了训练精度。  相似文献   

20.
刘波  潘久辉 《计算机工程》2008,34(19):187-188
针对在维护数据挖掘模型过程中须反复计算数据集、效率较低的问题,基于Ensembles学习思想,研究增量数据集的弱分类器生成方法,根据增量数据集分类器之间的相异度提出新的组合分类算法,分析组合分类器的出错率。实验结果表明,该分类方法是有效的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号