首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
针对蒸馏工艺过程中变量多且变量间关系呈强耦合性及高度非线性的特点,提出一种基于PSO-K均值聚类的石脑油干点值多模型软测量建模方法。分析石脑油干点值的影响因素,采集相关辅助变量。用基于PSO改进的K均值聚类算法将现场采集数据进行划分,得到样本子集。再将得到的各个样本子集分别用SVM算法进行训练,建立石脑油干点值的预测子模型。在此模型基础上,通过计算预测样本与各子模型训练样本聚类中心的欧氏距离,采用模型切换的方法选择预测模型。仿真结果表明该方法避免了分类时K均值算法易陷入局部极值的问题,可以有效预测常压塔石脑油干点值,与单个全局模型相比有更好的精度与泛化能力。  相似文献   

2.
为了解决核主分量分析方法处理大训练样本集时计算代价巨大的问题,在采用子集划分的KPCA算法基础上,提出采用核聚类划分子集,并用每个子集的协方差矩阵的特征值累积贡献率作为标准来选取相应的特征向量.分别在人工和实际数据集上测试,实验结果显示在同一累积贡献率和给定子集个数的条件下,采用核聚类划分子集总能得到较小尺寸的核矩阵,而核矩阵尺寸的减小有助于改善测试样本的特征提取速度以及降低特征分解核矩阵的时间复杂度.  相似文献   

3.
本文提出了一种入侵检测系统中训练样本集的构造方法,首先通过保留边界样本和删除内部样本进行样本选择,然后使用遗传算法与凝聚聚类算法相结合的方法对样本数量较少的类构造虚拟样本。这样得到的训练子集样本数量少,而且分布均匀。  相似文献   

4.
用于文本分类的改进KNN算法   总被引:1,自引:1,他引:1  
采用灵敏度方法对距离公式中文本特征的权重进行修正;提出一种基于CURE算法和Tabu算法的训练样本库的裁减方法,采用CURE聚类算法获得每个聚类的代表样本组成新的训练样本集合,然后用Tabu算法对此样本集合进行进一步维护(添加或删除样本),添加样本时只考虑增加不同类交界处的样本,添加或删除样本以分类精度最高、与原始训练样本库距离最近为原则。  相似文献   

5.
入侵检测系统中训练样本集的构造方法   总被引:2,自引:0,他引:2  
张莉  陈恭和 《计算机工程与应用》2006,42(28):145-146,180
以入侵检测系统中的分类器设计为例,研究分类器的训练样本构造问题。提出了一种适合样本分布不均匀、海量数据集的训练样本子集构造方法,首先通过保留边界样本,删除内部样本,对样本数量较多的类,进行选择样本;然后对样本数量较少的类构造虚拟样本。通过这两个过程得到的训练子集样本数量较少,且样本分布均匀。  相似文献   

6.
水下航行器的噪声源识别具有训练样本有限,存在偶发或突变噪声源等特点。本文针对这些特点,在具有增量学习能力的水下航行器的噪声源识别系统架构下,提出了一种参数自适应可调的基于密度的聚类算法。实验表明,该算法可以有效避免基于密度的聚类算法的参数敏感性对聚类结果的不良影响,在无监督情况下对水下航行器的机械噪声源样本进行有效聚类。通过该聚类算法标注后的样本可直接作为具有增量学习结构的分类器的训练样本,节省了时间和系统开销。  相似文献   

7.
实际生活中,经常会遇到大规模数据的分类问题,传统k-近邻k-NN(k-Nearest Neighbor)分类方法需要遍历整个训练样本集,因此分类效率较低,无法处理具有大规模训练集的分类任务。针对这个问题,提出一种基于聚类的加速k-NN分类方法 C_kNN(Speeding k-NN Classification Method Based on Clustering)。该方法首先对训练样本进行聚类,得到初始聚类结果,并计算每个类的聚类中心,选择与聚类中心相似度最高的训练样本构成新的训练样本集,然后针对每个测试样本,计算新训练样本集中与其相似度最高的k个样本,并选择该k个近邻样本中最多的类别标签作为该测试样本的预测模式类别。实验结果表明,C_k-NN分类方法在保持较高分类精度的同时大幅度提高模型的分类效率。  相似文献   

8.
针对SVM方法在大样本情况下学习和分类速度慢的问题,提出了大样本情况下的一种新的SVM迭代训练算法。该算法利用K均值聚类算法对训练样本集进行压缩,将聚类中心作为初始训练样本集,减少了样本间的冗余,提高了学习速度。同时为了保证学习的精度,采用往初始训练样本集中加入边界样本和错分样本的策略来更新训练样本集,迭代训练直到错分样本数目不变为止。该文提出的基于K均值聚类的SVM迭代算法能在保持学习精度的同时,减小训练样本集及决策函数的支持向量集的规模,从而提高学习和分类的速度。  相似文献   

9.
周玉 《计算机应用研究》2021,38(6):1683-1688
为了提高神经网络分类器的性能,提出一种基于K均值聚类的分段样本数据选择方法.首先通过K均值聚类把训练样本根据已知的类别数进行聚类,对比聚类前后的各类样本,找出聚类错误的样本集和聚类正确的样本集;聚类正确的样本集根据各样本到聚类中心的距离进行排序并均分为五段,挑选各类的奇数段样本和聚类错误的样本构成新的训练样本集.该方法能够提取信息量大的样本,剔除冗余样本,减少样本数量的同时提高样本质量.利用该方法,结合人工和UCI数据集对三种不同的神经网络分类器进行了仿真实验,实验结果显示在训练样本平均压缩比为66.93%的前提下,三种神经网络分类器的性能都得到了提高.  相似文献   

10.
基于聚类和支持向量机的话务量预测模型   总被引:1,自引:0,他引:1  
针对利用单因素时问序列模型进行话务量预测的不足,建立基于模糊C均值(FCM)聚类和支持向量机(SVM)的多元回归话务量预测模型.模型使用FCM算法对话务量的原始样本集聚类,选择与待预测样本特征最相似的样本子集作为训练集.使用SVM训练样本,通过决策回归函数预测话务量.实际话务量数据验证表明,该方法较周期时间序列和神经网络预测方法具有更高的预测精度和泛化能力.  相似文献   

11.
When the maximum likelihood approach (ML) is used during the calculation of the Discrete Hidden Markov Model (DHMM) parameters, DHMM parameters of the each class are only calculated using the training samples (positive training samples) of the same class. The training samples (negative training samples) not belonging to that class are not used in the calculation of DHMM model parameters. With the aim of supplying that deficiency, by involving the training samples of all classes in calculating processes, a Rocchio algorithm based approach is suggested. During the calculation period, in order to determine the most appropriate values of parameters for adjusting the relative effect of the positive and negative training samples, a Genetic algorithm is used as an optimization technique. The purposed method is used to classify the internal carotid artery Doppler signals recorded from 136 patients as well as of 55 healthy people. Our proposed method reached 97.38% classification accuracy with fivefold cross-validation (CV) technique. The classification results showed that the proposed method was effective for the classification of internal carotid artery Doppler signals.  相似文献   

12.
为改进SVM对不均衡数据的分类性能,提出一种基于拆分集成的不均衡数据分类算法,该算法对多数类样本依据类别之间的比例通过聚类划分为多个子集,各子集分别与少数类合并成多个训练子集,通过对各训练子集进行学习获得多个分类器,利用WE集成分类器方法对多个分类器进行集成,获得最终分类器,以此改进在不均衡数据下的分类性能.在UCI数据集上的实验结果表明,该算法的有效性,特别是对少数类样本的分类性能.  相似文献   

13.
为解决垃圾网页检测过程中的“维数灾难”和不平衡分类问题,提出一种基于免疫克隆特征选择和欠采样(US)集成的二元分类器算法。首先,使用欠采样技术将训练样本集大类抽样成多个与小类样本数相近的样本集,再将其分别与小类样本合并构成多个平衡的子训练样本集;然后,设计一种免疫克隆算法遴选出多个最优的特征子集;基于最优特征子集对平衡的子样本集进行投影操作,生成平衡数据集的多个视图;最后,用随机森林(RF)分类器对测试样本进行分类,采用简单投票法确定测试样本的最终类别。在WEBSPAM UK-2006数据集上的实验结果表明,该集成分类器算法应用于垃圾网页检测:与随机森林算法及其Bagging和AdaBoost集成分类器算法相比,准确率、F1测度、AUC等指标均提高11%以上;与其他最优的研究结果相比,该集成分类器算法在F1测度上提高2%,在AUC上达到最优。  相似文献   

14.
Support vector machines (SVMs) are a class of popular classification algorithms for their high generalization ability. However, it is time-consuming to train SVMs with a large set of learning samples. Improving learning efficiency is one of most important research tasks on SVMs. It is known that although there are many candidate training samples in some learning tasks, only the samples near decision boundary which are called support vectors have impact on the optimal classification hyper-planes. Finding these samples and training SVMs with them will greatly decrease training time and space complexity. Based on the observation, we introduce neighborhood based rough set model to search boundary samples. Using the model, we firstly divide sample spaces into three subsets: positive region, boundary and noise. Furthermore, we partition the input features into four subsets: strongly relevant features, weakly relevant and indispensable features, weakly relevant and superfluous features, and irrelevant features. Then we train SVMs only with the boundary samples in the relevant and indispensable feature subspaces, thus feature and sample selection is simultaneously conducted with the proposed model. A set of experimental results show the model can select very few features and samples for training; in the mean time the classification performances are preserved or even improved.  相似文献   

15.
In this paper, we propose a very simple and fast face recognition method and present its potential rationale. This method first selects only the nearest training sample, of the test sample, from every class and then expresses the test sample as a linear combination of all the selected training samples. Using the expression result, the proposed method can classify the testing sample with a high accuracy. The proposed method can classify more accurately than the nearest neighbor classification method (NNCM). The face recognition experiments show that the classification accuracy obtained using our method is usually 2–10% greater than that obtained using NNCM. Moreover, though the proposed method exploits only one training sample per class to perform classification, it might obtain a better performance than the nearest feature space method proposed in Chien and Wu (IEEE Trans Pattern Anal Machine Intell 24:1644–1649, 2002), which depends on all the training samples to classify the test sample. Our analysis shows that the proposed method achieves this by modifying the neighbor relationships between the test sample and training samples, determined by the Euclidean metric.  相似文献   

16.
17.
现有的在线流特征选择算法通常选择一个最优的全局特征子集,并假设该子集适用于样本空间的所有区域.但是,样本空间的每个区域都使用独有的特征子集进行准确描述,这些特征子集的特征和大小可能有所不同.因此,文中提出基于最大决策边界的局部在线流特征选择算法.引入局部特征选择,在充分利用局部信息的基础上,设计基于最大决策边界的特征衡量标准,尽可能分开同类样本和不同类样本.同时,使用最大化平均决策边界、最大化决策边界和最小化冗余3种策略选择合适的特征.针对局部区域选择最优的特征子集,然后使用类相似度测量方法进行分类.在14个数据集上的实验结果和统计假设检验验证文中算法的分类有效性和稳定性.  相似文献   

18.
Conventional representation methods try to express the test sample as a weighting sum of training samples and exploit the deviation between the test sample and the weighting sum of the training samples from each class (also referred to as deviation between the test sample and each class) to classify the test sample. In particular, the methods assign the test sample to the class that has the smallest deviation among all the classes. This paper analyzes the relationship between face images under different poses and, for the first time, devises a bidirectional representation method-based pattern classification (BRBPC) method for face recognition across pose. BRBPC includes the following three steps: the first step uses the procedure of conventional representation methods to express the test sample and calculates the deviation between the test sample and each class. The second step first expresses the training sample of a class as a weighting sum of the test sample and the training samples from all the other classes and then obtains the corresponding deviation (referred to as complementary deviation). The third step uses the score-level fusion to integrate the scores, that is, deviations generated from the first and second steps for final classification. The experimental results show that BRBPC classifies more accurately than conventional representation methods.  相似文献   

19.
Synthesizing handwritten-style characters is an interesting issue in today’s handwriting analysis field. The purpose of this study is to artificially generate training data, foster a deep understanding of human handwriting, and promote the use of the handwritten-style computer fonts, in which the individuality or variety of the synthesized characters is considered important. Research considering such two properties together, however, is very rare. In this paper, a handwriting model is proposed to synthesize various handwritten characters while preserving the writer’s individuality from a limited number of training data, using a statistical approach. The proposed model is verified in single- and multiple-stroke characters, such as Arabic numbers, small English letters, and Japanese Kanji letters. Synthesized characters are evaluated in three ways. First, they are analyzed visually using the selected samples, and the relationship between the training and synthesized characters is explained. Second, the personalities and varieties of all the data are evaluated using a conventional writer verification method. Third, a questionnaire is developed and administered to evaluate the subjective responses of the users regarding the personal styles of the synthesized characters. The results prove that the proposed model stably synthesizes personalized characters by being invariant to the number of training data, whereas the variety increases gradually as the data increase.  相似文献   

20.
杨新锋  刘平 《计算机仿真》2012,29(1):238-241
研究人脸识别和跟踪准确度问题。针对在使用大数据样本进行训练前提下,以往AAM算法人脸识别与定位不准确的缺陷,提出了一种新的利用聚类算法对样本空间进行划分,并在此基础上训练多个AAM的分层人脸识别和跟踪算法。首先利用所有训练样本训练得到初始AAM,然后利用一个全新的相似度计算公式,将所有训练样本划分成若干个子类别,在此基础上,针对每个子类别,训练一个相对稳定的AAM。在识别与跟踪过程中,先使用初始AAM进行定位,然后根据子类AAM进行精细化定位,从而得到比以往算法更为精确的定位效果。仿真结果显示改进的算法能准确定位出人脸所在位置,并且具有很高的运算效率,可以方便的实现实时监控系统的人脸跟踪定位及识别等目标。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号