首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
针对训练模式类标签不精确的识别问题,提出基于可传递信度模型的自适应模糊k-NN(k-Nearest Neighbor)分类器。利用可传递信度模型结合模糊集理论和可能性理论并运用pignistic变换,对待识别模式真正所属的类做出决策。采用梯度下降最小化误差函数,以实现参数的自适应学习。实验结果表明,该分类器误分类率低、鲁棒性强。  相似文献   

2.
针对传统的单分类器不适用于周期时间序列的异常检测,提出了一种基于移相加权球面单簇聚类的单分类器PS-WS1M-OCC.通过在聚类过程中增加高效的循环移位操作,解决了时间序列记录之间相似度计算的问题.另一方面,基于时间序列记 录的权重分布,提出了新的阈值自适应确定方法,从而使单分类器对训练集包含的异常数据和参数设置不敏感.实验表明,本文提出的单分类器可以用于周 期时间序列的异常检测;与传统的单分类器相比,可以成功地从包含异常数据的训练集中进行无监督学习,对训练集包含的异常数据鲁棒,并且对参数不敏感.  相似文献   

3.
杨帅  王浩  俞奎  曹付元 《软件学报》2023,34(7):3206-3225
稳定学习的目标是利用单一的训练数据构造一个鲁棒的预测模型,使其可以对任意与训练数据具有相似分布的测试数据进行精准的分类.为了在未知分布的测试数据上实现精准预测,已有的稳定学习算法致力于去除特征与类标签之间的虚假相关关系.然而,这些算法只能削弱特征与类标签之间部分虚假相关关系并不能完全消除虚假相关关系;此外,这些算法在构建预测模型时可能导致过拟合问题.为此,提出一种基于实例加权和双分类器的稳定学习算法,所提算法通过联合优化实例权重和双分类器来学习一个鲁棒的预测模型.具体而言,所提算法从全局角度平衡混杂因子对实例进行加权来去除特征与类标签之间的虚假相关关系,从而更好地评估每个特征对分类的作用.为了完全消除数据中部分不相关特征与类标签之间的虚假相关关系以及弱化不相关特征对实例加权过程的干扰,所提算法在实例加权之前先进行特征选择筛除部分不相关特征.为了进一步提高模型的泛化能力,所提算法在训练预测模型时构建两个分类器,通过最小化两个分类器的参数差异来学习一个较优的分类界面.在合成数据集和真实数据集上的实验结果表明了所提方法的有效性.  相似文献   

4.
通过对欧氏距离度量的分析,提出了自适应距离度量.首先利用训练样本建立自适应距离度量模型,该模型保证了训练样本到相同模式类的距离最近,到不同模式类的距离最远,根据该模型建立目标函数,求解目标函数,得到最优权重.基于最小距离分类器和K近邻分类器,采用UCI标准数据库中部分数据,对提出的自适应距离度量和欧氏距离度量进行了实验比较,实验结果表明自适应距离度量更有效.  相似文献   

5.
实际生活中,经常会遇到大规模数据的分类问题,传统k-近邻k-NN(k-Nearest Neighbor)分类方法需要遍历整个训练样本集,因此分类效率较低,无法处理具有大规模训练集的分类任务。针对这个问题,提出一种基于聚类的加速k-NN分类方法 C_kNN(Speeding k-NN Classification Method Based on Clustering)。该方法首先对训练样本进行聚类,得到初始聚类结果,并计算每个类的聚类中心,选择与聚类中心相似度最高的训练样本构成新的训练样本集,然后针对每个测试样本,计算新训练样本集中与其相似度最高的k个样本,并选择该k个近邻样本中最多的类别标签作为该测试样本的预测模式类别。实验结果表明,C_k-NN分类方法在保持较高分类精度的同时大幅度提高模型的分类效率。  相似文献   

6.
传统的分类器仅使用有标签的数据进行训练,然而,有标签的实例通常因昂贵、耗时而难以获得,从而造成标注瓶颈问题.半监督学习通过大量的无标签数据与有标签数据相结合来创建性能良好的分类器,从而解决标注瓶颈问题.由于半监督的学习需要较少的人工介入,而精确率又较高,因此无论在理论上还是实践上都具有意义.本文在对已有的半监督学习算法进行研究的基础上,针对有标签数据相当少时,无法使用统计方法进行标注置信度评价的情况,提出了基于kNN和SVM的二阶段协同学习,实验证实该方法是有效的.  相似文献   

7.
使用反映数据变化机制的自适应模型可以更好处理数据流问题。为实现自适应调整集成分类器使其更符合数据特性,提出一种基于动态异构集成的多标签数据流分类算法。通过使用H个不同分类算法分别训练固定大小的数据块,生成候选分类器组E={E1,…,EH},利用几何加权公式计算每个Ei中候选基分类器的权重实现组内的动态更新;提出一种新的自适应选择策略生成最终的异构集成分类器。通过在6个数据集上的大量实验验证,提出算法比现有算法在准确度、基于实例的F1值、微观F1值、宏观F1值上有更好的性能。  相似文献   

8.
多标签代价敏感分类集成学习算法   总被引:12,自引:2,他引:10  
付忠良 《自动化学报》2014,40(6):1075-1085
尽管多标签分类问题可以转换成一般多分类问题解决,但多标签代价敏感分类问题却很难转换成多类代价敏感分类问题.通过对多分类代价敏感学习算法扩展为多标签代价敏感学习算法时遇到的一些问题进行分析,提出了一种多标签代价敏感分类集成学习算法.算法的平均错分代价为误检标签代价和漏检标签代价之和,算法的流程类似于自适应提升(Adaptive boosting,AdaBoost)算法,其可以自动学习多个弱分类器来组合成强分类器,强分类器的平均错分代价将随着弱分类器增加而逐渐降低.详细分析了多标签代价敏感分类集成学习算法和多类代价敏感AdaBoost算法的区别,包括输出标签的依据和错分代价的含义.不同于通常的多类代价敏感分类问题,多标签代价敏感分类问题的错分代价要受到一定的限制,详细分析并给出了具体的限制条件.简化该算法得到了一种多标签AdaBoost算法和一种多类代价敏感AdaBoost算法.理论分析和实验结果均表明提出的多标签代价敏感分类集成学习算法是有效的,该算法能实现平均错分代价的最小化.特别地,对于不同类错分代价相差较大的多分类问题,该算法的效果明显好于已有的多类代价敏感AdaBoost算法.  相似文献   

9.
检测恶意URL对防御网络攻击有着重要意义. 针对有监督学习需要大量有标签样本这一问题, 本文采用半监督学习方式训练恶意URL检测模型, 减少了为数据打标签带来的成本开销. 在传统半监督学习协同训练(co-training)的基础上进行了算法改进, 利用专家知识与Doc2Vec两种方法预处理的数据训练两个分类器, 筛选两个分类器预测结果相同且置信度高的数据打上伪标签(pseudo-labeled)后用于分类器继续学习. 实验结果表明, 本文方法只用0.67%的有标签数据即可训练出检测精确度(precision)分别达到99.42%和95.23%的两个不同类型分类器, 与有监督学习性能相近, 比自训练与协同训练表现更优异.  相似文献   

10.
在分类器链方法中, 如何确定标签学习次序至关重要, 为此, 提出一种基于关联规则和拓扑序列的分类器链方法(TSECC). 首先结合频繁模式设计了一种基于强关联规则的标签依赖度量策略; 接下来通过标签间依赖关系构建有向无环图, 对图中所有顶点进行拓扑排序; 最后将得到的拓扑序列作为分类器链方法中标签的学习次序, 对每个标签的分类器依次迭代更新. 特别地, 为减少无标签依赖或标签依赖度较低的“孤独”标签对其余标签预测性能的影响, 将“孤独”标签排在拓扑序列之外, 利用二元关联模型训练. 在多种公共多标签数据集上的实验结果表明, TSECC能够有效提升分类性能.  相似文献   

11.
The problem of learning in pattern recognition using imperfectly labeled patterns is considered. Using a probabilistic model for the mislabeling of the training patterns, the author discusses performance of the Bayes and nearest neighbor classifiers with imperfect labels. Schemes are presented for training the classifier using both parametric and nonparametric techniques. Methods are developed for the correction of imperfect labels. To gain an understanding of the learning process, the author derives expressions for success probability as a function of training time for a one-dimensional increment error correction classifier with imperfect labels. Furthermore, feature selection with imperfectly labeled patterns is considered.  相似文献   

12.
Image semantic annotation can be viewed as a multi-class classification problem, which maps image features to semantic class labels, through the procedures of image modeling and image semantic mapping. Bayesian classifier is usually adopted for image semantic annotation which classifies image features into class labels. In order to improve the accuracy and efficiency of classifier in image annotation, we propose a combined optimization method which incorporates affinity propagation algorithm, optimizing training data algorithm, and modeling prior distribution with Gaussian mixture model to build Bayesian classifier. The experiment results illustrate that the classifier performance is improved for image semantic annotation with proposed method.  相似文献   

13.
Almost all drift detection mechanisms designed for classification problems work reactively: after receiving the complete data set (input patterns and class labels) they apply a sequence of procedures to identify some change in the class-conditional distribution – a concept drift. However, detecting changes after its occurrence can be in some situations harmful to the process under analysis. This paper proposes a proactive approach for abrupt drift detection, called DetectA (Detect Abrupt Drift). Briefly, this method is composed of three steps: (i) label the patterns from the test set (an unlabelled data block), using an unsupervised method; (ii) compute some statistics from the train and test sets, conditioned to the given class labels for train set; and (iii) compare the training and testing statistics using a multivariate hypothesis test. Based on the results of the hypothesis tests, we attempt to detect the drift on the test set, before the real labels are obtained. A procedure for creating datasets with abrupt drift has been proposed to perform a sensitivity analysis of the DetectA model. The result of the sensitivity analysis suggests that the detector is efficient and suitable for datasets of high-dimensionality, blocks with any proportion of drifts, and datasets with class imbalance. The performance of the DetectA method, with different configurations, was also evaluated on real and artificial datasets, using an MLP as a classifier. The best results were obtained using one of the detection methods, being the proactive manner a top contender regarding improving the underlying base classifier accuracy.  相似文献   

14.
Feature extraction using information-theoretic learning   总被引:3,自引:0,他引:3  
A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with any classifier, whereas the latter has an advantage since it can be used to minimize classification error directly. Certain criteria, such as minimum classification error, are better suited for simultaneous training, whereas other criteria, such as mutual information, are amenable for training the feature extractor either independently or simultaneously. Herein, an information-theoretic criterion is introduced and is evaluated for training the extractor independently of the classifier. The proposed method uses nonparametric estimation of Renyi's entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the feature extractor. The evaluations show that the proposed method, even though it uses independent training, performs at least as well as three feature extraction methods that train the extractor and classifier simultaneously.  相似文献   

15.
Fuzzy relational classifier trained by fuzzy clustering   总被引:5,自引:0,他引:5  
A novel approach to nonlinear classification is presented, in the training phase of the classifier, the training data is first clustered in an unsupervised way by fuzzy c-means or a similar algorithm. The class labels are not used in this step. Then, a fuzzy relation between the clusters and the class identifiers is computed. This approach allows the number of prototypes to be independent of the number of actual classes. For the classification of unseen patterns, the membership degrees of the feature vector in the clusters are first computed by using the distance measure of the clustering algorithm. Then, the output fuzzy set is obtained by relational composition. This fuzzy set contains the membership degrees of the pattern in the given classes. A crisp decision is obtained by defuzzification, which gives either a single class or a "reject" decision, when a unique class cannot be selected based on the available information. The principle of the proposed method is demonstrated on an artificial data set and the applicability of the method is shown on the identification of live-stock from recorded sound sequences. The obtained results are compared with two other classifiers.  相似文献   

16.
提出了一种没有训练集情况下实现对未标注类别文本文档进行分类的问题。类关联词是与类主体相关、能反映类主体的单词或短语。利用类关联词提供的先验信息,形成文档分类的先验概率,然后组合利用朴素贝叶斯分类器和EM迭代算法,在半监督学习过程中加入分类约束条件,用类关联词来监督构造一个分类器,实现了对完全未标注类别文档的分类。实验结果证明,此方法能够以较高的准确率实现没有训练集情况下的文本分类问题,在类关联词约束下的分类准确率要高于没有约束情况下的分类准确率。  相似文献   

17.
Text Classification from Labeled and Unlabeled Documents using EM   总被引:51,自引:0,他引:51  
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve classification accuracy under these conditions: (1) a weighting factor to modulate the contribution of the unlabeled data, and (2) the use of multiple mixture components per class. Experimental results, obtained using text from three different real-world tasks, show that the use of unlabeled data reduces classification error by up to 30%.  相似文献   

18.
Various measures, such as Margin and Bias/Variance, have been proposed with the aim of gaining a better understanding of why Multiple Classifier Systems (MCS) perform as well as they do. While these measures provide different perspectives for MCS analysis, it is not clear how to use them for MCS design. In this paper a different measure based on a spectral representation is proposed for two-class problems. It incorporates terms representing positive and negative correlation of pairs of training patterns with respect to class labels. Experiments employing MLP base classifiers, in which parameters are fixed but systematically varied, demonstrate the sensitivity of the proposed measure to base classifier complexity.  相似文献   

19.
In this paper, we describe three Bayesian classifiers for mineral potential mapping: (a) a naive Bayesian classifier that assumes complete conditional independence of input predictor patterns, (b) an augmented naive Bayesian classifier that recognizes and accounts for conditional dependencies amongst input predictor patterns and (c) a selective naive classifier that uses only conditionally independent predictor patterns. We also describe methods for training the classifiers, which involves determining dependencies amongst predictor patterns and estimating conditional probability of each predictor pattern given the target deposit-type. The output of a trained classifier determines the extent to which an input feature vector belongs to either the mineralized class or the barren class and can be mapped to generate a favorability map. The procedures are demonstrated by an application to base metal potential mapping in the proterozoic Aravalli Province (western India). The results indicate that although the naive Bayesian classifier performs well and shows significant tolerance for the violation of the conditional independence assumption, the augmented naive Bayesian classifier performs better and exhibits finer generalization capability. The results also indicate that the rejection of conditionally dependent predictor patterns degrades the performance of a naive classifier.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号