期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Asymmetry label correlation for multi-label learning

Bao Jiachao Wang Yibin Cheng Yusheng 《Applied Intelligence》2022,52(6):6093-6105

As an effective method for mining latent information between labels, label correlation is widely adopted by many scholars to model multi-label learning algorithms. Most existing multi-label algorithms usually ignore that the correlation between labels may be asymmetric while asymmetry correlation commonly exists in the real-world scenario. To tackle this problem, a multi-label learning algorithm with asymmetry label correlation (ACML, Asymmetry Label Correlation for Multi-Label Learning) is proposed in this paper. First, measure the adjacency between labels to construct the label adjacency matrix. Then, cosine similarity is utilized to construct the label correlation matrix. Finally, we constrain the label correlation matrix with the label adjacency matrix. Thus, asymmetry label correlation is modeled for multi-label learning. Experiments on multiple multi-label benchmark datasets show that the ACML algorithm has certain advantages over other comparison algorithms. The results of statistical hypothesis testing further illustrate the effectiveness of the proposed algorithm.

相似文献

2.

Incorporating label dependency into the binary relevance framework for multi-label classification

Everton Alvares-Cherman Jean Metz Maria Carolina Monard 《Expert systems with applications》2012,39(2):1647-1655

In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. The binary relevance approach is one of these methods, where the multi-label learning task is decomposed into several independent binary classification problems, one for each label in the set of labels, and the final labels for each example are determined by aggregating the predictions from all binary classifiers. However, this approach fails to consider any dependency among the labels. Aiming to accurately predict label combinations, in this paper we propose a simple approach that enables the binary classifiers to discover existing label dependency by themselves. An experimental study using decision trees, a kernel method as well as Naïve Bayes as base-learning techniques shows the potential of the proposed approach to improve the multi-label classification performance. 相似文献

3.

Scalable and efficient multi-label classification for evolving data streams

Jesse Read Albert Bifet Geoff Holmes Bernhard Pfahringer 《Machine Learning》2012,88(1-2):243-272

Many challenging real world problems involve multi-label data streams. Efficient methods exist for multi-label classification in non-streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as classifiers must be able to deal with huge numbers of examples and to adapt to change using limited time and memory while being ready to predict at any point. This paper proposes a new experimental framework for learning and evaluating on multi-label data streams, and uses it to study the performance of various methods. From this study, we develop a multi-label Hoeffding tree with multi-label classifiers at the leaves. We show empirically that this method is well suited to this challenging task. Using our new framework, which allows us to generate realistic multi-label data streams with concept drift (as well as real data), we compare with a selection of baseline methods, as well as new learning methods from the literature, and show that our Hoeffding tree method achieves fast and more accurate performance. 相似文献

4.

An Ordinal Multi-Dimensional Classification (OMDC) for Predictive Maintenance

Pelin Yildirim Taser 《计算机系统科学与工程》2023,44(2):1499-1516

Predictive Maintenance is a type of condition-based maintenance that assesses the equipment's states and estimates its failure probability and when maintenance should be performed. Although machine learning techniques have been frequently implemented in this area, the existing studies disregard to the natural order between the target attribute values of the historical sensor data. Thus, these methods cause losing the inherent order of the data that positively affects the prediction performances. To deal with this problem, a novel approach, named Ordinal Multi-dimensional Classification (OMDC), is proposed for estimating the conditions of a hydraulic system's four components by taking into the natural order of class values. To demonstrate the prediction ability of the proposed approach, eleven different multi-dimensional classification algorithms (traditional Binary Relevance (BR), Classifier Chain (CC), Bayesian Classifier Chain (BCC), Monte Carlo Classifier Chain (MCC), Probabilistic Classifier Chain (PCC), Classifier Dependency Network (CDN), Classifier Trellis (CT), Classifier Dependency Trellis (CDT), Label Powerset (LP), Pruned Sets (PS), and Random k-Labelsets (RAKEL)) were implemented using the Ordinal Class Classifier (OCC) algorithm. Besides, seven different classification algorithms (Multilayer Perceptron (MLP), Support Vector Machine (SVM), k-Nearest Neighbour (kNN), Decision Tree (C4.5), Bagging, Random Forest (RF), and Adaptive Boosting (AdaBoost)) were chosen as base learners for the OCC algorithm. The experimental results present that the proposed OMDC approach using binary relevance multi-dimensional classification methods predicts the conditions of a hydraulic system's multiple components with high accuracy. Also, it is clearly seen from the results that the OMDC models that utilize ensemble-based classification algorithms give more reliable prediction performances with an average Hamming score of 0.853 than the others that use traditional algorithms as base learners. 相似文献

5.

An extensive experimental comparison of methods for multi-label learning

Gjorgji Madjarov Dragi Kocev Dejan Gjorgjevikj Sašo Džeroski 《Pattern recognition》2012,45(9):3084-3104

Multi-label learning has received significant attention in the research community over the past few years: this has resulted in the development of a variety of multi-label learning methods. In this paper, we present an extensive experimental comparison of 12 multi-label learning methods using 16 evaluation measures over 11 benchmark datasets. We selected the competing methods based on their previous usage by the community, the representation of different groups of methods and the variety of basic underlying machine learning methods. Similarly, we selected the evaluation measures to be able to assess the behavior of the methods from a variety of view-points. In order to make conclusions independent from the application domain, we use 11 datasets from different domains. Furthermore, we compare the methods by their efficiency in terms of time needed to learn a classifier and time needed to produce a prediction for an unseen example. We analyze the results from the experiments using Friedman and Nemenyi tests for assessing the statistical significance of differences in performance. The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi-label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC). Furthermore, RF-PCT exhibited the best performance according to all measures for multi-label ranking. The recommendation from this study is that when new methods for multi-label learning are proposed, they should be compared to RF-PCT and HOMER using multiple evaluation measures. 相似文献

6.

基于改进的RAKEL算法的心电图诊断分类

赵静韩京宇钱龙毛毅《计算机应用》2022,42(6):1892-1897

心电图（ECG）数据通常包含多种病症,而ECG诊断是一个典型的多标签分类问题。在多标签分类方法中,RAKEL算法将标签集随机分解为若干个大小为k的子集,并建立LP分类器进行训练;然而由于没有充分考虑标签间的相关性,LP分类器中容易产生一些标签组合所对应样本稀少的情况,从而影响预测性能。为了充分考虑标签间的相关性,提出一种基于贝叶斯网络的RAKEL算法BN-RAKEL。首先利用贝叶斯网络找到标签间的相关性,确定候选标签子集;然后对每个标签采用基于信息增益的特征选择算法确定其最优特征空间,并针对每个候选标签子集利用最优特征空间相似性来检测其相关程度,以确定最终的具有强相关性的标签子集;最后在标签子集的最优特征空间上训练LP分类器。在实际的ECG数据集上,与多标签K近邻（ML-KNN）、RAKEL、CC和基于FP-Growth的RAKEL算法FI-RAKEL进行对比,结果显示所提算法在召回率和F-score上最少提高了3.6个百分点和2.3个百分点。实验结果表明,BN-RAKEL算法有较好的预测性能,能有效提升ECG诊断的准确性。相似文献

7.

页岩气储层预测的多标签主动学习算法

汪敏冯婷婷闵帆唐洪明闫建平廖纪佳《计算机应用》2022,42(2):646-654

针对页岩气储层数据获取困难、标签稀缺、标注成本高昂的问题,提出一种多标准主动查询的多标签学习(MAML)算法.首先,考虑样本的信息性和代表性来对样本进行初步处理;其次,加入包括属性差异性和标签丰富性的样本丰富性约束,在此基础上选择有价值的样本进行标签查询;最后,利用多标签学习算法来预测剩余样本的标签.通过11个Yaho... 相似文献

8.

多标签代价敏感分类集成学习算法 总被引：12，自引：2，他引：10

付忠良《自动化学报》2014,40(6):1075-1085

尽管多标签分类问题可以转换成一般多分类问题解决,但多标签代价敏感分类问题却很难转换成多类代价敏感分类问题.通过对多分类代价敏感学习算法扩展为多标签代价敏感学习算法时遇到的一些问题进行分析,提出了一种多标签代价敏感分类集成学习算法.算法的平均错分代价为误检标签代价和漏检标签代价之和,算法的流程类似于自适应提升（Adaptive boosting,AdaBoost）算法,其可以自动学习多个弱分类器来组合成强分类器,强分类器的平均错分代价将随着弱分类器增加而逐渐降低.详细分析了多标签代价敏感分类集成学习算法和多类代价敏感AdaBoost算法的区别,包括输出标签的依据和错分代价的含义.不同于通常的多类代价敏感分类问题,多标签代价敏感分类问题的错分代价要受到一定的限制,详细分析并给出了具体的限制条件.简化该算法得到了一种多标签AdaBoost算法和一种多类代价敏感AdaBoost算法.理论分析和实验结果均表明提出的多标签代价敏感分类集成学习算法是有效的,该算法能实现平均错分代价的最小化.特别地,对于不同类错分代价相差较大的多分类问题,该算法的效果明显好于已有的多类代价敏感AdaBoost算法. 相似文献

9.

基于浮动阈值分类器组合的多标签分类算法 总被引：1，自引：0，他引：1

张丹普付忠良王莉莉李昕《计算机应用》2015,35(1):147-151

针对目标可以同时属于多个类别的多标签分类问题,提出了一种基于浮动阈值分类器组合的多标签分类算法.首先,分析探讨了基于浮动阈值分类器的AdaBoost算法(AdaBoost.FT)的原理及错误率估计,证明了该算法能克服固定分段阈值分类器对分类边界附近点分类不稳定的缺点从而提高分类准确率;然后,采用二分类(BR)方法将该单标签学习算法应用于多标签分类问题,得到基于浮动阈值分类器组合的多标签分类方法,即多标签AdaBoost.FT.实验结果表明,所提算法的平均分类精度在Emotions数据集上比AdaBoost.MH、ML-kNN、RankSVM这3种算法分别提高约4%、8%、11%;在Scene、Yeast数据集上仅比RankSVM低约3%、1%.由实验分析可知,在不同类别标记之间基本没有关联关系或标签数目较少的数据集上,该算法均能得到较好的分类效果. 相似文献

10.

一种启发式多标记分类器选择与排序策略

李哲王志海何颖婧付彬《中文信息学报》2013,27(4):119-127

在多标记分类问题当中,多标记分类器的目的是为实例预测一个与其关联的标记集合。典型方法之一是将多标记分类问题转化为多个二类分类问题,这些二类分类器之间可以存在一定的关系。简单地考虑标记间依赖关系可以在一定程度上改善分类性能,但同时计算复杂度也是必须考虑的问题。该文提出了一种利用多标记间依赖关系的有序分类器集合算法,该算法通过启发式的搜索策略寻找分类器之间的某种次序,这种次序可以更好地反映标记间的依赖关系。在实验中,该文选取了来自不同领域的数据集和多个评价指标,实验结果表明该文所提出的算法比一般多标记分类算法具有更好的分类性能。相似文献

11.

Diversity measures for multiple classifier system analysis and design

《Information Fusion》2005,6(1):21-36

In the context of Multiple Classifier Systems, diversity among base classifiers is known to be a necessary condition for improvement in ensemble performance. In this paper the ability of several pair-wise diversity measures to predict generalisation error is compared. A new pair-wise measure, which is computed between pairs of patterns rather than pairs of classifiers, is also proposed for two-class problems. It is shown experimentally that the proposed measure is well correlated with base classifier test error as base classifier complexity is systematically varied. However, correlation with unity-weighted sum and vote is shown to be weaker, demonstrating the difficulty in choosing base classifier complexity for optimal fusion. An alternative strategy based on weighted combination is also investigated and shown to be less sensitive to number of training epochs. 相似文献

12.

基于半监督学习的多示例多标记E-MIMLSVM+算法

下载免费PDF全文

李村合朱红波《计算机工程与应用》2018,54(2):149-154

多示例多标记是一种新的机器学习框架,在该框架下一个对象用多个示例来表示,同时与多个类别标记相关联。MIMLSVM+算法将多示例多标记问题转化为一系列独立的二类分类问题,但是在退化过程中标记之间的联系信息会丢失,而E-MIMLSVM+算法则通过引入多任务学习技术对MIMLSVM+算法进行了改进。为了充分利用未标记样本来提高分类准确率,使用半监督支持向量机TSVM对E-MIMLSVM+算法进行了改进。通过实验将该算法与其他多示例多标记算法进行了比较,实验结果显示,改进算法取得了良好的分类效果。相似文献

13.

基于非负矩阵分解与稀疏表示的多标签分类算法

包永春张建臣杜守信张军军《计算机应用》2022,42(5):1375-1382

传统的多标签分类算法是以二值标签预测为基础的,而二值标签由于仅能指示数据是否具有相关类别,所含语义信息较少,无法充分表示标签语义信息。为充分挖掘标签空间的语义信息,提出了一种基于非负矩阵分解和稀疏表示的多标签分类算法（MLNS）。该算法结合非负矩阵分解与稀疏表示技术,将数据的二值标签转化为实值标签,从而丰富标签语义信息并提升分类效果。首先,对标签空间进行非负矩阵分解以获得标签潜在语义空间,并将标签潜在语义空间与原始特征空间结合以形成新的特征空间;然后,对此特征空间进行稀疏编码来获得样本间的全局相似关系;最后,利用该相似关系重构二值标签向量,从而实现二值标签与实值标签的转化。在5个标准多标签数据集和5个评价指标上将所提算法与MLBGM、ML²、LIFT和MLRWKNN等算法进行对比。实验结果表明,所提MLNS在多标签分类中优于对比的多标签分类算法,在50%的案例中排名第一,在76%的案例中排名前二,在全部的案例中排名前三。相似文献

14.

基于随机子空间的多标签类属特征提取算法

张晶李裕李培培《计算机应用研究》2019,36(2)

目前多标签学习已广泛应用到很多场景中,在此类学习问题中,一个样本往往可以同时拥有多个类别标签。由于类别标签可能带有的特有属性（即类属属性）将更有助于标签分类,所以已经出现了一些基于类属属性的多标签学习算法。针对类属属性构造会导致属性空间存在冗余的问题,本文提出了一种多标签类属特征提取算法LIFT_RSM。该方法基于类属属性空间通过综合利用随机子空间模型及成对约束降维思想提取有效的特征信息,以达到提升分类性能的目的。在多个数据集上的实验结果表明：与若干经典的多标签算法相比,提出的LIFT_RSM算法能得到更好的分类效果。相似文献

15.

一种基于成对标签的Rakel算法改进

周恩波叶荣华张微微周子涵《计算机与现代化》2016,(3):16

Rakel(Random k-labelsets)算法从原始标签集中随机选择一部分标签子集,并且使用LP(Label Powerset)算法训练相应的多标签子分类器。由于随机选择标签的原因,导致LP子分类器预测性能不好。本文基于标签的共现关系选择成对标签来训练LP分类器,提出PwRakel(Pairwise Random k-labelsets)算法。该算法通过挖掘标签相关性扩展训练集,有效提高分类性能。实验结果表明,所提出的算法与Rakel算法以及其他算法对比,分类准确度更高。相似文献

16.

多标签文本分类研究进展

下载免费PDF全文

郝超裘杭萍孙毅张超然《计算机工程与应用》2021,57(10):48-56

文本分类作为自然语言处理中一个基本任务,在20世纪50年代就已经对其算法进行了研究,现在单标签文本分类算法已经趋向成熟,但是对于多标签文本分类的研究还有很大的提升空间。介绍了多标签文本分类的基本概念以及基本流程,包括数据集获取、文本预处理、模型训练和预测结果。介绍了多标签文本分类的方法。这些方法主要分为两大类：传统机器学习方法和基于深度学习的方法。传统机器学习方法主要包括问题转换方法和算法自适应方法。基于深度学习的方法是利用各种神经网络模型来处理多标签文本分类问题,根据模型结构,将其分为基于CNN结构、基于RNN结构和基于Transfomer结构的多标签文本分类方法。对多标签文本分类常用的数据集进行了梳理总结。对未来的发展趋势进行了分析与展望。相似文献

17.

Feature selection for multi-label naive Bayes classification 总被引：4，自引：0，他引：4

Min-Ling Zhang José M. Peña Victor Robles 《Information Sciences》2009,179(19):3218-3229

In multi-label learning, the training set is made up of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances. In this paper, this learning problem is addressed by using a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances. Feature selection mechanisms are incorporated into Mlnb to improve its performance. Firstly, feature extraction techniques based on principal component analysis are applied to remove irrelevant and redundant features. After that, feature subset selection techniques based on genetic algorithms are used to choose the most appropriate subset of features for prediction. Experiments on synthetic and real-world data show that Mlnb achieves comparable performance to other well-established multi-label learning algorithms. 相似文献

18.

基于三层集成多标记学习的蛋白质多亚细胞定位预测

乔善平闫宝强《计算机应用》2016,36(8):2150-2156

针对多标记学习和集成学习在解决蛋白质多亚细胞定位预测问题上应用还不成熟的状况,研究基于集成多标记学习的蛋白质多亚细胞定位预测方法。首先,从多标记学习和集成学习相结合的角度提出了一种三层的集成多标记学习系统框架结构,该框架将学习算法和分类器进行了层次性分类,并把二分类学习、多分类学习、多标记学习和集成学习进行有效整合,形成一个通用型的三层集成多标记学习模型;其次,基于面向对象技术和统一建模语言（UML）对系统模型进行了设计,使系统具备良好的可扩展性,通过扩展手段增强系统的功能和提高系统的性能;最后,使用Java编程技术对模型进行扩展,实现了一个学习系统软件,并成功应用于蛋白质多亚细胞定位预测问题上。通过在革兰氏阳性细菌数据集上进行测试,验证了系统功能的可操作性和较好的预测性能,该系统可以作为解决蛋白质多亚细胞定位预测问题的一个有效工具。相似文献

19.

基于指数损失间隔的多标记特征选择算法

李雨婷《计算机技术与发展》2020,(4):46-51

在多标记学习的任务中,多标记学习的每个样本可被多个标签标记,比单标记学习的应用空间更广关注度更高,多标记学习可以利用关联性提高算法的性能。在多标记学习中,传统特征选择算法已不再适用,一方面,传统的特征选择算法可被用于单标记的评估标准。多标记学习使得多个标记被同时优化;而且在多标记学习中关联信息存在于不同标记间。因此,可设计一种能够处理多标记问题的特征选择算法,使标记之间的关联信息能够被提取和利用。通过设计最优的目标损失函数,提出了基于指数损失间隔的多标记特征选择算法。该算法可以通过样本相似性的方法,将特征空间和标记空间的信息融合在一起,独立于特定的分类算法或转换策略。优于其他特征选择算法的分类性能。在现实世界的数据集上验证了所提算法的正确性以及较好的性能。相似文献

20.

二次集成学习在医疗数据挖掘中的应用

魏秀参慕鑫杨杨《计算机与生活》2014,(9):1113-1119

CCDM 2014数据挖掘竞赛基于医学诊断数据,提出了实际生活中广泛出现的多类标问题和多类分类问题。针对两个问题出现的类别不平衡现象以及训练样本较少等特点,为了更好地完成数据挖掘任务,借助二次学习和集成学习的思想,提出了一个新的学习框架--二次集成学习。该学习框架通过首次集成学习得到若干置信度较高的样本,将其加入到原始训练集,并在新的训练集上进行二次学习,进而得到泛化性能更高的分类器。竞赛结果表明,与常用的集成学习相比,二次集成学习在两个问题上均取得了非常理想的结果。相似文献