首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
在多标记学习中,特征选择是处理数据高维问题和提升分类性能的一种有效手段,然而现有特征选择算法大多是基于标记分布大致平衡这一假设,鲜有考虑标记分布不平衡的问题。针对这一问题,本文提出了一种边缘标记弱化的多标记特征选择算法(Multi-label feature selection algorithm with weakening marginal labels,WML),计算不同标记下正负标记的频数比率作为该标记的权值,然后通过赋权方式弱化边缘标记,将标记空间信息融入到特征选择的过程中,得到一组更为高效的特征序列,提升标记对样本描述的精确性。在多个数据集上的实验结果表明,本文算法具有一定优势,通过稳定性分析和统计假设检验进一步证明本文算法的有效性和合理性。  相似文献   

2.
在多标记学习中,每个样本都由一个实例表示,并与多个类标记相关联。现有的多标记学习算法大多是在全局利用标记相关性,即假设所有的样本共享不同类别标记之间的正相关性。然而,在实际应用中,不同的样本共享不同的标记相关性,标记间不仅存在正相关性,而且存在相互排斥的现象,即负相关性。针对这一问题,提出了基于局部正、负成对标记相关性的k近邻多标记分类算法PNLC。首先,对多标记数据的特征向量进行预处理,分别为每类标记构造对该类标记最具有判别能力的属性特征;然后,在训练阶段,PNLC算法通过所有训练样本中各样本的每个k近邻的真实标记构建标记之间的正、负局部成对相关性矩阵;最后,在测试阶段,首先得到每个测试样例的k近邻及其对应的正、负成对标记关系,利用该标记关系计算最大后验概率对测试样例进行预测。实验结果表明,PNLC算法在yeast和image数据集上的分类准确率明显优于其他常用的多标记分类算法。  相似文献   

3.
偏标记数据消歧是利用偏标记数据进行机器学习的基础.针对偏标记数据中广泛存在的数据不平衡问题, 以及现有消歧算法对样本间约束信息利用不足的问题, 本文提出一种基于成对约束的偏标记数据消歧算法.首先, 基于低秩表示, 推导出数据不平衡条件下样本低秩表示系数和样本相似度之间的关系; 其次, 基于推导结果, 分别构建基于样本间正约束和负约束的图模型, 通过最小化图模型的能量函数求解偏标记数据的标签.在5个公开数据集上的实验结果表明本文方法相对基准算法在消歧准确率上平均提高了2.9 % ~ 14.9 %.  相似文献   

4.
基于小样本集弱学习规则的KNN分类算法*   总被引:2,自引:0,他引:2  
KNN及其改进算法使用类标号已知的数据集 对类标号未知的数据集 进行类别标识,如果 中的数据数量过少,将会影响最后的分类精度。基于小样本弱学习规则的KNN分类算法旨在提高基于小样本集的KNN算法的分类精度,它首先对 中的数据对象进行学习,从中选取一些数据,利用学到的标签知识对其进行类别标号,然后将其加入到 中,最后利用扩展后的 对 中的数据对象进行类别标识。通过使用标准数据集的测试发现该算法能够提高KNN的分类精度,取得了较满意的结果。  相似文献   

5.
Multi-label classification exhibits several challenges not present in the binary case. The labels may be interdependent, so that the presence of a certain label affects the probability of other labels’ presence. Thus, exploiting dependencies among the labels could be beneficial for the classifier’s predictive performance. Surprisingly, only a few of the existing algorithms address this issue directly by identifying dependent labels explicitly from the dataset. In this paper we propose new approaches for identifying and modeling existing dependencies between labels. One principal contribution of this work is a theoretical confirmation of the reduction in sample complexity that is gained from unconditional dependence. Additionally, we develop methods for identifying conditionally and unconditionally dependent label pairs; clustering them into several mutually exclusive subsets; and finally, performing multi-label classification incorporating the discovered dependencies. We compare these two notions of label dependence (conditional and unconditional) and evaluate their performance on various benchmark and artificial datasets. We also compare and analyze labels identified as dependent by each of the methods. Moreover, we define an ensemble framework for the new methods and compare it to existing ensemble methods. An empirical comparison of the new approaches to existing base-line and state-of-the-art methods on 12 various benchmark datasets demonstrates that in many cases the proposed single-classifier and ensemble methods outperform many multi-label classification algorithms. Perhaps surprisingly, we discover that the weaker notion of unconditional dependence plays the decisive role.  相似文献   

6.

The coronavirus COVID-19 pandemic is today’s major public health crisis, we have faced since the Second World War. The pandemic is spreading around the globe like a wave, and according to the World Health Organization’s recent report, the number of confirmed cases and deaths are rising rapidly. COVID-19 pandemic has created severe social, economic, and political crises, which in turn will leave long-lasting scars. One of the countermeasures against controlling coronavirus outbreak is specific, accurate, reliable, and rapid detection technique to identify infected patients. The availability and affordability of RT-PCR kits remains a major bottleneck in many countries, while handling COVID-19 outbreak effectively. Recent findings indicate that chest radiography anomalies can characterize patients with COVID-19 infection. In this study, Corona-Nidaan, a lightweight deep convolutional neural network (DCNN), is proposed to detect COVID-19, Pneumonia, and Normal cases from chest X-ray image analysis; without any human intervention. We introduce a simple minority class oversampling method for dealing with imbalanced dataset problem. The impact of transfer learning with pre-trained CNNs on chest X-ray based COVID-19 infection detection is also investigated. Experimental analysis shows that Corona-Nidaan model outperforms prior works and other pre-trained CNN based models. The model achieved 95% accuracy for three-class classification with 94% precision and recall for COVID-19 cases. While studying the performance of various pre-trained models, it is also found that VGG19 outperforms other pre-trained CNN models by achieving 93% accuracy with 87% recall and 93% precision for COVID-19 infection detection. The model is evaluated by screening the COVID-19 infected Indian Patient chest X-ray dataset with good accuracy.

  相似文献   

7.
由于标签空间过大,标签分布不平衡问题在多标签数据集中广泛存在,解决该问题在一定程度上可以提高多标签学习的分类性能。通过标签相关性提升分类性能是解决该问题的一种最常见的有效策略,众多学者进行了大量研究,然而这些研究更多地是采用基于正相关性策略提升性能。在实际问题中,除了正相关性外,标签的负相关性也可能存在,如果在考虑正相关性的同时,兼顾负相关性,无疑能够进一步改善分类器的性能。基于此,提出了一种基于负相关性增强的不平衡多标签学习算法——MLNCE,旨在解决多标签不平衡问题的同时,兼顾标签间的正负相关性,从而提高多标签分类器的分类性能。首先利用标签密度信息改造标签空间;然后在密度标签空间中探究标签真实的正反相关性信息,并添加到分类器目标函数中;最后利用加速梯度下降法求解输出权重以得到预测结果。在11个多标签标准数据集上与其他6种多标签学习算法进行对比实验,结果表明MLNCE算法可以有效提高分类精度。  相似文献   

8.
檀何凤  刘政怡 《计算机应用》2015,35(10):2761-2765
针对K近邻多标签(ML-KNN)分类算法中未考虑标签相关性的问题,提出了一种基于标签相关性的K近邻多标签分类(CML-KNN)算法。首先,计算出标签集合中每对标签间的条件概率;其次,对于即将被预测的标签,将其与已经预测的标签间的条件概率进行排序,求出最大值;最后,将最大值跟对应标签值相乘同时结合最大化后验概率(MAP)来构造多标签分类模型,对新标签进行预测。实验结果表明,所提算法在Emotions数据集上的分类性能均优于ML-KNN、AdaboostMH、RAkEL、BPMLL这4种算法;在Yeast、Enron数据集上仅在1~2个评价指标上低于ML-KNN与RAkEL算法。由实验分析可知,该算法取得了较好的分类效果。  相似文献   

9.
新型冠状病毒肺炎在全球范围迅速蔓延,为快速准确地对其诊断,进而阻断疫情传播链,提出一种基于深度学习的分类网络DLDA-A-DenseNet。首先将深层密集聚合结构与DenseNet-201结合,对不同阶段的特征信息聚合,以加强对病灶的识别及定位能力;其次提出高效多尺度长程注意力以细化聚合的特征;此外针对CT图像数据集类别不均衡问题,使用均衡抽样训练策略消除偏向性。在中国胸部CT图像调查研究会提供的数据集上测试,所提方法较原始DenseNet-201在准确率、召回率、精确率、F1分数和Kappa系数提高了2.24%、3.09%、2.09%、2.60%和3.48%;并在COVID-CISet图像数据集上测试,取得99.50%的最优准确率。结果表明,对比其他方法,提出的新冠肺炎CT图像分类方法充分提取了CT切片的病灶特征,具有更高的精度和良好的泛化性。  相似文献   

10.
提出了一种基于两阶段学习的半监督支持向量机(semi-supervised SVM)分类算法.首先使用基于图的标签传递算法给未标识样本赋予初始伪标识,并利用k近邻图将可能的噪声样本点识别出来并剔除;然后将去噪处理后的样本集视为已标识样本集输入到支持向量机(SVM)中,使得SVM在训练时能兼顾整个样本集的信息,从而提高SVM的分类准确率.实验结果证明,同其它半监督学习算法相比较,本文算法在标识的训练样本较少的情况下,分类性能有所提高且具有较高的可靠性.  相似文献   

11.
Wang  Min  Feng  Tingting  Shan  Zhaohui  Min  Fan 《Applied Intelligence》2022,52(10):11131-11146

In multi-label learning, each instance is simultaneously associated with multiple class labels. A large number of labels in an application exacerbates the problem of label scarcity. An interesting issue concerns how to query as few labels as possible while obtaining satisfactory classification accuracy. For this purpose, we propose the attribute and label distribution driven multi-label active learning (MCAL) algorithm. MCAL considers the characteristics of both attributes and labels to enable the selection of critical instances based on different measures. Representativeness is measured by the probability density function obtained by non-parametric estimation, while informativeness is measured by the bilateral softmax predicted entropy. Diversity is measured by the distance metric among instances, and richness is measured by the number of softmax predicted labels. We describe experiments performed on eight benchmark datasets and eleven real Yahoo webpage datasets. The results verify the effectiveness of MCAL and its superiority over state-of-the-art multi-label algorithms and multi-label active learning algorithms.

  相似文献   

12.
针对传统的胸部辅助诊断系统在胸部X光片疾病分类方面图像特征提取效果差、平均准确率低等问题,提出了一个注意力机制和标签相关性结合的多层次分类网络.网络的训练分为两个阶段,在阶段1为了提高网络特征提取能力,引入注意力机制并构建一个双分支特征提取网络,实现综合特征的提取,在阶段2考虑到多标签分类中标签之间相关性等问题,利用图卷积神经网络对标签相关关系进行建模,并与阶段1的特征提取结果进行结合,以实现对胸部X光片疾病的多标签分类任务.实验结果表明,本方法在ChestX-ray14数据集上各类疾病的加权平均AUC达到0.827,有助于辅助医生进行胸部疾病的诊断,有一定的临床应用价值.  相似文献   

13.
Nowadays, multi-label classification methods are of increasing interest in the areas such as text categorization, image annotation and protein function classification. Due to the correlation among the labels, traditional single-label classification methods are not directly applicable to the multi-label classification problem. This paper presents two novel multi-label classification algorithms based on the variable precision neighborhood rough sets, called multi-label classification using rough sets (MLRS) and MLRS using local correlation (MLRS-LC). The proposed algorithms consider two important factors that affect the accuracy of prediction, namely the correlation among the labels and the uncertainty that exists within the mapping between the feature space and the label space. MLRS provides a global view at the label correlation while MLRS-LC deals with the label correlation at the local level. Given a new instance, MLRS determines its location and then computes the probabilities of labels according to its location. The MLRS-LC first finds out its topic and then the probabilities of new instance belonging to each class is calculated in related topic. A series of experiments reported for seven multi-label datasets show that MLRS and MLRS-LC achieve promising performance when compared with some well-known multi-label learning algorithms.  相似文献   

14.
We address the problem of predicting category labels for unlabeled videos in a large video dataset by using a ground-truth set of objectively labeled videos that we have created. Large video databases like YouTube require that a user uploading a new video assign to it a category label from a prescribed set of labels. Such category labeling is likely to be corrupted by the subjective biases of the uploader. Despite their noisy nature, these subjective labels are frequently used as gold standard in algorithms for multimedia classification and retrieval. Our goal in this paper is NOT to propose yet another algorithm that predicts labels for unseen videos based on the subjective ground-truth. On the other hand, our goal is to demonstrate that the video classification performance can be improved if instead of using subjective labels, we first create an objectively labeled ground-truth set of videos and then train a classifier based on such a ground-truth so as to predict objective labels for the set of unlabeled videos.  相似文献   

15.
标签噪声会极大地降低深度网络模型的性能. 针对这一问题, 本文提出了一种基于对比学习的标签带噪图像分类方法. 该方法包括自适应阈值、对比学习模块和基于类原型的标签去噪模块. 首先采用对比学习最大化一幅图像的两个增强视图的相似度来提取图像鲁棒特征; 接下来通过一种新颖的自适应阈值过滤训练样本, 在模型训练过程中根据各个类别的学习情况动态调整阈值; 然后创新性地引入基于类原型的标签去噪模块, 通过计算样本特征向量与原型向量的相似度更新伪标签, 从而避免标签中噪声的影响; 在公开数据集CIFAR-10、CIFAR-100和真实数据集ANIMAL10上进行对比实验, 实验结果表明, 在人工合成噪声的条件下, 本文方法实验结果均高于常规方法, 通过计算图像鲁棒的特征向量与各个原型向量的相似度更新伪标签的方式, 降低了噪声标签的负面影响, 在一定程度上提高模型的抗噪声能力, 验证了该模型的有效性.  相似文献   

16.
17.
多标签答案聚合问题是通过融合众包收集的大量非专家标注来估计样本的真实标签,由于数字文化遗产数据具有标注成本高、样本类别多、分布不均衡等特点,给数据集多标签答案聚合问题带来了极大挑战。以往的方法主要集中在单标签任务,忽视了多标签任务的标签关联性;大部分多标签聚合方法虽然在一定程度上考虑了标签相关性,但是很敏感地受噪声和离群值的影响。为解决这些问题,提出一种基于自适应图正则化与联合低秩矩阵分解的多标签答案聚合方法AGR-JMF。首先,将标注矩阵分解成纯净标注和噪声标注两部分;对纯净标注采用自适应图正则化方法构建标签间的关联矩阵;最后,利用标注质量、标签关联性、标注人员行为属性相似性等信息指导低秩矩阵分解,以实现多标签答案的聚合。真实数据集和莫高窟壁画数据集上的实验表明,AGR-JMF相较于现有算法在聚合准确率、识别欺诈者等方面具有明显优势。  相似文献   

18.

In multi-label classification problems, every instance is associated with multiple labels at the same time. Binary classification, multi-class classification and ordinal regression problems can be seen as unique cases of multi-label classification where each instance is assigned only one label. Text classification is the main application area of multi-label classification techniques. However, relevant works are found in areas like bioinformatics, medical diagnosis, scene classification and music categorization. There are two approaches to do multi-label classification: The first is an algorithm-independent approach or problem transformation in which multi-label problem is dealt by transforming the original problem into a set of single-label problems, and the second approach is algorithm adaptation, where specific algorithms have been proposed to solve multi-label classification problem. Through our work, we not only investigate various research works that have been conducted under algorithm adaptation for multi-label classification but also perform comparative study of two proposed algorithms. The first proposed algorithm is named as fuzzy PSO-based ML-RBF, which is the hybridization of fuzzy PSO and ML-RBF. The second proposed algorithm is named as FSVD-MLRBF that hybridizes fuzzy c-means clustering along with singular value decomposition. Both the proposed algorithms are applied to real-world datasets, i.e., yeast and scene dataset. The experimental results show that both the proposed algorithms meet or beat ML-RBF and ML-KNN when applied on the test datasets.

  相似文献   

19.
Label noise can be a major problem in classification tasks, since most machine learning algorithms rely on data labels in their inductive process. Thereupon, various techniques for label noise identification have been investigated in the literature. The bias of each technique defines how suitable it is for each dataset. Besides, while some techniques identify a large number of examples as noisy and have a high false positive rate, others are very restrictive and therefore not able to identify all noisy examples. This paper investigates how label noise detection can be improved by using an ensemble of noise filtering techniques. These filters, individual and ensembles, are experimentally compared. Another concern in this paper is the computational cost of ensembles, once, for a particular dataset, an individual technique can have the same predictive performance as an ensemble. In this case the individual technique should be preferred. To deal with this situation, this study also proposes the use of meta-learning to recommend, for a new dataset, the best filter. An extensive experimental evaluation of the use of individual filters, ensemble filters and meta-learning was performed using public datasets with imputed label noise. The results show that ensembles of noise filters can improve noise filtering performance and that a recommendation system based on meta-learning can successfully recommend the best filtering technique for new datasets. A case study using a real dataset from the ecological niche modeling domain is also presented and evaluated, with the results validated by an expert.  相似文献   

20.
Zhang  Hongpo  Cheng  Ning  Zhang  Yang  Li  Zhanbo 《Applied Intelligence》2021,51(7):4503-4514

Label flipping attack is a poisoning attack that flips the labels of training samples to reduce the classification performance of the model. Robustness is used to measure the applicability of machine learning algorithms to adversarial attack. Naive Bayes (NB) algorithm is a anti-noise and robust machine learning technique. It shows good robustness when dealing with issues such as document classification and spam filtering. Here we propose two novel label flipping attacks to evaluate the robustness of NB under label noise. For the three datasets of Spambase, TREC 2006c and TREC 2007 in the spam classification domain, our attack goal is to increase the false negative rate of NB under the influence of label noise without affecting normal mail classification. Our evaluation shows that at a noise level of 20%, the false negative rate of Spambase and TREC 2006c has increased by about 20%, and the test error of the TREC 2007 dataset has increased to nearly 30%. We compared the classification accuracy of five classic machine learning algorithms (random forest(RF), support vector machine(SVM), decision tree(DT), logistic regression(LR), and NB) and two deep learning models(AlexNet, LeNet) under the proposed label flipping attacks. The experimental results show that two label noises are suitable for various classification models and effectively reduce the accuracy of the models.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号