首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multilabel classification is a challenging research problem in which each instance may belong to more than one class. Recently, a considerable amount of research has been concerned with the development of “good” multi-label learning methods. Despite the extensive research effort, many scientific challenges posed by e.g. highly imbalanced training sets and correlation among labels remain to be addressed. The aim of this paper is to use a heterogeneous ensemble of multi-label learners to simultaneously tackle both the sample imbalance and label correlation problems. This is different from the existing work in the sense that we are proposing to combine state-of-the-art multi-label methods by ensemble techniques instead of focusing on ensemble techniques within a multi-label learner. The proposed ensemble approach (EML) is applied to six publicly available multi-label data sets from various domains including computer vision, biology and text using several evaluation criteria. We validate the advocated approach experimentally and demonstrate that it yields significant performance gains when compared with state-of-the art multi-label methods.  相似文献   

2.
基于集成学习的半监督情感分类方法研究   总被引:1,自引:0,他引:1  
情感分类旨在对文本所表达的情感色彩类别进行分类的任务。该文研究基于半监督学习的情感分类方法,即在很少规模的标注样本的基础上,借助非标注样本提高情感分类性能。为了提高半监督学习能力,该文提出了一种基于一致性标签的集成方法,用于融合两种主流的半监督情感分类方法:基于随机特征子空间的协同训练方法和标签传播方法。首先,使用这两种半监督学习方法训练出的分类器对未标注样本进行标注;其次,选取出标注一致的未标注样本;最后,使用这些挑选出的样本更新训练模型。实验结果表明,该方法能够有效降低对未标注样本的误标注率,从而获得比任一种半监督学习方法更好的分类效果。  相似文献   

3.
在训练集存在噪声标签或类别不平衡分布的情况下,深度神经网络具有过度拟合这种有偏差的训练数据的不良趋势。通过设计适当的样本权重,使用重加权策略是解决此问题的常用方法,但不适当的重加权方案会给网络学习引入额外的开销和偏差,仅使用重加权方法很难解决有偏差分布下网络的过拟合问题。为此,建议将标签平滑正则化和类裕度正则化与重加权结合使用,并提出了一种基于自适应重加权和正则化的元学习方法(ensemble meta net,EMN),模型框架包括用于分类的基本网络和用于超参数估计的集成元网。该方法首先通过基本网络获得样本损失;然后使用三个元学习器基于损失值以集成的方式估计自适应重加权和正则化的超参数;最终利用三个超参数计算最终的集成元损失更新基本网络,进而提高基本网络在有偏分布数据集上的性能。实验结果表明,EMN在CIFAR和OCTMNIST数据集上的准确率高于其他方法,并通过策略关联性分析证明了不同策略的有效性。  相似文献   

4.
Many applications are facing the problem of learning from multiple information sources, where sources may be labeled or unlabeled, and information from multiple information sources may be beneficial but cannot be integrated into a single information source for learning. In this paper, we propose an ensemble learning method for different labeled and unlabeled sources. We first present two label propagation methods to infer the labels of training objects from unlabeled sources by making a full use of class label information from labeled sources and internal structure information from unlabeled sources, which are processes referred to as global consensus and local consensus, respectively. We then predict the labels of testing objects using the ensemble learning model of multiple information sources. Experimental results show that our method outperforms two baseline methods. Meanwhile, our method is more scalable for large information sources and is more robust for labeled sources with noisy data.  相似文献   

5.
Ensemble learning has attracted considerable attention owing to its good generalization performance. The main issues in constructing a powerful ensemble include training a set of diverse and accurate base classifiers, and effectively combining them. Ensemble margin, computed as the difference of the vote numbers received by the correct class and the another class received with the most votes, is widely used to explain the success of ensemble learning. This definition of the ensemble margin does not consider the classification confidence of base classifiers. In this work, we explore the influence of the classification confidence of the base classifiers in ensemble learning and obtain some interesting conclusions. First, we extend the definition of ensemble margin based on the classification confidence of the base classifiers. Then, an optimization objective is designed to compute the weights of the base classifiers by minimizing the margin induced classification loss. Several strategies are tried to utilize the classification confidences and the weights. It is observed that weighted voting based on classification confidence is better than simple voting if all the base classifiers are used. In addition, ensemble pruning can further improve the performance of a weighted voting ensemble. We also compare the proposed fusion technique with some classical algorithms. The experimental results also show the effectiveness of weighted voting with classification confidence.  相似文献   

6.
In pattern recognition, instance-based learning (also known as nearest neighbor rule) has become increasingly popular and can yield excellent performance. In instance-based learning, however, the storage of training set rises along with the number of training instances. Moreover, in such a case, a new, unseen instance takes a long time to classify because all training instances have to be considered when determining the ‘nearness’ or ‘similarity’ among instances. This study presents a novel reduced classification method for instance-based learning based on the gray relational structure. Here, only some training instances in the original training set are adopted for the pattern classification tasks. The relationships among instances are first determined according to the gray relational structure. In the relational structure, the inward edges of each training instance, indicating how many times each instance is considered as the nearest neighbor or neighbors in determining the class labels of other instances can be obtained. This method excludes training instances with no or few inward edges for the pattern classification tasks. By using the proposed instance pruning approach, new instances can be classified with a few training instances. Nine data sets are adopted to demonstrate the performance of the proposed learning approach. Experimental results indicate that the classification accuracy can be maintained when most of the training instances are pruned before learning. Additionally, the number of remained training instances in the proposal presented here is comparable to that of other existing instance pruning techniques.  相似文献   

7.
偏标记学习(partial label learning)是人们最近提出的一种弱监督机器学习框架,由于放松了训练数据集的构造条件,只需知道训练样本的真实标记的一个候选集合就可进行学习,可以更方便地处理很多领域的实际问题.在该框架下,训练数据的标记信息不再具有单一性和明确性,这就使得学习算法的构建变得比传统分类问题更加困难,目前只建立了几种面向小规模训练数据的学习算法.先利用ECOC技术将原始偏标记训练集转换为若干标准二分类数据集,然后基于变分高斯过程模型在每个二分类数据集上构建一个具有较低计算复杂度的二分类算法,最终实现了一种面向大规模数据的快速核偏标记学习算法.仿真实验结果表明,所提算法在预测精度几乎相当的情况下,训练时间要远远少于已有的核偏标记学习算法,利用普通的PC机处理样本规模达到百万级的问题只需要40min.  相似文献   

8.
一种增量贝叶斯分类模型   总被引:40,自引:0,他引:40  
分类一直是机器学习,模型识别和数据挖掘研究的核心问题,从海量数据中学习分类知识,尤其是当获得大量的带有类别标注的样本代价较高时,增量学习是解决该问题的有效途径,该文将简单贝叶期方法应用于增量分类中,提出了一种增量贝叶斯学习模型,给出了增量贝叶斯推理过程,包括增量地修正分类器参数和增量地分类测试样本,实验结果表明,该算法是可行的和有效。  相似文献   

9.
Many real‐world problems require multilabel classification, in which each training instance is associated with a set of labels. There are many existing learning algorithms for multilabel classification; however, these algorithms assume implicit negativity, where missing labels in the training data are automatically assumed to be negative. Additionally, many of the existing algorithms do not handle incremental learning in which new labels could be encountered later in the learning process. A novel multilabel adaptation of the backpropagation algorithm is proposed that does not assume implicit negativity. In addition, this algorithm can, using a naïve Bayesian approach, infer missing labels in the training data. This algorithm can also be trained incrementally as it dynamically considers new labels. This solution is compared with existing multilabel algorithms using data sets from multiple domains, and the performance is measured with standard multilabel evaluation metrics. It is shown that our algorithm improves classification performance for all metrics by an overall average of 7.4% when at least 40% of the labels are missing from the training data and improves by 18.4% when at least 90% of the labels are missing.  相似文献   

10.
NeC4.5: neural ensemble based C4.5   总被引:5,自引:0,他引:5  
Decision tree is with good comprehensibility while neural network ensemble is with strong generalization ability. These merits are integrated into a novel decision tree algorithm NeC4.5. This algorithm trains a neural network ensemble at first. Then, the trained ensemble is employed to generate a new training set through replacing the desired class labels of the original training examples with those output from the trained ensemble. Some extra training examples are also generated from the trained ensemble and added to the new training set. Finally, a C4.5 decision tree is grown from the new training set. Since its learning results are decision trees, the comprehensibility of NeC4.5 is better than that of neural network ensemble. Moreover, experiments show that the generalization ability of NeC4.5 decision trees can be better than that of C4.5 decision trees.  相似文献   

11.
MILES: multiple-instance learning via embedded instance selection   总被引:4,自引:0,他引:4  
Multiple-instance problems arise from the situations where training class labels are attached to sets of samples (named bags), instead of individual samples within each bag (called instances). Most previous multiple-instance learning (MIL) algorithms are developed based on the assumption that a bag is positive if and only if at least one of its instances is positive. Although the assumption works well in a drug activity prediction problem, it is rather restrictive for other applications, especially those in the computer vision area. We propose a learning method, MILES (multiple-instance learning via embedded instance selection), which converts the multiple-instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity measure. This feature mapping often provides a large number of redundant or irrelevant features. Hence, 1-norm SVM is applied to select important features as well as construct classifiers simultaneously. We have performed extensive experiments. In comparison with other methods, MILES demonstrates competitive classification accuracy, high computation efficiency, and robustness to labeling uncertainty  相似文献   

12.
CCDM 2014数据挖掘竞赛基于医学诊断数据,提出了实际生活中广泛出现的多类标问题和多类分类问题。针对两个问题出现的类别不平衡现象以及训练样本较少等特点,为了更好地完成数据挖掘任务,借助二次学习和集成学习的思想,提出了一个新的学习框架--二次集成学习。该学习框架通过首次集成学习得到若干置信度较高的样本,将其加入到原始训练集,并在新的训练集上进行二次学习,进而得到泛化性能更高的分类器。竞赛结果表明,与常用的集成学习相比,二次集成学习在两个问题上均取得了非常理想的结果。  相似文献   

13.
In this paper, a novel spectral-spatial hyperspectral image classification method has been proposed by designing hierarchical subspace switch ensemble learning algorithm. First, the hyperspectral images are processed by fast bilateral filtering to get the spatial features. The spectral features and spatial features are combined to form the initial feature set. Second, Hierarchical instance learning based on iterative means clustering method is designed to obtain hierarchical instance space. Third, random subspace method (RSM) is used for sampling the features and samples, thereby forming multiple sub sample set. After that, semi-supervised learning (S2L) is applied to choose test samples for improving classification performance without touching the class labels. Then, micro noise linear dimension reduction (mNLDR) is used for dimension reduction. Afterwards, ensemble multiple kernels SVM(EMK_SVM) are used for stable classification results. Finally, final classification results are obtained by combining classification results with voting strategy. Experimental results on real hyperspectral scenes demonstrate that the proposed method can effectively improve the classification performance apparently.  相似文献   

14.
一种新的不平衡数据学习算法PCBoost   总被引:8,自引:0,他引:8  
现实世界中广泛存在不平衡数据,其分类问题是机器学习研究中的一个热点.多数传统分类算法假定类分布平衡或误分类代价均衡,在处理不平衡数据时,效果不够理想.文中提出一种不平衡数据分类算法-PCBoost.算法以信息增益率为分裂准则构建决策树,作为弱分类器.在每次迭代初始,利用数据合成方法添加合成的少数类样例,平衡训练信息;在子分类器形成后,修正“扰动”,删除未被正确分类的合成样例.文中讨论了数据合成方法,给出了训练误差界的理论分析,并分析了集成学习参数的选择.实验结果表明,PCBoost算法具有处理不平衡数据分类问题的优势.  相似文献   

15.
问题分类是问答社区系统的关键技术,分析用户提出的自然语言问题,并返回一个确切而适当的问题类别.针对网络社区中问题分类标签众多(>1 000)、有一定层次且易受时间演化影响的问题,提出了针对两种不同流动粒度的问题分类算法,运用不同时刻的数据集层次集成学习方法提高了问题分类精度和效率.同时,针对单次分类标签过多引起的特征集...  相似文献   

16.
Supervised neural-network learning algorithms have proven very successful at solving a variety of learning problems. However, they suffer from a common problem of requiring explicit output labels. This requirement makes such algorithms implausible as biological models. In this paper, it is shown that pattern classification can be achieved, in a multilayered feedforward neural network, without requiring explicit output labels, by a process of supervised self-coding. The class projection is achieved by optimizing appropriate within-class uniformity, and between-class discernability criteria. The mapping function and the class labels are developed together, iteratively using the derived self-coding backpropagation algorithm. The ability of the self-coding network to generalize on unseen data is also experimentally evaluated on real data sets, and compares favorably with the traditional labeled supervision with neural networks. However, interesting features emerge out of the proposed self-coding supervision, which are absent in conventional approaches. The further implications of supervised self-coding with neural networks are also discussed.  相似文献   

17.
Many existing inductive learning systems have been developed under the assumption that the learning tasks are performed in a noise-free environment. To cope with most real-world problems, it is important that a learning system be equipped with the capability to handle uncertainty. In this paper, we first identify the various sources of uncertainty that may be encountered in a noisy problem domain. Next, we present a method for the efficient acquisition of classification rules from training instances which may contain inconsistent, incorrect, or missing information. This algorithm consists of three phases: ( i ) the detection of inherent patterns in a set of noisy training data; ( ii ) the construction of classification rules based on these patterns; and ( iii ) the use of these rules to predict the class membership of an object. The method has been implemented in a system known as APACS (automatic pattern analysis and classification system). This system has been tested using both real-life and simulated data, and its performance is found to be superior to many existing systems in terms of efficiency and classification accuracy. Being able to handle uncertainty in the learning process, the proposed algorithm can be employed for applications in real-world problem domains involving noisy data.  相似文献   

18.
针对卷积神经网络提取特征信息不完整导致图像分类方法分类精度不高等问题,利用深度学习的方法搭建卷积神经网络模型框架,提出一种基于迭代训练和集成学习的图像分类方法。利用数据增强对图像数据集进行预处理操作,在提取图像特征时,采用一种迭代训练卷积神经网络的方式,得到充分有效的图像特征,在训练分类器时,采用机器学习中集成学习的思想。分别在特征提取后训练分类器,根据各分类器贡献的大小,赋予它们不同的权重值,取得比单个分类器更好的性能,提高图像分类的精度。该方法在Stanford Dogs、UEC FOOD-100和CIFAR-100数据集上的实验结果表明了其较好的分类性能。  相似文献   

19.
一种新型多标记懒惰学习算法   总被引:6,自引:0,他引:6  
在多标记学习框架下,每个样本由单个实例进行表示并同时对应于多个概念标记.已有的多标记懒惰学习算法并未充分考察样本多个标记之间的相关性,因此其泛化性能将会受到一定程度的不利影响.针对上述问题,提出一种新型多标记懒惰学习算法IMLLA.该算法首先找出测试样本在训练集中与各个概念类对应的近邻样本,然后基于近邻样本的多标记信息构造一个标记计数向量,并提交给已训练的线性分类器进行预测.由于IMLLA在对每个概念类进行预测时利用了蕴含于其他概念类中的信息,因而充分考察了样本多个标记之间的相关性.在人工数据集以及真实世界数据集上的实验表明,IMLLA算法的性能显著优于常用的多标记学习算法.  相似文献   

20.
针对许多多示例算法都对正包中的示例情况做出假设的问题,提出了结合模糊聚类的多示例集成算法(ISFC).结合模糊聚类和多示例学习中负包的特点,提出了"正得分"的概念,用于衡量示例标签为正的可能性,降低了多示例学习中示例标签的歧义性;考虑到多示例学习中将负示例分类错误的代价更大,设计了一种包的代表示例选择策略,选出的代表示...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号