首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Some recent successful semi-supervised learning methods construct more than one learner from both labeled and unlabeled data for inductive learning. This paper proposes a novel multiple-view multiple-learner (MVML) framework for semi-supervised learning, which differs from previous methods in possession of both multiple views and multiple learners. This method adopts a co-training styled learning paradigm in enlarging labeled data from a much larger set of unlabeled data. To the best of our knowledge it is the first attempt to combine the advantages of multiple-view learning and ensemble learning for semi-supervised learning. The use of multiple views is promising to promote performance compared with single-view learning because information is more effectively exploited. At the same time, as an ensemble of classifiers is learned from each view, predictions with higher accuracies can be obtained than solely adopting one classifier from the same view. Experiments on different applications involving both multiple-view and single-view data sets show encouraging results of the proposed MVML method.  相似文献   

2.
Image annotation is posed as multi-class classification problem. Pursuing higher accuracy is a permanent but not stale challenge in the field of image annotation. To further improve the accuracy of image annotation, we propose a multi-view multi-label (abbreviated by MVML) learning algorithm, in which we take multiple feature (i.e., view) and ensemble learning into account simultaneously. By doing so, we make full use of the complementarity among the views and the base learners of ensemble learning, leading to higher accuracy of image annotation. With respect to the different distribution of positive and negative training examples, we propose two versions of MVML: the Boosting and Bagging versions of MVML. The former is suitable for learning over balanced examples while the latter applies to the opposite scenario. Besides, the weights of base learner is evaluated on validation data instead of training data, which will improve the generalization ability of the final ensemble classifiers. The experimental results have shown that the MVML is superior to the ensemble SVM of single view.  相似文献   

3.
Multi-view learning for classification has achieved a remarkable performance compared with the single-view based methods. Inspired by the instance based learning which directly regards the instance as the prior and well preserves the valuable information in different instances, a Multi-view Instance Attention Fusion Network (MvIAFN) is proposed to efficiently exploit the correlation across both instances and views. Specifically, a small number of instances from different views are first sampled as the set of templates. Given an additional instance and based on the similarities between it and the selected templates, it can be re-presented by following an attention strategy. Thanks for this strategy, the given instance is capable of preserving the additional information from the selected instances, achieving the purpose of extracting the instance-correlation. Additionally, for each sample, we not only perform the instance attention in each single view but also get the attention across multiple views, allowing us to further fuse them to obtain the fused attention for each view. Experimental results on datasets substantiate the effectiveness of our proposed method compared with state-of-the-arts.  相似文献   

4.
李延超  肖甫  陈志  李博 《软件学报》2020,31(12):3808-3822
主动学习从大量无标记样本中挑选样本交给专家标记.现有的批抽样主动学习算法主要受3个限制:(1)一些主动学习方法基于单选择准则或对数据、模型设定假设,这类方法很难找到既有不确定性又有代表性的未标记样本;(2)现有批抽样主动学习方法的性能很大程度上依赖于样本之间相似性度量的准确性,例如预定义函数或差异性衡量;(3)噪声标签问题一直影响批抽样主动学习算法的性能.提出一种基于深度学习批抽样的主动学习方法.通过深度神经网络生成标记和未标记样本的学习表示和采用标签循环模式,使得标记样本与未标记样本建立联系,再回到相同标签的标记样本.这样同时考虑了样本的不确定性和代表性,并且算法对噪声标签具有鲁棒性.在提出的批抽样主动学习方法中,算法使用的子模块函数确保选择的样本集合具有多样性.此外,自适应参数的优化,使得主动学习算法可以自动平衡样本的不确定性和代表性.将提出的主动学习方法应用到半监督分类和半监督聚类中,实验结果表明,所提出的主动学习方法的性能优于现有的一些先进的方法.  相似文献   

5.
一种基于聚类的PU主动文本分类方法   总被引:1,自引:0,他引:1  
刘露  彭涛  左万利  戴耀康 《软件学报》2013,24(11):2571-2583
文本分类是信息检索的关键问题之一.提取更多的可信反例和构造准确高效的分类器是PU(positive andunlabeled)文本分类的两个重要问题.然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高.分别针对这两个重要步骤提供了一种基于聚类的半监督主动分类方法.与传统的反例提取方法不同,利用聚类技术和正例文档应与反例文档共享尽可能少的特征项这一特点,从未标识数据集中尽可能多地移除正例,从而可以获得更多的可信反例.结合SVM 主动学习和改进的Rocchio 构建分类器,并采用改进的TFIDF(term frequency inverse document frequency)进行特征提取,可以显著提高分类的准确度.分别在3 个不同的数据集中测试了分类结果(RCV1,Reuters-21578,20 Newsgoups).实验结果表明,基于聚类寻找可信反例可以在保持较低错误率的情况下获取更多的可信反例,而且主动学习方法的引入也显著提升了分类精度.  相似文献   

6.
We address the problem of metric learning for multi-view data. Many metric learning algorithms have been proposed, most of them focus just on single view circumstances, and only a few deal with multi-view data. In this paper, motivated by the co-training framework, we propose an algorithm-independent framework, named co-metric, to learn Mahalanobis metrics in multi-view settings. In its implementation, an off-the-shelf single-view metric learning algorithm is used to learn metrics in individual views of a few labeled examples. Then the most confidently-labeled examples chosen from the unlabeled set are used to guide the metric learning in the next loop. This procedure is repeated until some stop criteria are met. The framework can accommodate most existing metric learning algorithms whether types-of-side-information or example-labels are used. In addition it can naturally deal with semi-supervised circumstances under more than two views. Our comparative experiments demonstrate its competiveness and effectiveness.  相似文献   

7.
In classification tasks, active learning is often used to select out a set of informative examples from a big unlabeled dataset. The objective is to learn a classification pattern that can accurately predict labels of new examples by using the selection result which is expected to contain as few examples as possible. The selection of informative examples also reduces the manual effort for labeling, data complexity, and data redundancy, thus improves learning efficiency. In this paper, a new active learning strategy with pool-based settings, called inconsistency-based active learning, is proposed. This strategy is built up under the guidance of two classical works: (1) the learning philosophy of query-by-committee (QBC) algorithm; and (2) the structure of the traditional concept learning model: from-general-to-specific (GS) ordering. By constructing two extreme hypotheses of the current version space, the strategy evaluates unlabeled examples by a new sample selection criterion as inconsistency value, and the whole learning process could be implemented without any additional knowledge. Besides, since active learning is favorably applied to support vector machine (SVM) and its related applications, the strategy is further restricted to a specific algorithm called inconsistency-based active learning for SVM (I-ALSVM). By building up a GS structure, the sample selection process in our strategy is formed by searching through the initial version space. We compare the proposed I-ALSVM with several other pool-based methods for SVM on selected datasets. The experimental result shows that, in terms of generalization capability, our model exhibits good feasibility and competitiveness.  相似文献   

8.
Many data mining applications have a large amount of data but labeling data is often di cult, expensive, or time consuming, as it requires human experts for annotation.Semi-supervised learning addresses this problem by using unlabeled data together with labeled data to improve the performance. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by two or more redundantly su cient sets of features (views) and additionally these views are independent given the class. However, these assumptions are not satis ed in many real-world application domains. In this paper, a framework called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classi ers is used for semi-supervised learning that requires neither redundant and independent views nor di erent base learning algorithms. The framework is a general single-view semi-supervised learner that can be applied on any ensemble learner to build diverse committees. Experimental results of CoBC using Bagging, AdaBoost and the Random Subspace Method (RSM) as ensemble learners demonstrate that error diversity among classi ers leads to an e ective Co-Training style algorithm that maintains the diversity of the underlying ensemble.  相似文献   

9.
针对集成学习方法中分类器差异性不足以及已标记样本少的问题,提出了一种新的半监督集成学习算法,将半监督方法引入到集成学习中,利用大量未标记样本的信息来细化每个基分类器,并且构造差异性更大的基分类器,首先通过多视图方法选取合适的未标记样本,并使用多视图方法将大量繁杂的特征属性分类,使用不同的特征降维方法对不同的视图进行降维,便与输入到学习模型中,同时采用相互独立的学习模型来增加集成的多样性。在UCI数据集上的实验结果表明,与使用单视图数据相比,使用多视图数据可以实现更准确的分类,并且与现有的诸如Boosting、三重训练算法比较,使用差异性更高的基学习器以及引入半监督方法能够有效提升集成学习的性能效果。  相似文献   

10.
Using a style-based ant colony system for adaptive learning   总被引:1,自引:0,他引:1  
Adaptive learning provides an alternative to the traditional “one size fits all” approach and has driven the development of teaching and learning towards a dynamic learning process for learning. Therefore, exploring the adaptive paths to suit learners personalized needs is an interesting issue. This paper proposes an extended approach of ant colony optimization, which is based on a recent metaheuristic method for discovering group patterns that is designed to help learners advance their on-line learning along an adaptive learning path. The investigation emphasizes the relationship of learning content to the learning style of each participant in adaptive learning. An adaptive learning rule was developed to identify how learners of different learning styles may associate those contents which have the higher probability of being useful to form an optimal learning path. A style-based ant colony system is implemented and its algorithm parameters are optimized to conform to the actual pedagogical process. A survey was also conducted to evaluate the validity and efficiency of the system in producing adaptive paths to different learners. The results reveal that both the learners and the lecturers agree that the style-based ant colony system is able to provide useful supplementary learning paths.  相似文献   

11.
传统的单视角方法对来自不同场景不同形式的多视角样本难以获得较好的分类性能,因此多视角学习成为近年来的热门研究课题并被广泛研究.在多视角学习中,可能存在这样一种特殊现象,即来自不同视角相同类的样本间的差异比来自同一视角不同类的样本间的差异大,这给多视角学习带来很大挑战,并导致多视角学习效果变差.鉴于此,首先利用Parzen窗技术构建共享隐空间,并将共享隐空间联合原始空间得到扩展空间,进行多视角学习,能够很好应对上述特殊现象;然后利用支持向量机(SVM),提出一种新型的多视角学习方法,即基于共享隐空间的多视角SVM;最后通过在人工和真实的多视角数据集上的实验验证了所提方法在应对上述挑战时具有很好的实验效果.  相似文献   

12.
Predictive Maintenance is a type of condition-based maintenance that assesses the equipment's states and estimates its failure probability and when maintenance should be performed. Although machine learning techniques have been frequently implemented in this area, the existing studies disregard to the natural order between the target attribute values of the historical sensor data. Thus, these methods cause losing the inherent order of the data that positively affects the prediction performances. To deal with this problem, a novel approach, named Ordinal Multi-dimensional Classification (OMDC), is proposed for estimating the conditions of a hydraulic system's four components by taking into the natural order of class values. To demonstrate the prediction ability of the proposed approach, eleven different multi-dimensional classification algorithms (traditional Binary Relevance (BR), Classifier Chain (CC), Bayesian Classifier Chain (BCC), Monte Carlo Classifier Chain (MCC), Probabilistic Classifier Chain (PCC), Classifier Dependency Network (CDN), Classifier Trellis (CT), Classifier Dependency Trellis (CDT), Label Powerset (LP), Pruned Sets (PS), and Random k-Labelsets (RAKEL)) were implemented using the Ordinal Class Classifier (OCC) algorithm. Besides, seven different classification algorithms (Multilayer Perceptron (MLP), Support Vector Machine (SVM), k-Nearest Neighbour (kNN), Decision Tree (C4.5), Bagging, Random Forest (RF), and Adaptive Boosting (AdaBoost)) were chosen as base learners for the OCC algorithm. The experimental results present that the proposed OMDC approach using binary relevance multi-dimensional classification methods predicts the conditions of a hydraulic system's multiple components with high accuracy. Also, it is clearly seen from the results that the OMDC models that utilize ensemble-based classification algorithms give more reliable prediction performances with an average Hamming score of 0.853 than the others that use traditional algorithms as base learners.  相似文献   

13.
Palmprint recognition has been widely used in security authentication. However, most of the existing palmprint representation methods are focused on a special application scenario using the hand-crafted features from a single-view. If the features become weak as the application scenario changes, the recognition performance will be degraded. To address this problem, we propose to comprehensively exploit palmprint features from multiple views to improve the recognition performance in generic scenarios. In this paper, a novel double-cohesion learning based multiview and discriminant palmprint recognition (DC_MDPR) method is proposed, which imposes a double-cohesion strategy to reduce the inter-view margins for each subject and the intra-class margins for each view. In this way, for each subject, the features from different views can be closer to each other in the binary-label space. Meanwhile, for each view, the features sharing the same label information can move towards each other by imposing a neighbor graph regularization. The proposed method can be flexibly applied to any type of palmprint feature fusion. Moreover, it presents the multiview features in a low-dimensionality sub-space, effectively reducing the computational complexity. Experimental results on various palmprint databases have shown that the proposed method can always achieve the best recognition performance compared to other state-of-the-art algorithms.  相似文献   

14.
陈锦禾  沈洁 《微机发展》2010,(2):110-113
针对小规模训练样本不足以支持学习器对含有大量潜在不确定因素的未标样本集分类的问题,提出了一种基于信息熵的主动学习方法,引入信息熵的离散事件概率估计理论,通过对未标文档熵值的计算,结合二阶段学习策略,主动学习利用现有知识,结合实验样本环境,主动地选取最有可能的解决问题的样本并标注它们的类别,获得新的参数,重新训练分类器,选择最有利分类器性能的样本,迭代直到未标样本集为空。实验结果表明,该方法取得了较好的分类效果。  相似文献   

15.
Problem-based learning is a goal directed and constructive process for learners. When meeting problems, learners usually force themselves to form work groups in order to find a solution. Currently, blogs are becoming more popular and in fact has formed a community wherein people can share their learning experiences with others. Many pedagogical applications have adopted what are posted in the community for supplementary learning. Integrating blogs in an intelligent tutoring system means that learners can better regulate and enhance their own learning. In this study, a novel learning device, a blog-based dynamic learning map, which employs both information retrieval and automated scheduling techniques, is designed to provide useful blog articles to help learning. The relevant articles in blogs are used to promote learner engagement in their interactions with the learning map and hence achieve their goals more easily. An experimental course has been implemented and the results show that learners make use of the blog-based learning aid in a very positive way and can eventually cross the specified threshold in a test. The proposed approach can encapsulate the dynamic learning principles in cohesive and supportive ways. Thus it can lead learners to gain useful supplementary materials, shorten the learning time and offering expanded alternative viewpoints to use in the solution of assigned problems. Our results show that both the learners and lectures are very positive to the design of our blog-based dynamic learning map.  相似文献   

16.
Much emphasis has been placed on the research on applying digital games in science education. Among the studies, the advantages and limitations of role-playing simulation games deserve further exploration. However, existing analyses of the behavioral patterns of role-playing simulation games in science education remain substantially lacking, particularly the integration of diverse behavioral pattern analysis methods. This study thus seeks to analyze the videotaped learning process of 86 college students in game-based learning activities that utilize a role-playing simulation game. This study used the integrated method of sequential analysis and cluster analysis and explored the learners’ flow state and learning behavioral patterns. The results show that the use of integrated behavioral pattern analysis helps to explore the traits and limitations of role-playing simulation games in science education as well as learners’ reflective behavior patterns.This study identifies a wide variety of learning behavior patterns from three potential clusters of learners and then discusses the learning process of each cluster. The different levels of flow experienced by the learners affected their learning behavior patterns; learners with higher levels of flow demonstrated a more in-depth reflective process. The study further discusses the results of these analyses and makes relevant recommendations for the systems development of the games, its educational applications, and evaluation methods.  相似文献   

17.
主动学习通过主动选择要学习的样例进行标注,从而有效地降低学习算法的样本复杂度。针对当前主动学习算法普遍采用的平分版本空间策略,本文提出过半缩减版本空间的策略,这种策略避免了平分版本空间策略所要求的较强假设。基于过半缩减版本空间的策略,本文实现了一种选取具有最大可能性被误分类的样例作为训练样例的启发式主动动学习算法(CBMPMS)。该算法计算版本空间中随机抽取的假设组成的委员会和当前学习器对样例预测的类概率差异的熵,以此作为选择样例的标准。针对UCI数据集的实验表明,该算法能够在大多数数据集上取得比相关研究更好的性能。  相似文献   

18.
目的 大数据环境下的多视角聚类是一个非常有价值且极具挑战性的问题。现有的适合大规模多视角数据聚类的方法虽然在一定程度上能够克服由于目标函数非凸性导致的局部最小值,但是缺乏对异常点鲁棒性的考虑,且在样本选择过程中忽略了视角多样性。针对以上问题,提出一种基于自步学习的鲁棒多样性多视角聚类模型(RD-MSPL)。方法 1)通过在目标函数中引入结构稀疏范数L2,1来建模异常点;2)通过在自步正则项中对样本权值矩阵施加反结构稀疏约束来增加在多个视角下所选择样本的多样性。结果 在Extended Yale B、Notting-Hill、COIL-20和Scene15公开数据集上的实验结果表明:1)在4个数据集上,所提出的RD-MSPL均优于现有的2个最相关多视角聚类方法。与鲁棒多视角聚类方法(RMKMC)相比,聚类准确率分别提升4.9%,4.8%,3.3%和1.3%;与MSPL相比,准确率分别提升7.9%,4.2%,7.1%和6.5%。2)通过自对比实验,证实了所提模型考虑鲁棒性和样本多样性的有效性;3)与单视角以及多个视角简单拼接的实验对比表明,RD-MSPL能够更有效地探索视角之间关联关系。结论 本文提出一种基于自步学习的鲁棒多样性多视角聚类模型,并针对该模型设计了一种高效求解算法。所提方法能够有效克服异常点对聚类性能的影响,在聚类过程中逐步加入不同视角下的多样性样本,在避免局部最小值的同时,能更好地获取不同视角的互补信息。实验结果表明,本文方法优于现有的相关方法。  相似文献   

19.
在构建了学习者多维特征模型的基础上,设计了基于模糊C均值的在线协作学习混合分组算法。提取学习者多维特征分量,通过模糊C均值算法以学习风格、知识水平、学习目标和兴趣爱好为主要特征进行同质聚类,根据活跃度和性别特征进行异质聚类以实现混合性质分组。该算法将异质和同质分组相结合,既保证了学习风格、知识水平、学习目标和兴趣爱好具有相似性的学习者划分到同一组,同时考虑到了活跃度和性别差异对学习效果的影响,使得小组划分更加合理。实验表明,该算法优于传统分组方法,学习者的学习效果和学习满意度都有较大提升。  相似文献   

20.
在面向大数据问题的应用领域中,由于现实世界的多样性和复杂性,经常会遇到大规模的多类别数据挖掘问题,传统的多分类方法一方面存在着超平面不平衡更新的问题,另一方面学习效率较低,对于复杂的多类别数据无法进行高效分类。针对这个问题,本文提出了一种改进的动态主动多分类(Dynamical active multiple classification, DYA)方法,该方法通过将死锁、激活等概念引入到主动多分类过程,在主动多分类过程中随着分类器的不断更新,动态地控制样本是否参与主动学习的过程;同时,采用分位计数、轮换学习方式的主动多分类方法,使得多类别的分类器能够得到平衡的学习和更新。实验结果表明,本文提出的动态主动多分类方法有效提高了模型的学习效率和泛化性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号