首页 | 本学科首页   官方微博 | 高级检索  
     

基于Tri—Training算法的数据编辑技术
引用本文:张雁,林英,吕丹桔. 基于Tri—Training算法的数据编辑技术[J]. 计算机与数字工程, 2013, 41(10): 1583-1585
作者姓名:张雁  林英  吕丹桔
作者单位:1. 西南林业大学计算机与信息学院 昆明650224
2. 云南大学软件学院 昆明650091
基金项目:云南省教育厅科研基金项目
摘    要:Tri-Training是一种半监督学习算法,在少量标记数据下,通过三个不同的分类器,从未标记样本中采样并标记新的训练数据,作为各分类器训练数据的有效补充。但由于错误标记样本的存在,引入了噪音数据,降低了分类的性能。论文在Tri—Training算法中分别采用DE-KNN,DE-BKNN和DE-NED三种数据编辑技术,识别移除误标记的数据。通过对六组UCI数据集的实验,分析结果表明,编辑技术的引入是有效的,三种方法的使用在一定程度上提升了Tri-Training算法的分类性能,尤其是DE-NED方法更为显著。

关 键 词:半监督学习  Tri—Training算法  数据编辑

Tri-Training Algorithm with Data Editing
ZHANG Yan , LIN Ying , LV Danjv. Tri-Training Algorithm with Data Editing[J]. Computer and Digital Engineering, 2013, 41(10): 1583-1585
Authors:ZHANG Yan    LIN Ying    LV Danjv
Affiliation:ZHANG Yah, LIN Ying, LV Danjv (1. School of Computer and Information, Southwest Forestry University, Kunming 650224) (2. School of Software, Yunnan University, Kunming 650091)
Abstract:Tri-Training is a semi-supervised learning algorithm in which three learners keep on labeling unlabeled examples and retrain- ing themselves on an enlarged training set. Since the Tri-training process may erroneously label some unlabeled examples, introduce the noise data and degrade the performance of classification. This paper utilizes the data editing methods including the DE-KNN, DE-BKNN and DE- NED to identify and remove the mislabeled examples from the labeled data based on Tri-Training algorithm. Some experiments are carried out on the six UCI data sets. The results of experiments show that the introduction of data editing is beneficial, and the learned hypotheses of da- ta editing combination with Tri-Training outperform those learned by the standard Tri-training algorithm. Especially, the DE-NED method is better than others.
Keywords:semi-supervised learning  Tri-Training  data editing
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号