首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征分布的半监督分类
引用本文:文翰,肖南峰.基于特征分布的半监督分类[J].北京工业大学学报,2012,38(1):75-80.
作者姓名:文翰  肖南峰
作者单位:1. 华南理工大学计算机科学与工程学院,广州510006/佛山科学技术学院理学院,广东佛山528000
2. 华南理工大学计算机科学与工程学院,广州,510006
基金项目:国家自然科学基金与中国民用航空总局联合资助项目,广东省自然科学基金重点资助项目
摘    要:为了避免倾向于高频词的信息增益(information gain,IG)方法忽略各类别间的相似性特点,提出了一种基于特征分布的选择方法对IG进行修正,使真正拥有高类别区分信息的特征项被保留.同时,对最大期望值(expectation maximization,EM)算法的效率低下问题加以改进,将拥有较高后验类别概率的未标注文档逐步从未标注文档集转至已标注文档集,有效减少算法迭代次数.测试结果表明,基于特征分布的半监督学习方法在Reuter-21578和Epinion.com两个不同特点的数据集上都取得了较好的分类效果和性能.

关 键 词:半监督分类  特征分布  类相似性

Semi-supervised Classification Using Feature Distribution
WEN Han,XIAO Nan-feng.Semi-supervised Classification Using Feature Distribution[J].Journal of Beijing Polytechnic University,2012,38(1):75-80.
Authors:WEN Han  XIAO Nan-feng
Affiliation:1(1.School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,China; 2.School of Science,Foshan University,Foshan 528000,Guangdong,China)
Abstract:It is crucial for semi-supervised learning(SSL) to cut down the dimension of the feature space through feature selection.The popular information gain(IG) selection method,which inclines to high frequency words,always ignores similarity of classes.Thus,the classification performance of characteristics IG is unstable.This paper puts forward a feature distribution selection to help IG retain features possessing high categories discriminative information.To solve the inherent efficiency problem of the expectation maximization(EM) algorithm,unlabeled documents that possess maximum posterior category probability are transferred from unlabeled collection to labeled collection.The iteration number of the improved EM is obviously reduced.Finally,experimental evaluation on Reuter-21578 and Epinion.com with two different data sets shows that the semi-supervised learning method using feature distribution obtains very effective performance for micro average F1 criterion.
Keywords:semi-supervised classification  feature distribution  similarities of classes
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号