首页 | 本学科首页   官方微博 | 高级检索  
     

基于多视图Tri-Training的微博用户性别判断
引用本文:孙启蕴.基于多视图Tri-Training的微博用户性别判断[J].计算机系统应用,2018,27(2):240-244.
作者姓名:孙启蕴
作者单位:南京烽火软件科技有限公司, 南京 210019;武汉邮电科学研究院 通信与信息专业, 武汉 430073
摘    要:互联网技术不断发展,新浪微博作为公开的网络社交平台拥有庞大的活跃用户. 然而由于用户数量庞大,且个人信息并不一定真实,造成训练样本打标困难. 本文采用了一种多视图tri-training的方法,构建三个不同的视图,利用这些视图中少量已打标样本和未打标样本不断重复互相训练三个不同的分类器,最后集成这三个分类器实现用户性别判断. 本文用真实用户数据进行实验,发现和单一视图分类器相比,使用多视图tri-training学习训练后的分类器准确性更好,且需要打标的样本更少.

关 键 词:性别判断  多视图学习  tri-training算法  数据挖掘
收稿时间:2017/5/17 0:00:00
修稿时间:2017/6/16 0:00:00

Microblog User Gender Recognition with Multi-View and Tri-Training Learning
SUN Qi-Yun.Microblog User Gender Recognition with Multi-View and Tri-Training Learning[J].Computer Systems& Applications,2018,27(2):240-244.
Authors:SUN Qi-Yun
Affiliation:FiberHome Telecommunication Technologies Co. Ltd., Nanjing 210019, China;Wuhan Research Insititute of Posts and Telecommunications, Wuhan 430073, China
Abstract:With the high pace of internet technology, microblog, an opening free social network, has an awful lot of active users. However, the number of sina microblog users is very large and the personal information is not always true, leading to the situation that it is hard to label the user''s gender. In this study, multi-view and tri-training learning method are used to solve these problems. First three different views are constructed and three different classifiers are trained with a small number of labeled samples. And then three different classifiers are trained repeatedly by unlabeled samples. Finally, we integrate three classifiers into one to judge the user gender. We use the real user data and find that the classifier using the multi-view and tri-training learning is better than the performance of the single view classifier and needs less labeled data.
Keywords:gender recognition  multi-view learning  tri-training  data mining
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号