首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于聚类的直推式学习的性能分析
引用本文:张新,何苯,罗铁坚,李东星.基于聚类的直推式学习的性能分析[J].软件学报,2014,25(12):2865-2876.
作者姓名:张新  何苯  罗铁坚  李东星
作者单位:中国科学院大学计算机与控制学院,北京,101408
基金项目:国家自然科学基金(61103131,61472391);教育部留学回国人员科研启动基金;北京市自然科学基金(4142050)
摘    要:近年来,Twitter 搜索在社交网络领域引起越来越多学者的关注。尽管排序学习可以融合 Twitter 中丰富的特征,但是训练数据的匮乏,会降低排序学习的性能。直推式学习作为一种常用的半监督学习方法,在解决训练数据的稀少性中发挥着重要的作用。由于在直推式学习的迭代过程中会生成噪音,基于聚类的直推式学习方法被提出。在基于聚类的直推式学习方法中有两个重要的参数,分别为聚类的阈值以及聚类文档的数量。在原有工作的基础上,提出使用另外一种不同的聚类算法。大量在标准TREC数据集Tweets11上的实验表明,聚类的阈值以及聚类过程中文档数量的选择都会对模型的检索性能产生影响。另外,也分析了基于聚类的直推式学习模型的鲁棒性在不同查询集上的表现。最后,引入名为簇凝聚度的质量控制因子,提出了一种基于聚类的自适应的直推式方法来实现 Twitter 检索。实验结果表明,基于聚类的自适应学习算法具有更好的鲁棒性。

关 键 词:聚类  直推学习  Twitter检索  自适应  性能
收稿时间:5/5/2014 12:00:00 AM
修稿时间:2014/8/21 0:00:00

Performance Analysis of Clustering-Based Transductive Learning
ZHANG Xin,HE Ben,LUO Tie-Jian and LI Dong-Xing.Performance Analysis of Clustering-Based Transductive Learning[J].Journal of Software,2014,25(12):2865-2876.
Authors:ZHANG Xin  HE Ben  LUO Tie-Jian and LI Dong-Xing
Institution:School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 101408, China;School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 101408, China;School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 101408, China;School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 101408, China
Abstract:Recently, Twitter search has drawn much attention of researchers in social networks. Although rich features of Twitter can be incorporated into rank learning, the retrieval effectiveness can be hurt by the lack of training data. Transductive learning, as a common semi-supervised learning method, has been playing an import role in dealing with the lacking of training data. Due to the fact that noise is generated during the iterative process of transductive learning, a clustering-based transductive method is proposed. There exist two important parameters in the clustering-based transductive approach, namely the threshold of clustering and the number of the documents that will be clustered. This paper extends the method by utilizing a different clustering algorithm. As shown by extensive experiments on the standard TREC Tweets11 collection, both of the two parameters have an effect on the retrieval effectiveness. Furthermore, the robustness of the clustering-based transduction approach on different query sets is also studied. Finally, the paper proposes an adaptive clustering-based approach by introducing a so called cluster coherence as quality controller. The experimental results show that the robustness of the proposed method is better.
Keywords:clustering  transductive learning  Twitter search  adaptive  performance
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号