首页 | 本学科首页   官方微博 | 高级检索  
     

Web论坛上的垃圾贴过滤
引用本文:林琛,汪卫.Web论坛上的垃圾贴过滤[J].计算机研究与发展,2009,46(Z2).
作者姓名:林琛  汪卫
作者单位:复旦大学计算机科学技术学院,上海,200433
基金项目:国家自然科学基金项目,国家"九七三"重点基础研究发展计划基金项目,上海市重点学科建设基金项目 
摘    要:随着网络的发展,Web论坛成为Web用户信息共享和分组合作的新平台.Web论坛上积累了海量的知识,由此成为互联网上进行数据挖掘的宝贵资源.在Web论坛上的应用常受到论坛上低质量帖子(垃圾贴)的影响.因此针对在Web论坛上进行垃圾贴过滤的问题,提出了基于隐含狄利克雷分布的CJTM和CAJTM模型,CJTM和CAJTM模型利用了论坛帖子的文本内容,帖子间的回复链接信息和作者信息,和传统的分类方法及基于规则的方法相比,CJTM和CAJTM模型不需要训练集和规则集.在实际Web论坛数据中进行的实验显示出较好的效果.

关 键 词:图模型  在线论坛  隐含狄利克雷分布  垃圾贴

Junk Post Filtering in Web Forums
Lin Chen,Wang Wei.Junk Post Filtering in Web Forums[J].Journal of Computer Research and Development,2009,46(Z2).
Authors:Lin Chen  Wang Wei
Abstract:Web forums have emerged to Web users as new platforms for information sharing and group collaboration.With the large volume of accumulated knowledge,Web forums have become valuable sources for data mining in recent years.However,the performance of those data mining applications is usually harmed by low-quality-posts in Web forums.In this paper,the problem of filtering lowquality-posts(junk posts)in Web forums is focused on.Inspired by LDA,graphic models,clustered junk topic model and clustered author junk topic model,are built for detecting junk posts.Text contents,reply linkages and author information are utilized in the presented models.In contrast to traditional approaches such as classification methods,these models require no training and no hard coded rule sets.The presented models can help to not only understand the process of generating posts but also quantitatively evaluate the quality of post contents in Web forums.Experiments conducted on a real Web forum show that this approach achieves better results compared with traditional methods.
Keywords:graphic model  Web forums  latent Dirichlet allocation  junk post
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号