基于LDA模型的博客垃圾评论发现 LDA-Based Opinion Spam Discovering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于LDA模型的博客垃圾评论发现

引用本文：	刁宇峰,杨亮,林鸿飞.基于LDA模型的博客垃圾评论发现[J].中文信息学报,2011,25(1):41-48.

作者姓名：	刁宇峰杨亮林鸿飞

作者单位：	大连理工大学信息检索研究室,辽宁大连 116024

基金项目：	国家自然科学基金资助项目(60673039,60973068); 国家社科基金资助项目(08BTQ025); 国家863高科技计划资助项目(2006AA01Z151); 教育部留学回国人员科研启动基金; 高等学校博士学科点专项科研基金资助项目(20090041110002)

摘要：	Blog(博客)作为一种新兴的网络媒体,在很大程度上增强了互联网的开放性,Blog已经成为互联网上的主要信息源之一,这也使得Blog空间中的垃圾评论成倍增长,因此如何识别垃圾评论成为面临的重要问题。该文首先借鉴处理垃圾邮件的方法,针对Blog本身的特点,使用规则初步过滤垃圾评论,然后对剩余评论,利用Latent Dirichlet Allocation(LDA) 这种能够提取文本隐含主题的产生式模型,对博客中的博文进行主题提取,并结合主题信息进行判断,从而识别Blog空间的垃圾评论。通过实验验证,该方法可以发现大多数垃圾评论,实验取得了较好的结果,使Blog信息更加准确、有效的为用户使用。
关键词：	Blog 博文 LDA 主题垃圾评论
LDA-Based Opinion Spam Discovering

DIAO Yufeng,YANG Liang,LIN Hongfei.LDA-Based Opinion Spam Discovering[J].Journal of Chinese Information Processing,2011,25(1):41-48.

Authors:	DIAO Yufeng YANG Liang LIN Hongfei

Affiliation:	Information Retrieval Laboratory, Dalian University of Technology, Dalian, Liaoning 116024, China

Abstract:	As well-known,Blog has become one of the main information sources on the Internet,and the opinion spam also grows fantastically in Blog.The paper focuses on identifying the opinion spam.Firstly,it adopts the method of email spam identification.Considering the characteristics of Blog,it establishes the rules of comments to filter the opinion spam,and then it utilizes the Latent Dirichlet Allocation Model(LDA) to extract the topics information from text content in Blog.Finally,with the topics information inte...

Keywords:	Blog Blog content LDA topic opinion spam
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏