首页 | 本学科首页   官方微博 | 高级检索  
     

中文博客主题情感句自动抽取研究
引用本文:孙宏纲,陆余良.中文博客主题情感句自动抽取研究[J].计算机工程与应用,2008,44(20):165-168.
作者姓名:孙宏纲  陆余良
作者单位:合肥电子工程学院,604实验室,合肥,230037
摘    要:博客作为一种大众化的信息及文化载体被越来越多的人所接受,博客信息的情感分析也逐渐成为了信息挖掘领域的热点。目前,在研究情感分析时,多是通过计算词汇的倾向性来完成的。由于并不是所有的带有情感色彩的词汇都是主题相关的,因此,以词为粒度的情感分析存在一定的缺陷。为了解决这一问题,试图从句子层面进行分析,主要研究了与之相关的主题情感句的自动提取问题。为了有效地提取主题相关情感句,设计了一个新颖的基于二元切分的提取算法来获取主题词,然后利用TFIDF算法获取更多的次要主题词,并利用这些主题词重组了那些包含主题词的原始句。因此,如果主题情感句存在的话,那么它一定在这些重组的主题句集合中,只要对该重组句集合进行分析、提取,便能得到主题情感句。最后,利用CRFs将主题句提取问题有效转化为了中文chunking问题,并在抽取实验中取得了很好的结果。

关 键 词:中文博客  情感分析  CRFs
收稿时间:2007-9-26
修稿时间:2007-12-21  

Study of topic sentiment sentences auto-extraction in Chinese blogs
SUN Hong-gang,LU Yu-liang.Study of topic sentiment sentences auto-extraction in Chinese blogs[J].Computer Engineering and Applications,2008,44(20):165-168.
Authors:SUN Hong-gang  LU Yu-liang
Affiliation:No.604 Lab,Hefei Electronic Engineering Institute,Hefei 230037,China
Abstract:In the field of Chinese blog sentiment analysis,previous researchers put most energy on the polarity analysis of word,but not all the word analyzed is relative with the topic,and word-level granularity for sentiment analysis is too small.We try to use sentiment sentences,a sentence-level model,for sentiment analysis.In this paper,it only focuses on topic sentiment sentences auto-extraction.In order to extracting topic sentiment sentences,it designs a novel Bi-segment method to extract the main topic words,and uses TFIDF to extract more topic words.With these words,it recombines original sentences,which contain the topic words.So as long as topic sentiment sentences exist,they must in the set of recombined sentences.Then,based on the analysis of Chinese blogs,it converts the problem of extraction into Chinese chunking by CRFs and has a good performance in extraction experiment.
Keywords:Chinese blogs  sentiment analysis  Conditional Random Fields(CRFs)
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号