首页 | 本学科首页   官方微博 | 高级检索  
     

基于上下文重构的短文本情感极性判别研究
引用本文:杨震,赖英旭,段立娟,李玉.基于上下文重构的短文本情感极性判别研究[J].自动化学报,2012,38(1):55-67.
作者姓名:杨震  赖英旭  段立娟  李玉
作者单位:1.北京工业大学计算机学院 北京 100124
基金项目:国家自然科学基金(61001178,60905017,60702031,61002029);北京市自然科学基金(4102012,4112009,4102013,4123093);北京市教育委员会科技发展计划面上项目(KM201210005024);国家软科学研究计划项目(2010GXQ5D317);北京市高等学校人才强教深化计划“中青年骨干人才培养计划”项目(PHR201108016);北京工业大学高层人才培养项目;北京工业大学校青基金资助~~
摘    要:文本对象所固有的多义性,面对短文本特征稀疏和上下文缺失的情况,现有处理方法无法明辨语义,形成了底层特征和高层表达之间巨大的语义鸿沟.本文尝试借由时间、空间、联系等要素挖掘文本间隐含的关联关系,重构文本上下文范畴,提升情感极性分类性能.具体做法对应一个两阶段处理过程:1)基于短文本的内在联系将其初步重组成上下文(领域);2)将待处理短文本归入适合的上下文(领域)进行深入处理.首先给出了基于Naive Bayes分类器的短文本情感极性分类基本框架,揭示出上下文(领域)范畴差异对分类性能的影响.接下来讨论了基于领域归属划分的文本情感极性分类增强方法,并将领域的概念扩展为上下文关系,提出了基于特殊上下文关系的文本情感极性判别方法.同时为了解决由于信息缺失所造成的上下文重组困难,给出基于遗传算法的任意上下文重组方案.理论分析表明,满足限制条件的前提下,基于上下文重构的情感极性判别方法能够同时降低抽样误差(Sample error)和近似误差(Approximation error).真实数据集上的实验结果也验证了理论分析的结论.

关 键 词:舆情分析    短文本处理    情感计算    误差分析    遗传算法
收稿时间:2011-3-28
修稿时间:2011-7-7

Short Text Sentiment Classification Based on Context Reconstruction
YANG Zhen,LAI Ying-Xu,DUAN Li-Juan,LI Yu-Jian.Short Text Sentiment Classification Based on Context Reconstruction[J].Acta Automatica Sinica,2012,38(1):55-67.
Authors:YANG Zhen  LAI Ying-Xu  DUAN Li-Juan  LI Yu-Jian
Affiliation:1. College of Computer Sciences, Beijing University of Technology, Beijing 100124
Abstract:Synonymy and polysemy present a challenge to effective natural language processing, especially in the situations of context absence and sparse feature in short texts, widened semantic gap between low-level text features representation and high-level interpretation. In this work, short texts were reorganized into special context, i.e., the implied internal relationship such as time and space, and a novel two-step scheme for semantic orientation detection based on the special context was proposed. In the first step, the short texts were reorganized into special contexts by the implied internal relationship. In the second step, the unknown short text was categorized into a special context and labeled a polarity tag using the inner semantic orientation classifier. We firstly discussed the effect of special context after a sentiment classification framework based on naive Bayes classifier was presented. Then an enhancement classification method was given using field concept, which was expanded to special context. Finally, a special context reorganizing method was proposed based on genetic algorithm. Theoretical analysis shows the proposed methods can reduce the sample error and approximation error under some constraints. The experimental results in real corpora show the effectiveness of the proposed method.
Keywords:Public opinion analysis  short text processing  sentiment classification  error analysis  genetic algorithm
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号