首页 | 本学科首页   官方微博 | 高级检索  
     

基于词性标注序列特征提取的微博情感分类
引用本文:卢伟胜,郭躬德,陈黎飞.基于词性标注序列特征提取的微博情感分类[J].计算机应用,2014,34(10):2869-2873.
作者姓名:卢伟胜  郭躬德  陈黎飞
作者单位:福建师范大学 数学与计算机科学学院,福州 350007
基金项目:国家自然科学基金资助项目
摘    要:传统的n-gram文本特征提取方法会产生高维度的特征向量,高维数据不但增大了分类的难度,同时也会增加分类的时间。针对这一问题,提出了一种基于词性(POS)标注序列的特征提取方法,根据词性序列能够代表一类文本的这一个特点,利用词性序列组作为文本的特征以达到降低特征维度的效果。在实验中,词性序列特征提取方法比n-gram特征提取方法至少提高了9%的分类精度,降低4816个维度。实验结果表明,该方法能够适用于微博情感分类。

关 键 词:特征提取  词性  标注序列  微博情感分类  极性分类
收稿时间:2014-04-28
修稿时间:2014-06-12

Emotion classification with feature extraction based on part of speech tagging sequences in micro blog
LU Weisheng,GUO Gongde,CHEN Lifei.Emotion classification with feature extraction based on part of speech tagging sequences in micro blog[J].journal of Computer Applications,2014,34(10):2869-2873.
Authors:LU Weisheng  GUO Gongde  CHEN Lifei
Affiliation:School of Mathematics and Computer Science, Fujian Normal University, Fuzhou Fujian 350007, China
Abstract:Traditional n-gram feature extraction tends to produce a high-dimensional feature vector. High-dimensional data not only increases the difficulty of classification, but also increases the classification time. Aiming at this problem, this paper presented a feature extraction method based on Part-of-Speech (POS) tagging sequences. The principle of this method was to use POS sequences as text features to reduce feature dimension, according to the property that POS sequences can represent a kind of text.In the experiment,compared with the n-gram feature extraction, the feature extraction based on POS sequences at least improved the classification accuracy of 9% and reduced the dimension of 4816. The experimental results show that the method is suitable for emotion classification in micro blog.
Keywords:feature extraction  Part-Of-Speech (POS)  tagging sequence  microblog emotion classification  polarity classification
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号