首页 | 本学科首页   官方微博 | 高级检索  
     

基于Bi-tagged特征的维吾尔文情感分类方法研究
引用本文:热西旦木·吐尔洪太,吾守尔·斯拉木. 基于Bi-tagged特征的维吾尔文情感分类方法研究[J]. 中文信息学报, 2018, 32(8): 80-90
作者姓名:热西旦木·吐尔洪太  吾守尔·斯拉木
作者单位:1.新疆大学 信息科学与工程学院,新疆 乌鲁木齐 830046;
2.伊犁师范学院 电子与信息工程学院,新疆 伊宁 835000
基金项目:国家“973”重点基础研究计划(2014CB340506);国家自然科学基金(61363063,61662076)
摘    要:现有的维吾尔文文本情感分类方法以从空格分词中得到的unigram特征作为文本表示,因而无法挖掘与情感表达相关的深层语言现象。该文从维吾尔文词汇之间的顺序依赖关系入手,总结若干个词性组合规则,提取能够表达丰富情感信息的Bi-tagged特征,并基于支持向量机(SVM)分类器对维吾尔文情感语料库进行了正负情感分类。实验结果表明,在维吾尔文文本情感分类中: (1)当包含该文提出的各项词性规则时,Bi-tagged特征的性能最优;(2)Bi-tagged特征不仅能够提取情感丰富的信息,而且可以提取否定信息;(3)与常用的unigram、bigram特征以及unigram和bigram的组合特征在该文数据集上的分类效果相比,该文所提取的Bi-tagged与unigram的组合特征分类效果更佳,比该文的Baseline的分类准确率提高了4.225%。该研究成果不但可以进一步提高维吾尔文文本情感分类效率,也可为哈萨克语、柯尔克孜语等亲属语言的情感分类提供借鉴。

关 键 词:情感分类  Bi-tagged特征  组合特征  维吾尔文  

Uyghur Text Sentiment Classification Based on Bi-tagged Features
Raxida Turhuntay,Wushour Slamu. Uyghur Text Sentiment Classification Based on Bi-tagged Features[J]. Journal of Chinese Information Processing, 2018, 32(8): 80-90
Authors:Raxida Turhuntay  Wushour Slamu
Affiliation:1.College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China;
2.College of Electronic and Information Engineering, Yili Normal University, Yili, Xinjiang 835000, China
Abstract:The current Uyghur text sentiment classification method uses the unigram features obtained from space segmentation as a text representation, and is not able to mine the deep language phenomena related to emotional expressions. This paper, based on the word order dependence of Uyghur language, summarized several rules, extracted Bi-tagged features that can express rich emotional information, and classified Uyghur sentiment corpora with a support vector machine (SVM) classifier. Results indicated that, in the Uyghur text sentiment classification: (1) the Bi-tagged features performed optimal results when it contained all parts of speech rules presented in this paper; (2) the Bi-tagged features are able to extract rich emotional information and negative information as well; (3) in comparison to the results of unigram, bigram features and their combined features on the datasets in this paper, the combination of Bi-tagged and unigram features have lead to improved performances. Accordingly, the classification accuracy is 4.225% higher than that of the baseline accuracy used in this paper. Our results, therefore, further improved the classification efficiency of Uyghur text sentiment classification. In addition, the methods presented in this paper can also be applied as a reference for the sentiment classification of other closely related languages such as Kazakh and Kirgiz.
Keywords:sentiment classification    Bi-tagged features    combined features    Uyghur  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号