首页 | 本学科首页   官方微博 | 高级检索  
     

基于THUCTC的金融语料情感分析模型优化
引用本文:饶东宁,黄思宏. 基于THUCTC的金融语料情感分析模型优化[J]. 广东工业大学学报, 2018, 35(3): 37-42. DOI: 10.12052/gdutxb.180016
作者姓名:饶东宁  黄思宏
作者单位:广东工业大学 计算机学院, 广东 广州 510006
基金项目:广东省自然科学基金资助项目(2016A030313084,2016A030313700)
摘    要:近几年,情感分析技术引起人们的兴趣,在金融应用上,可以作为投资者投资前的参考.但是现有方法存在应用过于专一、数据偏差、结果过于笼统和不够精确的问题.因此本文优化一个通用的中文文本分类器,用于对在线评论数据和股票新闻数据进行情感分析.收集整理了2万条数据作为语料库,每条数据分别由3个人进行独立标注.之后对THUCTC进行优化,具体从3个方面对中文文本分类器进行优化,首先是词语切分,使用词干词典方法结合不同的分词法,实验比较后得到二分法为最好的结果;其次,为分类器选择最好的内核,发现Liblinear内核对即时性要求较高的投资人更好,另一方面Libsvm在提高准确率方面更有优势;最后在金融导向的情绪字典方面,它由Chi-square和TF-IDF方法构建,可用在普通文本分类器上.通过这种方式,本文的结果可以被推广且不会失去准确性.

关 键 词:情感分析  文本分类  股价趋势预测  中文分词  
收稿时间:2018-01-01

Model Optimization of Financial Corpus Sentiment Analysis Based on THUCTC
Rao Dong-ning,Huang Si-hong. Model Optimization of Financial Corpus Sentiment Analysis Based on THUCTC[J]. Journal of Guangdong University of Technology, 2018, 35(3): 37-42. DOI: 10.12052/gdutxb.180016
Authors:Rao Dong-ning  Huang Si-hong
Affiliation:School of Computers, Guangdong University of Technology, Guangzhou 510006, China
Abstract:Sentiment analysis has attracted interest recently. In financial applications, it can be a reference for investors. However, existing approaches are either so specific as to cause data drift or too general to be precise. Therefore, a general Chinese text classifier for online reviews and news on stocks is optimized. A corpus with 20000 items is first collected. Then, each item is labeled by three persons as ground truth. After that, the THUCTC is optimized, thus optimizing a general Chinese text classifier in three aspects. First, by tokenization, the THUCTC is modified to a 2-gram with a stemming dictionary method and got better results. Second, the best kernel is selected for classifier. The Liblinear kernel is found to be better for people pressed for time. On the other hand, the Libsvm kernel is good at promoting accuracy. Third, a finance-oriented sentiment dictionary is set based on Chi-square and TF-IDF approach. It can be used by on-the-shelf general text classifiers. In this way, the result can be generalized without the loss of preciseness.
Keywords:sentiment analysis  text categorization  stock price trend prediction  Chinese word segmentation  
点击此处可从《广东工业大学学报》浏览原始摘要信息
点击此处可从《广东工业大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号