首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
An empirical study of sentiment analysis for chinese documents   总被引:1,自引:0,他引:1  
Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naïve Bayes and SVM) are investigated on a Chinese sentiment corpus with a size of 1021 documents. The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification. Furthermore, we found that sentiment classifiers are severely dependent on domains or topics.  相似文献   

2.
The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations. In this paper, we hypothesize that these two important properties can play a major role in Chinese sentiment analysis. In particular, we propose two effective features to encode phonetic information and, hence, fuse it with textual information. With this hypothesis, we propose Disambiguate Intonation for Sentiment Analysis (DISA), a network that we develop based on the principles of reinforcement learning. DISA disambiguates intonations for each Chinese character (pinyin) and, hence, learns precise phonetic representations. We also fuse phonetic features with textual and visual features to further improve performance. Experimental results on five different Chinese sentiment analysis datasets show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations and surpasses the state-of-the-art Chinese character-level representations.  相似文献   

3.
Term weighting is a strategy that assigns weights to terms to improve the performance of sentiment analysis and other text mining tasks. In this paper, we propose a supervised term weighting scheme based on two basic factors: Importance of a term in a document (ITD) and importance of a term for expressing sentiment (ITS), to improve the performance of analysis. For ITD, we explore three definitions based on term frequency. Then, seven statistical functions are employed to learn the ITS of each term from training documents with category labels. Compared with the previous unsupervised term weighting schemes originated from information retrieval, our scheme can make full use of the available labeling information to assign appropriate weights to terms. We have experimentally evaluated the proposed method against the state-of-the-art method. The experimental results show that our method outperforms the method and produce the best accuracy on two of three data sets.  相似文献   

4.
This paper presents our research on automatic annotation of a five-billion-word corpus of Japanese blogs with information on affect and sentiment. We first perform a study in emotion blog corpora to discover that there has been no large scale emotion corpus available for the Japanese language. We choose the largest blog corpus for the language and annotate it with the use of two systems for affect analysis: ML-Ask for word- and sentence-level affect analysis and CAO for detailed analysis of emoticons. The annotated information includes affective features like sentence subjectivity (emotive/non-emotive) or emotion classes (joy, sadness, etc.), useful in affect analysis. The annotations are also generalized on a two-dimensional model of affect to obtain information on sentence valence (positive/negative), useful in sentiment analysis. The annotations are evaluated in several ways. Firstly, on a test set of a thousand sentences extracted randomly and evaluated by over forty respondents. Secondly, the statistics of annotations are compared to other existing emotion blog corpora. Finally, the corpus is applied in several tasks, such as generation of emotion object ontology or retrieval of emotional and moral consequences of actions.  相似文献   

5.
为了解决用户评论文本中的产品特征-观点对的提取及情感分析问题,本文利用组块分析提取产品特征,从中寻找到频繁项集,并用PMI对候选产品特征进行过滤,得到产品的特征集合;利用特征与情感词在位置上的邻接关系,提取情感词并组成特征-观点对,通过SO-PMI方法进行情感倾向分析。为验证该方法的有效性,以酒店评论文本为例,从中提取酒店的特征-观点对并进行情感分析,准确率为76.68%,召回率为70.84%。实验结果表明引入组块分析可以有效的解决商品评论的细粒度情感分类问题。  相似文献   

6.
针对经典C4.5决策树算法存在过度拟合和伸缩性差的问题,提出了一种基于Bagging的决策树改进算法,并基于MapReduce模型对改进算法进行了并行化。首先,基于Bagging技术对C4.5算法进行了改进,通过有放回采样得到多个与初始训练集大小相等的新训练集,并在每个训练集上进行训练,得到多个分类器,再根据多数投票规则集成训练结果得到最终的分类器;然后,基于MapReduce模型对改进算法进行了并行化,能够并行化处理训练集、并行选择最佳分割属性和最佳分割点,以及并行生成子节点,实现了基于MapReduce Job工作流的并行决策树改进算法,提高了对大数据集的分析能力。实验结果表明,并行Bagging决策树改进算法具有较高的准确度与敏感度,以及较好的伸缩性和加速比。  相似文献   

7.
基于监督学习深度自编码器的图像重构   总被引:1,自引:0,他引:1  
张赛  芮挺  任桐炜  杨成松  邹军华 《计算机科学》2018,45(11):267-271, 297
针对数字图像受损信息的重构问题,提出一种将经典无监督学习自编码器(Auto-Encoder,AE)用于监督学习的新方法,并对深度模型结构与训练策略进行了研究。通过设计多组监督学习单层AE模型,提出了逐组“递进学习”和“关联编码”的学习策略,构建了一个新的基于监督学习的深度AE模型结构;对于新模型结构,采用多对一(一个输入样本的多种形式对应一个输出)的训练方法代替经典AE中一对一(一个输入样本对应一个输出)的训练方法。将该模型的结构和训练策略用于部分数据受损或遮挡的图像中进行数据重构测试,提高了模型对受损数据特征编码的表达能力和重构能力。实验结果表明,提出的新方法对于受损及遮挡样本的图像具有良好的重构效果和适应性。  相似文献   

8.
The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption stems from the fact that words considered to be “close” in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage. We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performed competitively against SOTA supervised sentiment analysis methods, mainly in scenarios with scarce information. Finally, in a cross-dataset comparison, our approach turned out to be as competitive as (i.e., statistically tie with) state-of-the-art supervised solutions such as pre-trained transformers (BERT), even without relying on any training (labeled) data. Indeed in small datasets or in datasets with scarce information (short messages), our solution outperformed the supervised ones by large margins.  相似文献   

9.
Recent years have witnessed a surge of interest in computational methods for affect, ranging from opinion mining, to subjectivity detection, to sentiment and emotion analysis. This article presents a brief overview of the latest trends in the field and describes the manner in which the articles contained in the special issue contribute to the advancement of the area. Finally, we comment on the current challenges and envisaged developments of the subjectivity and sentiment analysis fields, as well as their application to other Natural Language Processing tasks and related domains.  相似文献   

10.
Due to the advancement of technology and globalization, it has become much easier for people around the world to express their opinions through social media platforms. Harvesting opinions through sentiment analysis from people with different backgrounds and from different cultures via social media platforms can help modern organizations, including corporations and governments understand customers, make decisions, and develop strategies. However, multiple languages posted on many social media platforms make it difficult to perform a sentiment analysis with acceptable levels of accuracy and consistency. In this paper, we propose a bilingual approach to conducting sentiment analysis on both Chinese and English social media to obtain more objective and consistent opinions. Instead of processing English and Chinese comments separately, our approach treats review comments as a stream of text containing both Chinese and English words. That stream of text is then segmented by our segment model and trimmed by the stop word lists which include both Chinese and English words. The stem words are then processed into feature vectors and then applied with two exchangeable natural language models, SVM and N-Gram. Finally, we perform a case study, applying our proposed approach to analyzing movie reviews obtained from social media. Our experiment shows that our proposed approach has a high level of accuracy and is more effective than the existing learning-based approaches.  相似文献   

11.
IT vendors routinely use social media such as YouTube not only to disseminate their IT product information, but also to acquire customer input efficiently as part of their market research strategies. Customer responses that appear in social media, however, are typically unstructured; thus, a fairly large data set is needed for meaningful analysis. Although identifying customers’ value structures and attitudes may be useful for developing targeted or niche markets, the unstructured and volume-heavy nature of customer data prohibits efficient and economical extraction of such information. Automatic extraction of customer information would be valuable in determining value structure and strength. This paper proposes an intelligent method of estimating causality between user profiles, value structures, and attitudes based on the replies and published content managed by open social network systems such as YouTube. To show the feasibility of the idea proposed in this paper, information richness and agility are used as underlying concepts to create performance measures based on media/information richness theory. The resulting deep sentiment analysis proves to be superior to legacy sentiment analysis tools for estimation of causality among the focal parameters.  相似文献   

12.
13.
Trimmed bagging   总被引:1,自引:0,他引:1  
Bagging has been found to be successful in increasing the predictive performance of unstable classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then averages over all obtained classification rules. The idea of trimmed bagging is to exclude the bootstrapped classification rules that yield the highest error rates, as estimated by the out-of-bag error rate, and to aggregate over the remaining ones. In this note we explore the potential benefits of trimmed bagging. On the basis of numerical experiments, we conclude that trimmed bagging performs comparably to standard bagging when applied to unstable classifiers as decision trees, but yields better results when applied to more stable base classifiers, like support vector machines.  相似文献   

14.
Sentiment analysis techniques are increasingly used to grasp reactions from social media users to unexpected and potentially stressful social events. This paper argues that, alongside assessments of the affective valence of social media content as negative or positive, there is a need for a deeper understanding of the context in which reactions are expressed and the specific functions that users' emotional states may reflect. To demonstrate this, we present a qualitative analysis of affective expressions on Twitter collected in Germany during the 2011 EHEC food contamination incident based on a coding scheme developed from Skinner et al.'s (2003) coping classification framework. Affective expressions of coping were found to be diverse not only in terms of valence but also in the adaptive functions they served: beyond the positive or negative tone, some people perceived the outbreak as a threat while others as a challenge to cope with. We discuss how this qualitative sentiment analysis can allow a better understanding of the way the overall situation is perceived – threat or challenge – and the resources that individuals experience having to cope with emerging demands.  相似文献   

15.
This work presents a novel application of Sentiment Analysis in Recommender Systems by categorizing users according to the average polarity of their comments. These categories are used as attributes in Collaborative Filtering algorithms. To test this solution a new corpus of opinions on movies obtained from the Internet Movie Database (IMDb) has been generated, so both ratings and comments are available. The experiments stress the informative value of comments. By applying Sentiment Analysis approaches some Collaborative Filtering algorithms can be improved in rating prediction tasks. The results indicate that we obtain a more reliable prediction considering only the opinion text (RMSE of 1.868), than when apply similarities over the entire user community (RMSE of 2.134) and sentiment analysis can be advantageous to recommender systems.  相似文献   

16.
随着自然语言处理科学的迅猛发展,情感分析作为其重要的一个分支广泛应用于社交网络平台上,尤其是微博由于其传播广泛且蕴含丰富的情感信息而备受学者青睐。为解析微博中表达的情感信息以及深入挖掘其蕴含的潜在感情,本文在降噪自动编码器的深度模型之上研究探索改进了这个深度学习模型。降噪自动编码器的工作特点是在引入噪声的干扰之下实现对原始输入的还原,而其改进模型的优势在于考虑到了噪声的多样性和复杂性,并通过深度学习训练加强模型的原始特征复原能力,以此来克服不可预判的原始输入噪声。后文中通过分别使用SVM、降噪自动编码器模型以及改进的模型做情感分析实验,对比分类效果而得出改进的深度模型对微博文字情感把握更准确而且抗干扰能力及鲁棒性有所提升。  相似文献   

17.
Several methods have been proposed for microarray data analysis that enables to identify groups of genes with similar expression profiles only under a subset of examples. We propose to improve the performance of these biclustering methods by adapting the approach of bagging to biclustering problems. The principle consists in generating a set of biclusters and aggregating the results. Our method has been tested with success on both synthetic and real datasets.  相似文献   

18.
With the widespread usage of social networks, forums and blogs, customer reviews emerged as a critical factor for the customers’ purchase decisions. Since the beginning of 2000s, researchers started to focus on these reviews to automatically categorize them into polarity levels such as positive, negative, and neutral. This research problem is known as sentiment classification. The objective of this study is to investigate the potential benefit of multiple classifier systems concept on Turkish sentiment classification problem and propose a novel classification technique. Vote algorithm has been used in conjunction with three classifiers, namely Naive Bayes, Support Vector Machine (SVM), and Bagging. Parameters of the SVM have been optimized when it was used as an individual classifier. Experimental results showed that multiple classifier systems increase the performance of individual classifiers on Turkish sentiment classification datasets and meta classifiers contribute to the power of these multiple classifier systems. The proposed approach achieved better performance than Naive Bayes, which was reported the best individual classifier for these datasets, and Support Vector Machines. Multiple classifier systems (MCS) is a good approach for sentiment classification, and parameter optimization of individual classifiers must be taken into account while developing MCS-based prediction systems.  相似文献   

19.
Many tasks related to sentiment analysis rely on sentiment lexicons, lexical resources containing information about the emotional implications of words (e.g., sentiment orientation of words, positive or negative). In this work, we present an automatic method for building lemma-level sentiment lexicons, which has been applied to obtain lexicons for English, Spanish and other three official languages in Spain. Our lexicons are multi-layered, allowing applications to trade off between the amount of available words and the accuracy of the estimations. Our evaluations show high accuracy values in all cases. As a previous step to the lemma-level lexicons, we have built a synset-level lexicon for English similar to SentiWordNet 3.0, one of the most used sentiment lexicons nowadays. We have made several improvements in the original SentiWordNet 3.0 building method, reflecting significantly better estimations of positivity and negativity, according to our evaluations. The resource containing all the lexicons, ML-SentiCon, is publicly available.  相似文献   

20.
传统的属性级别情感分析方法缺乏对属性实体与前后文之间交互关系的研究,导致情感分类结果的正确率不高。为了有效提取文本特征,提出了一种利用多头注意力机制学习属性实体与前后文之间关系的属性级别情感分析模型(intra&inter multi-head attention network, IIMAN),从而提高情感极性判断结果。该模型首先利用BERT预训练完成输入语句的词向量化;通过注意力网络中的内部多头注意力与联合多头注意力学习属性实体与前后文以及前后文内部间的关系;最后通过逐点卷积变换层、面向属性实体的注意力层和输出层完成情感极性分类。通过在三个公开的属性级别情感分析数据集Twitter、laptop、restaurant上的实验证明,IIMAN相较于其他基线模型,正确率和F1值有了进一步的提升,能够有效提高情感极性分类结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号