首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Social media sites and applications, including Facebook, YouTube, Twitter and blogs, have become major social media attractions today. The huge amount of information from this medium has become an attractive resource for organisations to monitor the opinions of users, and therefore, it is receiving a lot of attention in the field of sentiment analysis. Early work on sentiment analysis approached this problem at a document-level, where the overall sentiment was identified, rather than the details of the sentiment. This research took into account the use of an aspect-based sentiment analysis on Twitter in order to perform a finer-grained analysis. A new hybrid sentiment classification for Twitter is proposed by embedding a feature selection method. A comparison of the accuracy of the classification by the principal component analysis (PCA), latent semantic analysis (LSA), and random projection (RP) feature selection methods are presented in this paper. Furthermore, the hybrid sentiment classification was validated using Twitter datasets to represent different domains, and the evaluation with different classification algorithms also demonstrated that the new hybrid approach produced meaningful results. The implementations showed that the new hybrid sentiment classification was able to improve the accuracy performance from the existing baseline sentiment classification methods by 76.55, 71.62 and 74.24%, respectively.  相似文献   

2.
杨书新  张楠 《计算机应用》2021,41(10):2829-2834
词嵌入技术在文本情感分析中发挥着重要的作用,但是传统的Word2Vec、GloVe等词嵌入技术会产生语义单一的问题。针对上述问题提出了一种融合情感词典与上下文语言模型ELMo的文本情感分析模型SLP-ELMo。首先,利用情感词典对句子中的单词进行筛选;其次,将筛选出的单词输入字符卷积神经网络(char-CNN),从而产生每个单词的字符向量;然后,将字符向量输入ELMo模型进行训练;此外,在ELMo向量的最后一层加入了注意力机制,以便更好地训练词向量;最后,将词向量与ELMo向量并行融合并输入分类器进行文本情感分类。与现有的多个模型对比,所提模型在IMDB和SST-2这两个数据集上均得到了更高的准确率,验证了模型的有效性。  相似文献   

3.
Sentiment lexicons and word embeddings constitute well-established sources of information for sentiment analysis in online social media. Although their effectiveness has been demonstrated in state-of-the-art sentiment analysis and related tasks in the English language, such publicly available resources are much less developed and evaluated for the Greek language. In this paper, we tackle the problems arising when analyzing text in such an under-resourced language. We present and make publicly available a rich set of such resources, ranging from a manually annotated lexicon, to semi-supervised word embedding vectors and annotated datasets for different tasks. Our experiments using different algorithms and parameters on our resources show promising results over standard baselines; on average, we achieve a 24.9% relative improvement in F-score on the cross-domain sentiment analysis task when training the same algorithms with our resources, compared to training them on more traditional feature sources, such as n-grams. Importantly, while our resources were built with the primary focus on the cross-domain sentiment analysis task, they also show promising results in related tasks, such as emotion analysis and sarcasm detection.  相似文献   

4.
Language Resources and Evaluation - Sentiment analysis is a classification task where polarity of textual data is identified, i.e. to analyze whether a sentence or document expresses a negative,...  相似文献   

5.
Supervised sentiment classification systems are typically domain-specific, and the performance decreases sharply when transferred from one domain to another domain. Building these systems involves annotating a large amount of data for every domain, which needs much human labor. So, a reasonable way is to utilize labeled data in one existed (or called source) domain for sentiment classification in target domain. To address this problem, we propose a two-stage framework for cross-domain sentiment classification. At the “building a bridge” stage, we build a bridge between the source domain and the target domain to get some most confidently labeled documents in the target domain; at the “following the structure” stage, we exploit the intrinsic structure, revealed by these most confidently labeled documents, to label the target-domain data. The experimental results indicate that the proposed approach could improve the performance of cross-domain sentiment classification dramatically.  相似文献   

6.
7.
8.
Han  Hongyu  Zhang  Jianpei  Yang  Jing  Shen  Yiran  Zhang  Yongshi 《Multimedia Tools and Applications》2018,77(16):21265-21280
Multimedia Tools and Applications - Lexicon-based approaches for review sentiment analysis have attracted significant attention in recent years. Lots of sentiment lexicon generation methods have...  相似文献   

9.
文本情感分析已经逐渐成为自然语言处理(NLP)的重要内容,并在系统推荐、用户情感信息获取,为政府、企业提供舆情参考等领域越来越占据重要地位。通过文献调研的方式,对情感分析领域的方法进行对比和综述。首先,从时间、方法等维度对情感分析的方法进行文献调研;然后,对情感分析的主要方法、应用场景进行归纳总结和对比;最后,在此基础上分析每种方法的优缺点。根据分析结果可以知道,在面对不同的任务场景,主要有三种情感分析的方法:基于情感字典的情感分析法、基于机器学习的情感分析法和基于深度学习的情感分析法,基于多策略混合的方法成为改进的趋势。文献调研表明,文本情感分析的技术方法还有改进的空间,在电子商务、心理治疗、舆情监控方面有较大市场和发展前景。  相似文献   

10.
Language Resources and Evaluation - This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media...  相似文献   

11.
Sentiment analysis aims to extract the sentiment polarity of given segment of text. Polarity resources that indicate the sentiment polarity of words are commonly used in different approaches. While English is the richest language in regard to having such resources, the majority of other languages, including Turkish, lack polarity resources. In this work we present the first comprehensive Turkish polarity resource, SentiTurkNet, where three polarity scores are assigned to each synset in the Turkish WordNet, indicating its positivity, negativity, and objectivity (neutrality) levels. Our method is general and applicable to other languages. Evaluation results for Turkish show that the polarity scores obtained through this method are more accurate compared to those obtained through direct translation (mapping) from SentiWordNet.  相似文献   

12.
Whilst several examples of segment based approaches to language identification (LID) have been published, they have been typically conducted using only a small number of languages, or varying feature sets, thus making it difficult to determine how the segment length influences the accuracy of LID systems. In this study, phone-triplets are used as crude approximates for a syllable-length sub-word segmental unit. The proposed pseudo-syllabic length framework is subsequently used for both qualitative and quantitative examination of the contributions made by acoustic, phonotactic and prosodic information sources, and trialled in accordance with the NIST 1996 LID protocol. Firstly, a series of experimental comparisons are conducted which examine the utility of using segmental units for modelling short term acoustic features. These include comparisons between language specific Gaussian mixture models (GMMs), language specific GMMs for each segmental unit, and finally language specific hidden Markov models (HMM) for each segment, undertaken in an attempt to better model the temporal evolution of acoustic features. In a second tier of experiments, the contribution of both broad and fine class phonotactic information, when considered over an extended time frame, is contrasted with an implementation of the currently popular parallel phone recognition language modelling (PPRLM) technique. Results indicate that this information can be used to complement existing PPRLM systems to obtain improved performance. The pseudo-syllabic framework is also used to model prosodic dynamics and compared to an implemented version of a recently published system, achieving comparable levels of performance.  相似文献   

13.
Facilitated by the SOA and new Web technologies, Service-Oriented Rich Clients (SORCs) compose various Web-delivered services in Web browser to create new applications. The SORCs support client-side data storage and manipulation and provide more features than traditional thin clients. However, the SORCs might suffer from data access issues, mainly due to both client-side incompatible data sources and server-side improper or even undesirable cache strategies. Addressing the data access issues, this paper proposes a data access framework for SORCs. The main contributions of this paper are as follows. First, the framework makes the SORCs accommodate heterogeneous local storage solutions and diverse Web browsers properly. The framework abstracts the underlying details of different local storages and selects the most proper data sources for current SORC in use. Secondly, the framework provides a cache mechanism, which supports client-side customized cache strategies. An adaptive technique for the strategies is also proposed to adjust cache strategies based on users?? historical actions to achieve better performance.  相似文献   

14.
As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and more attention online, which provide a unique opportunity to gauge the relation between expressed public sentiment and hot topics. This paper presents a Social Sentiment Sensor (SSS) system on Sina Weibo to detect daily hot topics and analyze the sentiment distributions toward these topics. SSS includes two main techniques, namely, hot topic detection and topic-oriented sentiment analysis. Hot topic detection aims to detect the most popular topics online based on the following steps, topic detection, topic clustering, and topic popularity ranking. We extracted topics from the hashtags using a hashtag filtering model because they can cover almost all the topics. Then, we cluster the topics that describe the same issue, and rank the topic clusters via their popularity to exploit the final hot topics. Topic-oriented sentiment analysis aims to analyze public opinions toward the hot topics. After retrieving the topic-related messages, we recognize sentiment for each message using a state-of-the-art SVM (Support Vector Machine) sentiment classifier. Then, we summarize the sentiments for the hot topic to achieve topic sentiment distribution. Based on the above framework and algorithms, SSS produces a real-time visualization system to monitor social sentiments, which is offering the public a new and timely perspective on the dynamics of the social topics.  相似文献   

15.
对语言Ontology进行了研究,提出了一个面向多文种信息处理的语言Ontology,给出了其设计思想、定义和推理机制,还描述了其结构和构建方法。该Ontology采用分层的树形结构,收录了具有语义的四类词:名词、动词、副词和形容词,并以词义和文种为类节点、词为叶子节点进行组织。能表示各个文种词汇的语义,并以语义为轴心进行融合,以提供不同文种间词汇的转换关系。另外,还提供了词汇语义相似度计算方法以及推理机制来对语义进行推理。  相似文献   

16.
17.
传统的机器学习方法在对网络评论文本进行情感极性分类时,未能充分挖掘语义信息和关联信息,而已有的深度学习方法虽能提取语义信息和上下文信息,但该过程往往是单向的,在获取评论文本的深层语义信息过程中存在不足。针对以上问题,提出了一种结合广义自回归预训练语言模型(XLNet)与循环卷积神经网络(RCNN)的文本情感分析方法。首先,利用XLNet对文本进行特征表示,并通过引入片段级递归机制和相对位置信息编码,充分利用了评论文本的语境信息,从而有效提升了文本特征的表达能力;然后,利用RCNN对文本特征进行双向训练,并在更深层次上提取文本的上下文语义信息,从而提升了在情感分析任务中的综合性能。所提方法分别在三个公开数据集weibo-100k、waimai-10k和ChnSentiCorp上进行了实验,准确率分别达到了96.4%、91.8%和92.9%。实验结果证明了所提方法在情感分析任务中的有效性。  相似文献   

18.
A method for detecting potential violations of integrity constraints of concurrent transactions running under snapshot isolation (SI) is presented. Although SI provides a high level of isolation, it does not, by itself, ensure that all integrity constraints are satisfied. In particular, while current implementations of SI enforce all internal integrity constraints, in particular key constraints, they fail to enforce constraints implemented via triggers. One remedy is to turn to serializable SI (SSI), in which full serializability is guaranteed. However, SSI comes at the price of either a substantial number of false positives, or else a high cost of constructing the full direct serialization graph. In this work, a compromise approach, called constraint-preserving snapshot isolation (CPSI), is developed, which while not guaranteeing full serializability, does guarantee that all constraints, including those enforced via triggers, are satisfied. In contrast to full SSI, CPSI requires testing concurrent transactions for conflict only pairwise, and thus involves substantially less overhead while providing a foundation for resolving conflicts via negotiation rather than via abort and restart. As is the case with SSI, CPSI can result in false positives. To address this, a hybrid approach is also developed which combines CPSI with a special version of SSI called CSSI, resulting in substantially fewer false positives than would occur using either approach alone.  相似文献   

19.
Multimedia Tools and Applications - Sentiment analysis is a domain of study that focuses on identifying and classifying the ideas expressed in the form of text into positive, negative and neutral...  相似文献   

20.
Sentiment analysis has long been a hot topic for understanding users statements online. Previously many machine learning approaches for sentiment analysis such as simple feature-oriented SVM or more complicated probabilistic models have been proposed. Though they have demonstrated capability in polarity detection, there exist one challenge called the curse of dimensionality due to the high dimensional nature of text-based documents. In this research, inspired by the dimensionality reduction and feature extraction capability of auto-encoders, an auto-encoder-based bagging prediction architecture (AEBPA) is proposed. The experimental study on commonly used datasets has shown its potential. It is believed that this method can offer the researchers in the community further insight into bagging oriented solution for sentimental analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号