首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
文本情感分析领域内的特征加权一般考虑两个影响因子:特征在文档中的重要性(ITD)和特征在表达情感上的重要性(ITS)。结合该领域内两种分类准确率较高的监督特征加权算法,提出了一种新的ITS算法。新算法同时考虑特征在一类文档集里的文档频率(在特定的文档集里,出现某个特征的文档数量)及其占总文档频率的比例,使主要出现且大量出现在同一类文档集里的特征获得更高的ITS权值。实验证明,新算法能提高文本情感分类的准确率。  相似文献   

2.
Sentiment analysis is the natural language processing task dealing with sentiment detection and classification from texts. In recent years, due to the growth in the quantity and fast spreading of user-generated contents online and the impact such information has on events, people and companies worldwide, this task has been approached in an important body of research in the field. Despite different methods having been proposed for distinct types of text, the research community has concentrated less on developing methods for languages other than English. In the above-mentioned context, the present work studies the possibility to employ machine translation systems and supervised methods to build models able to detect and classify sentiment in languages for which less/no resources are available for this task when compared to English, stressing upon the impact of translation quality on the sentiment classification performance. Our extensive evaluation scenarios show that machine translation systems are approaching a good level of maturity and that they can, in combination to appropriate machine learning algorithms and carefully chosen features, be used to build sentiment analysis systems that can obtain comparable performances to the one obtained for English.  相似文献   

3.
An empirical study of sentiment analysis for chinese documents   总被引:1,自引:0,他引:1  
Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naïve Bayes and SVM) are investigated on a Chinese sentiment corpus with a size of 1021 documents. The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification. Furthermore, we found that sentiment classifiers are severely dependent on domains or topics.  相似文献   

4.
In this paper, we propose a novel method for Information Extraction (IE) in a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. The aim of this system is to design and implement an intelligent agent to manage any set of knowledge where information is abundant, vague or imprecise. The method was applied to the case of a major university web portal, University of Seville web portal, which contains a huge amount of information. Besides, we also propose a novel method for term weighting (TW). This method also is based on Fuzzy Logic, and replaces the classical TF-IDF method, usually used for TW, for its flexibility.  相似文献   

5.
Sentiment analysis has long been a hot topic for understanding users statements online. Previously many machine learning approaches for sentiment analysis such as simple feature-oriented SVM or more complicated probabilistic models have been proposed. Though they have demonstrated capability in polarity detection, there exist one challenge called the curse of dimensionality due to the high dimensional nature of text-based documents. In this research, inspired by the dimensionality reduction and feature extraction capability of auto-encoders, an auto-encoder-based bagging prediction architecture (AEBPA) is proposed. The experimental study on commonly used datasets has shown its potential. It is believed that this method can offer the researchers in the community further insight into bagging oriented solution for sentimental analysis.  相似文献   

6.
传统机器学习面临一个难题,即当训练数据与测试数据不再服从相同分布时,由训练集得到的分类器无法对测试集文本准确分类。针对该问题,根据迁移学习原理,在源领域和目标领域的交集特征中,依据改进的特征分布相似度进行特征加权;在非交集特征中,引入语义近似度和新提出的逆文本类别指数(TF-ICF),对特征在源领域内进行加权计算,充分利用大量已标记的源领域数据和少量已标记的目标领域数据获得所需特征,以便快速构建分类器。在文本数据集20Newsgroups和非文本数据集UCI中的实验结果表明,基于分布和逆文本类别指数的特征迁移加权算法能够在保证精度的前提下对特征快速迁移并加权。  相似文献   

7.
8.
基于类别信息的特征权重计算方法对特征与类别的关系表达不够准确,即对于类别频率相同的特征无法比较其对类别的区分能力,因此要考虑特征在类内的分布情况。将特征的反类别频率(inverse category frequency,ICF)和类内熵(entropy)相结合引入到特征权重计算方案中,构造了两种有监督特征权重计算方案。在维吾尔文文本分类语料上进行的实验结果表明,该方法能够明显改善样本的空间分布状态并提高维吾尔文文本分类的微平均◢F◣▼1▽值。  相似文献   

9.
10.
Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies.  相似文献   

11.
Perception of universal facial beauty has long been debated amongst psychologists and anthropologists. In this paper, we perform experiments to evaluate the extent of universal beauty by surveying a number of diverse human referees to grade a collection of female facial images. Results obtained show that there exists a strong central tendency in the human grades, thus exhibiting agreement on beauty assessment. We then trained an automated classifier using the average human grades as the ground truth and used it to classify an independent test set of facial images. The high accuracy achieved proves that this classifier can be used as a general, automated tool for objective classification of female facial beauty. Potential applications exist in the entertainment industry, cosmetic industry, virtual media, and plastic surgery.  相似文献   

12.
Twitter messages are increasingly used to determine consumer sentiment towards a brand. The existing literature on Twitter sentiment analysis uses various feature sets and methods, many of which are adapted from more traditional text classification problems. In this research, we introduce an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis. We augment this reduced Twitter-specific lexicon with brand-specific terms for brand-related tweets. We show that the reduced lexicon set, while significantly smaller (only 187 features), reduces modeling complexity, maintains a high degree of coverage over our Twitter corpus, and yields improved sentiment classification accuracy. To demonstrate the effectiveness of the devised Twitter-specific lexicon compared to a traditional sentiment lexicon, we develop comparable sentiment classification models using SVM. We show that the Twitter-specific lexicon is significantly more effective in terms of classification recall and accuracy metrics. We then develop sentiment classification models using the Twitter-specific lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems. We show that DAN2 produces more accurate sentiment classification results than SVM while using the same Twitter-specific lexicon.  相似文献   

13.
The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations. In this paper, we hypothesize that these two important properties can play a major role in Chinese sentiment analysis. In particular, we propose two effective features to encode phonetic information and, hence, fuse it with textual information. With this hypothesis, we propose Disambiguate Intonation for Sentiment Analysis (DISA), a network that we develop based on the principles of reinforcement learning. DISA disambiguates intonations for each Chinese character (pinyin) and, hence, learns precise phonetic representations. We also fuse phonetic features with textual and visual features to further improve performance. Experimental results on five different Chinese sentiment analysis datasets show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations and surpasses the state-of-the-art Chinese character-level representations.  相似文献   

14.
在文本分类领域中,目前关于特征权重的研究存在两方面不足:一方面,对于基于文档频率的特征权重算法,其中的文档频率常常忽略特征的词频信息;另一方面,对特征与类别的关系表达不够准确和充分。针对以上两点不足,提出一种新的基于词频的类别相关特征权重算法(全称CDF-AICF)。该算法在度量特征权重时,考虑了特征在每个词频下的文档频率。同时,为了准确表达特征与类别的关系,提出了两个新的概念:类别相关文档频率CDF和平均逆类频率AICF,分别用于表示特征对类别的表现力和区分力。最后,通过与其它5个特征权重度量方法相比较,在三个数据集上进行分类实验。结果显示,CDF-AICF的分类性能优于其它5种度量方法。  相似文献   

15.
Numerical weather forecasts, such as meteorological forecasts of precipitation, are inherently uncertain. These uncertainties depend on model physics as well as initial and boundary conditions. Since precipitation forecasts form the input into hydrological models, the uncertainties of the precipitation forecasts result in uncertainties of flood forecasts. In order to consider these uncertainties, ensemble prediction systems are applied. These systems consist of several members simulated by different models or using a single model under varying initial and boundary conditions. However, a too wide uncertainty range obtained as a result of taking into account members with poor prediction skills may lead to underestimation or exaggeration of the risk of hazardous events. Therefore, the uncertainty range of model-based flood forecasts derived from the meteorological ensembles has to be restricted.In this paper, a methodology towards improving flood forecasts by weighting ensemble members according to their skills is presented. The skill of each ensemble member is evaluated by comparing the results of forecasts corresponding to this member with observed values in the past. Since numerous forecasts are required in order to reliably evaluate the skill, the evaluation procedure is time-consuming and tedious. Moreover, the evaluation is highly subjective, because an expert who performs it makes his decision based on his implicit knowledge.Therefore, approaches for the automated evaluation of such forecasts are required. Here, we present a semi-automated approach for the assessment of precipitation forecast ensemble members. The approach is based on supervised machine learning and was tested on ensemble precipitation forecasts for the area of the Mulde river basin in Germany. Based on the evaluation results of the specific ensemble members, weights corresponding to their forecast skill were calculated. These weights were then successfully used to reduce the uncertainties within rainfall-runoff simulations and flood risk predictions.  相似文献   

16.
This paper presents our research on automatic annotation of a five-billion-word corpus of Japanese blogs with information on affect and sentiment. We first perform a study in emotion blog corpora to discover that there has been no large scale emotion corpus available for the Japanese language. We choose the largest blog corpus for the language and annotate it with the use of two systems for affect analysis: ML-Ask for word- and sentence-level affect analysis and CAO for detailed analysis of emoticons. The annotated information includes affective features like sentence subjectivity (emotive/non-emotive) or emotion classes (joy, sadness, etc.), useful in affect analysis. The annotations are also generalized on a two-dimensional model of affect to obtain information on sentence valence (positive/negative), useful in sentiment analysis. The annotations are evaluated in several ways. Firstly, on a test set of a thousand sentences extracted randomly and evaluated by over forty respondents. Secondly, the statistics of annotations are compared to other existing emotion blog corpora. Finally, the corpus is applied in several tasks, such as generation of emotion object ontology or retrieval of emotional and moral consequences of actions.  相似文献   

17.
Due to the advancement of technology and globalization, it has become much easier for people around the world to express their opinions through social media platforms. Harvesting opinions through sentiment analysis from people with different backgrounds and from different cultures via social media platforms can help modern organizations, including corporations and governments understand customers, make decisions, and develop strategies. However, multiple languages posted on many social media platforms make it difficult to perform a sentiment analysis with acceptable levels of accuracy and consistency. In this paper, we propose a bilingual approach to conducting sentiment analysis on both Chinese and English social media to obtain more objective and consistent opinions. Instead of processing English and Chinese comments separately, our approach treats review comments as a stream of text containing both Chinese and English words. That stream of text is then segmented by our segment model and trimmed by the stop word lists which include both Chinese and English words. The stem words are then processed into feature vectors and then applied with two exchangeable natural language models, SVM and N-Gram. Finally, we perform a case study, applying our proposed approach to analyzing movie reviews obtained from social media. Our experiment shows that our proposed approach has a high level of accuracy and is more effective than the existing learning-based approaches.  相似文献   

18.
在文本情感分类中,传统的特征表达通常忽略了语言知识的重要性。提出了一种基于词性嵌入的特征权重计算方法,通过构造一种特征嵌入模式将名词、动词、形容词、副词四种词性对情感分类的贡献度嵌入到传统的TF-IDF(Term Frequency-Inverse Document Frequency)权值中。其中,词性的情感贡献度通过粒子群优化算法获得。实验采用支持向量机完成分类,并对比了不同知识的嵌入情况,包括词性、情感词及词性和情感词的组合。结果表明基于词性嵌入的方法分类性能最优,可以显著提高中文文本情感分类的准确率。  相似文献   

19.
In this paper a new approach is presented for tackling the problem of identifying the author of a handwritten text. This problem is solved with a simple, yet powerful, modification of the so called ALVOT family of supervised classification algorithms with a novel differentiated-weighting scheme. Compared to other previously published approaches, the proposed method significantly reduces the number and complexity of the text-features to be extracted from the text. Also, the specific combination of line-level and word-level features used introduces an eclectic paradigm between texture-related and structure-related approaches.  相似文献   

20.
Finding the right scales for feature extraction is crucial for supervised image segmentation based on pixel classification. There are many scale selection methods in the literature; among them the one proposed by Lindeberg is widely used for image structures such as blobs, edges and ridges. Those schemes are usually unsupervised, as they do not take into account the actual segmentation problem at hand. In this paper, we consider the problem of selecting scales, which aims at an optimal discrimination between user-defined classes in the segmentation. We show the deficiency of the classical unsupervised scale selection paradigms and present a supervised alternative. In particular, the so-called max rule is proposed, which selects a scale for each pixel to have the largest confidence in the classification across the scales. In interpreting the classifier as a complex image filter, we can relate our approach back to Lindeberg's original proposal. In the experiments, the max rule is applied to artificial and real-world image segmentation tasks, which is shown to choose the right scales for different problems and lead to better segmentation results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号