共查询到20条相似文献,搜索用时 15 毫秒
1.
郑安怡 《计算机工程与应用》2015,51(21):30-35
文本情感分析领域内的特征加权一般考虑两个影响因子:特征在文档中的重要性(ITD)和特征在表达情感上的重要性(ITS)。结合该领域内两种分类准确率较高的监督特征加权算法,提出了一种新的ITS算法。新算法同时考虑特征在一类文档集里的文档频率(在特定的文档集里,出现某个特征的文档数量)及其占总文档频率的比例,使主要出现且大量出现在同一类文档集里的特征获得更高的ITS权值。实验证明,新算法能提高文本情感分类的准确率。 相似文献
2.
An empirical study of sentiment analysis for chinese documents 总被引:1,自引:0,他引:1
Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naïve Bayes and SVM) are investigated on a Chinese sentiment corpus with a size of 1021 documents. The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification. Furthermore, we found that sentiment classifiers are severely dependent on domains or topics. 相似文献
3.
Jorge Ropero Ariel GómezAlejandro Carrasco Carlos León 《Expert systems with applications》2012,39(4):4567-4581
In this paper, we propose a novel method for Information Extraction (IE) in a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. The aim of this system is to design and implement an intelligent agent to manage any set of knowledge where information is abundant, vague or imprecise. The method was applied to the case of a major university web portal, University of Seville web portal, which contains a huge amount of information. Besides, we also propose a novel method for term weighting (TW). This method also is based on Fuzzy Logic, and replaces the classical TF-IDF method, usually used for TW, for its flexibility. 相似文献
4.
《Journal of Visual Languages and Computing》2014,25(6):840-849
Sentiment analysis has long been a hot topic for understanding users statements online. Previously many machine learning approaches for sentiment analysis such as simple feature-oriented SVM or more complicated probabilistic models have been proposed. Though they have demonstrated capability in polarity detection, there exist one challenge called the curse of dimensionality due to the high dimensional nature of text-based documents. In this research, inspired by the dimensionality reduction and feature extraction capability of auto-encoders, an auto-encoder-based bagging prediction architecture (AEBPA) is proposed. The experimental study on commonly used datasets has shown its potential. It is believed that this method can offer the researchers in the community further insight into bagging oriented solution for sentimental analysis. 相似文献
5.
6.
7.
Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies. 相似文献
8.
Twitter messages are increasingly used to determine consumer sentiment towards a brand. The existing literature on Twitter sentiment analysis uses various feature sets and methods, many of which are adapted from more traditional text classification problems. In this research, we introduce an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis. We augment this reduced Twitter-specific lexicon with brand-specific terms for brand-related tweets. We show that the reduced lexicon set, while significantly smaller (only 187 features), reduces modeling complexity, maintains a high degree of coverage over our Twitter corpus, and yields improved sentiment classification accuracy. To demonstrate the effectiveness of the devised Twitter-specific lexicon compared to a traditional sentiment lexicon, we develop comparable sentiment classification models using SVM. We show that the Twitter-specific lexicon is significantly more effective in terms of classification recall and accuracy metrics. We then develop sentiment classification models using the Twitter-specific lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems. We show that DAN2 produces more accurate sentiment classification results than SVM while using the same Twitter-specific lexicon. 相似文献
9.
Numerical weather forecasts, such as meteorological forecasts of precipitation, are inherently uncertain. These uncertainties depend on model physics as well as initial and boundary conditions. Since precipitation forecasts form the input into hydrological models, the uncertainties of the precipitation forecasts result in uncertainties of flood forecasts. In order to consider these uncertainties, ensemble prediction systems are applied. These systems consist of several members simulated by different models or using a single model under varying initial and boundary conditions. However, a too wide uncertainty range obtained as a result of taking into account members with poor prediction skills may lead to underestimation or exaggeration of the risk of hazardous events. Therefore, the uncertainty range of model-based flood forecasts derived from the meteorological ensembles has to be restricted.In this paper, a methodology towards improving flood forecasts by weighting ensemble members according to their skills is presented. The skill of each ensemble member is evaluated by comparing the results of forecasts corresponding to this member with observed values in the past. Since numerous forecasts are required in order to reliably evaluate the skill, the evaluation procedure is time-consuming and tedious. Moreover, the evaluation is highly subjective, because an expert who performs it makes his decision based on his implicit knowledge.Therefore, approaches for the automated evaluation of such forecasts are required. Here, we present a semi-automated approach for the assessment of precipitation forecast ensemble members. The approach is based on supervised machine learning and was tested on ensemble precipitation forecasts for the area of the Mulde river basin in Germany. Based on the evaluation results of the specific ensemble members, weights corresponding to their forecast skill were calculated. These weights were then successfully used to reduce the uncertainties within rainfall-runoff simulations and flood risk predictions. 相似文献
10.
在文本情感分类中,传统的特征表达通常忽略了语言知识的重要性。提出了一种基于词性嵌入的特征权重计算方法,通过构造一种特征嵌入模式将名词、动词、形容词、副词四种词性对情感分类的贡献度嵌入到传统的TF-IDF(Term Frequency-Inverse Document Frequency)权值中。其中,词性的情感贡献度通过粒子群优化算法获得。实验采用支持向量机完成分类,并对比了不同知识的嵌入情况,包括词性、情感词及词性和情感词的组合。结果表明基于词性嵌入的方法分类性能最优,可以显著提高中文文本情感分类的准确率。 相似文献
11.
Edith C. Herrera-Luna Edgardo M. Felipe-Riveron Salvador Godoy-Calderon 《Pattern recognition letters》2011,32(8):1139-1144
In this paper a new approach is presented for tackling the problem of identifying the author of a handwritten text. This problem is solved with a simple, yet powerful, modification of the so called ALVOT family of supervised classification algorithms with a novel differentiated-weighting scheme. Compared to other previously published approaches, the proposed method significantly reduces the number and complexity of the text-features to be extracted from the text. Also, the specific combination of line-level and word-level features used introduces an eclectic paradigm between texture-related and structure-related approaches. 相似文献
12.
Finding the right scales for feature extraction is crucial for supervised image segmentation based on pixel classification. There are many scale selection methods in the literature; among them the one proposed by Lindeberg is widely used for image structures such as blobs, edges and ridges. Those schemes are usually unsupervised, as they do not take into account the actual segmentation problem at hand. In this paper, we consider the problem of selecting scales, which aims at an optimal discrimination between user-defined classes in the segmentation. We show the deficiency of the classical unsupervised scale selection paradigms and present a supervised alternative. In particular, the so-called max rule is proposed, which selects a scale for each pixel to have the largest confidence in the classification across the scales. In interpreting the classifier as a complex image filter, we can relate our approach back to Lindeberg's original proposal. In the experiments, the max rule is applied to artificial and real-world image segmentation tasks, which is shown to choose the right scales for different problems and lead to better segmentation results. 相似文献
13.
Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary data. Based on a case study of Dutch employees’ work-related tweets, this paper compares the coding performance of three classifiers, Linear Support Vector Machine, Naïve Bayes, and logistic regression. The performance of these classifiers is assessed by examining accuracy, precision, recall, the area under the precision-recall curve, and Krippendorf’s Alpha. These indices are obtained by comparing the coding decisions of the classifier to manual coding decisions. The findings indicate that the Linear Support Vector Machine and Naïve Bayes classifiers outperform the logistic regression classifier. This study also compared the performance of these classifiers based on stratified random samples and random samples of training data. The findings indicate that in smaller training sets stratified random training samples perform better than random training samples, in large training sets (n = 4000) random samples yield better results. Finally, the Linear Support Vector Machine classifier was trained with 4000 tweets and subsequently used to categorize 578,581 tweets obtained from 430 employees. 相似文献
14.
Oscar Fontenla-Romero Author Vitae Bertha Guijarro-Berdiñas Author Vitae Author Vitae Amparo Alonso-Betanzos Author Vitae 《Pattern recognition》2010,43(5):1984-1992
This paper proposes a novel supervised learning method for single-layer feedforward neural networks. This approach uses an alternative objective function to that based on the MSE, which measures the errors before the neuron's nonlinear activation functions instead of after them. In this case, the solution can be easily obtained solving systems of linear equations, i.e., requiring much less computational power than the one associated with the regular methods. A theoretical study is included to proof the approximated equivalence between the global optimum of the objective function based on the regular MSE criterion and the one of the proposed alternative MSE function.Furthermore, it is shown that the presented method has the capability of allowing incremental and distributed learning. An exhaustive experimental study is also presented to verify the soundness and efficiency of the method. This study contains 10 classification and 16 regression problems. In addition, a comparison with other high performance learning algorithms shows that the proposed method exhibits, in average, the highest performance and low-demanding computational requirements. 相似文献
15.
Chin-Chun Chang Author Vitae 《Pattern recognition》2010,43(8):2971-2981
The RELIEF algorithm is a popular approach for feature weighting. Many extensions of the RELIEF algorithm are developed, and I-RELIEF is one of the famous extensions. In this paper, I-RELIEF is generalized for supervised distance metric learning to yield a Mahananobis distance function. The proposed approach is justified by showing that the objective function of the generalized I-RELIEF is closely related to the expected leave-one-out nearest-neighbor classification rate. In addition, the relationships among the generalized I-RELIEF, the neighbourhood components analysis, and graph embedding are also pointed out. Experimental results on various data sets all demonstrate the superiority of the proposed approach. 相似文献
16.
In this paper, we make a comparative study of the effectiveness of ensemble technique for sentiment classification. The ensemble framework is applied to sentiment classification tasks, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure. First, two types of feature sets are designed for sentiment classification, namely the part-of-speech based feature sets and the word-relation based feature sets. Second, three well-known text classification algorithms, namely na?¨ve Bayes, maximum entropy and support vector machines, are employed as base-classifiers for each of the feature sets. Third, three types of ensemble methods, namely the fixed combination, weighted combination and meta-classifier combination, are evaluated for three ensemble strategies. A wide range of comparative experiments are conducted on five widely-used datasets in sentiment classification. Finally, some in-depth discussion is presented and conclusions are drawn about the effectiveness of ensemble technique for sentiment classification. 相似文献
17.
Automatic land cover analysis for Tenerife by supervised classification using remotely sensed data 总被引:5,自引:0,他引:5
Automatic land cover classification from satellite images is an important topic in many remote sensing applications. In this paper, we consider three different statistical approaches to tackle this problem: two of them, namely the well-known maximum likelihood classification (ML) and the support vector machine (SVM), are noncontextual methods. The third one, iterated conditional modes (ICM), exploits spatial context by using a Markov random field. We apply these methods to Landsat 5 Thematic Mapper (TM) data from Tenerife, the largest of the Canary Islands. Due to the size and the strong relief of the island, ground truth data could be collected only sparsely by examination of test areas for previously defined land cover classes.We show that after application of an unsupervised clustering method to identify subclasses, all classification algorithms give satisfactory results (with statistical overall accuracy of about 90%) if the model parameters are selected appropriately. Although being superior to ML theoretically, both SVM and ICM have to be used carefully: ICM is able to improve ML, but when applied for too many iterations, spatially small sample areas are smoothed away, leading to statistically slightly worse classification results. SVM yields better statistical results than ML, but when investigated visually, the classification result is not completely satisfying. This is due to the fact that no a priori information on the frequency of occurrence of a class was used in this context, which helps ML to limit the unlikely classes. 相似文献
18.
结合语义的特征权重计算方法研究 总被引:1,自引:1,他引:1
为进一步改善目前大多数基于向量空间模型(VSM)的文本聚类算法的效果,研究了文本聚类的基础和关键环节--文本间相似度的计算,其中一个重要步骤就是计算各文本中特征词的权重,该计算的合理性和有效性直接影响到文本相似度的准确性和聚类的效果.传统的VSM特征权重计算方法-TF-IDF,没有考虑语义相似的词语在文本集中的分布情况,针对该问题,在基于"知网"的词语语义相似度分析基础上,提出了一种改进的TF-IDF权重计算方法.实验结果表明,该算法是有效可行的,且在一定程度上提高了文本聚类的查准率和查全率. 相似文献
19.
A novel supervised learning method is proposed by combining linear discriminant functions with neural networks. The proposed method results in a tree-structured hybrid architecture. Due to constructive learning, the binary tree hierarchical architecture is automatically generated by a controlled growing process for a specific supervised learning task. Unlike the classic decision tree, the linear discriminant functions are merely employed in the intermediate level of the tree for heuristically partitioning a large and complicated task into several smaller and simpler subtasks in the proposed method. These subtasks are dealt with by component neural networks at the leaves of the tree accordingly. For constructive learning, growing and credit-assignment algorithms are developed to serve for the hybrid architecture. The proposed architecture provides an efficient way to apply existing neural networks (e.g. multi-layered perceptron) for solving a large scale problem. We have already applied the proposed method to a universal approximation problem and several benchmark classification problems in order to evaluate its performance. Simulation results have shown that the proposed method yields better results and faster training in comparison with the multilayered perceptron. 相似文献
20.
Haeng-Jin Jang Jaemoon Sim Yonnim Lee Ohbyung Kwon 《Expert systems with applications》2013,40(18):7492-7503
IT vendors routinely use social media such as YouTube not only to disseminate their IT product information, but also to acquire customer input efficiently as part of their market research strategies. Customer responses that appear in social media, however, are typically unstructured; thus, a fairly large data set is needed for meaningful analysis. Although identifying customers’ value structures and attitudes may be useful for developing targeted or niche markets, the unstructured and volume-heavy nature of customer data prohibits efficient and economical extraction of such information. Automatic extraction of customer information would be valuable in determining value structure and strength. This paper proposes an intelligent method of estimating causality between user profiles, value structures, and attitudes based on the replies and published content managed by open social network systems such as YouTube. To show the feasibility of the idea proposed in this paper, information richness and agility are used as underlying concepts to create performance measures based on media/information richness theory. The resulting deep sentiment analysis proves to be superior to legacy sentiment analysis tools for estimation of causality among the focal parameters. 相似文献