期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An empirical study of sentiment analysis for chinese documents 总被引：1，自引：0，他引：1

Songbo Tan Jin Zhang 《Expert systems with applications》2008,34(4):2622-2629

Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naïve Bayes and SVM) are investigated on a Chinese sentiment corpus with a size of 1021 documents. The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification. Furthermore, we found that sentiment classifiers are severely dependent on domains or topics. 相似文献

2.

A Fuzzy Logic intelligent agent for Information Extraction: Introducing a new Fuzzy Logic-based term weighting scheme

Jorge Ropero Ariel GómezAlejandro Carrasco Carlos León 《Expert systems with applications》2012,39(4):4567-4581

In this paper, we propose a novel method for Information Extraction (IE) in a set of knowledge in order to answer to user consultations using natural language. The system is based on a Fuzzy Logic engine, which takes advantage of its flexibility for managing sets of accumulated knowledge. These sets may be built in hierarchic levels by a tree structure. The aim of this system is to design and implement an intelligent agent to manage any set of knowledge where information is abundant, vague or imprecise. The method was applied to the case of a major university web portal, University of Seville web portal, which contains a huge amount of information. Besides, we also propose a novel method for term weighting (TW). This method also is based on Fuzzy Logic, and replaces the classical TF-IDF method, usually used for TW, for its flexibility. 相似文献

3.

Auto-encoder based bagging architecture for sentiment analysis

《Journal of Visual Languages and Computing》2014,25(6):840-849

Sentiment analysis has long been a hot topic for understanding users statements online. Previously many machine learning approaches for sentiment analysis such as simple feature-oriented SVM or more complicated probabilistic models have been proposed. Though they have demonstrated capability in polarity detection, there exist one challenge called the curse of dimensionality due to the high dimensional nature of text-based documents. In this research, inspired by the dimensionality reduction and feature extraction capability of auto-encoders, an auto-encoder-based bagging prediction architecture (AEBPA) is proposed. The experimental study on commonly used datasets has shown its potential. It is believed that this method can offer the researchers in the community further insight into bagging oriented solution for sentimental analysis. 相似文献

4.

A unified supervised codebook learning framework for classification

Congyan LangAuthor Vitae Songhe FengAuthor Vitae Bing Cheng^{Author Vitae} 《Neurocomputing》2012,77(1):281-288

相似文献

5.

A framework for building web mining applications in the world of blogs: A case study in product sentiment analysis

Evandro Costa Rafael Ferreira Patrick Brito Ig Ibert Bittencourt Olavo Holanda Aydano Machado Tarsis Marinho 《Expert systems with applications》2012,39(5):4813-4834

相似文献

6.

A semantic term weighting scheme for text categorization

Qiming Luo Enhong Chen Hui Xiong 《Expert systems with applications》2011,38(10):12708-12716

Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies. 相似文献

7.

Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network

M. Ghiassi J. Skinner D. Zimbra 《Expert systems with applications》2013,40(16):6266-6282

Twitter messages are increasingly used to determine consumer sentiment towards a brand. The existing literature on Twitter sentiment analysis uses various feature sets and methods, many of which are adapted from more traditional text classification problems. In this research, we introduce an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis. We augment this reduced Twitter-specific lexicon with brand-specific terms for brand-related tweets. We show that the reduced lexicon set, while significantly smaller (only 187 features), reduces modeling complexity, maintains a high degree of coverage over our Twitter corpus, and yields improved sentiment classification accuracy. To demonstrate the effectiveness of the devised Twitter-specific lexicon compared to a traditional sentiment lexicon, we develop comparable sentiment classification models using SVM. We show that the Twitter-specific lexicon is significantly more effective in terms of classification recall and accuracy metrics. We then develop sentiment classification models using the Twitter-specific lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems. We show that DAN2 produces more accurate sentiment classification results than SVM while using the same Twitter-specific lexicon. 相似文献

8.

Assessment and weighting of meteorological ensemble forecast members based on supervised machine learning with application to runoff simulations and flood warning

《Advanced Engineering Informatics》2017

Numerical weather forecasts, such as meteorological forecasts of precipitation, are inherently uncertain. These uncertainties depend on model physics as well as initial and boundary conditions. Since precipitation forecasts form the input into hydrological models, the uncertainties of the precipitation forecasts result in uncertainties of flood forecasts. In order to consider these uncertainties, ensemble prediction systems are applied. These systems consist of several members simulated by different models or using a single model under varying initial and boundary conditions. However, a too wide uncertainty range obtained as a result of taking into account members with poor prediction skills may lead to underestimation or exaggeration of the risk of hazardous events. Therefore, the uncertainty range of model-based flood forecasts derived from the meteorological ensembles has to be restricted.In this paper, a methodology towards improving flood forecasts by weighting ensemble members according to their skills is presented. The skill of each ensemble member is evaluated by comparing the results of forecasts corresponding to this member with observed values in the past. Since numerous forecasts are required in order to reliably evaluate the skill, the evaluation procedure is time-consuming and tedious. Moreover, the evaluation is highly subjective, because an expert who performs it makes his decision based on his implicit knowledge.Therefore, approaches for the automated evaluation of such forecasts are required. Here, we present a semi-automated approach for the assessment of precipitation forecast ensemble members. The approach is based on supervised machine learning and was tested on ensemble precipitation forecasts for the area of the Mulde river basin in Germany. Based on the evaluation results of the specific ensemble members, weights corresponding to their forecast skill were calculated. These weights were then successfully used to reduce the uncertainties within rainfall-runoff simulations and flood risk predictions. 相似文献

9.

A supervised algorithm with a new differentiated-weighting scheme for identifying the author of a handwritten text

Edith C. Herrera-Luna Edgardo M. Felipe-Riveron Salvador Godoy-Calderon 《Pattern recognition letters》2011,32(8):1139-1144

In this paper a new approach is presented for tackling the problem of identifying the author of a handwritten text. This problem is solved with a simple, yet powerful, modification of the so called ALVOT family of supervised classification algorithms with a novel differentiated-weighting scheme. Compared to other previously published approaches, the proposed method significantly reduces the number and complexity of the text-features to be extracted from the text. Also, the specific combination of line-level and word-level features used introduces an eclectic paradigm between texture-related and structure-related approaches. 相似文献

10.

Scale selection for supervised image segmentation

Yan Li David M.J. Tax Marco Loog 《Image and vision computing》2012

Finding the right scales for feature extraction is crucial for supervised image segmentation based on pixel classification. There are many scale selection methods in the literature; among them the one proposed by Lindeberg is widely used for image structures such as blobs, edges and ridges. Those schemes are usually unsupervised, as they do not take into account the actual segmentation problem at hand. In this paper, we consider the problem of selecting scales, which aims at an optimal discrimination between user-defined classes in the segmentation. We show the deficiency of the classical unsupervised scale selection paradigms and present a supervised alternative. In particular, the so-called max rule is proposed, which selects a scale for each pixel to have the largest confidence in the classification across the scales. In interpreting the classifier as a complex image filter, we can relate our approach back to Lindeberg's original proposal. In the experiments, the max rule is applied to artificial and real-world image segmentation tasks, which is shown to choose the right scales for different problems and lead to better segmentation results. 相似文献

11.

A new convex objective function for the supervised learning of single-layer neural networks

Oscar Fontenla-Romero^{Author Vitae} Bertha Guijarro-Berdiñas Author Vitae Author Vitae Amparo Alonso-Betanzos Author Vitae 《Pattern recognition》2010,43(5):1984-1992

This paper proposes a novel supervised learning method for single-layer feedforward neural networks. This approach uses an alternative objective function to that based on the MSE, which measures the errors before the neuron's nonlinear activation functions instead of after them. In this case, the solution can be easily obtained solving systems of linear equations, i.e., requiring much less computational power than the one associated with the regular methods. A theoretical study is included to proof the approximated equivalence between the global optimum of the objective function based on the regular MSE criterion and the one of the proposed alternative MSE function.Furthermore, it is shown that the presented method has the capability of allowing incremental and distributed learning. An exhaustive experimental study is also presented to verify the soundness and efficiency of the method. This study contains 10 classification and 16 regression problems. In addition, a comparison with other high performance learning algorithms shows that the proposed method exhibits, in average, the highest performance and low-demanding computational requirements. 相似文献

12.

Social media research: The application of supervised machine learning in organizational communication research.

《Computers in human behavior》2016

Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary data. Based on a case study of Dutch employees’ work-related tweets, this paper compares the coding performance of three classifiers, Linear Support Vector Machine, Naïve Bayes, and logistic regression. The performance of these classifiers is assessed by examining accuracy, precision, recall, the area under the precision-recall curve, and Krippendorf’s Alpha. These indices are obtained by comparing the coding decisions of the classifier to manual coding decisions. The findings indicate that the Linear Support Vector Machine and Naïve Bayes classifiers outperform the logistic regression classifier. This study also compared the performance of these classifiers based on stratified random samples and random samples of training data. The findings indicate that in smaller training sets stratified random training samples perform better than random training samples, in large training sets (n = 4000) random samples yield better results. Finally, the Linear Support Vector Machine classifier was trained with 4000 tweets and subsequently used to categorize 578,581 tweets obtained from 430 employees. 相似文献

13.

Ensemble of feature sets and classification algorithms for sentiment classification 总被引：3，自引：0，他引：3

Rui Xia Chengqing Zong 《Information Sciences》2011,181(6):1138-120

In this paper, we make a comparative study of the effectiveness of ensemble technique for sentiment classification. The ensemble framework is applied to sentiment classification tasks, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure. First, two types of feature sets are designed for sentiment classification, namely the part-of-speech based feature sets and the word-relation based feature sets. Second, three well-known text classification algorithms, namely na?¨ve Bayes, maximum entropy and support vector machines, are employed as base-classifiers for each of the feature sets. Third, three types of ensemble methods, namely the fixed combination, weighted combination and meta-classifier combination, are evaluated for three ensemble strategies. A wide range of comparative experiments are conducted on five widely-used datasets in sentiment classification. Finally, some in-depth discussion is presented and conclusions are drawn about the effectiveness of ensemble technique for sentiment classification. 相似文献

14.

Generalized iterative RELIEF for supervised distance metric learning

Chin-Chun Chang Author Vitae 《Pattern recognition》2010,43(8):2971-2981

The RELIEF algorithm is a popular approach for feature weighting. Many extensions of the RELIEF algorithm are developed, and I-RELIEF is one of the famous extensions. In this paper, I-RELIEF is generalized for supervised distance metric learning to yield a Mahananobis distance function. The proposed approach is justified by showing that the objective function of the generalized I-RELIEF is closely related to the expected leave-one-out nearest-neighbor classification rate. In addition, the relationships among the generalized I-RELIEF, the neighbourhood components analysis, and graph embedding are also pointed out. Experimental results on various data sets all demonstrate the superiority of the proposed approach. 相似文献

15.

Automatic land cover analysis for Tenerife by supervised classification using remotely sensed data 总被引：5，自引：0，他引：5

Jens Keuchel Simone Naumann 《Remote sensing of environment》2003,86(4):530-541

Automatic land cover classification from satellite images is an important topic in many remote sensing applications. In this paper, we consider three different statistical approaches to tackle this problem: two of them, namely the well-known maximum likelihood classification (ML) and the support vector machine (SVM), are noncontextual methods. The third one, iterated conditional modes (ICM), exploits spatial context by using a Markov random field. We apply these methods to Landsat 5 Thematic Mapper (TM) data from Tenerife, the largest of the Canary Islands. Due to the size and the strong relief of the island, ground truth data could be collected only sparsely by examination of test areas for previously defined land cover classes.We show that after application of an unsupervised clustering method to identify subclasses, all classification algorithms give satisfactory results (with statistical overall accuracy of about 90%) if the model parameters are selected appropriately. Although being superior to ML theoretically, both SVM and ICM have to be used carefully: ICM is able to improve ML, but when applied for too many iterations, spatially small sample areas are smoothed away, leading to statistically slightly worse classification results. SVM yields better statistical results than ML, but when investigated visually, the classification result is not completely satisfying. This is due to the fact that no a priori information on the frequency of occurrence of a class was used in this context, which helps ML to limit the unlikely classes. 相似文献

16.

结合语义的特征权重计算方法研究 总被引：1，自引：1，他引：1

任姚鹏陈立潮张英俊袁英《计算机工程与设计》2010,31(10)

为进一步改善目前大多数基于向量空间模型(VSM)的文本聚类算法的效果,研究了文本聚类的基础和关键环节--文本间相似度的计算,其中一个重要步骤就是计算各文本中特征词的权重,该计算的合理性和有效性直接影响到文本相似度的准确性和聚类的效果.传统的VSM特征权重计算方法-TF-IDF,没有考虑语义相似的词语在文本集中的分布情况,针对该问题,在基于"知网"的词语语义相似度分析基础上,提出了一种改进的TF-IDF权重计算方法.实验结果表明,该算法是有效可行的,且在一定程度上提高了文本聚类的查准率和查全率. 相似文献

17.

Deep sentiment analysis: Mining the causality between personality-value-attitude for analyzing business ads in social media

Haeng-Jin Jang Jaemoon Sim Yonnim Lee Ohbyung Kwon 《Expert systems with applications》2013,40(18):7492-7503

IT vendors routinely use social media such as YouTube not only to disseminate their IT product information, but also to acquire customer input efficiently as part of their market research strategies. Customer responses that appear in social media, however, are typically unstructured; thus, a fairly large data set is needed for meaningful analysis. Although identifying customers’ value structures and attitudes may be useful for developing targeted or niche markets, the unstructured and volume-heavy nature of customer data prohibits efficient and economical extraction of such information. Automatic extraction of customer information would be valuable in determining value structure and strength. This paper proposes an intelligent method of estimating causality between user profiles, value structures, and attitudes based on the replies and published content managed by open social network systems such as YouTube. To show the feasibility of the idea proposed in this paper, information richness and agility are used as underlying concepts to create performance measures based on media/information richness theory. The resulting deep sentiment analysis proves to be superior to legacy sentiment analysis tools for estimation of causality among the focal parameters. 相似文献

18.

Combining linear discriminant functions with neural networks for supervised learning

Ke Chen Xiang Yu Huisheng Chi 《Neural computing & applications》1997,6(1):19-41

A novel supervised learning method is proposed by combining linear discriminant functions with neural networks. The proposed method results in a tree-structured hybrid architecture. Due to constructive learning, the binary tree hierarchical architecture is automatically generated by a controlled growing process for a specific supervised learning task. Unlike the classic decision tree, the linear discriminant functions are merely employed in the intermediate level of the tree for heuristically partitioning a large and complicated task into several smaller and simpler subtasks in the proposed method. These subtasks are dealt with by component neural networks at the leaves of the tree accordingly. For constructive learning, growing and credit-assignment algorithms are developed to serve for the hybrid architecture. The proposed architecture provides an efficient way to apply existing neural networks (e.g. multi-layered perceptron) for solving a large scale problem. We have already applied the proposed method to a universal approximation problem and several benchmark classification problems in order to evaluate its performance. Simulation results have shown that the proposed method yields better results and faster training in comparison with the multilayered perceptron. 相似文献

19.

Instance weighting versus threshold adjusting for cost-sensitive classification

Huimin Zhao 《Knowledge and Information Systems》2008,15(3):321-334

In real-world classification problems, different types of misclassification errors often have asymmetric costs, thus demanding cost-sensitive learning methods that attempt to minimize average misclassification cost rather than plain error rate. Instance weighting and post hoc threshold adjusting are two major approaches to cost-sensitive classifier learning. This paper compares the effects of these two approaches on several standard, off-the-shelf classification methods. The comparison indicates that the two approaches lead to similar results for some classification methods, such as Naïve Bayes, logistic regression, and backpropagation neural network, but very different results for other methods, such as decision tree, decision table, and decision rule learners. The findings from this research have important implications on the selection of the cost-sensitive classifier learning approach as well as on the interpretation of a recently published finding about the relative performance of Naïve Bayes and decision trees. 相似文献

20.

Pessimists and optimists: Improving collaborative filtering through sentiment analysis

Miguel Á. García-Cumbreras Arturo Montejo-Ráez Manuel C. Díaz-Galiano 《Expert systems with applications》2013,40(17):6758-6765

This work presents a novel application of Sentiment Analysis in Recommender Systems by categorizing users according to the average polarity of their comments. These categories are used as attributes in Collaborative Filtering algorithms. To test this solution a new corpus of opinions on movies obtained from the Internet Movie Database (IMDb) has been generated, so both ratings and comments are available. The experiments stress the informative value of comments. By applying Sentiment Analysis approaches some Collaborative Filtering algorithms can be improved in rating prediction tasks. The results indicate that we obtain a more reliable prediction considering only the opinion text (RMSE of 1.868), than when apply similarities over the entire user community (RMSE of 2.134) and sentiment analysis can be advantageous to recommender systems. 相似文献