共查询到20条相似文献,搜索用时 0 毫秒
1.
In this introduction, we briefly summarize the state of data and text mining today. Taking a very broad view, we use the term information mining to refer to the organization and analysis of structured or unstructured data that can be quantitative, textual, and/or pictorial in nature. The key question, in our view, is, “How can we transform data (in the very broad sense of this term) into ‘actionable knowledge’, knowledge that we can use in pursuit of a specified objective(s).” After detailing a set of key components of information mining, we introduce each of the papers in this volume and detail the focus of their contributions. 相似文献
2.
M. Dolores Molina-González Eugenio Martínez-Cámara María-Teresa Martín-Valdivia José M. Perea-Ortega 《Expert systems with applications》2013,40(18):7250-7257
Until now most of the published methods for polarity classification have been applied to English texts, but other languages are becoming increasingly important. This paper presents a new resource for the Spanish sentiment analysis research community. We have generated a new lexicon by translating into Spanish the Bin Liu English Lexicon. In order to assess the validity of the proposed lexicon a set of experiments on a Spanish review corpus are presented. In addition, the resource presented is compared with another existing Spanish lexicon. The results show that our resource outperforms the currently available Spanish lexicon for sentiment analysis. 相似文献
3.
《Expert systems with applications》2014,41(13):5984-5994
Many tasks related to sentiment analysis rely on sentiment lexicons, lexical resources containing information about the emotional implications of words (e.g., sentiment orientation of words, positive or negative). In this work, we present an automatic method for building lemma-level sentiment lexicons, which has been applied to obtain lexicons for English, Spanish and other three official languages in Spain. Our lexicons are multi-layered, allowing applications to trade off between the amount of available words and the accuracy of the estimations. Our evaluations show high accuracy values in all cases. As a previous step to the lemma-level lexicons, we have built a synset-level lexicon for English similar to SentiWordNet 3.0, one of the most used sentiment lexicons nowadays. We have made several improvements in the original SentiWordNet 3.0 building method, reflecting significantly better estimations of positivity and negativity, according to our evaluations. The resource containing all the lexicons, ML-SentiCon, is publicly available. 相似文献
4.
5.
We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlations. In our quantum-inspired framework, the word interaction within a single modality and the interaction across modalities are formulated with superposition and entanglement respectively at different stages. The complex-valued neural network implementation of the framework achieves comparable results to state-of-the-art systems on two benchmarking video sentiment analysis datasets. In the meantime, we produce the unimodal and bimodal sentiment directly from the model to interpret the entangled decision. 相似文献
6.
With the widespread usage of social networks, forums and blogs, customer reviews emerged as a critical factor for the customers’ purchase decisions. Since the beginning of 2000s, researchers started to focus on these reviews to automatically categorize them into polarity levels such as positive, negative, and neutral. This research problem is known as sentiment classification. The objective of this study is to investigate the potential benefit of multiple classifier systems concept on Turkish sentiment classification problem and propose a novel classification technique. Vote algorithm has been used in conjunction with three classifiers, namely Naive Bayes, Support Vector Machine (SVM), and Bagging. Parameters of the SVM have been optimized when it was used as an individual classifier. Experimental results showed that multiple classifier systems increase the performance of individual classifiers on Turkish sentiment classification datasets and meta classifiers contribute to the power of these multiple classifier systems. The proposed approach achieved better performance than Naive Bayes, which was reported the best individual classifier for these datasets, and Support Vector Machines. Multiple classifier systems (MCS) is a good approach for sentiment classification, and parameter optimization of individual classifiers must be taken into account while developing MCS-based prediction systems. 相似文献
7.
With the explosion of Social media, Opinion mining has been used rapidly in recent years. However, a few studies focused on the precision rate of feature review’s and opinion word’s extraction. These studies do not come with any optimum mechanism of supplying required precision rate for effective opinion mining. Most of these studies are based on Naïve Bayes, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and classical ontology. These systems are still imperfect for classifying the feature reviews into more degrees of polarity terms (strong negative, negative, neutral, positive and strong positive). Further, the existing classical ontology-based systems cannot extract blurred information from reviews; thus, it provides poor results. In this regard, this paper proposes a robust classification technique for feature review’s identification and semantic knowledge for opinion mining based on SVM and Fuzzy Domain Ontology (FDO). The proposed system retrieves a collection of reviews about hotel and hotel features. The SVM identifies hotel feature reviews and filter out irrelevant reviews (noises) and the FDO is then used to compute the polarity term of each feature. The amalgamation of FDO and SVM significantly increases the precision rate of review’s and opinion word’s extraction and accuracy of opinion mining. The FDO and intelligent prototype are developed using Protégé OWL-2 (Ontology Web Language) tool and JAVA, respectively. The experimental result shows considerable performance improvement in feature review’s classification and opinion mining. 相似文献
8.
Minimum class variance support vector machine (MCVSVM) and large margin linear projection (LMLP) classifier, in contrast with traditional support vector machine (SVM), take the distribution information of the data into consideration and can obtain better performance. However, in the case of the singularity of the within-class scatter matrix, both MCVSVM and LMLP only exploit the discriminant information in a single subspace of the within-class scatter matrix and discard the discriminant information in the other subspace. In this paper, a so-called twin-space support vector machine (TSSVM) algorithm is proposed to deal with the high-dimensional data classification task where the within-class scatter matrix is singular. TSSVM is rooted in both the non-null space and the null space of the within-class scatter matrix, takes full advantage of the discriminant information in the two subspaces, and so can achieve better classification accuracy. In the paper, we first discuss the linear case of TSSVM, and then develop the nonlinear TSSVM. Experimental results on real datasets validate the effectiveness of TSSVM and indicate its superior performance over MCVSVM and LMLP. 相似文献
9.
多语言问答是自然语言处理领域的研究热点之一,其目的是给定不同语种的问题和文本,模型能够返回正确的答案。随着机器翻译技术的快速发展及多语言预训练技术在自然语言处理领域中的广泛应用,多语言问答也取得了较快的发展。文中首先系统地梳理了当前多语言问答方法的相关工作,并将多语言问答方法分为基于特征的方法、基于翻译的方法、基于预训练的方法和基于双重编码的方法,分别介绍了每类方法的使用和特点;然后系统地探讨了当前多语言问答任务的相关工作,将多语言问答任务分为基于文本的多语言问答任务和基于多模态的多语言问答任务,并分别给出每个多语言问答任务的基本定义;接着总结了这些任务中的数据集统计、评价指标,以及涉及的问答方法;最后展望了多语言问答的未来发展方向。 相似文献
10.
Opinion target extraction is one of the core tasks in sentiment analysis on text data. In recent years, dependency parser–based approaches have been commonly studied for opinion target extraction. However, dependency parsers are limited by language and grammatical constraints. Therefore, in this work, a sequential pattern-based rule mining model, which does not have such constraints, is proposed for cross-domain opinion target extraction from product reviews in unknown domains. Thus, knowing the domain of reviews while extracting opinion targets becomes no longer a requirement. The proposed model also reveals the difference between the concepts of opinion target and aspect, which are commonly confused in the literature. The model consists of two stages. In the first stage, the aspects of reviews are extracted from the target domain using the rules automatically generated from source domains. The aspects are also transferred from the source domains to a target domain. Moreover, aspect pruning is applied to further improve the performance of aspect extraction. In the second stage, the opinion target is extracted among the aspects extracted at the former stage using the rules automatically generated for opinion target extraction. The proposed model was evaluated on several benchmark datasets in different domains and compared against the literature. The experimental results revealed that the opinion targets of the reviews in unknown domains can be extracted with higher accuracy than those of the previous works. 相似文献
11.
The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations. In this paper, we hypothesize that these two important properties can play a major role in Chinese sentiment analysis. In particular, we propose two effective features to encode phonetic information and, hence, fuse it with textual information. With this hypothesis, we propose Disambiguate Intonation for Sentiment Analysis (DISA), a network that we develop based on the principles of reinforcement learning. DISA disambiguates intonations for each Chinese character (pinyin) and, hence, learns precise phonetic representations. We also fuse phonetic features with textual and visual features to further improve performance. Experimental results on five different Chinese sentiment analysis datasets show that the inclusion of phonetic features significantly and consistently improves the performance of textual and visual representations and surpasses the state-of-the-art Chinese character-level representations. 相似文献
12.
为提高数据采掘的效率,通常需要在提供同等分析结果的情况下对原数据集进行简化。文章提出了一种有效的数据缩减算法Sodra,以无监督与有监督相结合的学习方式生成适于分类的缩减数据集。对实际数据集和人工数据集的分类实验表明,所提出的算法既能大大降低空间需求,又不损害分类性能。同时,利用缩减集上的特征分析算法Relif-P可进一步提高算法对无关特征的适应能力。 相似文献
13.
词汇情感倾向性(Word sentiment orientation, WSO)的鉴定通常是对文本进行粗粒度意见挖掘的基础.自由评论中存在许多语法噪声, 这使得以往基于规范文本提出的WSO鉴定方法不再适合自由评论. 自由评论中的情感词汇往往是上下文敏感的, 这使得非当前鉴定的情感词汇难以适用于当前自由评论的粗粒度意见挖掘. 针对上述问题,提出一种新的利用复杂网络为自由评论鉴定WSO的方法. 该方法主要有两个部分: 1)为了利用自由评论中词汇之间的上下文信息建模一个能够有效解决上下文敏感问题且具有良好抗噪声能力的情感倾向性关系网络(Sentiment orientation relationship network, SORN),提出了两个算法:金字塔抗噪声信息模型算法和利用抗噪声信息优化调整SORN的算法; 2)为了有效利用SORN为自由评论鉴定WSO,提出了基于SORN的WSO鉴定算法. 实验表明:对于在线为自由评论鉴定WSO,本文方法不仅在精确度方面远高于Hatzivassiloglou提出的方法,且具有良好的时间效率. 相似文献
14.
大数据时代,论坛上用户的看法、倾向、观点和争论形成了大量数据。对这些能表达作者情绪的数据进行挖掘,有助于相关人员对信息的理解、把控,亦会对决策形成直接影响。为此,关注论坛情感挖掘十分重要。从论坛数据挖掘相关技术的概念和意义出发,重点讨论了论坛情感挖掘中基于情感词典和基于机器学习两种方法的研究现状,对每种方法的适用任务、不足之处、改进方案、发展趋势等进行对比和阐述。给出论坛情感挖掘领域尚待解决的难题与挑战,并对该技术未来的发展方向做出预测。 相似文献
15.
16.
中文情感分析中的一个重要问题就是情感倾向分类,情感特征选择是基于机器学习的情感倾向分类的前提和基础,其作用在于通过剔除无关或冗余的特征来降低特征集的维数。提出一种将Lasso算法与过滤式特征选择方法相结合的情感混合特征选择方法:先利用Lasso惩罚回归算法对原始特征集合进行筛选,得出冗余度较低的情感分类特征子集;再对特征子集引入CHI,MI,IG等过滤方法来评价候选特征词与文本类别的依赖性权重,并据此剔除候选特征词中相关性较低的特征词;最终,在使用高斯核函数的SVM分类器上对比所提方法与DF,MI,IG和CHI在不同特征词数量下的分类效果。在微博短文本语料库上进行了实验,结果表明所提算法具有有效性和高效性;并且在特征子集维数小于样本数量时,提出的混合方法相比DF,MI,IG和CHI的特征选择效果都有一定程度的改善;通过对比识别率和查全率可以发现,Lasso-MI方法相比MI以及其他过滤方法更为有效。 相似文献
17.
情感分类是观点挖掘的一个重要的方面.提出了一种基于情感特征聚类的半监督式情感分类方法,该方法只需要对少量训练数据实例进行情感类别标注.首先从消费者评论中提取普通分类特征和情感特征,普通分类特征可以用来训练一个情感分类器.然后使用spectral聚类算法把这些情感特征映射成扩展特征.普通分类特征和扩展特征一起通过训练得到另一个情感分类器.2个分类器再从未标签数据集中选择实例放入到训练集合中,并通过训练得到最终的情感分类器.实验结果表明,在同样的数据集上该方法的情感分类准确度比基于self-learning SVM的方法和基于co-training SVM的方法的情感分类准确度要高. 相似文献
18.
Concepts and relations in ontologies and in other knowledge organisation systems are usually annotated with natural language labels. Most ontology matchers rely on such labels in element-level matching techniques. State-of-the-art approaches, however, tend to make implicit assumptions about the language used in labels (usually English) and are either domain-agnostic or are built for a specific domain. When faced with labels in different languages, most approaches resort to general-purpose machine translation services to reduce the problem to monolingual English-only matching. We investigate a thoroughly different and highly extensible solution based on semantic matching where labels are parsed by multilingual natural language processing and then matched using language-independent and domain aware background knowledge acting as an interlingua. The method is implemented in NuSM, the language and domain aware evolution of the SMATCH semantic matcher, and is evaluated against a translation-based approach. We also design and evaluate a fusion matcher that combines the outputs of the two techniques in order to boost precision or recall beyond the results produced by either technique alone. 相似文献
19.
Aspect-based sentiment analysis systems are a kind of text-mining systems that specialize in summarizing the sentiment that a collection of reviews convey regarding some aspects of an item. There are many cases in which users write their reviews using conditional sentences; in such cases, mining the conditions so that they can be analyzed is very important not to misinterpret the corresponding sentiment summaries. Unfortunately, current commercial systems or research systems neglect conditions; current frameworks and toolkits do not provide any components to mine them; furthermore, the proposals in the literature are insufficient because they are based on handcrafted patterns that fall short regarding recall or machine learning procedures that are tightly bound with a specific language and require too much configuration. In this article, we present Torii, which is a system that loads a collection of reviews, discovers the aspects on which they report, and summarizes the sentiment that is conveyed on them taking into account the existing conditions, if any. We also describe its architecture, our approach to mine conditions, and our experimental analysis on a large multilingual data set with reviews from multiple categories. To the best of our knowledge, Torii is the first proposal that addresses aspect-based sentiment analysis taking conditions into account. 相似文献
20.
随着社交网络、电子商务、移动互联网等技术的发展,各种网络数据迅速膨胀.互联网上蕴含着大量带有情绪色彩的文本数据,对其充分挖掘可以更好地理解网民的观点和立场.首先介绍了情绪分析的相关背景知识,包括不同情绪分类体系和文本情绪分析在舆情管控、商业决策、观点搜索、信息预测、情绪管理等场景的应用;然后从情绪分类的角度整理归纳了文本情绪分析的主流方法,并对其进行了细致的介绍和分析对比;最后,阐述了文本情绪分析存在的数据稀缺性、类别不平衡、领域依赖性、语言不平衡等问题,并结合大数据处理、多媒体融合、深度学习发展、特定主题挖掘和多语言协同等研究热点对文本情绪分析的前沿进展进行了概括和展望. 相似文献