传统的敏感舆情模型中,不论是基于文本或是数据挖掘的分析方法都是直接处理网络舆情,未结合网络传播特性分析.针对上述问题,研究并采用基于微博交互关系算法:通过量化微博的敏感程度,分析用户的交互关系来构建微博敏感舆论传播模型.实验基于新浪微博,搜索到一定数量的敏感用户,对用户的交互行为进行分析,得到未来有发表敏感舆论倾向的用户并进行监控.实验结果证明,与传统的舆情模型相比,该方法可行且有效,开拓了舆情分析思路,适用于当前网络舆情研究.  相似文献   

酒店在线评论细粒度挖掘具有重要研究意义.以酒店在线评论具体特征属性和情感分类为研究目标,应用Apfiori算法和情感词典匹配算法,对重庆雾都宾馆在线评论数据深入挖掘,挖掘出用户最关注的酒店十大特征和满意度结果,进一步挖掘出商务出差等五种不同出游类型人最关注的酒店五大特征和满意度结果.这种方法不仅能对酒店领域评论进行分析,同样能够应用于其他领域.  相似文献   

情感评价词典在情感分析中具有非常重要的作用,在新词频发的网络环境中,识别新的情感评价词,完善现有的情感词典是非常有必要的。使用基于模式的Bootstrapping方法,在微博语料中抽取情感评价词。实验证明,在保持了较理想的精确率的情况下,上述方法抽取了数量可观的传统情感词典未收录的情感评价词。  相似文献   

Blog clustering is an important approach for online public opinion analysis. The traditional clustering methods, usually group blogs by keywords, stories and timeline, which usually ignore opinions and emotions expressed in the blog articles. In this paper, an integrated graph-based model for clustering Chinese blogs by embedded sentiments is proposed. A novel graph-based representation and the corresponding clustering algorithm are applied on the Chinese blog search results. The proposed model SoB-graph considers not only sentiment words but also structural information in blogs. Experimental results show that comparing with the traditional graph-based document representation model and vector space document representation model, the proposed SoB-graph model has achieved better performance in clustering sentiments in Chinese blog documents.  相似文献   

随着数字媒体等技术的发展,出现了弹幕系统这种新型的评论模式并逐渐流行。它能够使视频观众即时发布关于视频情节内容的评论,也可以帮助观众理解视频内容。弹幕文本数据的产生,为短文本处理和实时数据处理提供了新的素材。研究弹幕数据的特点和其表达的情感,可以帮助我们更好地理解视频情节;研究弹幕内容之间的相似度进而分析用户之间的关联关系,不仅能够深入了解弹幕用户的特点、发掘不同视频之间的潜在联系,而且可以为视频制作时受众群体的选择提供更为准确的解决方案。首先将弹幕文本数据进行收集和预处理,然后计算这些文本的情感值。针对弹幕文本口语化的特点,建立了网络弹幕常用词词典。通过改进传统的k-means聚类算法,对所有发表弹幕的用户进行基于情感值的分类。这样的分类可以帮助我们了解观看特定类型视频的观众在情感上的异同点。  相似文献   

This work focuses on error analyzes from the Support Vector Machine (SVM) classification on Thai children stories at a sentence level. The construction of the Sentiment Term Tagging System (STTS) program allows the researchers to make observations and hypothesize around the areas where most anomalies occur. Three hypotheses, based on terms sentiment chosen for SVM predictions, are evidently proved to hold. In addition, a number of ways to improve the Thai sentiment classification research are suggested, including considerations to add negation into the process, add weighing scheme for different part-of-speech, disambiguate word senses, and update the Thai sentiment resource.  相似文献   

基于汉语情感词表的句子情感倾向分类研究   总被引:4,自引:2,他引:4       下载免费PDF全文
提出了一种基于汉语情感词词表的加权线性组合的句子情感分类方法。该方法通过已有的五种资源构建了中文情感词词表,并采用加权线性组合的句子情感分类方法对句子进行情感类别判断。实验结果表明,直接利用词汇语言粒度的句子情感分类综合F值为78.62%,若加入了否定短语语言粒度后,句子情感分类的综合F值提高了4.14%。  相似文献   

Social tagging systems are widely applied in Web 2.0. Many users use these systems to create, organize, manage, and share Internet resources freely. However, many ambiguous and uncontrolled tags produced by social tagging systems not only worsen users’ experience, but also restrict resources’ retrieval efficiency. Tag clustering can aggregate tags with similar semantics together, and help mitigate the above problems. In this paper, we first present a common co-occurrence group similarity based approach, which employs the ternary relation among users, resources, and tags to measure the semantic relevance between tags. Then we propose a spectral clustering method to address the high dimensionality and sparsity of the annotating data. Finally, experimental results show that the proposed method is useful and efficient.  相似文献   

This paper compares the performance of three clustering tests––Rogerson R, Getis-Ord G and Lin-Zeng LR-T––using a range of simulated sample distributions from rare to common spatial events. It is shown that all of the tests are sensitive to high value clustering, and all but G are sensitive to low-value clustering. For a spatial pattern exhibiting negative spatial autocorrelation, R is likely to associate the autocorrelation with clustering when sample size is greater than 20, while LR-T and G are unlikely to accept any presence of negative autocorrelation as clustering.  相似文献   

基于社会化标注的博客标签推荐方法   总被引:1,自引:0,他引:1  
为了提高博客系统推荐标签的质量,分析了现有的标签推荐算法及相关技术,提出了一种基于社会化标注的博客标签推荐方法。该方法的优势在于:利用相似博客的社会化标签作为候选标签集,确保了推荐标签的全面性和可用性;基于TF-IDF相似度方法定义筛选步骤去除候选标签集中冗余和冷僻的标签,提高了推荐标签的准确性和高效性。实验结果表明了该方法的有效性。  相似文献   

考虑到中文评价文本的整体情感倾向性与其表达的情感顺序有很大关系,且在具有情感倾向的中文文本中,越是靠近文本最后所表达的情感倾向,对于整个文本的情感分类影响越大。因此对于情感倾向表达不明显或者表达不单一的短文本,通过考虑文本中情感节点出现的顺序以及情感转折同化来对文本进行情感分类。在来自某购物网站爬取的中评评价文本数据集上的实验结果显示,提出的分类方法明显高于单纯基于词特征的支持向量机(SVM)分类器。  相似文献   

杨书新  张楠 《计算机应用》2021,41(10):2829-2834
词嵌入技术在文本情感分析中发挥着重要的作用,但是传统的Word2Vec、GloVe等词嵌入技术会产生语义单一的问题。针对上述问题提出了一种融合情感词典与上下文语言模型ELMo的文本情感分析模型SLP-ELMo。首先,利用情感词典对句子中的单词进行筛选;其次,将筛选出的单词输入字符卷积神经网络(char-CNN),从而产生每个单词的字符向量;然后,将字符向量输入ELMo模型进行训练;此外,在ELMo向量的最后一层加入了注意力机制,以便更好地训练词向量;最后,将词向量与ELMo向量并行融合并输入分类器进行文本情感分类。与现有的多个模型对比,所提模型在IMDB和SST-2这两个数据集上均得到了更高的准确率,验证了模型的有效性。  相似文献   

为综合利用基于情感词典和基于机器学习的两类情感分类方法的优点,提出一种基于情感词汇与机器学习的方面级情感分类方法。通过选取少量情感倾向与评价对象无关的情感词汇对评价搭配进行情感分类;通过构建机器学习分类器,以评价短语对各类别的互信息占比作为分类器的分类概率权重,进行加权计算,选择加权后分类概率最大的类别作为评价搭配的情感倾向类别。在中文评论数据集上的实验结果表明,该方法能有效提高情感分类性能。  相似文献   

提出一种基于主题情感句的汉语评论文倾向性分析方法.根据评论文的特点,采用一种基于n元词语匹配的方法识别主题,通过对比与主题的语义相似度和进行主客观分类抽取出候选主题情感句,计算其中相似度最高的若干个句子的情感倾向,将其平均值作为评论文的整体倾向.基于主题情感句的评论文倾向性分析方法避免了进行篇章结构分析,排除了与主题无...  相似文献   

基于频繁集的图像特征抽取   总被引:1,自引:1,他引:0       下载免费PDF全文
在图像分析领域,已有不少研究探讨了通过构建图像相邻像素之间的事务数据集,对图像纹理关联规则进行挖掘,但纹理关联规则仅存留最大项的频繁项集会使得很多信息丢失。为此提出了基于频繁项集的图像特征抽取方法,该方法首先基于项集的频繁度及空间分布筛选候选频繁项集,再定义每一个频繁项集的空间表达能力值构建特征集。在遥感图像上进行仿真测试,针对EM算法对初始设置比较敏感的特点,采用了对同一特征集指定不同聚类数目并比较对数似然值确定最终聚类结果的方法。实验结果表明,提出的频繁集对图像特征具有较好的表达。  相似文献   

XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach.  相似文献   

In mechanical design, designers often consciously or unconsciously create some engineering-rich local shape structures repeatedly in parts by combining regular design features. Identification of these local structures with high appearance frequencies is helpful for design-rule mining, design feature library customization and model data compression. In this paper, an approach is developed for extracting common local structures as design patterns from a set of B-rep models. Here, B-rep models are first transformed into a representation of Volume Relational Graphs (VRG), in which each volume is generated from a face shell in a boundary partition of a solid along specially selected cutting loops. Then, two kinds of code, Face Shape Code (FSC) and Face Location Code (FLC), are introduced to describe shapes of the volumes. After this, based on equality of the codes between volumes, isomorphic subgraphs among VRGs are identified as design patterns with a greedy search method, whose objective is to find precise expressions in the patterns for the original solid models.  相似文献   

一种基于谱聚类的半监督聚类方法   总被引:6,自引:1,他引:6  
司文武  钱沄涛 《计算机应用》2005,25(6):1347-1349
半监督聚类利用少部分标签的数据辅助大量未标签的数据进行非监督的学习,从而提高聚类的性能。提出一种基于谱聚类的半监督聚类算法,其利用标签数据的信息,调整点与点之间的距离所形成的距离矩阵,而后基于被调整的距离矩阵进行谱聚类。实验表明,该算法较之于已提出的半监督聚类算法,获得了更好的聚类性能。  相似文献   

为此,我们提出了一种新的用于评分预测的细粒度特征交互网络(FFIN)。首先,模型并没有将用户的所有评论聚合成一个统一的向量,而是将用户和物品的每条评论单独建模,通过堆叠的扩展卷积分层地为每个评论文本构建多层次表示,充分地捕获了评论的多粒度语义信息;其次,模型在每个语义层次上构建用户和物品评论的细粒度特征交互,这有效避免了单粒度交互导致的次级重要信息被忽略的问题;最后,由于用户的评论行为通常是主观且个性化的,我们没有使用注意力机制来识别重要信息,而是通过类似于图像识别的层次结构来识别高阶显著信号,并将其用于最终的评分预测。我们在6个来自Amazon和Yelp的具有不同特征的真实数据集上进行了广泛的实验。我们的结果表明,与最近提出的最先进的模型相比,所提出的FFIN在预测精度方面获得了显著的性能提升。进一步的实验分析表明,多粒度特征的交互,不仅突出了评论中的相关信息,还大大提高了评分预测的可解释性。  相似文献   

Emergence of MapReduce (MR) framework for scaling data mining and machine learning algorithms provides for Volume, while handling of Variety and Velocity needs to be skilfully crafted in algorithms. So far, scalable clustering algorithms have focused solely on Volume, taking advantage of the MR framework. In this paper we present a MapReduce algorithm—data aware scalable clustering (DASC), which is capable of handling the 3 Vs of big data by virtue of being (i) single scan and distributed to handle Volume, (ii) incremental to cope with Velocity and (iii) versatile in handling numeric and categorical data to accommodate Variety. DASC algorithm incrementally processes infinitely growing data set stored on distributed file system and delivers quality clustering scheme while ensuring recency of patterns. The up-to-date synopsis is preserved by the algorithm for the data seen so far. Each new data increment is processed and merged with the synopsis. Since the synopsis itself may grow very large in size, the algorithm stores it as a file. This makes DASC algorithm truly scalable. Exclusive clusters are obtained on demand by applying connected component analysis (CCA) algorithm over the synopsis. CCA presents subtle roadblock to effective parallelism during clustering. This problem is overcome by accomplishing the task in two stages. In the first stage, hyperclusters are identified based on prevailing data characteristics. The second stage utilizes this knowledge to determine the degree of parallelism, thereby making DASC data aware. Hyperclusters are distributed over the available compute nodes for discovering embedded clusters in parallel. Staged approach for clustering yields dual advantage of improved parallelism and desired complexity in \(\mathcal {MRC}^0\) class. DASC algorithm is empirically compared with incremental Kmeans and Scalable Kmeans++ algorithms. Experimentation on real-world and synthetic data with approximately 1.2 billion data points demonstrates effectiveness of DASC algorithm. Empirical observations of DASC execution are in consonance with the theoretical analysis with respect to stability in resources utilization and execution time.  相似文献   

