首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
With the explosion of Social media, Opinion mining has been used rapidly in recent years. However, a few studies focused on the precision rate of feature review’s and opinion word’s extraction. These studies do not come with any optimum mechanism of supplying required precision rate for effective opinion mining. Most of these studies are based on Naïve Bayes, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and classical ontology. These systems are still imperfect for classifying the feature reviews into more degrees of polarity terms (strong negative, negative, neutral, positive and strong positive). Further, the existing classical ontology-based systems cannot extract blurred information from reviews; thus, it provides poor results. In this regard, this paper proposes a robust classification technique for feature review’s identification and semantic knowledge for opinion mining based on SVM and Fuzzy Domain Ontology (FDO). The proposed system retrieves a collection of reviews about hotel and hotel features. The SVM identifies hotel feature reviews and filter out irrelevant reviews (noises) and the FDO is then used to compute the polarity term of each feature. The amalgamation of FDO and SVM significantly increases the precision rate of review’s and opinion word’s extraction and accuracy of opinion mining. The FDO and intelligent prototype are developed using Protégé OWL-2 (Ontology Web Language) tool and JAVA, respectively. The experimental result shows considerable performance improvement in feature review’s classification and opinion mining.  相似文献   

2.
Opinion target extraction is one of the core tasks in sentiment analysis on text data. In recent years, dependency parser–based approaches have been commonly studied for opinion target extraction. However, dependency parsers are limited by language and grammatical constraints. Therefore, in this work, a sequential pattern-based rule mining model, which does not have such constraints, is proposed for cross-domain opinion target extraction from product reviews in unknown domains. Thus, knowing the domain of reviews while extracting opinion targets becomes no longer a requirement. The proposed model also reveals the difference between the concepts of opinion target and aspect, which are commonly confused in the literature. The model consists of two stages. In the first stage, the aspects of reviews are extracted from the target domain using the rules automatically generated from source domains. The aspects are also transferred from the source domains to a target domain. Moreover, aspect pruning is applied to further improve the performance of aspect extraction. In the second stage, the opinion target is extracted among the aspects extracted at the former stage using the rules automatically generated for opinion target extraction. The proposed model was evaluated on several benchmark datasets in different domains and compared against the literature. The experimental results revealed that the opinion targets of the reviews in unknown domains can be extracted with higher accuracy than those of the previous works.  相似文献   

3.
情感倾向性判断是指根据文本表述分析文本的倾向性,即发表文本的作者所持有的支持或反对的态度,对于特定领域的情感倾向性研究尤以运用监督分类方法所得出的实验结果较为理想。但若将此类方法直接运用于不同领域的文本,其效果却难以尽如人意。在这种情况下,如何利用已标注情感倾向性的源领域文本去判断未知情感倾向性的目标领域文本的倾向性,即跨领域的情感倾向性分析问题——成为当前研究的热点。为此,该文提出一种基于SimRank的跨领域情感倾向性分析算法,把在源领域和目标领域中共现的词汇作为连接两个领域的桥梁,利用情感词典和SimRank算法找出潜在情感空间,然后使用SVM对已标注的源领域进行训练进而得到训练模型,以便利用此模型预测目标领域的情感倾向性。该文亦通过相关实验所得到的实验结果表明了此方法的有效性。  相似文献   

4.
中文网页分类技术是数据挖掘中一个研究热点领域,而支持向量机(SVM)是一种高效的分类识别方法,在解决高维模式识别问题中表现出许多特有的优势.提出了基于支持向量机的中文网页分类方法,其中包括对该过程中的网页文本预处理、特征提取和多分类算法等关键技术的介绍.实验表明,该方法训练数据规模大大减少,训练效率较高,同时具有较好的精确率和召回率.  相似文献   

5.
With the development of Internet, frequent pattern mining has been extended to more complexpatterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, webmining, etc. In this papert we present a novel algorithm, named Chopper, to discover frequent subtrees fromordered labeled trees. An extensive performance study shows that the newly developed algorithm outperformsTreeMinerV one of the fastest methods proposed previously, in mining large databases. At the end of this paper,the potential improvement of Chopper is mentioned.  相似文献   

6.
Opinion mining aiming to automatically detect subjective information has raised more and more interests from both academic and industry fields in recent years. In order to enhance the performance of opinion mining, some ensemble methods have been investigated and proven to be effective theoretically and empirically. However, cluster based ensemble method is paid less attention to in the area of opinion mining. In this paper, a new cluster based ensemble method, FCE-SVM, is proposed for opinion mining from social media. Based on the philosophy of divide and conquer, FCE-SVM uses fuzzy clustering module to generate different training sub datasets in the first stage. Then, base learners are trained based on different training datasets in the second stage. Finally, fusion module is employed to combine the results of based learners. Moreover, the multi-domain opinion datasets were investigated to verify the effectiveness of proposed method. Empirical results reveal that FCE-SVM gets the best performance through reducing bias and variance simultaneously. These results illustrate that FCE-SVM can be used as a viable method for opinion mining.  相似文献   

7.
Large-scale Support Vector Machine (SVM) classification is a very active research line in data mining. In recent years, several efficient SVM generation algorithms based on quadratic problems have been proposed, including: Successive OverRelaxation (SOR), Active Support Vector Machines (ASVM) and Lagrangian Support Vector Machines (LSVM). These algorithms have been used to solve classification problems with millions of points. ASVM is perhaps the fastest among them. This paper compares a new projection-based SVM algorithm with ASVM on a selection of real and synthetic data sets. The new algorithm seems competitive in terms of speed and testing accuracy.  相似文献   

8.
In recent years, the KDD process has been advocated to be an iterative and interactive process. It is seldom the case that a user is able to answer immediately all his questions on date with a single query. On the contrary, the work-flow of the typical user consists of several steps in which he/she iteratively refines the extracted knowledge by inspecting previous results and posing new queries. Given this view of the KDD process, in order to reduce the computational effort, it becomes crucial to have KDD systems that are able to exploit past results. This is especially true in environments in which the system knowledge base is the result of many discoveries on data made separately by the collaborative effort of different users.

In this paper, we consider the problem of mining frequent association rules from database relations. We first model a general, constraint-based, mining language for this task. Then, we propose an algorithm that answers such queries reusing past results. In particular, this solution is effective for a new class of constraints, called context dependent, which are more difficult than the traditionally studied item dependent constraints. Nevertheless, we show that some typical queries of important application domains, such as market stock trading, analysis of web log, and gene microarrays in bioinformatics, have context-dependent constraints. We show with a set of experiments in these application domains that the proposed solution with an incremental approach is both effective and viable.  相似文献   

9.
Sentiment lexicons (SL) (aka lexical resources) are the repositories of one or several dictionaries that consist of known and precompiled sentiment terms. These lexicons play an important role in performing several different opinion mining tasks. The efficacy of the lexicon-based approaches in performing opinion mining (OM) tasks solely depends on selecting an appropriate opinion lexicon to analyze the text. Therefore, one has to explore the available sentiment lexicons and then select the most suitable resource. Among available resources, SentiWordNet (SWN) is the most widely used lexicon to perform tasks related to opinion mining. In SWN, each synset of WordNet is being assigned the three sentiment numerical scores; positive, negative and objective that are calculated using by a set of classifiers. In this paper, a detailed and comprehensive review of the work related to opinion mining using SentiWordNet is provided in a very distinctive way. This survey will be useful for the researchers contributing to the field of opinion mining. Following features make our contribution worthwhile and unique among the reviews of similar kind: (i) our review classifies the existing literature with respect to opinion mining tasks and subtasks (ii) it covers a very different outlook of the opinion mining field by providing in-depth discussions of the existing works at different granularity levels (word, sentences, document, aspect, clause, and concept levels) (iii) this state-ofart review covers each article in the following dimensions: the designated task performed, granularity level of the task completed, results obtained, and feature dimensions, and (iv) lastly it concludes the summary of the related articles according to the granularity levels, publishing years, related tasks (or subtasks), and types of classifiers used. In the end, major challenges and tasks related to lexicon-based approaches towards opinion mining are also discussed.  相似文献   

10.
There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of time series. Examples of such applications come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of time series in the order of hundreds of millions to billions. However, all relevant techniques that have been proposed in the literature so far have not considered any data collections much larger than one-million time series. In this paper, we describe $i$ SAX 2.0 and its improvements, $i$ SAX 2.0 Clustered and $i$ SAX2+, three methods designed for indexing and mining truly massive collections of time series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce a novel bulk loading mechanism, the first of this kind specifically tailored to a time series index. We show how our methods allows mining on datasets that would otherwise be completely untenable, including the first published experiments to index one billion time series, and experiments in mining massive data from domains as diverse as entomology, DNA and web-scale image collections.  相似文献   

11.
随着互联网的飞速发展和Web应用系统的广泛应用,Web挖掘得到了人们越来越多的研究。从Web日志中发现和分析出用户的有用信息的Web日志挖掘已成为研究热点。很多基于关联规则的方法已经被应用于Web挖掘中。运用基于差别矩阵的粗糙集提取Web日志中的关联规则,并将生成的关联规则集用于用户行为的预测。实验结果说明该方法的有效性和实用性。  相似文献   

12.
意见领袖发现在舆情监测、市场推广、信息传播等领域具有重要的理论指导意义和实际应用价值。针对传统意见领袖挖掘算法片面考虑目标单一属性、缺乏话题相关性、评估缺乏客观性等问题,提出了一种基于改进拓扑势的意见领袖挖掘ITP算法。该算法结合具体节点的客观属性和网络结构,采用数据偏差对主观权重进行修正,可客观地对目标节点进行评估,挖掘意见领袖。对真实微博数据集进行实验,结果表明 与传统的3种方法相比,所提出的算法能挖掘不同背景下的意见领袖,具有较高的相关性和准确度。  相似文献   

13.
This paper presents a comprehensive survey of web log/usage mining based on over 100 research papers. This is the first survey dedicated exclusively to web log/usage mining. The paper identifies several web log mining sub-topics including specific ones such as data cleaning, user and session identification. Each sub-topic is explained, weaknesses and strong points are discussed and possible solutions are presented. The paper describes examples of web log mining and lists some major web log mining software packages.  相似文献   

14.
Interval Set Clustering of Web Users with Rough K-Means   总被引:1,自引:0,他引:1  
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.  相似文献   

15.
Web sites contain an ever increasing amount of information within their pages. As the amount of information increases so does the complexity of the structure of the web site. Consequently it has become difficult for visitors to find the information relevant to their needs. To overcome this problem various clustering methods have been proposed to cluster data in an effort to help visitors find the relevant information. These clustering methods have typically focused either on the content or the context of the web pages. In this paper we are proposing a method based on Kohonen’s self-organizing map (SOM) that utilizes both content and context mining clustering techniques to help visitors identify relevant information quicker. The input of the content mining is the set of web pages of the web site whereas the source of the context mining is the access-logs of the web site. SOM can be used to identify clusters of web sessions with similar context and also clusters of web pages with similar content. It can also provide means of visualizing the outcome of this processing. In this paper we show how this two-level clustering can help visitors identify the relevant information faster. This procedure has been tested to the access-logs and web pages of the Department of Informatics and Telecommunications of the University of Athens.  相似文献   

16.
Mockups are widely used to elicit and validate user requirements in web applications, and several intuitive tools have been developed in recent years, actively involving the end user in the requirements solicitation process. However, most current web development approaches and tools discard mockups after the information‐gathering process, abandoning the opportunity to exploit underlying information in them for autogenerating functional web applications. To overcome this limitation, we have devised a method for deriving the database schema and the logic of the web application from the information contained within mockups. In particular, the method gathers clues on how to organize the data and the control flow of the web application by analyzing the structure and relationships of the widgets in the mockup. Based on the proposed method, we have implemented a tool supporting the generation of web applications abiding by the model‐view‐controller architectural pattern. The tool has been evaluated by involving several end users in the development of web applications for different domains.  相似文献   

17.
存在于网上商城的大量的产品评论数量在以惊人的速度增长,并成为文本挖掘研究的一个新兴热点.由于中英文语言本身的不同,我们需要将汉语评论意见挖掘作为一个单独的领域来研究.在前人研究的基础上介绍了一种新的情感分类方法,第一次提出了将主观性意见语句分为以下三类:强极性主观性意见语句,依赖上下文语境的弱极性主观性意见语句,第三类...  相似文献   

18.
The massive web videos prompt an imperative demand on efficiently grasping the major events.However, the distinct characteristics of web videos, such as the limited number of features, the noisy text information, and the unavoidable error in near-duplicate keyframes (NDKs) detection, make web video event mining a challenging task.In this paper, we propose a novel four-stage framework to improve the performance of web video event mining.Data preprocessing is the first stage.Multiple Correspondence Analysis (MCA) is then applied to explore the correlation between terms and classes, targeting for bridging the gap between NDKs and high-level semantic concepts.Next, co-occurrence information is used to detect the similarity between NDKs and classes using the NDK-within-video information.Finally, both of them are integrated for web video event mining through negative NDK pruning and positive NDK enhancement.Moreover, both NDKs and terms with relatively low frequencies are treated as useful information in our experiments.Experimental results on large-scale web videos from YouTube demonstrate that the proposed framework outperforms several existing mining methods and obtains good results for web video event mining.  相似文献   

19.
Web访问挖掘预处理的用户识别算法   总被引:1,自引:0,他引:1  
Web访问挖掘是目前网上智能信息检索和电子商务的主要研究课题之一。该文主要对Web挖掘技术中的预处理过程进行了研究,着重分析了其中的用户识别方法,并给出了一个用户识别的通用算法。  相似文献   

20.
随着互联网普及率的不断提高和大众媒介的网络化,网络媒体逐渐成为使用率最高的网络应用.媒介的互动性和网络的海量性导致了网络评论的大量出现,使得网络评论主流观点的自动提取以及不同源数据的对比分析格外具有意义.针对以上问题展开研究,主要有两点贡献:第一,提出了一种自动提取网络评论主流观点的方法,该方法克服网络评论的复杂性和海量性,通过"Web评论观点鉴别"和"主流观点描述"两个核心部分的处理,自动提取出某一主题下评论的主流观点,并针对每个主流观点,使用关键词和代表性评论对其进行描述;第二,利用该方法对来自不同数据源的网络评论进行对比分析,给出了不同数据源的网络评论特点及差异.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号