首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
The sentiment analysis (SA) applications are becoming popular among the individuals and organizations for gathering and analysing user's sentiments about products, services, policies, and current affairs. Due to the availability of a wide range of English lexical resources, such as part‐of‐speech taggers, parsers, and polarity lexicons, development of sophisticated SA applications for the English language has attracted many researchers. Although there have been efforts for creating polarity lexicons in non‐English languages such as Urdu, they suffer from many deficiencies, such as lack of publically available sentiment lexicons with a proper scoring mechanism of opinion words and modifiers. In this work, we present a word‐level translation scheme for creating a first comprehensive Urdu polarity resource: “Urdu Lexicon” using a merger of existing resources: list of English opinion words, SentiWordNet, English–Urdu bilingual dictionary, and a collection of Urdu modifiers. We assign two polarity scores, positive and negative, to each Urdu opinion word. Moreover, modifiers are collected, classified, and tagged with proper polarity scores. We also perform an extrinsic evaluation in terms of subjectivity detection and sentiment classification, and the evaluation results show that the polarity scores assigned by this technique are more accurate than the baseline methods.  相似文献   

2.
Sentiment analysis focuses on identifying and classifying the sentiments expressed in text messages and reviews. Social networks like Twitter, Facebook, and Instagram generate heaps of data filled with sentiments, and the analysis of such data is very fruitful when trying to improve the quality of both products and services alike. Classic machine learning techniques have a limited capability to efficiently analyze such large amounts of data and produce precise results; they are thus supported by deep learning models to achieve higher accuracy. This study proposes a combination of convolutional neural network and long short‐term memory (CNN‐LSTM) deep network for performing sentiment analysis on Twitter datasets. The performance of the proposed model is analyzed with machine learning classifiers, including the support vector classifier, random forest (RF), stochastic gradient descent (SGD), logistic regression, a voting classifier (VC) of RF and SGD, and state‐of‐the‐art classifier models. Furthermore, two feature extraction methods (term frequency‐inverse document frequency and word2vec) are also investigated to determine their impact on prediction accuracy. Three datasets (US airline sentiments, women's e‐commerce clothing reviews, and hate speech) are utilized to evaluate the performance of the proposed model. Experiment results demonstrate that the CNN‐LSTM achieves higher accuracy than those of other classifiers.  相似文献   

3.
Sentiment analysis is the natural language processing task dealing with sentiment detection and classification from texts. In recent years, due to the growth in the quantity and fast spreading of user-generated contents online and the impact such information has on events, people and companies worldwide, this task has been approached in an important body of research in the field. Despite different methods having been proposed for distinct types of text, the research community has concentrated less on developing methods for languages other than English. In the above-mentioned context, the present work studies the possibility to employ machine translation systems and supervised methods to build models able to detect and classify sentiment in languages for which less/no resources are available for this task when compared to English, stressing upon the impact of translation quality on the sentiment classification performance. Our extensive evaluation scenarios show that machine translation systems are approaching a good level of maturity and that they can, in combination to appropriate machine learning algorithms and carefully chosen features, be used to build sentiment analysis systems that can obtain comparable performances to the one obtained for English.  相似文献   

4.
基于CNN和BiLSTM网络特征融合的文本情感分析   总被引:1,自引:0,他引:1  
李洋  董红斌 《计算机应用》2018,38(11):3075-3080
卷积神经网络(CNN)和循环神经网络(RNN)在自然语言处理上得到广泛应用,但由于自然语言在结构上存在着前后依赖关系,仅依靠卷积神经网络实现文本分类将忽略词的上下文含义,且传统的循环神经网络存在梯度消失或梯度爆炸问题,限制了文本分类的准确率。为此,提出一种卷积神经网络和双向长短时记忆(BiLSTM)特征融合的模型,利用卷积神经网络提取文本向量的局部特征,利用BiLSTM提取与文本上下文相关的全局特征,将两种互补模型提取的特征进行融合,解决了单卷积神经网络模型忽略词在上下文语义和语法信息的问题,也有效避免了传统循环神经网络梯度消失或梯度弥散问题。在两种数据集上进行对比实验,实验结果表明,所提特征融合模型有效提升了文本分类的准确率。  相似文献   

5.
6.
Yulei Sui  Sen Ye  Jingling Xue  Jie Zhang 《Software》2014,44(12):1485-1510
Because of its high precision as a flow‐insensitive pointer analysis, Andersen's analysis has been deployed in some modern optimising compilers. To obtain improved precision, we describe how to add context sensitivity on top of Andersen's analysis. The resulting analysis, called ICON , is efficient to analyse large programs while being sufficiently precise to drive compiler optimisations. Its novelty lies in summarising the side effects of a procedure by using one transfer function on virtual variables that represent fully parameterised locations accessed via its formal parameters. As a result, a good balance between efficiency and precision is made, resulting in ICON that is more powerful than a 1‐callsite‐sensitive analysis and less so than a call‐path‐sensitive analysis (when the recursion cycles in a program are collapsed in all cases). We have compared ICON with FULCRA , a state of the art Andersen's analysis that is context sensitive by acyclic call paths, in Open64 (with recursion cycles collapsed in both cases) using the 16 C/C++ benchmarks in SPEC2000 (totalling 600 KLOC) and 5 C applications (totalling 2.1 MLOC). Our results demonstrate scalability of ICON and lack of scalability of FULCRA . FULCRA spends over 2 h in analysing SPEC2000 and fails to run to completion within 5 h for two of the five applications tested. In contrast, ICON spends just under 7 min on the 16 benchmarks in SPEC2000 and just under 26 min on the same two applications. For the 19 benchmarks analysable by FULCRA , ICON is nearly as accurate as FULCRA in terms of the quality of the built Static Single Assignment (SSA) form and the precision of the discovered alias information. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
Modern software systems are required to dynamically adapt to changing workloads, scenarios, and objectives and to achieve a certain Quality of Service (QoS). Guaranteeing QoS requirements is not trivial, as run‐time uncertainty might invalidate the design‐time rationale, where software components have been selected by means of off‐line analysis. In this work, we propose a QoS‐based feedback approach that makes a combined use of design‐time predictions and run‐time measurements to manage QoS data over time and support software architects while selecting software components that best fit QoS requirements. We illustrate the feasibility and efficacy of the approach on a case study, where the quantitative evaluation shows how the analysis effectively identifies the sources of QoS violations and indicates possible solutions to achieve QoS requirements.  相似文献   

8.
ABSTRACT

Starting from the assumption that the factors orienting University choice are heterogeneous and multidimensional, the study explores student’s motivations in higher education. To this aim, a big data analysis has been performed through ‘TalkWalker’, a tool based on the algorithms developed in the context of Social Data Intelligence, which allows understanding the sentiment of a group of people regarding a specific theme. The data have been extracted by drawing on published posts from anywhere in the world over a 12-month period from many online sources. According to the findings, the main variable capable of influencing the choice of University is training offer, followed by physical structure, work opportunities, prestige, affordability, communication, organisation, environmental sustainability. The study establishes an innovative research agenda for further studies by proposing the elaboration of a systems and process-based view for higher education. However, it presents the limitation of the superficial investigation, determined by the analysis of a large amount of data. Therefore, for future research, it might be appropriate to apply a different technique to realise a comparison and to check whether the size of the considered sample and the depth of the analysis technique can affect the results and the consequent considerations.  相似文献   

9.
Several classes of scientific and commercial applications require the execution of a large number of independent tasks. One highly successful and low‐cost mechanism for acquiring the necessary computing power for these applications is the ‘public‐resource computing’, or ‘desktop Grid’ paradigm, which exploits the computational power of private computers. So far, this paradigm has not been applied to data mining applications for two main reasons. First, it is not straightforward to decompose a data mining algorithm into truly independent sub‐tasks. Second, the large volume of the involved data makes it difficult to handle the communication costs of a parallel paradigm. This paper introduces a general framework for distributed data mining applications called Mining@home. In particular, we focus on one of the main data mining problems: the extraction of closed frequent itemsets from transactional databases. We show that it is possible to decompose this problem into independent tasks, which however need to share a large volume of the data. We thus introduce a data‐intensive computing network, which adopts a P2P topology based on super peers with caching capabilities, aiming to support the dissemination of large amounts of information. Finally, we evaluate the execution of a pattern extraction task on such network. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
11.
We propose a parallelization scheme for an existing algorithm for constructing a web‐directory, that contains categories of web documents organized hierarchically. The clustering algorithm automatically infers the number of clusters using a quality function based on graph cuts. A parallel implementation of the algorithm has been developed to run on a cluster of multi‐core processors interconnected by an intranet. The effect of the well‐known Latent Semantic Indexing on the performance of the clustering algorithm is also considered. The parallelized graph‐cut based clustering algorithm achieves an F‐measure in the range [0.69,0.91] for the generated leaf‐level clusters while yielding a precision‐recall performance in the range [0.66,0.84] for the entire hierarchy of the generated clusters. As measured via empirical observations, the parallel algorithm achieves an average speedup of 7.38 over its sequential variant, at the same time yielding a better clustering performance than the sequential algorithm in terms of F‐measure. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
According to efficient markets theory, information is an important factor that affects market performance and serves as a source of first‐hand evidence in decision making, in particular with the rapid rise of Internet technologies in recent years. However, a lack of knowledge and inference ability prevents current decision support systems from processing the wide range of available information. In this paper, we propose a common‐sense knowledge‐supported news model. Compared with previous work, our model is the first to incorporate broad common‐sense knowledge into a decision support system, thereby improving the news analysis process through the application of a graphic random‐walk framework. Prototype and experiments based on Hong Kong stock market data have demonstrated that common‐sense knowledge is an important factor in building financial decision models that incorporate news information.  相似文献   

13.
In the knowledge economy, human capital is a key factor in any organization to achieve a sustainable competitive advantage. Thus, selection of competent personnel is the most important function of human resource managers. However, because of a wide range of criteria and organizational factors that affect the process, personnel selection is often regarded as a complex problem that can be answered through multicriteria decision‐making (MCDM) procedures. Despite the great importance of determining a comprehensive set of criteria, it has not gained enough attention in the literature. This study presents a competency framework with five criteria for choosing the best information technology (IT) expert from five alternatives. The stepwise weight assessment ratio analysis (SWARA) and grey additive ratio assessment (ARAS‐G) methods are also used to derive the criteria weights and provide the final alternative, respectively. The results reveal that subject competency is the major criteria in IT personnel selection.  相似文献   

14.
Market basket analysis is one of the typical applications in mining association rules. The valuable information discovered from data mining can be used to support decision making. Generally, support and confidence (objective) measures are used to evaluate the interestingness of association rules. However, in some cases, by using these two measures, the discovered rules may be not profitable and not actionable (not interesting) to enterprises. Therefore, how to discover the patterns by considering both objective measures (e.g. probability) and subjective measures (e.g. profit) is a challenge in data mining, particularly in marketing applications. This paper focuses on pattern evaluation in the process of knowledge discovery by using the concept of profit mining. Data Envelopment Analysis is utilized to calculate the efficiency of discovered association rules with multiple objective and subjective measures. After evaluating the efficiency of association rules, they are categorized into two classes, relatively efficient (interesting) and relatively inefficient (uninteresting). To classify these two classes, Decision Tree (DT)‐based classifier is built by using the attributes of association rules. The DT classifier can be used to find out the characteristics of interesting association rules, and to classify the unknown (new) association rules.  相似文献   

15.
Currently, high-dimensional data such as image data is widely used in the domain of pattern classification and signal processing. When using high-dimensional data, feature analysis methods such as PCA (principal component analysis) and LDA (linear discriminant analysis) are usually required in order to reduce memory usage or computational complexity as well as to increase classification performance. We propose a feature analysis method for dimension reduction based on a data generation model that is composed of two types of factors: class factors and environment factors. The class factors, which are prototypes of the classes, contain important information required for discriminating between various classes. The environment factors, which represent distortions of the class prototypes, need to be diminished for obtaining high class separability. Using the data generation model, we aimed to exclude environment factors and extract low-dimensional class factors from the original data. By performing computational experiments on artificial data sets and real facial data sets, we confirmed that the proposed method can efficiently extract low-dimensional features required for classification and has a better performance than the conventional methods.  相似文献   

16.
The discovery of knowledge through data mining provides a valuable asset for addressing decision making problems. Although a list of features may characterize a problem, it is often the case that a subset of those features may influence more a certain group of events constituting a sub‐problem within the original problem. We propose a divide‐and‐conquer strategy for data mining using both the data‐based sensitivity analysis for extracting feature relevance and expert evaluation for splitting the problem of characterizing telemarketing contacts to sell bank deposits. As a result, the call direction (inbound/outbound) was considered the most suitable candidate feature. The inbound telemarketing sub‐problem re‐evaluation led to a large increase in targeting performance, confirming the benefits of such approach and considering the importance of telemarketing for business, in particular in bank marketing.  相似文献   

17.
18.
Behavior‐based detection and signature‐based detection are two popular approaches to malware (malicious software) analysis. The security industry, such as the sector selling antivirus tools, has been using signature and heuristic‐based technologies for years. However, this approach has been proven to be inefficient in identifying unknown malware strains. On the other hand, the behavior‐based malware detection approach has a greater potential in identifying previously unknown instances of malicious software. The accuracy of this approach relies on techniques to profile and recognize accurate behavior models. Unfortunately, with the increasing complexity of malicious software and limitations of existing automatic tools, the current behavior‐based approach cannot discover many newer forms of malware either. In this paper, we implement ‘holography platform’, a behavior‐based profiler on top of a virtual machine emulator that intercepts the system processes and analyzes the CPU instructions, CPU registers, and memory. The captured information is stored in a relational database, and data mining techniques are used to extract information. We demonstrate the breadth of the ‘holography platform’ by conducting two experiments: a packed binary behavior analysis and a malvertising (malicious advertising) incident tracing. Both tasks are known to be very difficult to do efficiently using existing methods and tools. We demonstrate how the precise behavior information can be easily obtained using the ‘holography platform’ tool. With these two experiments, we show that the ‘holography platform’ can provide security researchers and automatic malware detection systems with an efficient malicious software behavior analysis solution. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

19.
There is a growing demand for using commercial‐off‐the‐shelf (COTS) software components to facilitate the development of software systems. Among many research topics for component‐based software, quality‐of‐service (QoS) evaluation is yet to be given the importance it deserves. In this paper, we propose a novel analytical model to evaluate the QoS of component‐based software systems. We use the component execution graph (CEG) graph model to model the architecture at the process level and the interdependence among components. The CEG graph can explicitly capture sequential, parallel, selective and iterative compositions of components. For QoS estimation, each component in the CEG model is associated with execution rate, failure rate and cost per unit time. Three metrics of the QoS are considered and analytically calculated, namely make‐span, reliability and cost. Through a case study, we show that our model is capable of modeling real‐world COTS software systems effectively. Also, Monte‐Carlo simulation in the case study indicates that analytical results are consistent with simulation and all are covered by 95% confidence intervals. We also present a sensitivity analysis technique to identify QoS bottlenecks. This paper concludes with a comparison with related work. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

20.
Digital watermarking evaluation and benchmarking are challenging tasks because of multiple evaluation and conflicting criteria. A few approaches have been presented to implement digital watermarking evaluation and benchmarking frameworks. However, these approaches still possess a number of limitations, such as fixing several attributes on the account of other attributes. Well‐known benchmarking approaches are limited to robust watermarking. Therefore, this paper presents a new methodology for digital watermarking evaluation and benchmarking based on large‐scale data by using external evaluators and a group decision making context. Two experiments are performed. In the first experiment, a noise gate‐based digital watermarking approach is developed, and the scheme for the noise gate digital watermarking approach is enhanced. Sixty audio samples from different audio styles are tested with two algorithms. A total of 120 samples were evaluated according to three different metrics, namely, quality, payload, and complexity, to generate a set of digital watermarking samples. In the second experiment, the situation in which digital watermarking evaluators have different preferences is discussed. Weight measurement with a decision making solution is required to solve this issue. The analytic hierarchy process is used to measure evaluator preference. In the decision making solution, the technique for order of preference by similarity to the ideal solution with different contexts (e.g., individual and group) is utilized. Therefore, selecting the proper context with different aggregation operators to benchmark the results of experiment 1 (i.e., digital watermarking approaches) is recommended. The findings of this research are as follows: (1) group and individual decision making provide the same result in this case study. However, in the case of selection where the priority weights are generated from the evaluators, group decision making is the recommended solution to solve the trade‐off reflected in the benchmarking process for digital watermarking approaches. (2) Internal and external aggregations show that the enhanced watermarking approach demonstrates better performance than the original watermarking approach. © 2016 The Authors. Software: Practice and Experience published by John Wiley & Sons Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号