共查询到20条相似文献,搜索用时 15 毫秒
1.
R. K. Jena 《Behaviour & Information Technology》2019,38(9):986-1001
ABSTRACTThe ability to exploit students’ sentiments using different machine learning techniques is considered an important strategy for planning and manoeuvring in a collaborative educational environment. The advancement of machine learning technology is energised by the healthy growth of big data technologies. This helps the applications based on Sentiment Mining (SM) using big data to become a common platform for data mining activities. However, very little has been studied on the sentiment application using a huge amount of available educational data. Therefore, this paper has made an attempt to mine the academic data using different efficient machine learning algorithms. The contribution of this paper is two-fold: (i) studying the sentiment polarity (positive, negative and neutral) from students’ data using machine learning techniques, and (ii) modelling and predicting students’ emotions (Amused, Anxiety, Bored, Confused, Enthused, Excited, Frustrated, etc.) using the big data frameworks. The developed SM techniques using big data frameworks can be scaled and made adaptable for source variation, velocity and veracity to maximise value mining for the benefit of students, faculties and other stakeholders. 相似文献
2.
Sentiment detection and classification is the latest fad for social analytics on Web. With the array of practical applications in healthcare, finance, media, consumer markets, and government, distilling the voice of public to gain insight to target information and reviews is non‐trivial. With a marked increase in the size, subjectivity, and diversity of social web‐data, the vagueness, uncertainty and imprecision within the information has increased manifold. Soft computing techniques have been used to handle this fuzziness in practical applications. This work is a study to understand the feasibility, scope and relevance of this alliance of using Soft computing techniques for sentiment analysis on Twitter. We present a systematic literature review to collate, explore, understand and analyze the efforts and trends in a well‐structured manner to identify research gaps defining the future prospects of this coupling. The contribution of this paper is significant because firstly the primary focus is to study and evaluate the use of soft computing techniques for sentiment analysis on Twitter and secondly as compared to the previous reviews we adopt a systematic approach to identify, gather empirical evidence, interpret results, critically analyze, and integrate the findings of all relevant high‐quality studies to address specific research questions pertaining to the defined research domain. 相似文献
3.
Ibrahim M. Alwayle Badriyya B. Al-onazi Jaber S. Alzahrani Khaled M. Alalayah Khadija M. Alaidarous Ibrahim Abdulrab Ahmed Mahmoud Othman Abdelwahed Motwakel 《计算机系统科学与工程》2023,46(3):3423-3438
Arabic is one of the most spoken languages across the globe. However, there are fewer studies concerning Sentiment Analysis (SA) in Arabic. In recent years, the detected sentiments and emotions expressed in tweets have received significant interest. The substantial role played by the Arab region in international politics and the global economy has urged the need to examine the sentiments and emotions in the Arabic language. Two common models are available: Machine Learning and lexicon-based approaches to address emotion classification problems. With this motivation, the current research article develops a Teaching and Learning Optimization with Machine Learning Based Emotion Recognition and Classification (TLBOML-ERC) model for Sentiment Analysis on tweets made in the Arabic language. The presented TLBOML-ERC model focuses on recognising emotions and sentiments expressed in Arabic tweets. To attain this, the proposed TLBOML-ERC model initially carries out data pre-processing and a Continuous Bag Of Words (CBOW)-based word embedding process. In addition, Denoising Autoencoder (DAE) model is also exploited to categorise different emotions expressed in Arabic tweets. To improve the efficacy of the DAE model, the Teaching and Learning-based Optimization (TLBO) algorithm is utilized to optimize the parameters. The proposed TLBOML-ERC method was experimentally validated with the help of an Arabic tweets dataset. The obtained results show the promising performance of the proposed TLBOML-ERC model on Arabic emotion classification. 相似文献
4.
商品评论信息是用户线上决策的重要依据,但在利益的驱使下商家往往会通过雇佣专业的写手撰写大量虚假评论的方式来误导用户,进而达到包装自己或诋毁竞争对手的目的.这种现象会造成不正当的商业竞争和极差的用户体验.针对这一现象,我们通过情感预训练的方法对现有的虚假评论识别模型进行了改进,并提出了一种能够同时整合评论语义和情感信息的联合预训练学习方法.鉴于预训练模型强大的语义表示能力, 在联合学习框架中采用了2种预训练模型编码器分别用于抽取评论的语义和情感上下文特征,并通过联合训练的方法整合2种特征,最后使用Center Loss损失函数对模型进行优化.在多个公开数据集和多个不同任务上进行了验证实验,实验表明提出的联合模型在虚假评论检测与情感极性分析任务上都取得了目前最好的效果且具有更强的泛化能力. 相似文献
5.
基于加权Bayes分类器的流数据在线分类算法研究 总被引:1,自引:0,他引:1
传统的分类算法在对模型进行训练之前,需要得到整个训练数据集。然而在大数据环境下,数据以数据流的形式源源不断地流向系统,因此不可能预先获得整个训练数据集。研究了大数据环境下含有噪音的流数据的在线分类问题。将流数据的在线分类描述成一个优化问题,提出了一种加权的Nave Bayes分类器和一种误差敏感的(Error Adaptive)分类器,并通过真实的数据集对提出的算法进行了验证。实验结果表明,文中提出的误差敏感的分类器算法在系统没有噪音的情况下分类预测的准确性要优于相关的算法;此外,当流数据中含有噪音时,误差敏感的分类器算法对噪音不敏感,仍然具有很好的预测准确性,因此可以应用于大数据环境下流数据的在线分类预测。 相似文献
6.
Norjihan Binti Abdul Ghani Suraya Hamid Muneer Ahmad Younes Saadi N.Z. Jhanjhi Mohammed A. Alzain Mehedi Masud 《计算机系统科学与工程》2022,40(3):913-926
The world health organization (WHO) terms dengue as a serious illness that impacts almost half of the world’s population and carries no specific treatment. Early and accurate detection of spread in affected regions can save precious lives. Despite the severity of the disease, a few noticeable works can be found that involve sentiment analysis to mine accurate intuitions from the social media text streams. However, the massive data explosion in recent years has led to difficulties in terms of storing and processing large amounts of data, as reliable mechanisms to gather the data and suitable techniques to extract meaningful insights from the data are required. This research study proposes a sentiment analysis polarity approach for collecting data and extracting relevant information about dengue via Apache Hadoop. The method consists of two main parts: the first part collects data from social media using Apache Flume, while the second part focuses on querying and extracting relevant information via the hybrid filtration-polarity algorithm using Apache Hive. To overcome the noisy and unstructured nature of the data, the process of extracting information is characterized by pre and post-filtration phases. As a result, only with the integration of Flume and Hive with filtration and polarity analysis, can a reliable sentiment analysis technique be offered to collect and process large-scale data from the social network. We introduce how the Apache Hadoop ecosystem – Flume and Hive – can provide a sentiment analysis capability by storing and processing large amounts of data. An important finding of this paper is that developing efficient sentiment analysis applications for detecting diseases can be more reliable through the use of the Hadoop ecosystem components than through the use of normal machines. 相似文献
7.
僵尸网络通过控制的主机实现多类恶意行为,使得当前的检测方法失效,其中窃取敏感数据已经成为主流。鉴于僵尸网络实现的恶意行为,检测和减轻方法的研究已经势在必行。提出了一种新颖的分布式实时僵尸网络检测方法,该方法通过将Netflow组织成主机Netflow图谱和主机关系链,并提取隐含的C&C通信特征来检测僵尸网络。同时,基于Spark Streaming分布式实时流处理引擎,使用该算法实现了BotScanner分布式检测系统。为了验证该系统的有效性,采用5个主流的僵尸网络家族进行训练,并分别使用模拟网络流量和真实网络流量进行测试。实验结果表明,在无需深度包解析的情况下,BotScanner分布式检测系统能够实时检测指定的僵尸网络,并获得了较高的检测率和较低的误报率。而且,在真实的网络环境中,BotScanner分布式检测系统能够进行实时检测,加速比接近线性,验证了Spark Streaming引擎在分布式流处理方面的优势,以及用于僵尸网络检测方面的可行性。 相似文献
8.
随着互联网整体水平的提高,大量基于维吾尔文的网络信息不断建立,引起了对不同领域的信息进行情感倾向性分析的迫切需要。该文考虑到维吾尔文没有足够的情感训练语料和完整的情感词典,结合机器学习方法和词典方法的优点,构建一个分类器模型 LCUSCM(Lexicon-based and Corpus-based Uyghur Text Sentiment Classification Model),先用自己构建的维吾尔文情感词典对语料进行高质量的情感分类,分类过程中对词典进行递归扩充,再根据每条句子的情感得分,从词典分类的结果中选择一部分语料来训练一个分类器并改进第一步的分类结果。此方法的正确率比单独使用机器学习方法提高了9.13%, 比词典方法提高了1.82%。 相似文献
9.
THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT 总被引:2,自引:0,他引:2
Most research on learning to identify sentiment ignores "neutral" examples, learning only from examples of significant (positive or negative) polarity. We show that it is crucial to use neutral examples in learning polarity for a variety of reasons. Learning from negative and positive examples alone will not permit accurate classification of neutral examples. Moreover, the use of neutral training examples in learning facilitates better distinction between positive and negative examples. 相似文献
10.
The popularity of many social media sites has prompted both academic and practical research on the possibility of mining social media data for the analysis of public sentiment. Studies have suggested that public emotions shown through Twitter could be well correlated with the Dow Jones Industrial Average. However, it remains unclear how public sentiment, as reflected on social media, can be used to predict stock price movement of a particular publicly-listed company. In this study, we attempt to fill this research void by proposing a technique, called SMeDA-SA, to mine Twitter data for sentiment analysis and then predict the stock movement of specific listed companies. For the purpose of experimentation, we collected 200 million tweets that mentioned one or more of 30 companies that were listed in NASDAQ or the New York Stock Exchange. SMeDA-SA performs its task by first extracting ambiguous textual messages from these tweets to create a list of words that reflects public sentiment. SMeDA-SA then made use of a data mining algorithm to expand the word list by adding emotional phrases so as to better classify sentiments in the tweets. With SMeDA-SA, we discover that the stock movement of many companies can be predicted rather accurately with an average accuracy over 70%. This paper describes how SMeDA-SA can be used to mine social media date for sentiments. It also presents the key implications of our study. 相似文献
11.
该文提出了一种基于情感词向量的情感分类方法。词向量采用连续实数域上的固定维数向量来表示词汇,能够表达词汇丰富的语义信息。词向量的学习方法,如word2vec,能从大规模语料中通过上下文信息挖掘出潜藏的词语间语义关联。本文在从语料中学习得到的蕴含语义信息的词向量基础上,对其进行情感调整,得到同时考虑语义和情感倾向的词向量。对于一篇输入文本,基于情感词向量建立文本的特征表示,采用机器学习的方法对文本进行情感分类。该方法与基于词、N-gram及原始word2vec词向量构建文本表示的方法相比,情感分类准确率更高、性能和稳定性更好。 相似文献
12.
Muhammad Umer Imran Ashraf Arif Mehmood Saru Kumari Saleem Ullah Gyu Sang Choi 《Computational Intelligence》2021,37(1):409-434
Sentiment analysis focuses on identifying and classifying the sentiments expressed in text messages and reviews. Social networks like Twitter, Facebook, and Instagram generate heaps of data filled with sentiments, and the analysis of such data is very fruitful when trying to improve the quality of both products and services alike. Classic machine learning techniques have a limited capability to efficiently analyze such large amounts of data and produce precise results; they are thus supported by deep learning models to achieve higher accuracy. This study proposes a combination of convolutional neural network and long short‐term memory (CNN‐LSTM) deep network for performing sentiment analysis on Twitter datasets. The performance of the proposed model is analyzed with machine learning classifiers, including the support vector classifier, random forest (RF), stochastic gradient descent (SGD), logistic regression, a voting classifier (VC) of RF and SGD, and state‐of‐the‐art classifier models. Furthermore, two feature extraction methods (term frequency‐inverse document frequency and word2vec) are also investigated to determine their impact on prediction accuracy. Three datasets (US airline sentiments, women's e‐commerce clothing reviews, and hate speech) are utilized to evaluate the performance of the proposed model. Experiment results demonstrate that the CNN‐LSTM achieves higher accuracy than those of other classifiers. 相似文献
13.
为提高外汇新闻的意见挖掘,分析外汇新闻的数据特征,提出面向外汇新闻文本的细粒度情感分析方法,包括对情感倾向和情感强度的计算。在情感倾向方面,基于朴素贝叶斯、逻辑回归、随机森林和支持向量机4种机器学习算法,设计融合情感词权重的情感倾向计算方法;在情感强度方面,分析外汇新闻中影响情感强度的特征词,通过权重策略,实现最优权重组合下的外汇新闻情感强度计算。实验结果表明了该方法在情感倾向和情感强度计算方面的有效性。 相似文献
14.
Andrea Vázquez-Ingelmo Alicia García-Holgado Francisco José García-Peñalvo Roberto Therón 《Expert Systems》2023,40(1):e12872
The misinformation problem affects the development of the society. Misleading content and unreliable information overwhelm social networks and media. In this context, the use of data visualizations to support news and stories is increasing. The use of misleading visualizations both intentionally or accidentally influence in the audience perceptions, which usually are not visualization and domain experts. Several factors influence o accurately tag a visualization as confusing or misleading. In this paper, we present a machine learning approach to detect if an information visualization can be potentially confusing and misunderstood based on the analytic task it tries to support. This approach is supported by fine-grained features identified through domain engineering and meta modelling on the information visualization and dashboards domain. We automatically generated visualizations from a tri-variate dataset through the software product line paradigm and manually labelled them to obtain a training dataset. The results support the viability of the proposal as a tool to support journalists, audience and society in general, not only to detect confusing visualizations, but also to select the visualization that supports a previous defined task according to the data domain. 相似文献
15.
针对社交媒体数据的特点及其分析的挑战性,提出了一种基于实时计算框架Storm、批处理框架Hadoop和高效可水平扩展的NoSQL数据库MongoDB的分布式社交媒体数据处理方案,并依此指导实现基于Twitter流式数据的流感疫情可视化分析系统.实验证明,该分布式方案能较好支持Twitter流式数据的高效处理和储存,使之满足系统的性能需求. 相似文献
16.
Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary data. Based on a case study of Dutch employees’ work-related tweets, this paper compares the coding performance of three classifiers, Linear Support Vector Machine, Naïve Bayes, and logistic regression. The performance of these classifiers is assessed by examining accuracy, precision, recall, the area under the precision-recall curve, and Krippendorf’s Alpha. These indices are obtained by comparing the coding decisions of the classifier to manual coding decisions. The findings indicate that the Linear Support Vector Machine and Naïve Bayes classifiers outperform the logistic regression classifier. This study also compared the performance of these classifiers based on stratified random samples and random samples of training data. The findings indicate that in smaller training sets stratified random training samples perform better than random training samples, in large training sets (n = 4000) random samples yield better results. Finally, the Linear Support Vector Machine classifier was trained with 4000 tweets and subsequently used to categorize 578,581 tweets obtained from 430 employees. 相似文献
17.
基于Hadoop和Mahout的大数据管理分析系统 总被引:1,自引:0,他引:1
随着数据量的爆炸性增长、数据结构的多样化和数据的流动性,传统的关系数据库系统已经无法满足大数据管理和分析的要求。因此有必要对基于大数据的数据管理和分析系统进行研究,以达到快速地统计和分析特定领域中海量结构化/非结构化数据,最终为决策提供支持的目的。提出一种基于Hadoop和Mahout的大数据管理分析系统。通过数据特性的分析,将数据分解后存入对应的数据库中进行管理。并在特定的应用领域中实现和验证了所提出的大数据管理分析系统,获得了优于已报道相关研究工作的数据分析结果。 相似文献
18.
The increase in available data from sensors embedded in industrial equipment has led to a recent rise in the use of industrial predictive maintenance. In the aircraft industry, predictive maintenance has become an essential tool for optimizing maintenance schedules, reducing aircraft downtime, and identifying unexpected faults. Despite this, there is currently no comprehensive survey of predictive maintenance applications and techniques solely devoted to the aircraft manufacturing industry. This article is an in-depth state-of-the-art systematic literature review of the different data types, applications, projects, and opportunities for predictive maintenance in this industry. The goal of this review is to identify, and highlight the challenges and opportunities for future research in this field. This review found that the current focus of research is too biased towards aircraft engines due to a lack of publicly available data sets, and that greater automation is an important step to optimize aircraft maintenance to its full potential. 相似文献
19.
针对互联网出现的评论文本情感分析,引入潜在狄利克雷分布(Latent Dirichlet allocation,LDA)模型,提出一种分类方法。该分类方法结合情感词典,依据指定的情感单元搭配模式,提取情感信息,包括情感词和上、下文。使用主题模型发掘情感信息中的关键特征,并融入到情感向量空间中。最后利用机器学习分类算法,实现中文评论文本的情感分类。实验结果表明,提出的方法有效降低了特征向量的维度,并且在文本情感分类上有很好的效果。 相似文献
20.
Angela Fortunato Michele Gorgoglione Antonio Messeni Petruzzelli Umberto Panniello 《Information Systems Management》2017,34(3):238-249
ABSTRACTThe concepts of open innovation and big data have been largely explored, but little research focused on the use of big data for open innovation activities. We explore how big data obtained by social media can be useful for open innovation activities in Television (TV) domain. Results demonstrate that the analysis of open data coming from social media data allows TV managers to identify the specific TV contents and Twitter elements, which are able to increase the social media traffic related to the show and gain insights for innovating the design of the TV show’s episodes or seasons. 相似文献