共查询到20条相似文献,搜索用时 0 毫秒
1.
Supervised learning has attracted much attention in recent years. As a consequence, many of the state-of-the-art algorithms are domain dependent as they require a labeled training corpus to learn the domain features. This requires the availability of labeled corpora which is a cumbersome task in itself. However, for text sentiment detection SentiWordNet (SWN) may be used. It is a vocabulary where terms are arranged in synonym groups called synsets. This research makes use of SentiWordNet and treats it as the labeled corpus for training. A sentiment dictionary, SentiMI, builds upon the mutual information calculated from these terms. A complete framework is developed by using feature selection and extracting mutual information, from SentiMI, for the selected features. Training, testing and evaluation of the proposed framework are conducted on a large dataset of 50,000 movie reviews. A notable performance improvement of 7% in accuracy, 14% in specificity, and 8% in F-measure is achieved by the proposed framework as compared to the baseline SentiWordNet classifier. Comparison with the state-of-the-art classifiers is also performed on widely used Cornell Movie Review dataset which also proves the effectiveness of the proposed approach. 相似文献
2.
《Information & Management》2016,53(7):835-845
We examine the drivers of crowd wisdom in the financial domain by relating analyst report and social media sentiment via Granger causality (GC) testing based on the wisdom of crowds (WoC) theory. The significance of a large number of the tested time series indicates that analyst reports and social media content are suitable for mutual prediction. We elaborate on the conditions under which crowd cognitive diversity matters, and we derive related measures. The results suggest that the WoC theory can partially explain the GC between the two media types and that both professional analysts and the crowd can outperform one another under favorable circumstances. 相似文献
3.
《Information & Management》2016,53(8):987-996
Social media is a major platform for opinion sharing. In order to better understand and exploit opinions on social media, we aim to classify users with opposite opinions on a topic for decision support. Rather than mining text content, we introduce a link-based classification model, named global consistency maximization (GCM) that partitions a social network into two classes of users with opposite opinions. Experiments on a Twitter data set show that: (1) our global approach achieves higher accuracy than two baseline approaches and (2) link-based classifiers are more robust to small training samples if selected properly. 相似文献
4.
《Expert systems with applications》2014,41(10):4950-4958
Social media, especially Twitter is now one of the most popular platforms where people can freely express their opinion. However, it is difficult to extract important summary information from many millions of tweets sent every hour. In this work we propose a new concept, sentimental causal rules, and techniques for extracting sentimental causal rules from textual data sources such as Twitter which combine sentiment analysis and causal rule discovery. Sentiment analysis refers to the task of extracting public sentiment from textual data. The value in sentiment analysis lies in its ability to reflect popularly voiced perceptions that are stated in natural language. Causal rules on the other hand indicate associations between different concepts in a context where one (or several concepts) cause(s) the other(s). We believe that sentimental causal rules are an effective summarization mechanism that combine causal relations among different aspects extracted from textual data as well as the sentiment embedded in these causal relationships. In order to show the effectiveness of sentimental causal rules, we have conducted experiments on Twitter data collected on the Kurdish political issue in Turkey which has been an ongoing heated public debate for many years. Our experiments on Twitter data show that sentimental causal rule discovery is an effective method to summarize information about important aspects of an issue in Twitter which may further be used by politicians for better policy making. 相似文献
5.
Haeng-Jin Jang Jaemoon Sim Yonnim Lee Ohbyung Kwon 《Expert systems with applications》2013,40(18):7492-7503
IT vendors routinely use social media such as YouTube not only to disseminate their IT product information, but also to acquire customer input efficiently as part of their market research strategies. Customer responses that appear in social media, however, are typically unstructured; thus, a fairly large data set is needed for meaningful analysis. Although identifying customers’ value structures and attitudes may be useful for developing targeted or niche markets, the unstructured and volume-heavy nature of customer data prohibits efficient and economical extraction of such information. Automatic extraction of customer information would be valuable in determining value structure and strength. This paper proposes an intelligent method of estimating causality between user profiles, value structures, and attitudes based on the replies and published content managed by open social network systems such as YouTube. To show the feasibility of the idea proposed in this paper, information richness and agility are used as underlying concepts to create performance measures based on media/information richness theory. The resulting deep sentiment analysis proves to be superior to legacy sentiment analysis tools for estimation of causality among the focal parameters. 相似文献
6.
科研人员在日常研究中经常使用Excel,Spss等工具对数据进行分析加工来获得相关领域知识。然而随着大数据时代的到来,常用的数据处理软件因单机性能的限制已经不能满足科研人员对大数据分析处理的需求。大数据的处理和可视化离不开分布式计算环境。因此,为了完成对大数据的快速处理和可视化,科研人员不仅需要购置、维护分布式集群环境,还需要具备分布式环境下的编程能力和相应的前端数据可视化技术。这对很多非计算机科班的数据分析工作者而言是非常困难且不必要的。针对上述问题,提出了一种基于Web的轻量级大数据处理和可视化工具。通过该工具,数据分析工作者只需通过简单的点击和拖动,便可以在浏览器中轻松地打开大型数据文件(GB级别)、快速地对文件进行定位(跳转到文件某一行)、方便地调用分布式计算框架来对文件内容进行排序或求极大值、便捷地对数据进行可视化等。 实证研究证明,该解决方案是有效的。 相似文献
7.
《Information & Management》2020,57(5):103181
Examining 22,504 tweets extracted from Sina Weibo, a microblog site, we identify two clusters of microblog users and study how they influence the stock market. Our research contributes the following significant findings to the existing literature. First, we discover that there exists an inverse U-shaped curve between stock return and the attention of both news media and investors. Second, we verify that news media attention plays a positive moderating effect in the relationship between investor attention and the stock return. Finally, we find that social interaction could positively moderate the effect of news media’s and investor’s sentiments on stock return. 相似文献
8.
《Expert systems with applications》2014,41(16):7653-7670
The quality of the interpretation of the sentiment in the online buzz in the social media and the online news can determine the predictability of financial markets and cause huge gains or losses. That is why a number of researchers have turned their full attention to the different aspects of this problem lately. However, there is no well-rounded theoretical and technical framework for approaching the problem to the best of our knowledge. We believe the existing lack of such clarity on the topic is due to its interdisciplinary nature that involves at its core both behavioral-economic topics as well as artificial intelligence. We dive deeper into the interdisciplinary nature and contribute to the formation of a clear frame of discussion. We review the related works that are about market prediction based on online-text-mining and produce a picture of the generic components that they all have. We, furthermore, compare each system with the rest and identify their main differentiating factors. Our comparative analysis of the systems expands onto the theoretical and technical foundations behind each. This work should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance. 相似文献
9.
Due to the advancement of technology and globalization, it has become much easier for people around the world to express their opinions through social media platforms. Harvesting opinions through sentiment analysis from people with different backgrounds and from different cultures via social media platforms can help modern organizations, including corporations and governments understand customers, make decisions, and develop strategies. However, multiple languages posted on many social media platforms make it difficult to perform a sentiment analysis with acceptable levels of accuracy and consistency. In this paper, we propose a bilingual approach to conducting sentiment analysis on both Chinese and English social media to obtain more objective and consistent opinions. Instead of processing English and Chinese comments separately, our approach treats review comments as a stream of text containing both Chinese and English words. That stream of text is then segmented by our segment model and trimmed by the stop word lists which include both Chinese and English words. The stem words are then processed into feature vectors and then applied with two exchangeable natural language models, SVM and N-Gram. Finally, we perform a case study, applying our proposed approach to analyzing movie reviews obtained from social media. Our experiment shows that our proposed approach has a high level of accuracy and is more effective than the existing learning-based approaches. 相似文献
10.
《Electronic Commerce Research and Applications》2014,13(6):431-439
Many social network websites have been aggressively exploring innovative electronic word-of-mouth (eWOM) advertising strategies using information shared by users, such as posts and product reviews. For example, Facebook offers a service allowing marketers to utilize users’ posts to automatically generate advertisements. The effectiveness of this practice depends on the ability to accurately predict a post’s influence on its readers. For an advertising strategy of this nature, the influence of a post is determined jointly by the features of the post, such as contents and time of creation, and the features of the author of the post. We propose two models for predicting the influence of a post using both sources of influence, post- and author-related features, as predictors. An empirical evaluation shows that the proposed predictive features improve prediction accuracy, and the models are effective in predicting the influence score. 相似文献
11.
In this paper we present a methodology to analyze and visualize streams of Social Media messages and apply it to a case in which Twitter is used as a backchannel, i.e. as a communication medium through which participants follow an event in the real world as it unfolds. Unlike other methods based on social networks or theories of information diffusion, we do not assume proximity or a pre-existing social structure to model content generation and diffusion by distributed users; instead we refer to concepts and theories from discourse psychology and conversational analysis to track online interaction and discover how people collectively make sense of novel events through micro-blogging. In particular, the proposed methodology extracts concept maps from twitter streams and uses a mix of sentiment and topological metrics computed over the extracted concept maps to build visual devices and display the conversational flow represented as a trajectory through time of automatically extracted topics. We evaluated the proposed method through data collected from the analysis of Twitter users’ reactions to the March 2015 Apple Keynote during which the company announced the official launch of several new products. 相似文献
12.
对等P2P网络中大数据关键特征挖掘模型仿真 总被引:2,自引:0,他引:2
在网络数据管理优化问题的研究中,对等P2P为点对点网络通信.由于数据特征受到主观因素的影响较大,无法形成固定的关联特征,使得关键特征定位往往需要进行较大规模的大数据对比,传统的关联规则方法应用到此网络特征搜索过程时,建立的规则往往较为混乱甚至无规则可言,造成数据特征挖掘耗时,无效挖掘行为较多,效率较低.为此,提出利用Apriori算法的对等P2P网络中大数据关键特征挖掘方法.筛选对等p2p网络中大数据特征,选取聚类中心,并针对聚类中心进行关联性计算,删除关联性较差的特征.根据Apriori算法相关理论,对数据进行连接和剪枝处理,建立大数据关键特征挖掘模型.实验结果表明,利用改进算法进行对等p2p网络中大数据关键特征挖掘,能够提高挖掘的准确性,满足p2p网络的实际需求. 相似文献
13.
分析了数据挖掘领域面临的性能问题(主要包括算法的有效性、可伸缩性和并行性);根据数据并行的思想,提出了在时序预测中并行训练神经网络的模型,以提高训练速度。这一模型具有良好的可扩展性,能适应大训练集的情况,是一种粗粒度的并行,且易于在集群系统这样的并行环境下进行数据挖掘。同时,描述了相关算法,并对训练速度进行了测试。 相似文献
14.
Deniz Kılınç 《Software》2019,49(9):1352-1364
There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real-time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real-time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naïve Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii) a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real-time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real-time modes are 86.77% and 80.93%, respectively. 相似文献
15.
基于大数据分析恐怖袭击风险预测研究与仿真 总被引:2,自引:0,他引:2
在对恐怖袭击风险预测过程中,在建立预测模型时,由于受到恐怖袭击带有伪装性的影响,存在大量的伪装性样本和干扰性数据,真实的样本不充足、可统计数据波动较大.导致预测过程很容易受到干扰,存在预测精度低的问题.提出了大数据分析的恐怖袭击风险预测方法.建立恐怖袭击风险综合评判的大数据分析模型,采用大数据分析模型对恐怖袭击历史数据中隐含的可演化信息进行学习,利用所获取的结果进行未来的恐怖袭击预测.预测过程中融入多步时间序列预测中的递推计算的思想,将每一步预测的不确定性作为下一次预测迭代的输入要素加以充分考虑,提升预测精确度.仿真结果证明,采用大数据分析的恐怖袭击风险预测方法精确度和效率都比较高. 相似文献
16.
随着数据规模的快速增长,单机的数据分析工具已经无法满足需求。针对大数据的分析问题,设计并实现了一种基于组件的大数据分析服务平台Haflow。Haflow自定义了业务流程模型和可扩展的组件接口,组件接口支持各种异构工具的集成。系统接收用户定义的业务流程,将其翻译成执行流程实例,提交到Hadoop分布式集群上执行。Haflow是一个可扩展的、分布式的、支持异构分析工具的、面向服务的大数据分析服务平台。提出该平台有两重意义:一方面平台将与数据分析业务无关的工作封装起来,支持各种异构组件,以加快分析应用的开发速度;另一方面,平台后端使用Hadoop分布式系统来实现多任务的并发,从而提高应用的平均执行速度。 相似文献
17.
基于主从模式的并行关联规则挖掘算法及其应用研究 总被引:2,自引:0,他引:2
文章把数据挖掘技术引入到地震预报领域,研究并行关联规则在地震预报中寻找地震相关地区的应用。针对地震目录数据的特点利用多种剪枝技术,提出一个基于主从模式设计的并行关联规则挖掘算法,取得了较好的运行效率。在寻找地震地区相关性的应用实践中得到了一些有意义的结果。 相似文献
18.
This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, treemaps, and choropleths. There is also an interactive semantic search interface (Prospector), where users can save, refine, and analyse the results of semantic search queries over time. Practical use of the framework is exemplified through three case studies: a general scenario analysing tweets from UK politicians and the public’s response to them in the run up to the 2015 UK general election, an investigation of attitudes towards climate change expressed by these politicians and the public, via their engagement with environmental topics, and an analysis of public tweets leading up to the UK’s referendum on leaving the EU (Brexit) in 2016. The paper also presents a brief evaluation and discussion of some of the key text analysis components, which are specifically adapted to the domain and task, and demonstrate scalability and efficiency of our toolkit in the case studies. 相似文献
19.
基于Web大数据挖掘的证券价格波动实时影响研究 总被引:1,自引:0,他引:1
随着Web大数据的发展,互联网中海量、快捷的信息为证券市场变化预测提供了丰富的数据支撑,如何利用大数据分析技术进行实时可靠的证券市场价格变化预测成为重要的科学问题.从证券市场价格变化的核心价值问题研究出发,分析了股票价值所反映的基本面要求,建立了影响股票价值内涵和价格表现的10项准确可度量的特征因素:经济周期、财政政策、利率变动、汇率变动、物价变动、通货膨胀、政治政策、行业变化、经营状况、上下游影响等.在此基础上,构造互联网中信息内容与各个特征因素的提取方法、变化关系和影响模型,提出了针对大盘、行业、个股的互联网信息指标来反映Web数据对其的支撑程度,最终实现了基于Web大数据的综合特征因素度量来预测证券市场的方法.实验表明,该方法具有良好的可行性,将带来明显的学术和商业价值. 相似文献
20.
Irresponsible and negligent use of natural resources in the last five decades has made it an important priority to adopt more intelligent ways of managing existing resources, especially the ones related to energy. The main objective of this paper is to explore the opportunities of integrating internal data already stored in Data Warehouses together with external Big Data to improve energy consumption predictions. This paper presents a study in which we propose an architecture that makes use of already stored energy data and external unstructured information to improve knowledge acquisition and allow managers to make better decisions. This external knowledge is represented by a torrent of information that, in many cases, is hidden across heterogeneous and unstructured data sources, which are recuperated by an Information Extraction system. Alternatively, it is present in social networks expressed as user opinions. Furthermore, our approach applies data mining techniques to exploit the already integrated data. Our approach has been applied to a real case study and shows promising results. The experiments carried out in this work are twofold: (i) using and comparing diverse Artificial Intelligence methods, and (ii) validating our approach with data sources integration. 相似文献