首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

The ability to exploit students’ sentiments using different machine learning techniques is considered an important strategy for planning and manoeuvring in a collaborative educational environment. The advancement of machine learning technology is energised by the healthy growth of big data technologies. This helps the applications based on Sentiment Mining (SM) using big data to become a common platform for data mining activities. However, very little has been studied on the sentiment application using a huge amount of available educational data. Therefore, this paper has made an attempt to mine the academic data using different efficient machine learning algorithms. The contribution of this paper is two-fold: (i) studying the sentiment polarity (positive, negative and neutral) from students’ data using machine learning techniques, and (ii) modelling and predicting students’ emotions (Amused, Anxiety, Bored, Confused, Enthused, Excited, Frustrated, etc.) using the big data frameworks. The developed SM techniques using big data frameworks can be scaled and made adaptable for source variation, velocity and veracity to maximise value mining for the benefit of students, faculties and other stakeholders.  相似文献   

2.
In recent years, the scrutiny of bitcoin and other cryptocurrencies as legal and regulated components of financial systems has been increasing. Bitcoin is currently one of the largest cryptocurrencies in terms of capital market share. Therefore, this study proposes that sentiment analysis can be used as a computational tool to predict the prices of bitcoin and other cryptocurrencies for different time intervals. A key characteristic of the cryptocurrency market is that the fluctuation of currency prices depends on people's perceptions and opinions, not institutional money regulation. Therefore, analysing the relationship between social media and web search is crucial for cryptocurrency price prediction. This study uses Twitter and Google Trends to forecast the short-term prices of the primary cryptocurrencies, as these social media platforms are used to influence purchasing decisions. The study adopts and interpolates a unique multimodel approach to analyse the impact of social media on cryptocurrency prices. Our results prove that people's psychological and behavioural attitudes have a significant impact on the highly speculative cryptocurrency prices.  相似文献   

3.
Healthcare scientific applications, such as body area network, require of deploying hundreds of interconnected sensors to monitor the health status of a host. One of the biggest challenges is the streaming data collected by all those sensors, which needs to be processed in real time. Follow-up data analysis would normally involve moving the collected big data to a cloud data center for status reporting and record tracking purpose. Therefore, an efficient cloud platform with very elastic scaling capacity is needed to support such kind of real time streaming data applications. The current cloud platform either lacks of such a module to process streaming data, or scales in regard to coarse-grained compute nodes.In this paper, we propose a task-level adaptive MapReduce framework. This framework extends the generic MapReduce architecture by designing each Map and Reduce task as a consistent running loop daemon. The beauty of this new framework is the scaling capability being designed at the Map and Task level, rather than being scaled from the compute-node level. This strategy is capable of not only scaling up and down in real time, but also leading to effective use of compute resources in cloud data center. As a first step towards implementing this framework in real cloud, we developed a simulator that captures workload strength, and provisions the amount of Map and Reduce tasks just in need and in real time.To further enhance the framework, we applied two streaming data workload prediction methods, smoothing and Kalman filter, to estimate the unknown workload characteristics. We see 63.1% performance improvement by using the Kalman filter method to predict the workload. We also use real streaming data workload trace to test the framework. Experimental results show that this framework schedules the Map and Reduce tasks very efficiently, as the streaming data changes its arrival rate.  相似文献   

4.
This study presents a model for the early identification of students who are likely to fail in an academic course. To enhance predictive accuracy, sentiment analysis is used to identify affective information from text‐based self‐evaluated comments written by students. Experimental results demonstrated that adding extracted sentiment information from student self‐evaluations yields a significant improvement in early‐stage prediction quality. The results also indicate the limited early‐stage predictive value of structured data, such as homework completion, attendance, and exam grades, due to data sparseness at the beginning of the course. Thus, applying sentiment analysis to unstructured data (e.g., self‐evaluation comments) can play an important role in improving the accuracy of early‐stage predictions. The findings present educators with an opportunity to provide students with real‐time feedback and support to help students become self‐regulated learners. Using the exploring results for improvement in teaching and learning initiatives is important to maintain students' performances and the effectiveness of the learning process.  相似文献   

5.
Recent years have witnessed a rapid spread of multi-modality microblogs like Twitter and Sina Weibo composed of image, text and emoticon. Visual sentiment prediction of such microblog based social media has recently attracted ever-increasing research focus with broad application prospect. In this paper, we give a systematic review of the recent advances and cutting-edge techniques for visual sentiment analysis. To this end, in this paper we review the most recent works in this topic, in which detailed comparison as well as experimental evaluation are given over the cutting-edge methods. We further reveal and discuss the future trends and potential directions for visual sentiment prediction.  相似文献   

6.
大数据时代信息技术的快速发展,依托于各类硬件防护设备的网络体系架构的异构数据量每天以指数级的量级递增,基于传统的网络安全防护技术无法有效的适用于具有海量数据的特征网络安全和分析预测等工作,因此海量数据的保存、使用、以及分析等信息挖掘和数据分析预测逐步成为社会各界重视和当前的研究趋势。本文以海量的异构数据为研究对象,识别网络安全大数据的典型特征,结合情报预测的主要方法,创新性的提出了大数据特征下的网络安全预测分析技术,提高网络安全风险识别和预测、俞静能力,有效的改善网络防护效果。  相似文献   

7.
With the development of Internet, people are more likely to post and propagate opinions online. Sentiment analysis is then becoming an important challenge to understand the polarity beneath these comments. Currently a lot of approaches from natural language processing’s perspective have been employed to conduct this task. The widely used ones include bag-of-words and semantic oriented analysis methods. In this research, we further investigate the structural information among words, phrases and sentences within the comments to conduct the sentiment analysis. The idea is inspired by the fact that the structural information is playing important role in identifying the overall statement’s polarity. As a result a novel sentiment analysis model is proposed based on recurrent neural network, which takes the partial document as input and then the next parts to predict the sentiment label distribution rather than the next word. The proposed method learns words representation simultaneously the sentiment distribution. Experimental studies have been conducted on commonly used datasets and the results have shown its promising potential.  相似文献   

8.
Incremental feature extraction is effective for facilitating the analysis of large-scale streaming data. However, most current incremental feature extraction methods are not suitable for processing streaming data with high feature dimensions because only a few methods have low time complexity, which is linear with both the number of samples and features. In addition, feature extraction methods need to improve the performance of further classification. Therefore, incremental feature extraction methods need to be more efficient and effective. Partial least squares (PLS) is known to be an effective dimension reduction technique for classification. However, the application of PLS to streaming data is still an open problem. In this study, we propose a highly efficient and powerful dimension reduction algorithm called incremental PLS (IPLS), which comprises a two-stage extraction process. In the first stage, the PLS target function is adapted so it is incremental by updating the historical mean to extract the leading projection direction. In the second stage, the other projection directions are calculated based on the equivalence between the PLS vectors and the Krylov sequence. We compared the performance of IPLS with other state-of-the-art incremental feature extraction methods such as incremental principal components analysis, incremental maximum margin criterion, and incremental inter-class scatter using real streaming datasets. Our empirical results showed that IPLS performed better than other methods in terms of its efficiency and further classification accuracy.  相似文献   

9.
文本情感分类是指通过挖掘和分析文本中的观点、意见和看法等主观信息,对文本的情感倾向做出类别判断。基于集成情感成员模型提出一种文本情感分析方法。把基于改进的神经网络、基于语义特征和基于条件随机场的三个情感分类模型作为成员模型集成在一起。集成后的模型能够涵盖不同的情感特征,从而克服了传统集成学习中仅关注成员模型处理结果的不足。以公开语料进行实验,集成模型融合了多个成员模型的优势,分类正确率达到了88.2%,远高于任一成员模型的效果。  相似文献   

10.
物联网感知流数据多以时序数据为主,具有数据量大、连续到达、多来源等特点。现有的基于HBase的交通流数据存储系统在数据写入并发量大时,仍然存在存储效率低与系统可用性不高的问题。针对该问题,设计并实现了基于负载均衡的多源流数据实时存储系统。该系统将数据代理扩展为集群架构,提出了一种基于负载均衡的任务调度算法,实现了任务与数据代理之间的按序匹配,使数据代理集群负载均衡地处理任务,实现数据并行存储到HBase数据库中。实验对比结果表明:该系统使各数据代理的数据分配比例维持在0.3~0.4,同时以约1.5倍于单数据代理的速度将数据写入HBase数据库。  相似文献   

11.
跨领域中文评论的情感分类研究   总被引:1,自引:0,他引:1  
主要对跨领域中文评论句中的各个评价对象所对应的观点表达的情感倾向进行研究。在结合单一领域特别是产品领域中情感分类的常用算法以及结合跨领域评论观点表达的特殊性的基础上,提出了基于词典资源和有监督机器学习这两种方法来对跨领域中文评论句进行情感分类,探讨了跨领域中文评论在算法上与单一领域的异同,同时对两种方法进行了比较。实验结果表明,提出的方法具有较大的实用价值。  相似文献   

12.
With the accelerated process of urbanization, more and more people tend to live in cities. In order to deal with the big data that are generated by citizens and public city departments, new information and communication technologies are utilized to process the urban data, which makes it more easier to manage. Cloud computing is a novel computation technology. After cloud computing was commercialized, there have been lot of cloud-based applications. Since the cloud service is provided by the third party, the cloud is semi-trusted. Due to the features of cloud computing, there are many security issues in cloud computing. Attribute-based encryption (ABE) is a promising cryptography technique which can be used in the cloud to solve many security issues. In this paper, we propose a framework for urban data sharing by exploiting the attribute-based cryptography. In order to fit the real world ubiquitous-cities utilization, we extend our scheme to support dynamic operations. In particular, from the part of performance analysis, it can be concluded that our scheme is secure and can resist possible attacks. Moreover, experimental results and comparisons show that our scheme is more efficient in terms of computation.  相似文献   

13.
ABSTRACT

Clustering techniques are very attractive for identifying and extracting patterns of interests from datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality, heterogeneity, and high complexity of some algorithms. Distributed clustering techniques constitute a very good alternative to the Big Data challenges (e.g., Volume, Variety, Veracity, and Velocity). In this paper, we developed and implemented a Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing. The DPDC approach consists of two phases. The first phase is fully parallel and it generates local clusters and the second phase aggregates the local results to obtain global clusters. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. DPDC was thoroughly tested and compared to well-known clustering algorithms BIRCH and CURE. The results show that the approach not only produces high-quality results but also scales up very well by taking advantage of the Hadoop MapReduce paradigm or any distributed system.  相似文献   

14.
We tackle the crucial challenge of fusing different modalities of features for multimodal sentiment analysis. Mainly based on neural networks, existing approaches largely model multimodal interactions in an implicit and hard-to-understand manner. We address this limitation with inspirations from quantum theory, which contains principled methods for modeling complicated interactions and correlations. In our quantum-inspired framework, the word interaction within a single modality and the interaction across modalities are formulated with superposition and entanglement respectively at different stages. The complex-valued neural network implementation of the framework achieves comparable results to state-of-the-art systems on two benchmarking video sentiment analysis datasets. In the meantime, we produce the unimodal and bimodal sentiment directly from the model to interpret the entangled decision.  相似文献   

15.
Information systems (IS) research has explored “effective use” in a variety of contexts. However, it is yet to specifically consider it in the context of the unique characteristics of big data. Yet, organizations have a high appetite for big data, and there is growing evidence that investments in big data solutions do not always lead to the derivation of intended value. Accordingly, there is a need for rigorous academic guidance on what factors enable effective use of big data. With this paper, we aim to guide IS researchers such that the expansion of the body of knowledge on the effective use of big data can proceed in a structured and systematic manner and can subsequently lead to empirically driven guidance for organizations. Namely, with this paper, we cast a wide net to understand and consolidate from literature the potential factors that can influence the effective use of big data, so they may be further studied. To do so, we first conduct a systematic literature review. Our review identifies 41 factors, which we categorize into 7 themes, namely data quality; data privacy and security and governance; perceived organizational benefit; process management; people aspects; systems, tools, and technologies; and organizational aspects. To explore the existence of these themes in practice, we then analyze 45 published case studies that document insights into how specific companies use big data successfully. Finally, we propose a framework for the study of effective use of big data as a basis for future research. Our contributions aim to guide researchers in establishing the relevance and relationships within the identified themes and factors and are a step toward developing a deeper understanding of effective use of big data.  相似文献   

16.
With the rapid development of economy and the frequent occurrence of air pollution incidents, the problem of air pollution has become a hot issue of concern to the whole people. The air quality big data is generally characterized by multi-source heterogeneity, dynamic mutability, and spatial–temporal correlation, which usually uses big data technology for air quality analysis after data fusion. In recent years, various models and algorithms using big data techniques have been proposed. To summarize these methodologies of air quality study, in this paper, we first classify air quality monitoring by big data techniques into three categories, consisting of the spatial model, temporal model and spatial–temporal model. Second, we summarize the typical methods by big data techniques that are needed in air quality forecasting into three folds, which are statistical forecasting model, deep neural network model, and hybrid model, presenting representative scenarios in some folds. Third, we analyze and compare some representative air pollution traceability methods in detail, classifying them into two categories: traditional model combined with big data techniques and data-driven model. Finally, we provide an outlook on the future of air quality analysis with some promising and challenging ideas.  相似文献   

17.
Edge computing combining with artificial intelligence (AI) has enabled the timely processing and analysis of streaming data produced by IoT intelligent applications. However, it causes privacy risk due to the data exchanges between local devices and untrusted edge servers. The powerful analytical capability of AI further exacerbates the risks because it can even infer private information from insensitive data. In this paper, we propose a privacy-preserving IoT streaming data analytical framework based on edge computing, called PrivStream, to prevent the untrusted edge server from making sensitive inferences from the IoT streaming data. It utilizes a well-designed deep learning model to filter the sensitive information and combines with differential privacy to protect against the untrusted edge server. The noise is also injected into the framework in the training phase to increase the robustness of PrivStream to differential privacy noise. Taking into account the dynamic and real-time characteristics of streaming data, we realize PrivStream with two types of models to process data segment with fixed length and variable length, respectively, and implement it on a distributed streaming platform to achieve real-time streaming data transmission. We theoretically prove that Privstream satisfies ε-differential privacy and experimentally demonstrate that PrivStream has better performance than the state-of-the-art and has acceptable computation and storage overheads.  相似文献   

18.
The popularity of many social media sites has prompted both academic and practical research on the possibility of mining social media data for the analysis of public sentiment. Studies have suggested that public emotions shown through Twitter could be well correlated with the Dow Jones Industrial Average. However, it remains unclear how public sentiment, as reflected on social media, can be used to predict stock price movement of a particular publicly-listed company. In this study, we attempt to fill this research void by proposing a technique, called SMeDA-SA, to mine Twitter data for sentiment analysis and then predict the stock movement of specific listed companies. For the purpose of experimentation, we collected 200 million tweets that mentioned one or more of 30 companies that were listed in NASDAQ or the New York Stock Exchange. SMeDA-SA performs its task by first extracting ambiguous textual messages from these tweets to create a list of words that reflects public sentiment. SMeDA-SA then made use of a data mining algorithm to expand the word list by adding emotional phrases so as to better classify sentiments in the tweets. With SMeDA-SA, we discover that the stock movement of many companies can be predicted rather accurately with an average accuracy over 70%. This paper describes how SMeDA-SA can be used to mine social media date for sentiments. It also presents the key implications of our study.  相似文献   

19.
In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top–Down Specialization (TDS) and Bottom–Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.  相似文献   

20.
《Information & Management》2016,53(8):964-977
As taxi service is supervised by certain electronic equipment (e.g., global positioning system (GPS) equipment) and network technique (e.g., cab reservation through Uber in USA or DIDI in China), taxi business is a typical electronic commerce mode. For a long time, taxi service is facing a typical challenge, that is, passengers may be detoured and overcharged by some unethical taxi drivers, especially when traveling in unfamiliar cities. As a result, it is important to detect taxi drivers’ misbehavior through taxi’s GPS big data analysis in a real-time manner for enhancing the quality of taxi services. In view of this challenge, an online anomalous trajectory detection method, named OnATrade (pronounced “on a trade,” which means activities in a taxi trade on the fly), is investigated in this paper for improving taxi service using GPS big data. The method mainly consists of two steps: route recommendation and online detection. In the first step, route candidates are generated by using a route recommendation algorithm. In the second step, an online anomalous trajectory detection approach is presented to find taxis that have driving anomalies. Experiments evaluate the validity of our method on large-scale, real-world taxi GPS trajectories. Finally, several value-added applications benefiting from big data analysis over taxi’s GPS data sets are discussed for potential commercial applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号