首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

Creating an interactive, accurate, and low-latency big data visualisation is challenging due to the volume, variety, and velocity of the data. Visualisation options range from visualising the entire big dataset, which could take a long time and be taxing to the system, to visualising a small subset of the dataset, which could be fast and less taxing to the system but could also lead to a less-beneficial visualisation as a result of information loss. The main research questions investigated by this work are what effect sampling has on visualisation insight and how to provide guidance to users in navigating this trade-off. To investigate these issues, we study an initial case of simple estimation tasks on histogram visualisations of sampled big data, in hopes that these results may generalise. Leveraging sampling, we generate subsets of large datasets and create visualisations for a crowd-sourced study involving a simple cognitive visualisation task. Using the results of this study, we quantify insight, sampling, visualisation, and perception error in comparison to the full dataset. We use these results to model the relationship between sample size and insight error, and we propose the use of our model to guide big data visualisation sampling.  相似文献   

2.
时空数据表达研究   总被引:1,自引:0,他引:1  
描述了目前时态数据模型和时空数据模型的发展,现已共识时态是任何信息的一个重要属性,但是时态数据库中时态关系代数的基本思路是通过在关系模式上显式化时变语义来进行简单结构的时态数据建模,而时空数据建模中时态对象代数是在对象结构上显式化时变语义来进行复杂结构的时态数据建模,并深入探讨了时空数据模型中地理信息的时间维表达方式,指出了各种时空数据模型存在的主要问题。  相似文献   

3.

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

  相似文献   

4.
随着生物医药文献的快速积累,利用文本挖掘技术处理海量的科技文献,从而发现生命科学领域新的知识,已成为当前数据挖掘和人工智能领域研究的热点.从Swanson最早提出基于生物医学文献的知识发现方法到现在,许多研究人员投入到这个新兴的领域中.对基于生物医学文献的知识发现的研究内容、研究方法以及成果进行了系统的分析和阐述,对不同的研究方法在文本挖掘过程中的优劣进行了比较,对基于生物医学文献的知识发现的发展趋势进行了展望.  相似文献   

5.
随着生物医药文献的快速积累,利用文本挖掘技术处理海量的科技文献,从而发现生命科学领域新的知识,已成为当前数据挖掘和人工智能领域研究的热点。从Swanson最早提出基于生物医学文献的知识发现方法到现在,许多研究人员投入到这个新兴的领域中。对基于生物医学文献的知识发现的研究内容、研究方法以及成果进行了系统的分析和阐述,对不同的研究方法在文本挖掘过程中的优劣进行了比较,对基于生物医学文献的知识发现的发展趋势进行了展望。  相似文献   

6.
Social networking sites such as Facebook or Twitter attract millions of users, who everyday post an enormous amount of content in the form of tweets, comments and posts. Since social network texts are usually short, learning tasks have to deal with a very high dimensional and sparse feature space, in which most features have low frequencies. As a result, extracting useful knowledge from such noisy data is a challenging task, that converts large-scale short-text learning tasks in social environments into one of the most relevant problems in machine learning and data mining. Feature selection is one of the most known and commonly used techniques for reducing the impact of the high dimensional feature space in text learning. A wide variety of feature selection techniques can be found in the literature applied to traditional, long-texts and document collections. However, short-texts coming from the social Web pose new challenges to this well-studied problem as texts’ shortness offers a limited context to extract enough statistical evidence about words relations (e.g. correlation), and instances usually arrive in continuous streams (e.g. Twitter timeline), so that the number of features and instances is unknown, among other problems. This paper surveys feature selection techniques for dealing with short texts in both offline and online settings. Then, open issues and research opportunities for performing online feature selection over social media data are discussed.  相似文献   

7.
Feature construction has been studied extensively, including for 0/1 data samples. Given the recent breakthroughs in closedness-related constraint-based mining, we are considering its impact on feature construction for classification tasks. We investigate the use of condensed representations of frequent itemsets based on closedness properties as new features. These itemset types have been proposed to avoid set counting in difficult association rule mining tasks, i.e. when data are noisy and/or highly correlated. However, our guess is that their intrinsic properties (say the maximality for the closed itemsets and the minimality for the δ-free itemsets) should have an impact on feature quality. Understanding this remains fairly open, and we discuss these issues thanks to itemset properties on the one hand and an experimental validation on various data sets (possibly noisy) on the other hand.  相似文献   

8.
面向对象的城市交通规划时空数据模型   总被引:1,自引:0,他引:1  
阐述了交通规划信息的时空特性,以城市交通规划为背景,对城市交通规划时空数据模型的设计思想和方法进行了研究,提出了一个城市交通规划时空数据模型。模型采用失量结构,并采用面向对象的技术进行概括。着重对交通规划基础信息中的面状要素、线状要素和点状要素的时空表达进行了介绍。  相似文献   

9.
传感器技术的飞速发展催生大量交通轨迹数据,轨迹异常检测在智慧交通、自动驾驶、视频监控等领域具有重要的应用价值.不同于分类、聚类和预测等轨迹挖掘任务,轨迹异常检测旨在发现小概率、不确定和罕见的轨迹行为.轨迹异常检测中一些常见的挑战与异常值类型、轨迹数据标签、检测准确率以及计算复杂度有关.针对上述问题,全面综述近20年来轨迹异常检测技术的研究现状和最新进展.首先,对轨迹异常检测问题的特点与目前存在的研究挑战进行剖析.然后,基于轨迹标签的可用性、异常检测算法原理、离线或在线算法工作方式等分类标准,对现有轨迹异常检测算法进行对比分析.对于每一类异常检测技术,从算法原理、代表性方法、复杂度分析以及算法优缺点等方面进行详细总结与剖析.接着,讨论开源的轨迹数据集、常用的异常检测评估方法以及异常检测工具.在此基础上,给出轨迹异常检测系统架构,形成从轨迹数据采集到异常检测应用等一系列相对完备的轨迹挖掘流程.最后,总结轨迹异常检测领域关键的开放性问题,并展望未来的研究趋势和解决思路.  相似文献   

10.
Visualising how social networks evolve is important in intelligence analysis in order to detect and monitor issues, such as emerging crime patterns or rapidly growing groups of offenders. It remains an open research question how this type of information should be presented for visual exploration. To get a sense of how users work with different types of visualisations, we evaluate a matrix and a node-link diagram in a controlled thinking aloud study. We describe the sense-making strategies that users adopted during explorative and realistic tasks. Thereby, we focus on the user behaviour in switching between the two visualisations and propose a set of nine strategies. Based on a qualitative and quantitative content analysis we show which visualisation supports which strategy better. We find that the two visualisations clearly support intelligence tasks and that for some tasks the combined use is more advantageous than the use of an individual visualisation.  相似文献   

11.
In this paper, we consider instance selection as an important focusing task in the data preparation phase of knowledge discovery and data mining. Focusing generally covers all issues related to data reduction. First of all, we define a broader perspective on focusing tasks, choose instance selection as one particular focusing task, and outline the specification of concrete evaluation criteria to measure success of instance selection approaches. Thereafter, we present a unifying framework that covers existing approaches towards solutions for instance selection as instantiations. We describe specific examples of instantiations of this framework and discuss their strengths and weaknesses. Then, we outline an enhanced framework for instance selection, generic sampling, and summarize example evaluation results for several different instantiations of its implementation. Finally, we conclude with open issues and research challenges for instance selection as well as focusing in general.  相似文献   

12.
时空轨迹数据的获取变得越来越容易,轨迹数据刻画了移动对象的行为模式与活动规律,是对移动对象在时空环境下的移动模式和行为特征的真实写照,在城市规划、交通管理、服务推荐、位置预测等领域具有重要的应用价值。这些过程通常需要通过对时空轨迹数据进行模式挖掘才能得以实现。简述了轨迹数据挖掘的预处理和基本步骤,归纳了异常轨迹检测方法的分类,分析、总结了近年来基于轨迹数据的四种模式挖掘,从管理决策角度对轨迹数据挖掘进行相关综述和分析,有望为轨迹数据的模式挖掘与管理决策提供必要的文献资料和理论基础。  相似文献   

13.
《Ergonomics》2012,55(5):659-673
In recent years, advances in sensor technology, connectedness and computational power have come together to produce huge data-sets. The treatment and analysis of these data-sets is known as big data analytics (BDA), and the somewhat related term data mining. Fields allied to human factors/ergonomics (HFE), e.g. statistics, have developed computational methods to derive meaningful, actionable conclusions from these data bases. This paper examines BDA, often characterised by volume, velocity and variety, giving examples of successful BDA use. This examination provides context by considering examples of using BDA on human data, using BDA in HFE studies, and studies of how people perform BDA. Significant issues for HFE are the reliance of BDA on correlation rather than hypotheses and theory, the ethics of BDA and the use of HFE in data visualisation.  相似文献   

14.
马慧  汤庸  吴凌坤 《计算机科学》2011,38(4):221-225
正确发现流程实际运作情况对工作流管理有着重要的意义。工作流挖掘抽取系统日志信息,挖掘流程的真实运作模型。其中挖掘隐含任务是工作流挖掘中待研究问题之一。基于a算法,提出了能挖掘隐含任务的挖掘算法aH。分析了隐含任务出现的可能情况,通过判断并行任务的位置关系,往工作流网中添加隐含任务;然后合并相同的隐含任务,去掉冗余隐含任务,以完善结果模型。实现了。算法原型,实验证实了方法的可行性及有效性,并分析了方法的不足之处。  相似文献   

15.
The State of the Art in Flow Visualisation: Feature Extraction and Tracking   总被引:3,自引:0,他引:3  
Flow visualisation is an attractive topic in data visualisation, offering great challenges for research. Very large data sets must be processed, consisting of multivariate data at large numbers of grid points, often arranged in many time steps. Recently, the steadily increasing performance of computers again has become a driving force for new advances in flow visualisation, especially in techniques based on texturing, feature extraction, vector field clustering, and topology extraction. In this article we present the state of the art in feature‐based flow visualisation techniques. We will present numerous feature extraction techniques, categorised according to the type of feature. Next, feature tracking and event detection algorithms are discussed, for studying the evolution of features in time‐dependent data sets. Finally, various visualisation techniques are demonstrated. ACM CSS: I.3.8 Computer Graphics—applications  相似文献   

16.

Many computer vision-based techniques utilize semantic information i.e. scene text present in a natural scene for image analysis. Subsequently, in recent times researchers pay more attention to key tasks such as scene text detection, recognition, and end-to-end system. In this survey, we have given a comprehensive review of the recent advances on these key tasks. The review focused firstly on the traditional methods and their categorization, also show the evolution of scene text detection, recognition methods, and end-to-end systems with their pros and cons. Secondly, this survey focuses on the latest state-of-the-art (SOTA) methods based on transfer learning and additionally do the extension of scene text reading system i.e. salient text detection, text or non-text image classification, a fusion of scene text in vision and language, etc. After that, we have done a performance analysis on various SOTA methods on the various key issues and techniques. Finally, we discuss the various evaluation metrics and standard dataset on which the various SOTA methods of scene text detection is investigated and compared.

  相似文献   

17.
A Data Mining Approach for Retailing Bank Customer Attrition Analysis   总被引:3,自引:1,他引:3  
Deregulation within the financial service industries and the widespread acceptance of new technologies is increasing competition in the finance marketplace. Central to the business strategy of every financial service company is the ability to retain existing customers and reach new prospective customers. Data mining is adopted to play an important role in these efforts. In this paper, we present a data mining approach for analyzing retailing bank customer attrition. We discuss the challenging issues such as highly skewed data, time series data unrolling, leaker field detection etc, and the procedure of a data mining project for the attrition analysis for retailing bank customers. We use lift as a proper measure for attrition analysis and compare the lift of data mining models of decision tree, boosted naïve Bayesian network, selective Bayesian network, neural network and the ensemble of classifiers of the above methods. Some interesting findings are reported. Our research work demonstrates the effectiveness and efficiency of data mining in attrition analysis for retailing bank.  相似文献   

18.

Social networking platforms have witnessed tremendous growth of textual, visual, audio, and mix-mode contents for expressing the views or opinions. Henceforth, Sentiment Analysis (SA) and Emotion Detection (ED) of various social networking posts, blogs, and conversation are very useful and informative for mining the right opinions on different issues, entities, or aspects. The various statistical and probabilistic models based on lexical and machine learning approaches have been employed for these tasks. The emphasis was given to the improvement in the contemporary tools, techniques, models, and approaches, are reflected in majority of the literature. With the recent developments in deep neural networks, various deep learning models are being heavily experimented for the accuracy enhancement in the aforementioned tasks. Recurrent Neural Network (RNN) and its architectural variants such as Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) comprise an important category of deep neural networks, basically adapted for features extraction in the temporal and sequential inputs. Input to SA and related tasks may be visual, textual, audio, or any combination of these, consisting of an inherent sequentially, we critically investigate the role of sequential deep neural networks in sentiment analysis of multimodal data. Specifically, we present an extensive review over the applicability, challenges, issues, and approaches for textual, visual, and multimodal SA using RNN and its architectural variants.

  相似文献   

19.
Hu  Rui  Yan  Zheng  Ding  Wenxiu  Yang  Laurence T. 《World Wide Web》2020,23(2):1441-1463

Internet of Things (IoT), as a typical representation of cyberization, enables the interconnection of physical things and the Internet, which provides intelligent and advanced services for industrial production and human lives. However, it also brings new challenges to IoT applications due to heterogeneity, complexity and dynamic nature of IoT. Especially, it is difficult to determine the sources of specified data, which is vulnerable to inserted attacks raised by different parties during data transmission and processing. In order to solve these issues, data provenance is introduced, which records data origins and the history of data generation and processing, thus possible to track the sources and reasons of any problems. Though some related researches have been proposed, the literature still lacks a comprehensive survey on data provenance in IoT. In this paper, we first propose a number of design requirements of data provenance in IoT by analyzing the features of IoT data and applications. Then, we provide a deep-insight review on existing schemes of IoT data provenance and employ the requirements to discuss their pros and cons. Finally, we summarize a number of open issues to direct future research.

  相似文献   

20.

One of the most challenging safety problems in open pit mines is backbreak during blasting operation, and its prediction is very important for a technically and economically successful mining operation. This paper presents application of particle swarm optimization (PSO) technique to estimate the backbreak induced by bench blasting, based on major controllable blasting parameters. Two forms of PSO models, linear and quadratic, are developed based on blasting data from Sungun copper mine, Iran. According to obtained results, both models can be used to predict the backbreak, but the comparison of two models, in terms of statistical performance indices, shows that the quadratic form provides better results than the linear form.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号