首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 368 毫秒
1.
为了实现对贵州卷烟销售大数据的分析及可视化,基于开源数据分析工具KETTLE构建了ETL过程模型,对现有的卷烟销售基础大数据进行抽取、转换、加载,形成分析型数据,快速高效地实现了数据集成.经过ETL处理的集成数据,为卷烟销售大数据可视化系统及进一步的数据挖掘、决策支持提供了数据基础.同时,采用GIS平台构建和可视化技术,设计并实现了卷烟销售大数据可视化分析系统,为企业的数据分析和决策支持提供了数据展示及分析平台.  相似文献   

2.
烟草研发体系具有业务广泛、数据庞杂等特点,影响大数据分析技术在烟草研发领域的推广应用.构建面向烟草行业研发应用的大数据分析平台,遵循全局数据应用的整合思维,封装多源异构的烟草研发数据的采集与存储过程;采用应用场景驱动与数据分层融合策略,完成数据主题化规范化管理;设计并开发数据分析过程可视化编排器,简化数据资产价值挖掘过程.该平台能够让"非数据专家用户"更专注于烟草研发业务,有利于推动大数据分析技术在烟草研发体系中的应用.  相似文献   

3.
随着高性能计算机以及相关软硬件技术的飞速发展,数值模拟的规模越来越大,置信度越来越高,其产生的数据场规模亦越来越大、越来越复杂,需要更高级的科学计算可视化方法分析这些数值计算结果.然而,针对目前的TB级以及更大规模教据场,可视化的交互性能迟滞落后,成为阻碍数据分析的重要因素.如何提高可视化分析的交互速度,减少数据场的I/O处理是目前可视化系统必须解决的重要问题.多分辨技术是解决大规模数据场交互可视化的重要手段之一.自研发的大规模并行分布式数据分析与可视化系统JaVis采用多分辨技术提高数据可视化的交互速度,其中实现的关键技术如下:多分辨数据的组织、多分辨控制插件的生成、多分辨率层次切换技术等,并利用物理科学数据进行了性能及可靠性测试.  相似文献   

4.
随着人工智能技术的突破性进展,人工智能与可视化的交叉研究成为当前的研究热点之一,为人工智能和大数据分析领域的若干核心难题提供了启发式的理论、方法和技术.一方面,人工智能技术的创新应用提升了可视化的分析效率,拓展了分析功能,为大数据可视分析提供了强有力的工具.另一方面,可视化技术增强了以深度学习为代表的人工智能的可解释性...  相似文献   

5.
大数据可视分析综述   总被引:8,自引:0,他引:8  
任磊  杜一  马帅  张小龙  戴国忠 《软件学报》2014,25(9):1909-1936
可视分析是大数据分析的重要方法.大数据可视分析旨在利用计算机自动化分析能力的同时,充分挖掘人对于可视化信息的认知能力优势,将人、机的各自强项进行有机融合,借助人机交互式分析方法和交互技术,辅助人们更为直观和高效地洞悉大数据背后的信息、知识与智慧.主要从可视分析领域所强调的认知、可视化、人机交互的综合视角出发,分析了支持大数据可视分析的基础理论,包括支持分析过程的认知理论、信息可视化理论、人机交互与用户界面理论.在此基础上,讨论了面向大数据主流应用的信息可视化技术——面向文本、网络(图)、时空、多维的可视化技术.同时探讨了支持可视分析的人机交互技术,包括支持可视分析过程的界面隐喻与交互组件、多尺度/多焦点/多侧面交互技术、面向Post-WIMP的自然交互技术.最后,指出了大数据可视分析领域面临的瓶颈问题与技术挑战.  相似文献   

6.
滕琴  陈一民 《计算机时代》2022,(10):130-135
提出一种基于VR的大数据可视化教学系统的实现方法。给出了大数据可视化教学系统设计原则,基于并行实时渲染的大规模无缝拼接显示技术、数据可视化的多体感交互技术、大屏的视频点播技术、面向大数据可视化的全景拼接与显示等,提出了大数据可视化教学系统架构;给出了大数据分析过程的认知模型、高维数据的分析及呈现方法,以及基于可视化的数据理解与可用性分析评估。利用Unity3D技术,通过C#等开发了大数据可视化虚拟课堂。实践证明,该系统可有效提高学生的学习兴趣及效率,增强学生对于大数据本质的洞察和理解。  相似文献   

7.
李相霏  韩珂 《计算机时代》2021,(12):60-63,68
在新冠肺炎疫情背景下,通过数据爬虫技术获取开源的疫情数据,经数据处理后形成数据集,利用可视化技术进行数据可视化与数据分析.介绍了数据爬取与处理方法,利用Python语言的Flask框架等可视化技术完成疫情数据可视化,从多角度、多方面展示与分析疫情数据.帮助民众充分了解全国各地的疫情情况、疫情发展趋势以及疫苗接种情况.  相似文献   

8.
谢然 《互联网周刊》2014,(11):32-34
正随着城市、交通、气象等数据容量和复杂性的与日俱增,可视化的需求越来越大,依靠可视化手段进行数据分析将会成为业内的标准。同时随着上下游产业的完善以及政策的扶持,可视化技术必将在大数据产业中大放异彩。  相似文献   

9.
随着大数据、云计算、互联网技术的快速发展,面向不同网民用户的网络媒介大量涌现,与之而来产生出大量的用户行为数据信息,如何针对海量化大数据进行自动收集、整合与可视化分析,成为数据分析课程教学关注的主要方向。通过基于Python软件数据抓取工具,可针对挖掘、爬取用户行为文本数据的执行历程,设置网络数据信息采集、机器学习、可视化分析的教学环节,对学生展开大数据技术、文本挖掘流程、可视化呈现方式的教学,优化课程教学情境、教学内容、演示实践方式、师生交互模式,帮助学生掌握用户行为数据爬取技术、大数据分析流程,以培养出符合行业需求的大数据统计与分析人才。  相似文献   

10.
大数据系统和分析技术综述   总被引:15,自引:0,他引:15  
首先根据处理形式的不同,介绍了不同形式数据的特征和各自的典型应用场景以及相应的代表性处理系统,总结了大数据处理系统的三大发展趋势;随后,对系统支撑下的大数据分析技术和应用(包括深度学习、知识计算、社会计算与可视化等)进行了简要综述,总结了各种技术在大数据分析理解过程中的关键作用;最后梳理了大数据处理和分析面临的数据复杂性、计算复杂性和系统复杂性挑战,并逐一提出了可能的应对之策.  相似文献   

11.
12.
随着互联网的高速发展,特别是近年来云计算、物联网等新兴技术的出现,社交网络等服务的广泛应用,人类社会的数据的规模正快速地增长,大数据时代已经到来。如何获取,分析大数据已经成为广泛的问题。但随着带来的数据的安全性必须引起高度重视。本文从大数据的概念和特征说起,阐述大数据面临的安全挑战,并提出大数据的安全应对策略。  相似文献   

13.
Time series analysis has always been an important and interesting research field due to its frequent appearance in different applications. In the past, many approaches based on regression, neural networks and other mathematical models were proposed to analyze the time series. In this paper, we attempt to use the data mining technique to analyze time series. Many previous studies on data mining have focused on handling binary-valued data. Time series data, however, are usually quantitative values. We thus extend our previous fuzzy mining approach for handling time-series data to find linguistic association rules. The proposed approach first uses a sliding window to generate continues subsequences from a given time series and then analyzes the fuzzy itemsets from these subsequences. Appropriate post-processing is then performed to remove redundant patterns. Experiments are also made to show the performance of the proposed mining algorithm. Since the final results are represented by linguistic rules, they will be friendlier to human than quantitative representation.  相似文献   

14.
Compression-based data mining of sequential data   总被引:3,自引:1,他引:2  
The vast majority of data mining algorithms require the setting of many input parameters. The dangers of working with parameter-laden algorithms are twofold. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process. Data mining algorithms should have as few parameters as possible. A parameter-light algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics, learning, and computational theory hold great promise for a parameter-light data-mining paradigm. The results are strongly connected to Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen lines of code. We will show that this approach is competitive or superior to many of the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/XML/video datasets. As a further evidence of the advantages of our method, we will demonstrate its effectiveness to solve a real world classification problem in recommending printing services and products. Responsible editor: Johannes Gehrke  相似文献   

15.
The optimization capabilities of RDBMSs make them attractive for executing data transformations. However, despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produce several output tuples for a single input tuple cannot be expressed in that way.

To overcome this limitation, we propose to extend Relational Algebra with a new operator named data mapper. In this paper, we formalize the data mapper operator and investigate some of its properties. We then propose a set of algebraic rewriting rules that enable the logical optimization of expressions with mappers and prove their correctness. Finally, we experimentally study the proposed optimizations and identify the key factors that influence the optimization gains.  相似文献   


16.
As the amount of multimedia data is increasing day-by-day thanks to cheaper storage devices and increasing number of information sources, the machine learning algorithms are faced with large-sized datasets. When original data is huge in size small sample sizes are preferred for various applications. This is typically the case for multimedia applications. But using a simple random sample may not obtain satisfactory results because such a sample may not adequately represent the entire data set due to random fluctuations in the sampling process. The difficulty is particularly apparent when small sample sizes are needed. Fortunately the use of a good sampling set for training can improve the final results significantly. In KDD’03 we proposed EASE that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. (1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. (2) EASE was shown to work on IBM QUEST dataset which is a categorical count data set. EASIER, in addition, is shown to work on continuous data of images and audio features. We have successfully applied EASIER to image classification and audio event identification applications. Experimental results show that EASIER outperforms SRS significantly. Surong Wang received the B.E. and M.E. degree from the School of Information Engineering, University of Science and Technology Beijing, China, in 1999 and 2002 respectively. She is currently studying toward for the Ph.D. degree at the School of Computer Engineering, Nanyang Technological University, Singapore. Her research interests include multimedia data processing, image processing and content-based image retrieval. Manoranjan Dash obtained Ph.D. and M. Sc. (Computer Science) degrees from School of Computing, National University of Singapore. He has worked in academic and research institutes extensively and has published more than 30 research papers (mostly refereed) in various reputable machine learning and data mining journals, conference proceedings, and books. His research interests include machine learning and data mining, and their applications in bioinformatics, image processing, and GPU programming. Before joining School of Computer Engineering (SCE), Nanyang Technological University, Singapore, as Assistant Professor, he worked as a postdoctoral fellow in Northwestern University. He is a member of IEEE and ACM. He has served as program committee member of many conferences and he is in the editorial board of “International journal of Theoretical and Applied Computer Science.” Liang-Tien Chia received the B.S. and Ph.D. degrees from Loughborough University, in 1990 and 1994, respectively. He is an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. He has recently been appointed as Head, Division of Computer Communications and he also holds the position of Director, Centre for Multimedia and Network Technology. His research interests include image/video processing & coding, multimodal data fusion, multimedia adaptation/transmission and multimedia over the Semantic Web. He has published over 80 research papers.  相似文献   

17.
18.
Linear combinations of translates of a given basis function have long been successfully used to solve scattered data interpolation and approximation problems. We demonstrate how the classical basis function approach can be transferred to the projective space ℙ d−1. To be precise, we use concepts from harmonic analysis to identify positive definite and strictly positive definite zonal functions on ℙ d−1. These can then be applied to solve problems arising in tomography since the data given there consists of integrals over lines. Here, enhancing known reconstruction techniques with the use of a scattered data interpolant in the “space of lines”, naturally leads to reconstruction algorithms well suited to limited angle and limited range tomography. In the medical setting algorithms for such incomplete data problems are desirable as using them can limit radiation dosage.  相似文献   

19.
自互联网出现以来,数据保护一直是个难题。当社交媒体网站在数字市场上大展拳脚的那一刻,对用户数据和信息的保护让决策者们不得不保持警惕。在数字经济时代的背景下,数据逐渐成为企业提升竞争力的重要要素,围绕着数据展开的市场竞争越来越多。数字经济时代,企业对数据资源的重视与争夺,将网络平台权利与用户个人信息保护、互联网企业之间有关数据不正当竞争的纠纷和冲突,推上了风口浪尖。因此,如何协调和把握数据的合理利用和保护之间的关系,规制不正当竞争行为,以求在数字经济快速发展的洪流中,占据竞争优势显得尤为重要。文章将通过分析数据的二元性,讨论数据在数字经济时代的价值,并结合反不正当竞争法和实践案例,进一步讨论数据利用和保护的关系。  相似文献   

20.
Existing automated test data generation techniques tend to start from scratch, implicitly assuming that no pre‐existing test data are available. However, this assumption may not always hold, and where it does not, there may be a missed opportunity; perhaps the pre‐existing test cases could be used to assist the automated generation of additional test cases. This paper introduces search‐based test data regeneration, a technique that can generate additional test data from existing test data using a meta‐heuristic search algorithm. The proposed technique is compared to a widely studied test data generation approach in terms of both efficiency and effectiveness. The empirical evaluation shows that test data regeneration can be up to 2 orders of magnitude more efficient than existing test data generation techniques, while achieving comparable effectiveness in terms of structural coverage and mutation score. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号