首页 | 本学科首页   官方微博 | 高级检索  
     

一个基于Web的轻量级大数据处理与可视化工具
引用本文:李炎,马俊明,安博,曹东刚. 一个基于Web的轻量级大数据处理与可视化工具[J]. 计算机科学, 2018, 45(9): 60-64, 93
作者姓名:李炎  马俊明  安博  曹东刚
作者单位:高可信软件技术教育部重点实验室北京大学 北京100871;北京大学信息科学技术学院 北京100871,高可信软件技术教育部重点实验室北京大学 北京100871,高可信软件技术教育部重点实验室北京大学 北京100871;北京大学信息科学技术学院 北京100871,高可信软件技术教育部重点实验室北京大学 北京100871;北京大学信息科学技术学院 北京100871
基金项目:本文受国家重点研发计划(2016YFB1000105),国家自然科学基金(61690201,61421091)资助
摘    要:科研人员在日常研究中经常使用Excel,Spss等工具对数据进行分析加工来获得相关领域知识。然而随着大数据时代的到来,常用的数据处理软件因单机性能的限制已经不能满足科研人员对大数据分析处理的需求。大数据的处理和可视化离不开分布式计算环境。因此,为了完成对大数据的快速处理和可视化,科研人员不仅需要购置、维护分布式集群环境,还需要具备分布式环境下的编程能力和相应的前端数据可视化技术。这对很多非计算机科班的数据分析工作者而言是非常困难且不必要的。针对上述问题,提出了一种基于Web的轻量级大数据处理和可视化工具。通过该工具,数据分析工作者只需通过简单的点击和拖动,便可以在浏览器中轻松地打开大型数据文件(GB级别)、快速地对文件进行定位(跳转到文件某一行)、方便地调用分布式计算框架来对文件内容进行排序或求极大值、便捷地对数据进行可视化等。 实证研究证明,该解决方案是有效的。

关 键 词:数据分析  分布式系统  并行计算  数据可视化  大数据
收稿时间:2017-08-15
修稿时间:2017-09-25

Web Based Lightweight Tool for Big Data Processing and Visualization
LI Yan,MA Jun-ming,AN Bo and CAO Dong-gang. Web Based Lightweight Tool for Big Data Processing and Visualization[J]. Computer Science, 2018, 45(9): 60-64, 93
Authors:LI Yan  MA Jun-ming  AN Bo  CAO Dong-gang
Affiliation:Key Lab of High Confidence Software TechnologiesPeking University,Ministry of Education,Beijing 100871,China;School of Electronic Engineering and Computer Science,Peking University,Beijing 100871,China,Key Lab of High Confidence Software TechnologiesPeking University,Ministry of Education,Beijing 100871,China,Key Lab of High Confidence Software TechnologiesPeking University,Ministry of Education,Beijing 100871,China;School of Electronic Engineering and Computer Science,Peking University,Beijing 100871,China and Key Lab of High Confidence Software TechnologiesPeking University,Ministry of Education,Beijing 100871,China;School of Electronic Engineering and Computer Science,Peking University,Beijing 100871,China
Abstract:Researchers in the daily study often use Excel,Spss and other tools to analyze and process the data to obtain the knowledge of relevant field.However,with the arrival of large data age, due to constraints of stand-alone performance,general data processing software cannot meet the needs of researchers for large data analysis and processing.Large data processing and visualization are inseparable from the distributed computing environment.Therefore,in order to complete the rapid processing and visualization of large data,researchers not only need to purchase and maintain a distributed cluster environment,but also need to be able to program in a distributed environment and master the corresponding front-end data visualization technology.It is very difficult and unnecessary for many non-computer science data analysis workers.In view of the above problems,this paper presented a Web-based lightweight large data processing and visualization tool.Using this tool,data analysis workers can easily open a large data file(GB level) in the browser,quickly locate the file,sort the contents of the file and visualize it through a simple click and drag.At last,a correspon-ding empirical study was carried out to prove the effiectiveness of this solution.
Keywords:Data analysis  Distributed system  Parallel computation  Data visualization  Big data
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号