首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
目的 对于大数据挖掘,可视分析是一种非常重要的研究手段,有助于快速、直观地理解分析大数据蕴含的价值信息。但因其海量、时空、高维等特征,大数据可视化存在内存消耗大、渲染延迟高、可视效果差等问题。针对上述问题,以海量时空点数据为例,采用预处理可视化方案,设计并实现了一套高可扩展的分布式可视分析框架。方法 借鉴瓦片金字塔模型提出一种多维度聚合金字塔模型(MAP),将瓦片金字塔的2D空间层级聚合扩展到时间/空间/属性多维度,同时支持时间、空间、属性的多维层级聚合。进而以Spark集群作为并行预处理工具,以HBase分布式数据库持久化存储MAP模型数据,实现了一套开源的分布式可视化框架(MAP-Vis)。结果 以纽约出租车数据集为例,本研究实验证明能够支持时间/空间/属性多尺度、多维度联动的交互式可视化,同时具有高可扩展的预处理能力和存储能力。结论 在分布式处理能力支持下,系统能实现亚秒级的查询响应,达到良好的交互式可视化效果,证明MAP-Vis是一种有效的大数据交互式可视化方案。  相似文献   

2.
基于组件技术的列车自动监控仿真系统开发平台   总被引:1,自引:0,他引:1  
王野  郭秀清 《计算机应用》2007,27(Z2):286-288
针对轨道交通列车自动监控(ATS)仿真系统开发的需要和现状,提出并设计了一种基于组件技术的ATS仿真系统交互式开发平台.该开发平台通过对ATS仿真系统组件进行统一管理,绘制站场运行图,生成站场型数据,交付给仿真运行框架,最终实现ATS仿真系统.  相似文献   

3.
基于Spark的流程化机器学习分析方法   总被引:1,自引:0,他引:1  
Spark通过使用内存分布数据集,更加适合负载数据挖掘与机器学习等需要大量迭代的工作.但是数据分析师直接使用Spark进行开发十分复杂,包括scala学习门槛高,代码优化与系统部署需要丰富的经验,同时代码的复用度低导致重复工作繁多.本文设计并实现了一种基于Spark的可视化流程式机器学习的方法,一方面设计组件模型来刻画机器学习的基本步骤,包括数据预处理、特征处理、模型训练及验证评估,另一方面提供可视化的流程建模工具,支持分析者设计机器学习流程,由工具自动翻译为Spark平台代码高效执行.本工具可以极大的提高Spark平台机器学习应用开发的效率.论文介绍了工具的方法理论和关键技术,并通过案例表明工具的有效性.  相似文献   

4.
赵薇  刘杰  叶丹 《计算机科学》2014,41(9):75-79
随着数据规模的快速增长,单机的数据分析工具已经无法满足需求。针对大数据的分析问题,设计并实现了一种基于组件的大数据分析服务平台Haflow。Haflow自定义了业务流程模型和可扩展的组件接口,组件接口支持各种异构工具的集成。系统接收用户定义的业务流程,将其翻译成执行流程实例,提交到Hadoop分布式集群上执行。Haflow是一个可扩展的、分布式的、支持异构分析工具的、面向服务的大数据分析服务平台。提出该平台有两重意义:一方面平台将与数据分析业务无关的工作封装起来,支持各种异构组件,以加快分析应用的开发速度;另一方面,平台后端使用Hadoop分布式系统来实现多任务的并发,从而提高应用的平均执行速度。  相似文献   

5.
大屏数据可视化是对数据分析结果的表达,是数据赋能决策的重要环节.针对大屏数据可视化软件开发周期较长、成本较高等问题,本文基于Vue前端框架及Echarts可视化组件,研制开发了一个大屏数据可视化易用工具C317DataUI,通过对可视化组件拖拽式操作进行界面布局,使用组件的数据连接面板进行数据配置管理,并提供了部分场景...  相似文献   

6.
为了解决传统数据清洗工具面对海量数据时复杂度高、效率低的问题,设计实现了流式大数据数据清洗系统.利用分布式计算技术清洗数据,以解决性能低的问题.该系统由统一接入模块、计算集群和调度中心三部分组成,实现了多种数据源的统一接入,分布式处理,并通过Web界面进行清洗流程的交互式配置.实验结果表明,面对海量数据的时候,流式大数据数据清洗系统的性能强于传统的单机数据清洗,提高了清洗效率.  相似文献   

7.
游欣  罗念龙  王映雪 《计算机工程与设计》2007,28(16):3985-3988,3993
数据预处理是为教学决策支持系统提供高质量数据的关键.教学决策的复杂性和不确定性以及教学数据的特殊性是制约教学数据预处理的主要问题.通过对教学数据预处理中的问题进行全面分析,设计了基于元数据的教学数据预处理方法.该方法主要包括数据的提取、集成和规约,不仅针对教学数据的特点改进了数据质量,而且根据教学活动中的主题对面向应用的教学数据进行了重新整合,从而适应不同教学决策任务的数据需求.  相似文献   

8.
根据密码算法测评系统及其动态框架的设计需求,定义了动态可重组框架及其组件间的相互关系和数据交互流程.提出了一种具有3层结构的数据交互规范,该规范采用线性的存储结构,支持可变长度的数据,使得组件间数据的传递数量、类型和形式可以通过命令字及其相应的参数来动态定义,突破了组件间数据传输受限于目定格式的限制.基于数据交互规范,设计并实现了动态可重组的自动化测评系统.  相似文献   

9.
高速公路挖掘数据预处理的研究   总被引:1,自引:1,他引:1  
将数据挖掘技术应用于高速公路系统中,利用挖掘得到的模式对高速公路管理提供有效支持,数据预处理关系到收费数据的挖掘质量,针对丢卡、坏卡、回头车等存在的问题,对数据预处理进行了改正,并给予算法实现。  相似文献   

10.
针对大数据时代,数据密集型计算已经成为国内外的一个研究热点. 遥感数据具有多源化、海量化特点,是名副其实的大数据. 研究适用于遥感影像自动化、业务化处理的数据密集型计算方法,是目前遥感应用技术面临的挑战所面临的挑战,本文提出了一种基于数据密集型计算的遥感图像处理方法. 在文中,首先围绕遥感数据自动化、业务化预处理等问题,深入调查和分析了国内外研究现状,进而介绍了系统体系结构,通过工作流灵活组织多种算法模型协同工作,设计以“5并行1加速”的计算体系解决数据密集型的遥感图像预处理,并通过产品生产实例对其性能进行测试. 结果表明,该系统在保证处理精度的前提下,大大提高了遥感大数据预处理的效率.  相似文献   

11.
With the advent of the big data era, the significance of data analysis has increasingly come to the forefront, showcasing its ability to uncover valuable insights from vast datasets, thereby enhancing the decision-making process for users. Nonetheless, the data analysis workflow faces three dominant challenges: high coupling in the analysis workflow, a plethora of interactive interfaces, and a time-intensive exploratory analysis process. To address these challenges, we introduce with this paper Navi, a data analysis system powered by natural language interaction. Navi embraces a modular design philosophy that abstracts three core functional modules from mainstream data analysis workflows: data querying, visualization generation, and visualization exploration. This approach effectively reduces the coupling of the system. Meanwhile, Navi leverages natural language as a unified interactive interface to seamlessly integrate various functional modules through a task scheduler, ensuring their effective collaboration. Moreover, in order to address the challenges of exponential search space and ambiguous user intent in visualization exploration, we propose an automated approach for visualization exploration based on Monte Carlo tree search. In addition, a pruning algorithm and a composite reward function, both incorporating visualization domain knowledge, are devised to enhance the search efficiency and result quality. Finally, we validate the effectiveness of Navi through both quantitative experiments and user studies.  相似文献   

12.
随着电信行业市场竞争的不断加剧,用户对服务质量要求逐步提高,导致用户投诉率不断攀升。在此情况下,通过准确预测用户投诉行为来降低用户投诉率成为运营商关注的重点。目前传统的投诉预测模型仅从分类算法和人工调研特征来讨论,而没有充分利用运营商的大数据。因此,提出了在Hadoop/Spark大数据平台上使用并行随机森林来构建用户预测投诉模型,它不仅用到了业务支持系统数据,而且还用到了运营支持系统数据和客服工单数据,并在此基础上进一步增加了反映用户相互关系的图特征和二阶特征。基于上海市某运营商数据的实验结果表明,利用多来源、高维度的特征来训练用户投诉预测模型的精度会明显高于传统方法,在此基础上有针对性地对目标用户采取安抚措施,可以降低用户投诉率,获得较高的商业价值。  相似文献   

13.
随着大数据时代的到来,数据分析的作用日益显著,它能够从海量数据中发现有价值的信息,从而更有效地指导用户决策。然而,数据分析流程中存在三大挑战:分析流程高耦合、交互接口种类多和探索分析高耗时。为应对上述挑战,本文提出了基于自然语言交互的数据分析系统Navi,该系统采用模块化的设计原则,抽象出主流数据分析流程的三个核心功能模块:数据查询、可视化生成和可视化探索模块,从而降低系统设计的耦合度。同时,Navi以自然语言作为统一的交互接口,并通过一个任务调度器,实现了各功能模块的有效协同。此外,为了解决可视化探索中搜索空间指数级和用户意图不明确的问题,本文提出了一种基于蒙特卡洛树搜索的可视化自动探索方法,并设计了基于可视化领域知识的剪枝算法和复合奖励函数,提高了搜索效率和结果质量。最后,本文通过量化实验和用户实验验证了Navi的有效性。  相似文献   

14.
针对大数据离线分析类和交互式查询类负载,首先对这些负载的一些共性进行分析,提取出公共操作集,并对它们进行分组整理;然后在大数据平台上测试这些负载运行过程中的微体系结构特征,采用PCA和SimpleKMeans算法对这些体系结构特征参数进行降维和聚类处理。实验分析结果表明负载之间有公共的操作集,如Join和Cross Production;有些负载有相似的属性,如Difference和Projection共享相同的微体系结构特征。实验结果对于 处理器等硬件平台的设计以及应用程序的优化具有指导性的意义,并且为大数据基准测试平台的设计提供了参考。  相似文献   

15.
为了帮助用户理解和分析搜索引擎产生的搜索数据,提出一个搜索趋势数据可视分析系统,包括数据收集与预处理、流图计算与绘制、流线生成与文字摆放以及交互式分析.以流图与文本相结合的方式呈现搜索数据,展示数据中蕴含的搜索趋势和热点;创新性地提出了流线指导下的文本布局算法,使文字能更好地贴合流图形状;此外,还提供了一系列交互,帮助...  相似文献   

16.
数据预处理是为考试分析系统提供高质量数据的关键。为了更好地从大量复杂的和不确定的考试数据中有效地挖掘有用的信息,必须对源数据进行预处理。本文通过对考试分析系统中数据源进行详细的分析,发现数据源具有不一致、冗余等特点。从而给出了考试分析系统中数据预处理的一般性方法。  相似文献   

17.
This paper describes a new out-of-core multi-resolution data structure for real-time visualization, interactive editing and externally efficient processing of large point clouds. We describe an editing system that makes use of the novel data structure to provide interactive editing and preprocessing tools for large scanner data sets. Using the new data structure, we provide a complete tool chain for 3D scanner data processing, from data preprocessing and filtering to manual touch-up and real-time visualization. In particular, we describe an out-of-core outlier removal and bilateral geometry filtering algorithm, a toolset for interactive selection, painting, transformation, and filtering of huge out-of-core point-cloud data sets and a real-time rendering algorithm, which all use the same data structure as storage backend. The interactive tools work in real-time for small model modifications. For large scale editing operations, we employ a two-resolution approach where editing is planned in real-time and executed in an externally efficient offline computation afterwards. We evaluate our implementation on example data sets of sizes up to 63 GB, demonstrating that the proposed technique can be used effectively in real-world applications.  相似文献   

18.
Affective computing is important in human–computer interaction. Especially in interactive cloud computing within big data, affective modeling and analysis have extremely high complexity and uncertainty for emotional status as well as decreased computational accuracy. In this paper, an approach for affective experience evaluation in an interactive environment is presented to help enhance the significance of those findings. Based on a person-independent approach and the cooperative interaction as core factors, facial expression features and states as affective indicators are applied to do synergetic dependence evaluation and to construct a participant’s affective experience distribution map in interactive Big Data space. The resultant model from this methodology is potentially capable of analyzing the consistency between a participant’s inner emotional status and external facial expressions regardless of hidden emotions within interactive computing. Experiments are conducted to evaluate the rationality of the affective experience modeling approach outlined in this paper. The satisfactory results on real-time camera demonstrate an availability and validity comparable to the best results achieved through the facial expressions only from reality big data. It is suggested that the person-independent model with cooperative interaction and synergetic dependence evaluation has the characteristics to construct a participant’s affective experience distribution, and can accurately perform real-time analysis of affective experience consistency according to interactive big data. The affective experience distribution is considered as the most individual intelligent method for both an analysis model and affective computing, based on which we can further comprehend affective facial expression recognition and synthesis in interactive cloud computing.  相似文献   

19.
The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.  相似文献   

20.
In the era of big data, data is of great value as an essential factor in production. It is of great significance to implement its analysis, mining, and utilization of large-scale data via data sharing. However, due to the heterogeneous dispersion of data and increasingly rigorous privacy protection regulations, data owners cannot arbitrarily share data, and thus data owners are turned into data silos. Since data federation can achieve collaborative queries while preserving the privacy of data silos, we present in this paper a secure multi-party relational data federation system based on the idea of federated computation that ``data stays, computation moves.'' The system is compatible with a variety of relational databases and can shield users from the heterogeneity of the underlying data from multiple data owners. On the basis of secret sharing, the system implements the secure multi-party operator library supporting the secure multi-party basic operations, and the resulting reconstruction process of operators is optimized with higher execution efficiency. On this basis, the system supports query operations such as Summation (SUM), Averaging (AVG), Minimization/Maximization (MIN/MAX), equi-join, and $\theta $-join and makes full use of multi-party features to reduce data interactions among data owners and security overhead, thus effectively supporting efficient data sharing. Finally, experiments are conducted on the benchmark dataset TPC-H. The experimental results show that the system can support more data owners than the current data federation systems SMCQL and Conclave and has higher execution efficiency in a variety of query operations, exceeding the existing systems by as much as 3.75 times.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号