Big Data: from collection to visualization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Big Data: from collection to visualization

Authors:	Mohammed Ghesmoune Hanene Azzag Salima Benbernou Mustapha Lebbah Tarn Duong Mourad Ouziri

Affiliation:	1.LIPN-UMR 7030 - CNRS,University of Paris 13, Sorbonne Paris City,Villetaneuse,France;2.LIPADE,University of Paris Descartes, Sorbonne Paris City,Paris Cedex 06,France

Abstract:	Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏