首页 | 本学科首页   官方微博 | 高级检索  
     


Big Data: from collection to visualization
Authors:Mohammed Ghesmoune  Hanene Azzag  Salima Benbernou  Mustapha Lebbah  Tarn Duong  Mourad Ouziri
Affiliation:1.LIPN-UMR 7030 - CNRS,University of Paris 13, Sorbonne Paris City,Villetaneuse,France;2.LIPADE,University of Paris Descartes, Sorbonne Paris City,Paris Cedex 06,France
Abstract:Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号