首页 | 本学科首页   官方微博 | 高级检索  
     


Learning latent variable models from distributed and abstracted data
Authors:Xiaofeng Zhang  William K. Cheung
Affiliation:a Harbin Institute of Technology, School of Computer Sciecne and Technology, Shenzhen Graduate School, Kowloon Tong, Hong Kong
b Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Abstract:
Discovering global knowledge from distributed data sources is challenging, where the important issues include the ever-increasing data volume at the highly distributed sources and the general concern on data privacy. Properly abstracting the distributed data with a compact representation which can retain sufficient local details for global knowledge discovery in principle can address both the scalability and the data privacy challenges. This calls for the need to develop formal methodologies to support knowledge discovery on abstracted data. In this paper, we propose to abstract distributed data as Gaussian mixture models and learn a family of generative models from the abstracted data using a modified EM algorithm. To demonstrate the effectiveness of the proposed approach, we applied it to learn (a) data cluster models and (b) data manifold models, and evaluated their performance using both synthetic and benchmark data sets with promising results in terms of both effectiveness and scalability. Also, we have demonstrated that the proposed approach is robust against heterogeneous data distributions over the distributed sources.
Keywords:Distributed data mining   Data abstraction   Model-based methods   Gaussian mixture model   Generative topographic mapping
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号