首页 | 本学科首页   官方微博 | 高级检索  
     

复杂异构数据的表征学习综述
引用本文:蹇松雷,卢凯. 复杂异构数据的表征学习综述[J]. 计算机科学, 2020, 47(2): 1-9
作者姓名:蹇松雷  卢凯
作者单位:国防科技大学计算机学院 长沙 410073;国防科技大学计算机学院 长沙 410073
基金项目:国家重点研发计划;国防科技卓越人才计划;湖南省科技领军人才计划;国家自然科学基金
摘    要:随着智能时代和大数据时代的到来,各种复杂异构数据不断涌现,成为数据驱动的人工智能方法、机器学习模型的基础。复杂异构数据的表征直接关系着后续模型的学习性能,因此如何有效地表征复杂异构数据成为机器学习的一个重要研究领域。文中首先介绍了数据表征的多种类型,并提出了现有数据表征方法面临的挑战;其次,根据数据类型将数据划分成单一类型数据和复合类型数据,针对单一类型数据,分别介绍了4种典型数据的表征学习发展现状和代表算法,包含离散数据、网络数据、文本数据和图像数据;然后,详细介绍了4种由多个单一数据或数据源复合而成的复杂数据,包含了离散特征与连续特征混合的结构化数据、属性数据与复杂网络复合的属性网络数据、来自不同领域的跨领域数据和由多种数据类型复合的多模态数据,分别介绍了基于上述复杂数据的表征学习现状以及最新的表征学习模型;最后,对复杂异构数据表征学习的发展趋势进行了探讨。

关 键 词:表征学习  机器学习  离散数据  属性网络  跨领域数据  多模态数据

Survey on Representation Learning of Complex Heterogeneous Data
JIAN Song-lei,LU Kai. Survey on Representation Learning of Complex Heterogeneous Data[J]. Computer Science, 2020, 47(2): 1-9
Authors:JIAN Song-lei  LU Kai
Affiliation:(College of Computer,National University of Defense Technology,Changsha 410073,China)
Abstract:With the coming of the eras of artificial intelligence and big data,various complex heterogeneous data emerge continuously,becoming the basis of data-driven artificial intelligence methods and machine learning models.The quality of data representation directly affects the performance of following learning algorithms.Therefore,it is an important research area for representing useful complex heterogeneous data for machine learning.Firstly,multiple types of data representations were introduced and the challenges of representation learning methods were proposed.Then,according to the data modality,the data were categorized into singe-type data and multi-type data.For single-type data,the research development and typical representation learning algorithms for categorical data,network data,text data and image data were introduced respectively.Further,the multi-type data compounded by multiple single-type data were detailed,including the mixed data containing both categorical features and continuous features,the attributed network data containing node content and topological network,cross-domain data derived from different domains and the multimodal data containing multiple modalities.And based on these data,the research development and state-of-the-art representation learning models were introduced.Finally,the development trends on representation learning of complex heterogeneous data were discussed.
Keywords:Representation learning  Machine learning  Categorical data  Attributed network  Cross-domain data  Multimodal data
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号