首页 | 本学科首页   官方微博 | 高级检索  
     

一种异构直推式迁移学习算法
引用本文:杨柳,景丽萍,于剑.一种异构直推式迁移学习算法[J].软件学报,2015,26(11):2762-2780.
作者姓名:杨柳  景丽萍  于剑
作者单位:交通数据分析与挖掘北京市重点实验室(北京交通大学), 北京 100044;河北大学 数学与信息科学学院, 河北 保定 071000;河北省机器学习与计算智能重点实验室(河北大学), 河北 保定 071000,交通数据分析与挖掘北京市重点实验室(北京交通大学), 北京 100044,交通数据分析与挖掘北京市重点实验室(北京交通大学), 北京 100044
基金项目:国家自然科学基金(61375062, 61370129); 高等学校博士学科点专项科研基金(20120009110006); 中央高校基本科研业务费专项基金(2014JBM029); 河北省科技厅科技计划(13210347); 河北省教育厅资助项目(QN20131006); CCF-腾讯科研基金
摘    要:目标领域已有类别标注的数据较少时会影响学习性能,而与之相关的其他源领域中存在一些已标注数据.迁移学习针对这一情况,提出将与目标领域不同但相关的源领域上学习到的知识应用到目标领域.在实际应用中,例如文本-图像、跨语言迁移学习等,源领域和目标领域的特征空间是不相同的,这就是异构迁移学习.关注的重点是利用源领域中已标注的数据来提高目标领域中未标注数据的学习性能,这种情况是异构直推式迁移学习.因为源领域和目标领域的特征空间不同,异构迁移学习的一个关键问题是学习从源领域到目标领域的映射函数.提出采用无监督匹配源领域和目标领域的特征空间的方法来学习映射函数.学到的映射函数可以把源领域中的数据在目标领域中重新表示.这样,重表示之后的已标注源领域数据可以被迁移到目标领域中.因此,可以采用标准的机器学习方法(例如支持向量机方法)来训练分类器,以对目标领域中未标注的数据进行类别预测.给出一个概率解释以说明其对数据中的一些噪声是具有鲁棒性的.同时还推导了一个样本复杂度的边界,也就是寻找映射函数时需要的样本数.在4个实际的数据库上的实验结果,展示了该方法的有效性.

关 键 词:异构迁移学习  直推式迁移学习  异构特征空间  映射函数
收稿时间:2015/2/28 0:00:00
修稿时间:2015/8/26 0:00:00

Heterogeneous Transductive Transfer Learning Algorithm
YANG Liu,JING Li-Ping and YU Jian.Heterogeneous Transductive Transfer Learning Algorithm[J].Journal of Software,2015,26(11):2762-2780.
Authors:YANG Liu  JING Li-Ping and YU Jian
Affiliation:Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044, China;College of Mathematics and Information Science, Hebei University, Baoding 071000, China;Key Laboratory of Machine Learning and Computational Intelligence (Hebei University), Baoding 071000, China,Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044, China and Beijing Key Laboratory of Traffic Data Analysis and Mining (Beijing Jiaotong University), Beijing 100044, China
Abstract:The lack of labeled data affects the performance in target domain. Fortunately, there are ample labeled data in some other related source domains. Transfer learning allows knowledge to be transferred from source domains to target domain. In real applications, such as text-image and cross-language transfer learning, the feature spaces of source and target domains are different, that is heterogeneous transfer learning. This paper focuses on heterogeneous transductive transfer learning (HTTL), an approach to improve the performance of unlabeled data in target domain by using some labeled data in heterogeneous source domains. Since the feature spaces of source domains and target domain are different, the key problem is to learn the mapping functions between the heterogeneous source domains and target domain. This paper proposes to learn the mapping functions by unsupervised matching in the different feature spaces. The data in source domains can be re-represented with the mapping functions and transferred to the target domain. Thus, in target domain, there are some labeled data which come from the source domains. Standard machine learning methods such as support vector machine can be used to train classifiers for predicting the labels of unlabeled data in target domain. Moreover, a probabilistic interpretation is derived to verify the robustness of the presented method over certain noises in the utility matrices. A sample complexity bound is given to indicate how many instances are needed to adequately find the mapping functions. The effectiveness of the proposed approach is verified by experiments on four real-world data sets.
Keywords:heterogeneous transfer learning  transductive transfer learning  heterogeneous feature space  mapping function
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号