首页 | 本学科首页   官方微博 | 高级检索  
     

混杂数据的多核几何平均度量学习
引用本文:齐忍,朱鹏飞,梁建青. 混杂数据的多核几何平均度量学习[J]. 软件学报, 2017, 28(11): 2992-3001
作者姓名:齐忍  朱鹏飞  梁建青
作者单位:天津大学 计算机科学与技术学院, 天津 300054,天津大学 计算机科学与技术学院, 天津 300054,天津大学 计算机科学与技术学院, 天津 300054
基金项目:国家自然科学基金(61502332,61732011)
摘    要:在机器学习和模式识别任务中,选择一种合适的距离度量方法是至关重要的.度量学习主要利用判别性信息学习一个马氏距离或相似性度量.然而,大多数现有的度量学习方法都是针对数值型数据的,对于一些有结构的数据(比如符号型数据),用传统的距离度量来度量两个对象之间的相似性是不合理的;其次,大多数度量学习方法会受到维度的困扰,高维度使得训练时间长,模型的可扩展性差.提出了一种基于几何平均的混杂数据度量学习方法.采用不同的核函数将数值型数据和符号型数据分别映射到可再生核希尔伯特空间,从而避免了特征的高维度带来的负面影响.同时,提出了一个基于几何平均的多核度量学习模型,将混杂数据的度量学习问题转化为求黎曼流形上两个点的中心点问题.在UCI数据集上的实验结果表明,针对混杂数据的多核度量学习方法与现有的度量学习方法相比,在准确性方面展现出更优异的性能.

关 键 词:几何平均  多核学习  度量学习  混杂数据
收稿时间:2017-05-13
修稿时间:2017-06-16

Multiple Kernel Geometric Mean Metric Learning for Heterogeneous Data
QI Ren,ZHU Peng-Fei and LIANG Jian-Qing. Multiple Kernel Geometric Mean Metric Learning for Heterogeneous Data[J]. Journal of Software, 2017, 28(11): 2992-3001
Authors:QI Ren  ZHU Peng-Fei  LIANG Jian-Qing
Affiliation:School of Computer Science and Technology, Tianjin University, Tianjin 300054, China,School of Computer Science and Technology, Tianjin University, Tianjin 300054, China and School of Computer Science and Technology, Tianjin University, Tianjin 300054, China
Abstract:How to choose a proper distance metric is vital to many machine learning and pattern recognition tasks. Metric learning mainly uses discriminant information to learn a Mahalanobis distance or similarity metric. However, most existing metric learning methods are for numerical data, and it is unreasonable to calculate the similarity between two heterogeneous objects (e.g., categorical data) using traditional distance metrics. Besides, they suffer from curse of dimensionality, resulting in poor efficiency and scalability when the feature dimension is very high. In this paper, a geometric mean metric learning method is proposed for heterogeneous data. The numerical data and categorical data are mapped to a reproducing kernel Hilbert space by using different kernel functions, thus avoiding the negative influence of the high dimensionality of the feature. At the same time, a multiple kernel metric learning model based on geometric mean is introduced to transform the metric learning problem of heterogeneous data into solving the midpoint between two points on the Riemannian manifold. Experiments on benchmark UCI datasets show that the presented method shows promising performances in terms of accuracy in comparison with the state-of-the-art metric learning methods.
Keywords:geometric mean  multi-kernel learning  metric learning  heterogeneous data
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号