首页 | 本学科首页   官方微博 | 高级检索  
     

重叠局部高斯过程回归
引用本文:刘晓芳,刘策,刘露咪,程丹松.重叠局部高斯过程回归[J].哈尔滨工业大学学报,2019,51(11):22-26.
作者姓名:刘晓芳  刘策  刘露咪  程丹松
作者单位:哈尔滨工业大学 电气工程及自动化学院,哈尔滨,150001;哈尔滨工业大学 计算机科学与技术学院,哈尔滨,150001;北京控制工程研究所,北京,100000
基金项目:国家自然科学基金(51677042)
摘    要:高斯过程是一种函数的分布,在机器学习领域常用于回归.对于n个训练样本,其训练和预测时间复杂度分别为O(n~3)和O(n~2),因此难以应用于大规模数据.针对这个问题,本文基于分治的思想,提出一种简单高效的近似模型,称为"重叠局部高斯过程".本文方法假设随机变量在给定邻近变量的值后,会与距离较远的变量条件独立.首先将训练样本集递归划分,构建一棵三叉树,其中兄弟节点包含的样本存在交集,交集中的样本起到诱导点的作用,可构建相邻区域的依赖关系.然后利用每个叶子结点所包含的样本建立局部的高斯过程回归模型,在当前假设下,每个父节点的边缘似然和预测分布可通过组合其子节点的计算结果来近似,从而降低计算量.同时,这种组合方式可保证拟合的函数是连续的.理论分析表明,对于n个训练样本,近似模型训练和预测的时间复杂度均为O(n~t),其中t与交集的大小相关,通常介于1与2之间.此外通过在公共数据集上的实验对比也验证了本文近似模型的有效性.

关 键 词:模式识别  高斯过程  回归  贝叶斯学习  大规模机器学习
收稿时间:2019/4/6 0:00:00

Overlapped Local Gaussian Process Regression
LIU Xiaofang,LIU Ce,LIU Lumi and CHENG Dansong.Overlapped Local Gaussian Process Regression[J].Journal of Harbin Institute of Technology,2019,51(11):22-26.
Authors:LIU Xiaofang  LIU Ce  LIU Lumi and CHENG Dansong
Affiliation:School of Electrical Engineering & Automation, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,Beijing Institute of Control Engineering, Beijing 100000, China and School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:Gaussian processes (GPs) are distributions of functions, and are commonly used for regression in machine learning community. For n training samples, their time complexity for training and prediction is O(n3) and O(n2) respectively. The high computational cost hinders their application to large datasets. Inspired by the divide and conquer strategy, the paper proposed a simple but efficient approximate model, called Overlapped Local Gaussian Process (OLGP). The method assumes that given the nearest neighbors, the random variables are independent of farther ones. The training samples are recursively divided and a ternary tree is constructed in the end. Sibling nodes have intersections where the samples play the role of inducing points to model the dependency between neighboring regions. Each leaf node is associated with a local GP regression model. The evidence and predictive distribution in parent nodes are composed of the ones from their sons, which reduces the computational cost significantly. Furthermore, the fitted function is guaranteed to be continuous. A theoretical analysis shows that the time complexity for training and prediction reduces to O(nt) for n training samples, where t depends on the proportion of the intersection in each level, and is usually between 1 and 2. The paper demonstrated the effectiveness of the method by speed-accuracy performance on several real-world datasets.
Keywords:pattern recognition  gaussian process  regression  bayesian learning  large scale machine learning
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《哈尔滨工业大学学报》浏览原始摘要信息
点击此处可从《哈尔滨工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号