首页 | 本学科首页   官方微博 | 高级检索  
     

基于KL散度和近邻点间距离的球面嵌入算法
引用本文:张变兰,路永钢,张海涛. 基于KL散度和近邻点间距离的球面嵌入算法[J]. 计算机应用, 2017, 37(3): 680-683. DOI: 10.11772/j.issn.1001-9081.2017.03.680
作者姓名:张变兰  路永钢  张海涛
作者单位:兰州大学 信息科学与工程学院, 兰州 730000
基金项目:国家自然科学基金面上项目(61272213);中央高校基本科研业务费专项资金资助项目(lzujbky-2016-k07,lzujbky-2016-142)。
摘    要:针对现有球面嵌入算法在非近邻点间的距离度量不准确或缺失的情况下,不能有效地进行低维嵌入的问题,提出了一种新的球面嵌入算法,它能够只利用近邻点间的距离,将任何尺度的高维数据嵌入到单位球面上,同时求出适合原始数据分布的球面半径。该算法从一个随机产生的球面分布开始,利用KL散度衡量每对近邻点间的归一化距离在原始空间和球面空间中的差异,并基于此差异构建出目标函数,然后再用带有动量的随机梯度下降法,不断优化球面上点的分布,直到结果稳定。为了测试算法,模拟产生了两类球面分布数据:分别是球面均匀分布和球面正态分布的数据。实验结果表明,对于球面均匀分布的数据,即使在近邻点个数很少的情况下,仍然能够将数据准确地嵌入球面空间,嵌入后的数据分布与原始数据分布的均方根误差(RMSE)低于0.00001,且球面半径的估算误差低于0.000001;而对于球面正态分布的数据,在近邻点个数较多的情况下,该算法也可以将数据较准确地嵌入球面空间。因此,在非近邻点间距离缺失的情况下,所提方法仍然可以较准确地对数据进行低维嵌入,这非常有利于数据的可视化研究。

关 键 词:球面嵌入  KL散度  随机梯度下降法  最近邻  
收稿时间:2016-09-19
修稿时间:2016-11-11

Spherical embedding algorithm based on Kullback-Leibler divergence and distances between nearest neighbor points
ZHANG Bianlan,LU Yonggang,ZHANG Haitao. Spherical embedding algorithm based on Kullback-Leibler divergence and distances between nearest neighbor points[J]. Journal of Computer Applications, 2017, 37(3): 680-683. DOI: 10.11772/j.issn.1001-9081.2017.03.680
Authors:ZHANG Bianlan  LU Yonggang  ZHANG Haitao
Affiliation:School of Information Science and Engineering, Lanzhou University, Lanzhou Gansu 730000, China
Abstract:Aiming at the problem that the existing spherical embedding algorithm cannot effectively embed the data into the low-dimensional space in the case that the distances between points far apart are inaccurate or absent, a new spherical embedding method was proposed, which can take the distances between the nearest neighbor points as input, and embeds high dimensional data of any scale onto the unit sphere, and then estimates the radius of the sphere which fit the distribution of the original data. Starting from a randomly generated spherical distribution, the Kullback-Leibler (KL) divergence was used to measure the difference of the normalized distance between each pair of neighboring points in the original space and the spherical space. Based on the difference, the objective function was constructed. Then, the stochastic gradient descent method with momentum was used to optimize the distribution of the points on the sphere until the result is stable. To test the algorithm, two types of spherical distribution data sets were simulated: which are spherical uniform distribution and Kent distribution on the unit sphere. The experimental results show that, for the uniformly distributed data, the data can be accurately embedded in the spherical space even if the number of neighbors is very small, the Root Mean Square Error (RMSE) of the embedded data distribution and the original data distribution is less than 0.00001, and the spherical radius of the estimated error is less than 0.000001; for spherical normal distribution data, the data can be embedded into the spherical space accurately when the number of neighbors is large. Therefore, in the case that the distance between points far apart are absent, the proposed method can still be quite accurate for low-dimensional data embedding, which is very helpful for the visualization of data.
Keywords:spherical embedding   Kullback-Leibler (KL) divergence   stochastic gradient descent method   nearest neighbor
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号