首页 | 本学科首页   官方微博 | 高级检索  
     

基于量化误差与分形理论的高计算效率无监督聚类研究*
引用本文:胡国生,杨海涛.基于量化误差与分形理论的高计算效率无监督聚类研究*[J].计算机应用研究,2016,33(10).
作者姓名:胡国生  杨海涛
作者单位:广东食品药品职业学院 软件学院,浙江大学 数学学院
基金项目:浙江省自然科学基金项目(Y1090416);浙江省自然科学基金(Y1091084)
摘    要:已有的矢量聚类算法需学习较多的复杂数据方可获得较好的聚类效果,而对于多维的大数据性能较弱,对此,提出一种基于量化误差与分形理论的高计算效率无监督聚类算法。首先,为数据集建立量化误差的参数化模型,基于数据集的空间结构获得数据集的率失真曲线;然后,通过对率失真曲线的估算,获得数据空间的有效维度;最终,利用分形理论,通过搜索数据集的量化模型参数获得目标数据集的最优类簇数量。实验结果表明,本文的量化误差参数化模型可较好地估算数据集的有效维度,同时,本算法对数值型数据集的最优类簇估算与计算效率优于已有的矢量聚类算法。

关 键 词:分形理论    量化误差  率失真曲线  数据维度估算  无监督聚类  多维数据
收稿时间:2015/6/23 0:00:00
修稿时间:9/5/2016 12:00:00 AM

Quantization error and fractal theory based high computation efficiency unsupervised clustering algorithm
HU Guo-sheng and YANG Hai-tao.Quantization error and fractal theory based high computation efficiency unsupervised clustering algorithm[J].Application Research of Computers,2016,33(10).
Authors:HU Guo-sheng and YANG Hai-tao
Affiliation:School of Software,Guangdong Food and Drug vocational College,Guangzhou Guangdong,School of Mathematics,Zhejiang University
Abstract:The existing vector clustering algorithm need to learn a lot of complex data in order to get a good performance for clustering, and it does not have good performance for big data, a quantization error and fractal theory based high computation efficiency unsupervised clustering algorithm is proposed to solve that problem. Firstly, a parametric modeling of the quantization error is constructed for data set, based on the space structure of the data set the rate-distortion curve is got; then, the efficient dimensionality of the data set is computed by estimation of the rate distortion curve; lastly, the optimal clustering number of the target data set by fractal theory. Experiments result shows that the proposed quantization error modeling could estimate the quantization error very well and the proposed algorithm has better performance in search the best clustering number and computation efficiency than the existing vector clustering algorithm.
Keywords:fractal theory  quantization error  rate distortion curve  unsupervised clustering  multidimensional data
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号