首页 | 本学科首页   官方微博 | 高级检索  
     

高维数据中有效的相似性计算方法
引用本文:叶施仁,游湘涛,史忠植,李晓黎.高维数据中有效的相似性计算方法[J].计算机研究与发展,2000,37(10):1166-1172.
作者姓名:叶施仁  游湘涛  史忠植  李晓黎
作者单位:中国科学院计算技术研究所,北京,100080
基金项目:国家自然科学基金!(项目编号 6 980 3 0 10 ),国家“八六三”高技术研究发展计划基金资助!(项目编号 86 3 -5 11-946,86 3 -818-0 7)
摘    要:相似性的计算是CBR和k-NN等Lazy Learning研究中十分关键的问题,研究了降低相似性计算代价的方法,并以k-NN为例,介绍了基于部分特征的相似性算法和基于投影的相似性算法,它们能够通过减少计算距离过程中所涉及的特征数目来提高算法的效率,实验表明效率的提高是明显的,其中基于部分特征的k-NN算法效率提高26%~28%,基于投影的k-NN算法效率提高48%~83%,作者已将该算法应用到工程

关 键 词:相似性  计算方法  高维数据  数据采掘  数据库

EFFICIENT SIMILARITY COMPUTING METHODS IN HIGH DIMENSIONAL DATA
YE Shi-Ren,YOU Xiang-Tao,SHI Zhong-Zhi,LI Xiao-Li.EFFICIENT SIMILARITY COMPUTING METHODS IN HIGH DIMENSIONAL DATA[J].Journal of Computer Research and Development,2000,37(10):1166-1172.
Authors:YE Shi-Ren  YOU Xiang-Tao  SHI Zhong-Zhi  LI Xiao-Li
Abstract:Similarity is a pivotal notion in research on lazy learning, such as case based reasoning and k NN (nearest neighbor). A method of how to decrease complexity of computing similarity is studied, and a similarity calculation algorithm is introduced, that is based on partial features and the similarity calculation algorithm that is based on projection. For briefness and clarity, they are described in the procedure of k NN: partial feature based k NN algorithm and projection based k NN algorithm. In the steps of acquiring distance, using only few features can improve efficiency. This improvement is remarkable in our experiment: the former increases about 26%~28%, and the latter increases from 48% to 83%. At the same time, those algorithms have been adapted in application successfully.
Keywords:similarity  data reduction  nearest neighbor  lazy learning  data mining  KDD
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号