首页 | 本学科首页   官方微博 | 高级检索  
     

基于遗传算法的无监督分形属性规约技术
引用本文:闫光辉,李战怀.基于遗传算法的无监督分形属性规约技术[J].计算机工程与应用,2008,44(10):23-27.
作者姓名:闫光辉  李战怀
作者单位:西北工业大学 计算机学院,西安 710072
基金项目:国家自然科学基金(the National Natural Science Foundation of China under Grant No.60573096),甘肃省教育厅基金(the Foundationof Gansu Province Educational Department under Grant No.0604-09)
摘    要:属性规约是应对“维数灾难”的有效技术,分形属性规约FDR(Fractal Dimensionality Reduction)是近年来出现的一种无监督属性选择技术,令人遗憾的是其需要多遍扫描数据集,因而难于应对高维数据集情况;基于遗传算法的属性规约技术对于高维数据而言优越于传统属性选择技术,但其无法应用于无监督学习领域。为此,结合遗传算法内在随机并行寻优机制及分形属性选择的无监督特点,设计并实现了基于遗传算法的无监督分形属性子集选择算法GABUFSS(Genetic Algorithm Based Unsupervised Feature Subset Selection)。基于合成与实际数据集的实验对比分析了GABUFSS算法与FDR算法的性能,结果表明GABUFSS相对优于FDR算法,并具有发现等价结果属性子集的特点。

关 键 词:数据挖掘  维数灾难  遗传算法  属性规约  分形  
文章编号:1002-8331(2008)10-0023-05
收稿时间:2007-11-19
修稿时间:2007年11月19

Unsupervised dimensionality reduction based on fractal dimension and genetic algorithm
YAN Guang-hui,LI Zhan-huai.Unsupervised dimensionality reduction based on fractal dimension and genetic algorithm[J].Computer Engineering and Applications,2008,44(10):23-27.
Authors:YAN Guang-hui  LI Zhan-huai
Affiliation:School of Computer Science,Northwestern Polytechnical University,Xi’an 710072,China
Abstract:Dimensionality reduction is the powerful method to tackle the"Curse of Dimensionality".Genetic algorithms based feature subset selection technique is superior to traditional feature selection method in the dimensionality reduction of the high dimensional data set.However,it can not be used in the field of unsupervised learning such as clustering which has no class label to use.FDR(Fractal Dimensionality Reduction) is the new unsupervised feature selection method.But,it is infeasible in practice in the high dimensional data set for its multiple scanning of the data set and high time consume.Accordingly,the authors propose the GABUFSS(Genetic Algorithm Based Unsupervised Feature Subset Selection) algorithm which combines the genetic algorithm and the fractal dimensionality reduction technique to tackle the unsupervised feature subset selection problem in the high dimensional data set.The experimental results using synthetic and real life data set show that GABUFSS algorithm achieves better performance than FDR algorithm in the high dimensional data set and can find identical subsets additionally.
Keywords:data mining  curse of dimensionality  genetic algorithm  dimensionality reduction  fractal
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号