首页 | 本学科首页   官方微博 | 高级检索  
     

分类问题的一种可伸缩特征选择算法
引用本文:张巍,邹翔,吴晓如.分类问题的一种可伸缩特征选择算法[J].计算机学报,2005,28(7):1223-1229.
作者姓名:张巍  邹翔  吴晓如
作者单位:中国科学技术大学电子工程与信息科学系,合肥,230027;中国科学技术大学计算机科学技术系,合肥,230027;安徽中科大讯飞信息科技有限公司,合肥,230088
基金项目:国家“八六三”高技术研究发展计划项目基金(2004AA114030)资助.~~
摘    要:特征选择是数据挖掘分类中的一个重要问题.该文推导出一种新的衡量特征与类别相关度的测度SCD即描述特征取值序列类分布的CV系数,利用该测度给出一种线性的可伸缩特征选择算法StaFSOS,并证明了在类别数为2时,SCD测度满足分支界限法的单调性;给出了StaFSOS的一个完备形式——BBStaFS.在12个标准数据集中,StaFSOS算法得出的结果和目标集几乎一致,而StaFSOS的效率高于其它算法;而在另1个中,BBStaFS算法得出了准确结果.在用1000个样本20个特征的真实数据进行的测试中,StaFSOS运行时间是目前较快的GRSR的1/2,得出的特征集准确有效.

关 键 词:数据挖掘  分类  特征选择

A Scalable Feature Selection Algorithm for Classification
ZHANG Wei,ZOU Xiang,WU Xiao-ru.A Scalable Feature Selection Algorithm for Classification[J].Chinese Journal of Computers,2005,28(7):1223-1229.
Authors:ZHANG Wei  ZOU Xiang  WU Xiao-ru
Affiliation:ZHANG Wei 1) ZOU Xiang 2) WU Xiao-Ru 3) 1)
Abstract:Feature selection is an important issue in classification mining. This paper gives a dependence measure named SCD from statistical theory; this measure describes the CV ratio of class distributions of each feature value. According to SCD measure, an I/O linear feature selection algorithm (i.e. StaFSOS) is constructed. The SCD measure is proven to satisfy the monotonicity of Branch & Bound algorithm when there are only two classes, therefore StaFSOS and B&B are combined into BBStaFS feature selection algorithm. The result features selected by StaFSOS are consistent with the target features in 12 open benchmarks, but more efficiently than other algorithms, while BBStaFS selects the target features in another benchmark. When StaFSOS selects the target features by using a realworld data of 1000 samples and 20 features, GRSR is the most recent efficient algorithm, however, the runtime of StaFSOS is just half of GRSR.
Keywords:data mining  classification  feature selection
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号