首页 | 本学科首页   官方微博 | 高级检索  
     

分类数据集的一致化特征选择约简
引用本文:吴新玲.分类数据集的一致化特征选择约简[J].计算机工程与应用,2007,43(18):174-176.
作者姓名:吴新玲
作者单位:广东技术师范学院,信息工程系,广州,510262;武汉大学,软件工程国家重点实验室,武汉,430072
摘    要:样本数据集的不一致性和冗余特征会降低分类的质量和效率。提出了一种一致化特征选择约简方法,该方法基于贝叶斯公式,采用阈值,将非一致数据归为最可能的一类,使数据集一致化。并在一致数据集上,运用类别区分矩阵选择可准确区分各类数据的最小特征变量集。给出的启发式搜索策略和应用实例表明:一致化特征选择约简方法能有效消除分类数据集的不一致性,选择最优的特征变量、降低数据的维数、减少数据集中的冗余信息。

关 键 词:数据挖掘  分类  特征选择  数据约简
文章编号:1002-8331(2007)18-0174-03
修稿时间:2007-01

Consistent feature selection reduction about classification data set
WU Xin-ling.Consistent feature selection reduction about classification data set[J].Computer Engineering and Applications,2007,43(18):174-176.
Authors:WU Xin-ling
Affiliation:1.Department of Information Engineering,Guangdong Polytechnic Normal University,Guangzhou 510262,China 2.State Key Lab. of Software Engineering,Wuhan University,Wuhan 430072,China
Abstract:The disaccords and the redundancy features of a sample dataset will drop the classification quality and efficiency. In this paper,the method called consistent feature selection reduction is proposed about the classification data set.This method group together the inconsistent datum of the best possible category and make the data set uniform based on the Bayesian formula and a threshold value.Then a category distinguish matrix is built upon the consistent data set and the least feature variable subset that can distinguish the classification accurately is obtained through the category distinguish matrix.A heuristic search strategy and a practical example are given.The result shows the consistent feature selection reduction method can eliminate the disaccords of the sample dataset,select the optimal feature variables,drop the dimension of the data and reduce the redundancy information effectively.
Keywords:data mining  classification  feature selection  data reduction
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号