一种新的快速特征选择和数据分类方法 Novel and Efficient Method on Feature Selection and Data Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种新的快速特征选择和数据分类方法

引用本文：	陈铁明,马继霞,Samuel H.Huang,蔡家楣.一种新的快速特征选择和数据分类方法[J].计算机研究与发展,2012,49(4):735-745.

作者姓名：	陈铁明马继霞 Samuel H.Huang 蔡家楣

作者单位：	1. 浙江工业大学计算机科学与技术学院杭州 310023;软件开发环境国家重点实验室(北京航空航天大学) 北京 100191 2. 浙江工业大学计算机科学与技术学院杭州 310023 3. 辛辛那提大学智能系统实验室美国辛辛那提45221

基金项目：	国家“九七三”重点基础研究发展计划基金项目(2005CB321901,2010CB328106);国家自然科学基金项目(60773115);软件开发环境国家重点实验室开放课题基金项目(SKLSDE-2009KF-2-01);浙江省自然科学基金项目(Y1110576);浙江省信息安全重点实验室开放课题基金项目(201003)

摘要：	针对数据分类问题提出一种新型高效的特征选择和规则提取方法.首先通过减少初始区间数量改进Chi-Merge离散化方法,再采用改进的Chi-Merge离散化连续型特征变量;特征离散化后,统计样本数据在每个特征子集划分下的频数表,并根据频数表计算数据不一致率,再利用顺序前向最优搜索的方法,快速确定特征数量由小到大的每一个最优特征子集;根据特征子集对应的数据不一致率差异最小化原则,完成特征个数最小化的最优特征子集筛选;根据最优特征子集的数据频数表,可直接提取数据分类规则.实验表明,快速提取的规则可获得较好的分类效果.基于该特征选择方法,提出一种面向分布式同构数据的快速分类模型,不但具有良好的分类效果,还支持对样本数据内容的隐私保护.
关键词：	离散化频数表特征选择规则提取数据分类隐私保护
Novel and Efficient Method on Feature Selection and Data Classification

Chen Tieming , Ma Jixia , Samuel H.Huang , Cai Jiamei.Novel and Efficient Method on Feature Selection and Data Classification[J].Journal of Computer Research and Development,2012,49(4):735-745.

Authors:	Chen Tieming Ma Jixia Samuel HHuang Cai Jiamei

Affiliation:	1(College of Computer Science & Technology,Zhejiang University of Technology,Hangzhou 310023)2(State Key Laboratory of Software Development Environment(Beihang University),Beijing 100191)3(System Intelligent Laboratory,University of Cincinnati,Cincinnati,OH,USA 45221)

Abstract:	A novel feature selection method for data classification problems,as well as a quick rule extraction scheme,are proposed in this paper.At first,the Chi-Merge discretization method is improved by reducing the initial intervals.Using the improved method,the continuous attributes can be effectively discretized.After the attributes discretization,all contingency tables on variant feature patterns can be calculated quickly,and the inconsistency rate can also be generated for each contingency table.The key sequential of features can be identified by selecting the minimum inconsistency rate,and the optimized feature subset can also be achieved efficiently based on the sequence forward search strategy.At last,based on the data contingency table under the selected feature subset,the classification rules can be extracted with one-pass.The experiments show that the proposed data classification scheme obtains good performance.Furthermore,the proposed feature selection and rule extraction method can be extended for the classification applications on distributed isomorphic datasets.The proposed distributed classification method is also simple,efficient with high performance,as well as with privacy-preserving property for contents of sample data.

Keywords:	discretization contingency table feature selection rule extraction data classification privacy preserving
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏