首页 | 本学科首页   官方微博 | 高级检索  
     

用于不完整数据的选择性贝叶斯分类器
引用本文:陈景年,黄厚宽,田凤占,付树军.用于不完整数据的选择性贝叶斯分类器[J].计算机研究与发展,2007,44(8):1324-1330.
作者姓名:陈景年  黄厚宽  田凤占  付树军
作者单位:1. 北京交通大学计算机与信息技术学院,北京,100044;山东财政学院信息与计算科学系,济南,250014
2. 北京交通大学计算机与信息技术学院,北京,100044
摘    要:选择性分类器通过删除数据集中的无关属性和冗余属性可以有效地提高分类精度和效率.因此,一些选择性分类器应运而生.然而,由于处理不完整数据的复杂性,它们大都是针对完整数据的.由于各种原因,现实中的数据通常是不完整的并且包含许多冗余属性或无关属性.如同完整数据的情形一样,不完整数据集中的冗余属性或无关属性也会使分类性能大幅下降.因此,对用于不完整数据的选择性分类器的研究是一项重要的研究课题.通过分析以往在分类过程中对不完整数据的处理方法,提出了两种用于不完整数据的选择性贝叶斯分类器:SRBC和CBSRBC.SRBC是基于一种鲁棒贝叶斯分类器构建的,而CBSRBC则是在SRBC基础上利用X2统计量构建的.在12个标准的不完整数据集上的实验结果表明,这两种方法在大幅度减少属性数目的同时,能显著提高分类准确率和稳定性.从总体上来讲,CBSRBC在分类精度、运行效率等方面都优于SRBC算法,而SRBC需要预先指定的阈值要少一些.

关 键 词:贝叶斯方法  分类  特征选择  不完整数据  X2统计量  不完整数据集  选择性  贝叶斯  分类器  Incomplete  Data  Classifiers  Bayes  阈值  算法  运行效率  稳定性  分类准确率  处理方法  结果  实验  标准  统计量  利用  鲁棒  过程
修稿时间:2007-01-14

Selective Bayes Classifiers for Incomplete Data
Chen Jingnian,Huang Houkuan,Tian Fengzhan,Fu Shujun.Selective Bayes Classifiers for Incomplete Data[J].Journal of Computer Research and Development,2007,44(8):1324-1330.
Authors:Chen Jingnian  Huang Houkuan  Tian Fengzhan  Fu Shujun
Affiliation:1 School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044;2 Department of Information and Computing Science, Shandong University of Finance, Jinan 250014
Abstract:Selective classifiers have been proved to be a kind of algorithms that can effectively improve the accuracy and efficiency of classification by deleting irrelevant or redundant attributes of a data set.Though some selective classifiers have been proposed,most of them deal with complete data,which is due to the complexity of dealing with incomplete data.Yet actual data sets are often incomplete and have many redundant or irrelevant attributes because of various kinds of reason.Similar to the case of complete data,irrelevant or redundant attributes of an incomplete data set can also sharply reduce the accuracy of a classifier established on this data set.So constructing selective classifiers for incomplete data is an important problem.With the analysis of main methods of processing incomplete data for classification,two selective Bayes classifiers for incomplete data,which are denoted as SRBC and CBSRBC respectively,are presented.While SRBC is constructed by using the robust Bayes classifiers,CBSRBC is based on SRBC and chi-squared statistics.Experiments on twelve benchmark incomplete data sets show that these two algorithms can not only enormously reduce the number of attributes,but also greatly improve the accuracy and stability of classification as well.On the whole,CBSRBC is more efficient than SRBC and its classification accuracy is higher than that of SRBC.But some thresholds necessary to CBSRBC can be avoided by SRBC.
Keywords:Bayesian method  classification  feature selection  incomplete data  chi-squared statistics
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号