一种增量贝叶斯分类模型 An Incremental Bayes Classification Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种增量贝叶斯分类模型

引用本文：	宫秀军,刘少辉,史忠植.一种增量贝叶斯分类模型[J].计算机学报,2002,25(6):645-650.

作者姓名：	宫秀军刘少辉史忠植

作者单位：	中国科学院计算技术研究所智能信息处理重点实验室,北京,100080

基金项目：	国家自然科学基金 (60 0 73 0 19,6980 3 0 10 )资助

摘要：	分类一直是机器学习，模型识别和数据挖掘研究的核心问题，从海量数据中学习分类知识，尤其是当获得大量的带有类别标注的样本代价较高时，增量学习是解决该问题的有效途径，该文将简单贝叶期方法应用于增量分类中，提出了一种增量贝叶斯学习模型，给出了增量贝叶斯推理过程，包括增量地修正分类器参数和增量地分类测试样本，实验结果表明，该算法是可行的和有效。
关键词：	增量贝叶斯分类模型增量学习数据挖掘人工智能
修稿时间：	2000年12月17
An Incremental Bayes Classification Model

GONG Xiu Jun,LIU Shao Hui,SHI Zhong Zhi.An Incremental Bayes Classification Model[J].Chinese Journal of Computers,2002,25(6):645-650.

Authors:	GONG Xiu Jun LIU Shao Hui SHI Zhong Zhi

Abstract:	Classification has been considered as a hot research area in machine learning, pattern recognition and data mining. Incremental learning is an effective method for learning the classification knowledge from massive data, especially in the situation of high cost in getting labeled training examples. Firstly, this paper discusses the difference between Bayesian estimation and classical parameter estimation and denotes the fundamental principle for incorporating the prior knowledge in Bayesian learning. Then we provide the incremental Bayesian learning model. This model explains the Bayesian learning process that changes the belief with the prior knowledge and new examples information. By selecting the Dirichlet prior distribution, we show this process in detail. In the second session, we mainly discuss the incremental process. For new examples for incremental learning, there exist two statuses: with labels and without labels. As for examples with labels, it is easy to update the classification parameter with the help of conjunct Dirichlet distribution. So it is the key point to learn from unlabeled examples. Different from the method provided by Kamal Nigam, which learns from unlabeled examples using EM algorithm, we focus on the next example that would be selected in learning. This paper gives a method measuring the classification loss with 0 1 loss. We will select the examples that minimize the classification loss. Meanwhile, to improve the algorithm performance, the pool based technique is introduced. For each turn, we only compute the classification loss for examples in pool. Because the basic operations in learning are updating the classification parameters and classifying test instances incrementally, we give their approximate expressions. For testing algorithm's efficiency, this paper makes an experiment on mushroom data set in UCI repository. The initial training set contains 6 labeled examples. Then several unlabeled examples are added. The final experimental results show that this algorithm is feasible and effective.

Keywords:	simple Bayes incremental learning conjugate Dirichlet distribution data mining
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏