首页 | 本学科首页   官方微博 | 高级检索  
     

一种增量贝叶斯分类模型
引用本文:宫秀军,刘少辉,史忠植.一种增量贝叶斯分类模型[J].计算机学报,2002,25(6):645-650.
作者姓名:宫秀军  刘少辉  史忠植
作者单位:中国科学院计算技术研究所智能信息处理重点实验室,北京,100080
基金项目:国家自然科学基金 (60 0 73 0 19,6980 3 0 10 )资助
摘    要:分类一直是机器学习,模型识别和数据挖掘研究的核心问题,从海量数据中学习分类知识,尤其是当获得大量的带有类别标注的样本代价较高时,增量学习是解决该问题的有效途径,该文将简单贝叶期方法应用于增量分类中,提出了一种增量贝叶斯学习模型,给出了增量贝叶斯推理过程,包括增量地修正分类器参数和增量地分类测试样本,实验结果表明,该算法是可行的和有效。

关 键 词:增量贝叶斯分类模型  增量学习  数据挖掘  人工智能
修稿时间:2000年12月17

An Incremental Bayes Classification Model
GONG Xiu Jun,LIU Shao Hui,SHI Zhong Zhi.An Incremental Bayes Classification Model[J].Chinese Journal of Computers,2002,25(6):645-650.
Authors:GONG Xiu Jun  LIU Shao Hui  SHI Zhong Zhi
Abstract:Classification has been considered as a hot research area in machine learning, pattern recognition and data mining. Incremental learning is an effective method for learning the classification knowledge from massive data, especially in the situation of high cost in getting labeled training examples. Firstly, this paper discusses the difference between Bayesian estimation and classical parameter estimation and denotes the fundamental principle for incorporating the prior knowledge in Bayesian learning. Then we provide the incremental Bayesian learning model. This model explains the Bayesian learning process that changes the belief with the prior knowledge and new examples information. By selecting the Dirichlet prior distribution, we show this process in detail. In the second session, we mainly discuss the incremental process. For new examples for incremental learning, there exist two statuses: with labels and without labels. As for examples with labels, it is easy to update the classification parameter with the help of conjunct Dirichlet distribution. So it is the key point to learn from unlabeled examples. Different from the method provided by Kamal Nigam, which learns from unlabeled examples using EM algorithm, we focus on the next example that would be selected in learning. This paper gives a method measuring the classification loss with 0 1 loss. We will select the examples that minimize the classification loss. Meanwhile, to improve the algorithm performance, the pool based technique is introduced. For each turn, we only compute the classification loss for examples in pool. Because the basic operations in learning are updating the classification parameters and classifying test instances incrementally, we give their approximate expressions. For testing algorithm's efficiency, this paper makes an experiment on mushroom data set in UCI repository. The initial training set contains 6 labeled examples. Then several unlabeled examples are added. The final experimental results show that this algorithm is feasible and effective.
Keywords:simple Bayes  incremental learning  conjugate Dirichlet distribution  data mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号