首页 | 本学科首页   官方微博 | 高级检索  
     

基于混合概率模型的无监督离散化算法
引用本文:李刚.基于混合概率模型的无监督离散化算法[J].计算机学报,2002,25(2):158-164.
作者姓名:李刚
作者单位:1. 迪多大学计算与数学学院,VIC,3168,澳大利亚
2. 上海大学计算机科学系,上海,201800
基金项目:自然科学基金 (69873 0 3 1)资助
摘    要:现实应用中常常涉及许多连续的数值属性,而且前许多机器学习算法则要求所处理的属性取离散值,根据在对数值属性的离散化过程中,是否考虑相关类别属性的值,离散化算法可分为有监督算法和无监督算法两类。基于混合概率模型,该文提出了一种理论严格的无监督离散化算法,它能够在无先验知识,无类别是属性的前提下,将数值属性的值域划分为若干子区间,再通过贝叶斯信息准则自动地寻求最佳的子区间数目和区间划分方法。

关 键 词:人工智能  机器学习  混合概率模型  无监督离散化算法
修稿时间:2000年4月4日

An Unsupervised Discretization Algori thm Based on Mixture Probabilistic Model.
LI Gang,TONG Fu.An Unsupervised Discretization Algori thm Based on Mixture Probabilistic Model.[J].Chinese Journal of Computers,2002,25(2):158-164.
Authors:LI Gang  TONG Fu
Affiliation:LI Gang 1) TONG Fu 2) 1)
Abstract:Many existing machine learning algorithms expect the attributes to be discrete. In this paper we describe a theoretically rigorous algorithm for discretization of continuous attributes based on mixture probabilistic models. This algorithm can automatically divide the range of specified attribute into intervals without prior knowledge or referencing attributes. A mixture probabilistic model in which each mixture component corresponding to a different interval represents all the attribute values. The Expectation Maximization algorithm for maximum likelihood determines the parameters for the mixture probabilistic model. One advantage of mixture probabilistic model approach to discretizing is that it allows the use of approximate Bayes factors to compare models. In order to determine the most suitable number of intervals, the maximum likelihood parameters for mixture probability model with different number of components are calculated, and BIC(Bayesian Information Criteria) of these models are compared. From them, we can choose the model with the highest BIC as the resulting generative probabilistic model and determining the number of intervals. So choosing the best model simultaneously solves the problem of determining the number of intervals and the dividing method. Experimental results show that this form of discretization can have distinct advantages over competing non probabilistic approaches (such as K means algorithm) for certain reasons, since it allows uncertainty in interval membership, direct control over the variability over the variability is allowed within each interval, and permits an objective treatment of the ever thorny question of how many intervals are being suggested by data.
Keywords:artificial intelligence  machine learning  discretization  mixture probabilistic model  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号