基于混合概率模型的无监督离散化算法 An Unsupervised Discretization Algori thm Based on Mixture Probabilistic Model.期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于混合概率模型的无监督离散化算法

引用本文：	李刚.基于混合概率模型的无监督离散化算法[J].计算机学报,2002,25(2):158-164.

作者姓名：	李刚

作者单位：	1. 迪多大学计算与数学学院,VIC,3168,澳大利亚 2. 上海大学计算机科学系,上海,201800

基金项目：	自然科学基金 (69873 0 3 1)资助

摘要：	现实应用中常常涉及许多连续的数值属性，而且前许多机器学习算法则要求所处理的属性取离散值，根据在对数值属性的离散化过程中，是否考虑相关类别属性的值，离散化算法可分为有监督算法和无监督算法两类。基于混合概率模型，该文提出了一种理论严格的无监督离散化算法，它能够在无先验知识，无类别是属性的前提下，将数值属性的值域划分为若干子区间，再通过贝叶斯信息准则自动地寻求最佳的子区间数目和区间划分方法。
关键词：	人工智能机器学习混合概率模型无监督离散化算法
修稿时间：	2000年4月4日
An Unsupervised Discretization Algori thm Based on Mixture Probabilistic Model.

LI Gang,TONG Fu.An Unsupervised Discretization Algori thm Based on Mixture Probabilistic Model.[J].Chinese Journal of Computers,2002,25(2):158-164.

Authors:	LI Gang TONG Fu

Affiliation:	LI Gang 1) TONG Fu 2) 1)

Abstract:	Many existing machine learning algorithms expect the attributes to be discrete. In this paper we describe a theoretically rigorous algorithm for discretization of continuous attributes based on mixture probabilistic models. This algorithm can automatically divide the range of specified attribute into intervals without prior knowledge or referencing attributes. A mixture probabilistic model in which each mixture component corresponding to a different interval represents all the attribute values. The Expectation Maximization algorithm for maximum likelihood determines the parameters for the mixture probabilistic model. One advantage of mixture probabilistic model approach to discretizing is that it allows the use of approximate Bayes factors to compare models. In order to determine the most suitable number of intervals, the maximum likelihood parameters for mixture probability model with different number of components are calculated, and BIC(Bayesian Information Criteria) of these models are compared. From them, we can choose the model with the highest BIC as the resulting generative probabilistic model and determining the number of intervals. So choosing the best model simultaneously solves the problem of determining the number of intervals and the dividing method. Experimental results show that this form of discretization can have distinct advantages over competing non probabilistic approaches (such as K means algorithm) for certain reasons, since it allows uncertainty in interval membership, direct control over the variability over the variability is allowed within each interval, and permits an objective treatment of the ever thorny question of how many intervals are being suggested by data.

Keywords:	artificial intelligence machine learning discretization mixture probabilistic model
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏