首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于赋权联合概率模型的聚类算法
引用本文:姬波,叶阳东,卢红星.一种基于赋权联合概率模型的聚类算法[J].数据采集与处理,2016,31(1):130-138.
作者姓名:姬波  叶阳东  卢红星
作者单位:郑州大学信息工程学院,郑州,450001
摘    要:序列化信息瓶颈 (Sequential information bottleneck, sIB) 算法是一种广泛使用的聚类算法。该算法采用联合概率模型表示数据,对样本和属性的相关性有较好的表达能力。但是sIB算法采用的联合概率模型假设数据各个属性对聚类的贡献度相同,从而削弱了聚类效果。本文提出了赋权联合概率模型概念,采用互信息度量属性重要度,并构建赋权联合概率模型来优化数据表示,从而达到突出代表性属性、抑制冗余属性的目的。UCI数据集上的实验表明,基于赋权联合概率模型的WJPM_sIB算法优于sIB算法,在F1评价下,WJPM_sIB算法聚类结果比sIB算法提高了5.90%。

关 键 词:聚类  属性权重  联合概率模型  序列化信息瓶颈算法  互信息

Clustering Algorithm Based on Weighting Joint Probability Model
Ji Bo,Ye Yangdong,Lu Hongxing.Clustering Algorithm Based on Weighting Joint Probability Model[J].Journal of Data Acquisition & Processing,2016,31(1):130-138.
Authors:Ji Bo  Ye Yangdong  Lu Hongxing
Affiliation:School of Information Engineering, Zhengzhou University, Zhengzhou, 450052, China
Abstract:Sequential information bottleneck (sIB) algorithm is one of the widely used clustering algorithms. The sIB algorithm applies the joint probability model to describe data, which has good ability to express the relationship between data samples and data attributes. However, the sIB algorithm suggests that all data attributes are equally important, which influences the clustering effect. To address the issue, the paper proposes the weighting joint probability model. The proposed model applies the mutual information measurement to the important level of data attributes so that to highlight representative attributes and depress redundancy attributes. Experiments on UCI datasets show that the proposed the weighting joint probability model (WJPM) sIB algorithm based on WJPM improves the F1 measure by 5.90% than the sIB algorithm.
Keywords:clustering  attribute weight  joint probability model  sequential information bottleneck algorithm  mutual information
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号