应用<i>k</i>-means算法实现标记分布学习 Label distribution learning based on <i>k</i>-means algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

应用k-means算法实现标记分布学习

引用本文：	邵东恒,杨文元,赵红.应用k-means算法实现标记分布学习[J].智能系统学报,2017,12(3):325-332.

作者姓名：	邵东恒杨文元赵红

作者单位：	闽南师范大学粒计算重点实验室, 福建漳州 363000

摘要：	标记分布学习是近年来提出的一种新的机器学习范式，它能很好地解决某些标记多义性的问题。现有的标记分布学习算法均利用条件概率建立参数模型，但未能充分利用特征和标记间的联系。本文考虑到特征相似的样本所对应的标记分布也应当相似，利用原型聚类的k均值算法（k-means），将训练集的样本进行聚类，提出基于k-means算法的标记分布学习（label distribution learning based on k-means algorithm，LDLKM）。首先通过聚类算法k-means求得每一个簇的均值向量，然后分别求得对应标记分布的均值向量。最后将测试集和训练集的均值向量间的距离作为权重，应用到对测试集标记分布的预测上。在6个公开的数据集上进行实验，并与3种已有的标记分布学习算法在5种评价指标上进行比较，实验结果表明提出的LDLKM算法是有效的。
关键词：	标记分布聚类 k-meansk-means 闵可夫斯基距离多标记权重矩阵均值向量 softmax函数
Label distribution learning based on k-means algorithm

SHAO Dongheng,YANG Wenyuan,ZHAO Hong.Label distribution learning based on k-means algorithm[J].CAAL Transactions on Intelligent Systems,2017,12(3):325-332.

Authors:	SHAO Dongheng YANG Wenyuan ZHAO Hong

Affiliation:	Lab of Granular Computing, Minnan Normal University, Zhangzhou 363000, China

Abstract:	Label distribution learning is a new type of machine learning paradigm that has emerged in recent years. It can solve the problem wherein different relevant labels have different importance. Existing label distribution learning algorithms adopt the parameter model with conditional probability, but they do not adequately exploit the relation between features and labels. In this study, the k-means clustering algorithm, a type of prototype-based clustering, was used to cluster the training set instance since samples having similar features have similar label distribution. Hence, a new algorithm known as label distribution learning based on k-means algorithm (LDLKM) was proposed. It firstly calculated each cluster’s mean vector using the k-means algorithm. Then, it got the mean vector of the label distribution corresponding to the training set. Finally, the distance between the mean vectors of the test set and the training set was applied to predict label distribution of the test set as a weight. Experiments were conducted on six public data sets and then compared with three existing label distribution learning algorithms for five types of evaluation measures. The experimental results demonstrate the effectiveness of the proposed KM-LDL algorithm.

Keywords:	label distribution clustering k-meansk-means Minkowski distance multi-label weight matrix mean vector softmax function

	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏