首页 | 本学科首页   官方微博 | 高级检索  
     

符号序列的概率向量聚类方法*
引用本文:程铃钫,陈黎飞.符号序列的概率向量聚类方法*[J].计算机应用研究,2018,35(6).
作者姓名:程铃钫  陈黎飞
作者单位:福建农林大学 金山学院,福建师范大学 数学与计算机科学学院
基金项目:国家自然科学(61672157)
摘    要:针对符号序列聚类中表示模型及序列间距离度量定义的困难问题,提出一种基于概率向量的表示模型及基于该模型的符号序列聚类算法。该模型引入符号序列的概率分布表示法,定义了一种基于概率分布差异的符号序列距离度量及该模型的目标函数,最后给出了一种符号序列K-均值型聚类算法,并在来自不同领域的实际应用序列集上进行了实验验证。实验结果表明,与基于子序列表示模型的符号序列聚类算法相比,所提方法在DNA序列和语音序列等具有较多符号的实际数据上,有效提高聚类精度的同时降低聚类时间50%以上。

关 键 词:数据聚类  符号序列  向量空间模型  概率向量  Markov模型
收稿时间:2017/1/16 0:00:00
修稿时间:2018/4/30 0:00:00

Clustering method for symbolic sequences using probability vectors
Cheng Lingfang and Chen Lifei.Clustering method for symbolic sequences using probability vectors[J].Application Research of Computers,2018,35(6).
Authors:Cheng Lingfang and Chen Lifei
Affiliation:Jinshan College of Fujian Agriculture and Forestry University,
Abstract:A representation model using probability vectors of symbolic sequences and a new clustering algorithm based on the model were proposed in this paper, to address the difficult problems in defining an efficient representation as well as a meaningful distance measure for symbolic sequences clustering. A probability-distribution-based representation method for symbolic sequences was proposed, on which a new distance measure computed on the dissimilarity of the probability distributions was first defined,and a clustering criterion for sequences clustering with the probability vector space model was also defined. Finally, we described a K-Means-type algorithm for symbolic sequences clustering, and conducted a series of experiments on real-world sequence sets from various domains to evaluate its performance. The experimental results showed that, on both gene sequences and speech sequences consisting of a relatively large number of symbols, the proposed method improves the clustering accuracy effectively with more than 50% decrease in the clustering time, compared with the existing algorithms using a subsequence-based representation model.
Keywords:data clustering  symbolic sequence  vector space model  probability vectors  Markov model
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号