首页 | 本学科首页   官方微博 | 高级检索  
     

基于频繁子树模式的GML文档结构聚类算法
引用本文:朱颖雯,吉根林,孙勤红. 基于频繁子树模式的GML文档结构聚类算法[J]. 计算机工程与应用, 2011, 47(1): 144-146. DOI: 10.3778/j.issn.1002-8331.2011.01.039
作者姓名:朱颖雯  吉根林  孙勤红
作者单位:1.三江学院 计算机基础部,南京 210012 2.南京师范大学 计算机学院,南京 210097
基金项目:国家自然科学基金No.40871176~~
摘    要:提出了一种基于频繁子树模式的GML文档结构聚类算法GCFS(GML Clustering based on Frequent Subtree patterns),与其他相关算法不同,该算法首先挖掘GML文档集合中的最大与闭合频繁Induced子树,并将其作为聚类特征,根据频繁子树的大小赋予不同的权值,采用余弦函数定义相似度,利用K-Means算法对聚类特征进行聚类。实验结果表明算法GCFS是有效的,具有较高的聚类效率,性能优于其他同类算法。

关 键 词:地理标识语言(GML)结构聚类  最大频繁Induced子树  闭合频繁Induced子树
收稿时间:2009-08-24
修稿时间:2009-10-24 

GML document structural clustering algorithm based on frequent subtree patterns
ZHU Yingwen,JI Genlin,SUN Qinhong. GML document structural clustering algorithm based on frequent subtree patterns[J]. Computer Engineering and Applications, 2011, 47(1): 144-146. DOI: 10.3778/j.issn.1002-8331.2011.01.039
Authors:ZHU Yingwen  JI Genlin  SUN Qinhong
Affiliation:1.Department of Computer Foundation Teaching,Sanjiang University,Nanjing 210012,China 2.School of Computer,Nanjing Normal University,Nanjing 210097,China
Abstract:This paper presents algorithm GCFS for clustering GML document structure based on frequent subtree patterns.It firstly mines all maximal and closed frequent Induced subtrees from GML documents;then chooses some subtree patterns to form the clustering features,weights these features according to the length of subtree pattern,computes the similarity of two GML documents by cosine function,uses K-Means algorithm to cluster documents by clustering features.Experiment results show that GCFS is effective and effi...
Keywords:Geography Markup Language(GML) clustering by structure  maximal frequent Induced subtrees  closed frequent Induced subtrees  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号