首页 | 本学科首页   官方微博 | 高级检索  
     

XML结构聚类
引用本文:郝晓丽,冯志勇.XML结构聚类[J].计算机应用,2005,25(6):1398-1400.
作者姓名:郝晓丽  冯志勇
作者单位:天津大学,电子信息工程学院,天津,300072;天津大学,电子信息工程学院,天津,300072
基金项目:天津市自然科学基金资助项目(043800411)
摘    要:针对当前XML文档结构聚类算法的一些不足,提出采用段匹配的概念来计算两棵XML文档树中的路径相似性,并在此基础上得出两棵树整体的相似度量。在整个聚类过程中,算法还把一组相关文档与一个XML聚类代表相关联,该聚类代表就包含了一个文档集合中所有文档的最相关的特征。为了构建聚类代表,算法通过构造最佳匹配树,合并树,修剪树三步来实现。通过比较聚类代表,发现新的聚类时更新聚类代表来完成文档聚类。实验结果就充分展现了算法的有效性。

关 键 词:文档聚  XML  聚类代表  匹配段
文章编号:1001-9081(2005)06-1398-03

XML documents structured Cluster
HAO Xiao-li,FENG Zhi-yong.XML documents structured Cluster[J].journal of Computer Applications,2005,25(6):1398-1400.
Authors:HAO Xiao-li  FENG Zhi-yong
Affiliation:HAO Xiao-li,Feng Zhi-yongCollege of Electronic and Information,Tianjin University,Tianjin 300072,China)
Abstract:This article proposed a novel way for clustering XML documents against the defects of the methods in existence. Based on the conception of segment matching, calculation the similarity of two XML trees, which was used to measure the similarity between the two integrated XML trees. In the whole clustering procession, it equiped each cluster with XML cluster representative, which subsumed the most typical structural specifics of a set of XML documents. The cluster representative was constructed by three successive steps named Tree matching, Tree merging and Tree pruning. Then clustering was accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. And finally the effectiveness of the clustering method is evaluated by testing results.
Keywords:document clustering  XML  cluster representative  matching segment
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号