首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的混合XML文档聚类方法
引用本文:王桐,刘大昕. 一种新的混合XML文档聚类方法[J]. 哈尔滨工程大学学报, 2007, 28(6): 697-701
作者姓名:王桐  刘大昕
作者单位:哈尔滨工程大学,信息与通信工程学院,黑龙江,哈尔滨,150001;哈尔滨工程大学,计算机科学与技术学院,黑龙江,哈尔滨,150001
摘    要:为了提高大规模半结构化文档集的聚类质量,提出了一种新的XML文档聚类方法.从XML文档中提取层次路径序列,以此为依据将XML文档表示为VSM中的向量,将欧氏空间对应于粒子群模型的问题空间,采用粒子群聚类方法进行文档聚类.为了加速算法的收敛性,在算法的后续部分采用C-means进行快速局部调优,提出两阶段混合聚类方法,优点是能够跳出局部极值,搜寻整个问题空间的同时又保证了合理的时间.实验结果表明提出的方法具有较高的聚类准确性和较好的收敛程度.

关 键 词:XML  聚类  粒子群优化
文章编号:1006-7043(2007)06-0697-05
修稿时间:2006-03-06

A novel mixed clustering method for XML documents
WANG Tong,LIU Da-xin. A novel mixed clustering method for XML documents[J]. Journal of Harbin Engineering University, 2007, 28(6): 697-701
Authors:WANG Tong  LIU Da-xin
Affiliation:1. College of Information and Communications Engineering, Harbin Engineering University, Harbin 150001, Chinas2. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
Abstract:To improve the clustering quality of massive extensible markup language(XML) document collections,this paper proposes a novel XML document clustering method.First,the approach extracts hierarchy path sequences from documents and uses them to transform documents into vectors in a Euclidean space.Based on the particle swarm model,a clustering method using PSO(particle swarm optimization) is then applied.In order to improve the convergence of the algorithm,a C-means algorithm is applied in the final stage so that the enhanced mixed algorithm MCPX is obtained.The advantages of the MCPX algorithm is that it can skip out of the local optima of the search space to obtain a global optima with reasonable time expense.Experimental results show that the proposed technique has satisfactory clustering convergence and accuracy.
Keywords:extensible markup language(XML)  document clustering  particle swarm optimization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号