XML文档相似性的仿真研究 Simulation Research on XML Documents Similarity期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

XML文档相似性的仿真研究

引用本文：	陆翠明,李芳,Athena I Vakali.XML文档相似性的仿真研究[J].计算机仿真,2005,22(12):300-303.

作者姓名：	陆翠明李芳 Athena I Vakali

作者单位：	1. 上海交通大学计算机系,上海,200030 2. 亚里斯多德大学信息系,希腊

摘要：	XML文档相似性的计算是XML文档分类中的一个难题。文中描述了一种基于结构的方法，通过序列化模式挖掘方法，挖掘出两个文档之间的最大相似路径，从而可以通过计算最大相似的路径的节点数目和所有路径的节点数目的比值，得到两个文档之间的相似度。文章提出了一种新的最小化XML文档的方法，并且综合考虑了文档节点的语义相似度和结构相似度，从而进一步地提高了计算文档相似度的精度。实验表明，该方法有着良好的应用前景。
关键词：	扩展标识语言信息检索数据挖掘序列化模式挖掘
文章编号：	1006-9348(2005)12-0300-03
修稿时间：	2004年9月7日
Simulation Research on XML Documents Similarity

LU Cui-ming,LI Fang,Athena I Vakali.Simulation Research on XML Documents Similarity[J].Computer Simulation,2005,22(12):300-303.

Authors:	LU Cui-ming LI Fang Athena I Vakali

Affiliation:	LU Cui-ming1,LI Fang1,Athena I Vakali2

Abstract:	Computing similarity between XML documents has been a big puzzle in documents classifying. This paper firstly proposes a model for computing XML documents similarity. Then it uses XMLGenerator to simulate implementing test. The paper describes a method based on structure, which uses sequential pattern mining approach to find out the maximal common paths in two XML document trees. Then we measure similarity as the ratio between maximal common paths and all paths extracted from XML document tree. A novel approach to minimize XML document is proposed and semantic similarity and structural similarity are both considered to improve similarity between two XML documents. There is a good future of our method.

Keywords:	Extensible markup language(XML) Information retrieval Data mining Sequential pattern mining
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏