首页 | 本学科首页   官方微博 | 高级检索  
     


Element matching across data-oriented XML sources using a multi-strategy clustering model
Authors:Charnyote  Joachim  
Affiliation:

a Department of Computer Science, Mahidol University, Rama VI Rd., Bangkok 10400, Thailand

b Department of Computer and Information Science and Engineering, University of Florida, Box 116120, 301 CSE Building, Gainesville, FL 32611-6120, USA

Abstract:We describe a family of heuristics-based clustering strategies to support the merging of XML data from multiple sources. As part of this research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when reconciling related XML data from multiple sources. Given the fact that element clustering is compute-intensive, especially when comparing large numbers of data elements that exhibit great representational diversity, performance is a critical, yet so far neglected aspect of the merging process. We have developed five heuristics for clustering data in the multi-dimensional metric space. Equivalence of data elements within the individual clusters is determined using several distance functions that calculate the semantic distances among the elements.

The research described in this article is conducted within the context of the Integration Wizard (IWIZ) project at the University of Florida. IWIZ enables users to access and retrieve information from multiple XML-based sources through a consistent, integrated view. The results of our qualitative analysis of the clustering heuristics have validated the feasibility of our approach as well as its superior performance when compared to other similarity search techniques.

Keywords:Author Keywords: Element matching  Information integration  Object clustering  Reconciliation  XML
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号