首页 | 本学科首页   官方微博 | 高级检索  
     


Learning to Match the Schemas of Data Sources: A Multistrategy Approach
Authors:Doan  AnHai  Domingos  Pedro  Halevy  Alon
Affiliation:(1) Department of Computer Science, University of Illinois, Urbana-Champaign, IL 61801, USA;(2) Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
Abstract:The problem of integrating data from multiple data sources—either on the Internet or within enterprises—has received much attention in the database and AI communities. The focus has been on building data integration systems that provide a uniform query interface to the sources. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the query interface and the source schemas. Examples of mappings are ldquoelement location maps to addressrdquo and ldquoprice maps to listed-pricerdquo. We propose a multistrategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data, then combines the predictions of the modules using a meta-learner. Learner modules employ a variety of techniques, ranging from Naive Bayes and nearest-neighbor classification to entity recognition and information retrieval. We describe the LSD system, which employs this approach to find semantic mappings. To further improve matching accuracy, LSD exploits domain integrity constraints, user feedback, and nested structures in XML data. We test LSD experimentally on several real-world domains. The experiments validate the utility of multistrategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy.
Keywords:schema matching  multistrategy learning  data integration
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号