首页 | 本学科首页   官方微博 | 高级检索  
     


A hybrid system for German encyclopedia alignment
Authors:Roman Kern  Christin Seifert  Michael Granitzer
Affiliation:1. Graz University of Technology, Knowledge Management Institute, Inffeldgasse 21a, 8010, Graz, Austria
2. Know-Center GmbH and Graz University of Technology, Knowledge Management Institute, Inffeldgasse 21a, 8010, Graz, Austria
Abstract:Collaboratively created on-line encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and started an initiative to merge their corpora to create a single, more complete encyclopedia. The crucial step in this merging process is the alignment of articles. We have developed a two-step hybrid system to provide high-accurate alignments with low manual effort. First, we apply an information retrieval based, automatic alignment algorithm. Second, the articles with a low confidence score are revised using a manual alignment scheme carefully designed for quality assurance. Our evaluation shows that a combination of weighting and ranking techniques utilizing different facets of the encyclopedia articles allow to effectively reduce the number of necessary manual alignments. Further, the setup of the manual alignment turned out to be robust against inter-indexer inconsistencies. As a result, the developed system empowered us to align four encyclopedias with high accuracy and low effort.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号