A hybrid system for German encyclopedia alignment |
| |
Authors: | Roman Kern Christin Seifert Michael Granitzer |
| |
Affiliation: | 1. Graz University of Technology, Knowledge Management Institute, Inffeldgasse 21a, 8010, Graz, Austria 2. Know-Center GmbH and Graz University of Technology, Knowledge Management Institute, Inffeldgasse 21a, 8010, Graz, Austria
|
| |
Abstract: | Collaboratively created on-line encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and started an initiative to merge their corpora to create a single, more complete encyclopedia. The crucial step in this merging process is the alignment of articles. We have developed a two-step hybrid system to provide high-accurate alignments with low manual effort. First, we apply an information retrieval based, automatic alignment algorithm. Second, the articles with a low confidence score are revised using a manual alignment scheme carefully designed for quality assurance. Our evaluation shows that a combination of weighting and ranking techniques utilizing different facets of the encyclopedia articles allow to effectively reduce the number of necessary manual alignments. Further, the setup of the manual alignment turned out to be robust against inter-indexer inconsistencies. As a result, the developed system empowered us to align four encyclopedias with high accuracy and low effort. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|