首页 | 本学科首页   官方微博 | 高级检索  
     


Morphological query expansion and language-filtering words for improving Basque web retrieval
Authors:Igor Leturia  Antton Gurrutxaga  Nerea Areta  Iñaki Alegria  Aitzol Ezeiza
Affiliation:1. Elhuyar Foundation, Usurbil, Gipuzkoa, Spain
2. University of the Basque Country, Donostia/San Sebastian, Gipuzkoa, Spain
Abstract:The experience of a user of major search engines or other web information retrieval services looking for information in the Basque language is far from satisfactory: they only return pages with exact matches but no inflections (necessary for an agglutinative language like Basque), many results in other languages (no search engine gives the option to restrict its results to Basque), etc. This paper proposes using morphological query expansion and language-filtering words in combination with the APIs of search engines as a very cost-effective solution to build appropriate web search services for Basque. The implementation details of the methodology (choosing the most appropriate language-filtering words, the number of them, the most frequent inflections for the morphological query expansion, etc.) have been specified by corpora-based studies. The improvements produced have been measured in terms of precision and recall both over corpora and real web searches. Morphological query expansion can improve recall up to 47 % and language-filtering words can raise precision from 15 % to around 90 %, although with a loss in recall of about 30–35 %. The proposed methodology has already been successfully used in the Basque search service Elebila (http://www.elebila.eu) and the web-as-corpus tool CorpEus (http://www.corpeus.org), and the approach could be applied to other morphologically rich or under-resourced languages as well.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号