首页 | 本学科首页   官方微博 | 高级检索  
     


Improving Arabic information retrieval using word embedding similarities
Authors:Abdelkader El Mahdaouy  Saïd Ouatik El Alaoui  Eric Gaussier
Affiliation:1.Laboratory of Informatics and Modeling, Faculty of Sciences Dhar el Mahraz,Sidi Mohamed Ben Abdellah University,Fez,Morocco;2.Université Grenoble Alpes, CNRS, Grenoble INP, LIG,Grenoble,France
Abstract:Term mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words in the matching process. In this paper, we propose a method to incorporate word embedding (WE) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch. Experiments are performed on the standard Arabic TREC collection using three neural word embedding models. The results show that extending the existing IR models improves significantly baseline bag-of-words models. Although the proposed extensions significantly outperform their baseline bag-of-words, the difference between the evaluated neural word embedding models is not statistically significant. Moreover, the overall comparison results show that our extensions significantly improve the Arabic WordNet based semantic indexing approach and three recent WE-based IR language models.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号