首页 | 本学科首页   官方微博 | 高级检索  
     


Hamshahri: A standard Persian text collection
Authors:Abolfazl AleAhmad  Hadi Amiri  Ehsan Darrudi  Masoud Rahgozar  Farhad Oroumchian
Affiliation:1. Database Research Group, Control and Intelligent Processing Center Of Excellence, School of Electrical and Computer Engineering, Campus #2, University of Tehran, North Kargar St., Tehran, Iran;2. University of Wollongong in Dubai, Knowledge Village, Dubai, UAE;1. Department of Biotechnology, Sri Venkateswara University, Tirupati 517502, A.P., India;2. Department of Zoology, Sri Venkateswara University, Tirupati 517502, A.P., India;1. Department of Pharmacognosy, Semmelweis University, Üll?i rd. 26, 1085 Budapest, Hungary;2. Department of Applied Chemistry, Faculty of Food Science, Corvinus University of Budapest, Villányi St. 29-43, 1118 Budapest, Hungary;3. Department of Complementer Medicine, University of Pécs, Faculty of Medicine, Vörösmarty St. 4, 7622 Pécs, Hungary;4. Department of Gastroenterology, Saint John Hospital, Diós árok 1-3, 1125 Budapest, Hungary;1. School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran;2. Medical Management and Information School, Shiraz University of Medical Science, Shiraz, Iran
Abstract:The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the different nature of the Persian language compared to the other languages such as English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is the lack of a standard test collection. In this paper, we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgments are presented in this paper. We believe that this collection is the largest Persian text collection, so far.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号