首页 | 本学科首页   官方微博 | 高级检索  
     


Exploring and exploiting a historical corpus for Arabic
Authors:Email authorEmail author  Sane?Yagi  Omaima?Ismail  Mohammad?AbuShariah
Affiliation:1.Computer Information Systems Department, King Abdullah II School for Information Technology,University of Jordan,Amman,Jordan;2.Linguistics Department,University of Jordan,Amman,Jordan
Abstract:This paper presents a historical Arabic corpus named HAC. At this early embryonic stage of the project, we report about the design, the architecture and some of the experiments which we have conducted on HAC. The corpus, and accordingly the search results, will be represented using a primary XML exchange format. This will serve as an intermediate exchange tool within the project and will allow the user to process the results offline using some external tools. HAC is made up of Classical Arabic texts that cover 1600 years of language use; the Quranic text, Modern Standard Arabic texts, as well as a variety of monolingual Arabic dictionaries. The development of this historical corpus assists linguists and Arabic language learners to effectively explore, understand, and discover interesting knowledge hidden in millions of instances of language use. We used techniques from the field of natural language processing to process the data and a graph-based representation for the corpus. We provided researchers with an export facility to render further linguistic analysis possible.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号