Exploring and exploiting a historical corpus for Arabic期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Exploring and exploiting a historical corpus for Arabic

Authors:	Email author View author&#;s OrcID profile" target="_blank">Bassam?Hammo Email author View author&#;s OrcID profile Sane?Yagi Omaima?Ismail Mohammad?AbuShariah

Affiliation:	1.Computer Information Systems Department, King Abdullah II School for Information Technology,University of Jordan,Amman,Jordan;2.Linguistics Department,University of Jordan,Amman,Jordan

Abstract:	This paper presents a historical Arabic corpus named HAC. At this early embryonic stage of the project, we report about the design, the architecture and some of the experiments which we have conducted on HAC. The corpus, and accordingly the search results, will be represented using a primary XML exchange format. This will serve as an intermediate exchange tool within the project and will allow the user to process the results offline using some external tools. HAC is made up of Classical Arabic texts that cover 1600 years of language use; the Quranic text, Modern Standard Arabic texts, as well as a variety of monolingual Arabic dictionaries. The development of this historical corpus assists linguists and Arabic language learners to effectively explore, understand, and discover interesting knowledge hidden in millions of instances of language use. We used techniques from the field of natural language processing to process the data and a graph-based representation for the corpus. We provided researchers with an export facility to render further linguistic analysis possible.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏