Extracting an Arabic Lexicon from Arabic Newspaper Text期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Extracting an Arabic Lexicon from Arabic Newspaper Text

Authors:	Saleem Abuleil and Martha Evens

Affiliation:	(1) Chicago State University, 9501 S. King Drive, Chicago, IL 60628, USA;(2) Illinois Institute of Technology, 10 West 31 Street, Chicago, IL 60616, USA

Abstract:	We describe how to build a largecomprehensive, integrated Arabic lexicon byautomatic parsing of newspaper text. We havebuilt a parser system to read Arabic newspaperarticles, isolate the tokens from them, findthe part of speech, and the features for eachtoken. To achieve this goal we designed a setof algorithms, we generated several sets ofrules, and we developed a set of techniques,and a set of components to carry out thesetechniques. As each sentence is processed, newwords and features are added to the lexicon, sothat it grows continuously as the system runs.To test the system we have used 100 articles(80,444 words) from the Al-Raya newspaper.The system consists of several modules: thetokenizer module to isolate the tokens, the type findersystem to find the part of speech of eachtoken, the proper noun phrase parser module tomark the proper nouns and to discover someinformation about them and the feature findermodule to find the features of the words.

Keywords:	morphology analyzer parser part of speech proper nouns tokenizer
本文献已被 SpringerLink 等数据库收录！