首页 | 本学科首页   官方微博 | 高级检索  
     


Extracting an Arabic Lexicon from Arabic Newspaper Text
Authors:Saleem Abuleil and Martha Evens
Affiliation:(1) Chicago State University, 9501 S. King Drive, Chicago, IL 60628, USA;(2) Illinois Institute of Technology, 10 West 31 Street, Chicago, IL 60616, USA
Abstract:We describe how to build a largecomprehensive, integrated Arabic lexicon byautomatic parsing of newspaper text. We havebuilt a parser system to read Arabic newspaperarticles, isolate the tokens from them, findthe part of speech, and the features for eachtoken. To achieve this goal we designed a setof algorithms, we generated several sets ofrules, and we developed a set of techniques,and a set of components to carry out thesetechniques. As each sentence is processed, newwords and features are added to the lexicon, sothat it grows continuously as the system runs.To test the system we have used 100 articles(80,444 words) from the Al-Raya newspaper.The system consists of several modules: thetokenizer module to isolate the tokens, the type findersystem to find the part of speech of eachtoken, the proper noun phrase parser module tomark the proper nouns and to discover someinformation about them and the feature findermodule to find the features of the words.
Keywords:morphology analyzer  parser  part of speech  proper nouns  tokenizer
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号