首页 | 本学科首页   官方微博 | 高级检索  
     


Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization
Authors:Amine Chennoufi  Azzeddine Mazroui
Affiliation:1.Department of Mathematics and Computer Science, Faculty of Sciences,University Mohamed First,Oujda,Morocco
Abstract:The absence of short vowels in Arabic texts is the source of some difficulties in several automatic processing systems of Arabic language. Several developed hybrid systems of automatic diacritization of the Arabic texts are presented and evaluated in this paper. All these approaches are based on three phases: a morphological step followed by statistical phases based on Hidden Markov Model at the word level and at the character level. The two versions of the morpho-syntactic analyzer Alkhalil were used and tested and the outputs of this stage are the different possible diacritizations of words. A lexical database containing the most frequent words in the Arabic language has been incorporated into some systems in order to make the system faster. The learning step was performed on a large Arabic corpus and the impact of the size of this learning corpus on the performance of the system was studied. The systems use smoothing techniques to circumvent the problem of missing transitions words and the Viterbi algorithm to select the optimal solution. Our proposed system that benefits from the wealth of morphological analysis and a large diacritized corpus presents interesting experimental results in comparison to other automatic diacritization systems known until now.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号