Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian

Authors:	Connie R. Adsett, Yannick Marchand,Vlado Kes elj

Affiliation:	^aInstitute for Biodiagnostics (Atlantic), National Research Council Canada, 1796 Summer Street, Suite 3900, Halifax, Nova Scotia, Canada B3H 3A7;^bFaculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1W5

Abstract:	Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-based automatic syllabification systems and two data-driven automatic syllabification systems (Syllabification by Analogy and the Look-Up Procedure) are compared on a language with lower syllabic complexity – Italian. Comparing the performance using a lexicon containing 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70% word accuracy while the best rule set correctly syllabified 89.77% words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, a language of low syllabic complexity.

Keywords:	Syllabification Italian language Rule-based systems Data-driven methods Analogy
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏