Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian |
| |
Authors: | Connie R. Adsett, Yannick Marchand,Vlado Kes
elj |
| |
Affiliation: | aInstitute for Biodiagnostics (Atlantic), National Research Council Canada, 1796 Summer Street, Suite 3900, Halifax, Nova Scotia, Canada B3H 3A7;bFaculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1W5 |
| |
Abstract: | Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-based automatic syllabification systems and two data-driven automatic syllabification systems (Syllabification by Analogy and the Look-Up Procedure) are compared on a language with lower syllabic complexity – Italian. Comparing the performance using a lexicon containing 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70% word accuracy while the best rule set correctly syllabified 89.77% words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, a language of low syllabic complexity. |
| |
Keywords: | Syllabification Italian language Rule-based systems Data-driven methods Analogy |
本文献已被 ScienceDirect 等数据库收录! |
|