首页 | 本学科首页   官方微博 | 高级检索  
     


Korean automatic spacing using pretrained transformer encoder and analysis
Authors:Taewook Hwang  Sangkeun Jung  Yoon-Hyung Roh
Affiliation:1. Computer Science & Engineering, ChungNam National University, Daejeon, Republic of Korea;2. Language Intelligence Research Section, Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea
Abstract:Automatic spacing in Korean is used to correct spacing units in a given input sentence. The demand for automatic spacing has been increasing owing to frequent incorrect spacing in recent media, such as the Internet and mobile networks. Therefore, herein, we propose a transformer encoder that reads a sentence bidirectionally and can be pretrained using an out-of-task corpus. Notably, our model exhibited the highest character accuracy (98.42%) among the existing automatic spacing models for Korean. We experimentally validated the effectiveness of bidirectional encoding and pretraining for automatic spacing in Korean. Moreover, we conclude that pretraining is more important than fine-tuning and data size.
Keywords:attention  BERT  Korean automatic spacing  natural language processing  pretrained transformer encoder
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号