首页 | 本学科首页   官方微博 | 高级检索  
     


Automatic Word Spacing Using Probabilistic Models Based on Character n-grams
Authors:Do-Gil Lee Hae-Chang Rim Dongsuk Yook
Affiliation:Korea Univ., Seoul;
Abstract:On the Internet, information is largely in text form, which often includes such errors as spelling mistakes. These errors complicate natural language processing because most NLP applications aren't robust and assume that the input data is noise free. Preprocessing is necessary to deal with these errors and meet the growing need for automatic text processing. One kind of such preprocessing is automatic word spacing. This process decides correct boundaries between words in a sentence containing spacing errors, which are a type of spelling error. Except for some Asian languages such as Chinese and Japanese, most languages have explicit word spacing. In these languages, word spacing is crucial to increase readability and to accurately communicate a text's meaning. Automatic word spacing plays an important role not only as a spell-checker module but also as a preprocessor for a morphological analyzer, which is a fundamental tool for NLP applications. Furthermore, automatic word spacing can serve as a postprocessor for optical-character-recognition systems and speech recognition systems
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号