首页 | 本学科首页   官方微博 | 高级检索  
     


Investigation and modeling of the structure of texting language
Authors:Monojit Choudhury  Rahul Saraf  Vijit Jain  Animesh Mukherjee  Sudeshna Sarkar  Anupam Basu
Affiliation:(1) Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India;(2) Department of Computer Engineering, Malaviya National Institute of Technology, Jaipur, India;(3) D.E. Shaw India Software Private Ltd, Hyderabad, India
Abstract:Language usage over computer mediated discourses, such as chats, emails and SMS texts, significantly differs from the standard form of the language and is referred to as texting language (TL). The presence of intentional misspellings significantly decrease the accuracy of existing spell checking techniques for TL words. In this work, we formally investigate the nature and type of compressions used in SMS texts, and develop a Hidden Markov Model based word-model for TL. The model parameters have been estimated through standard machine learning techniques from a word-aligned SMS and standard English parallel corpus. The accuracy of the model in correcting TL words is 57.7%, which is almost a threefold improvement over the performance of Aspell. The use of simple bigram language model results in a 35% reduction of the relative word level error rates.
Keywords:Texting language  SMS  Hidden Markov Model  Text correction  Spell checking
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号