首页 | 本学科首页   官方微博 | 高级检索  
     


Integrating Deep Learning and Machine Translation for Understanding Unrefined Languages
Authors:HongGeun Ji  Soyoung Oh  Jina Kim  Seong Choi  Eunil Park
Affiliation:1.Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, 03063, Korea2 Raon Data, Seoul, 03073, Korea3 Department of Computer Science and Engineering, University of Minnesota, Minneapolis, 55455, MN, USA
Abstract:In the field of natural language processing (NLP), the advancement of neural machine translation has paved the way for cross-lingual research. Yet, most studies in NLP have evaluated the proposed language models on well-refined datasets. We investigate whether a machine translation approach is suitable for multilingual analysis of unrefined datasets, particularly, chat messages in Twitch. In order to address it, we collected the dataset, which included 7,066,854 and 3,365,569 chat messages from English and Korean streams, respectively. We employed several machine learning classifiers and neural networks with two different types of embedding: word-sequence embedding and the final layer of a pre-trained language model. The results of the employed models indicate that the accuracy difference between English, and English to Korean was relatively high, ranging from 3% to 12%. For Korean data (Korean, and Korean to English), it ranged from 0% to 2%. Therefore, the results imply that translation from a low-resource language (e.g., Korean) into a high-resource language (e.g., English) shows higher performance, in contrast to vice versa. Several implications and limitations of the presented results are also discussed. For instance, we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.
Keywords:Twitch  multilingual  machine translation  machine learning
点击此处可从《计算机、材料和连续体(英文)》浏览原始摘要信息
点击此处可从《计算机、材料和连续体(英文)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号