首页 | 本学科首页   官方微博 | 高级检索  
     


A word-building method based on neural network for text classification
Authors:Kai Shuang  Zhixuan Zhang  Jonathan Loo  Sen Su
Affiliation:1. State Key Laboratory of Networking &2. Switching Technology, Beijing University of Posts and Telecommunications (BUPT), Beijing, P.R.China;3. Science and Technology on Communication Networks Laboratory, Beijing, P.R.China;4. School of Computing and Engineering, University of West London, London, UK
Abstract:Text classification is a foundational task in many natural language processing applications. All traditional text classifiers take words as the basic units and conduct the pre-training process (like word2vec) to directly generate word vectors at the first step. However, none of them have considered the information contained in word structure which is proved to be helpful for text classification. In this paper, we propose a word-building method based on neural network model that can decompose a Chinese word to a sequence of radicals and learn structure information from these radical level features which is a key difference from the existing models. Then, the convolutional neural network is applied to extract structure information of words from radical sequence to generate a word vector, and the long short-term memory is applied to generate the sentence vector for the prediction purpose. The experimental results show that our model outperforms other existing models on Chinese dataset. Our model is also applicable to English as well where an English word can be decomposed down to character level, which demonstrates the excellent generalisation ability of our model. The experimental results have proved that our model also outperforms others on English dataset.
Keywords:Convolutional neural network  long short term memory  structure information  text classification  word-building method
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号