首页 | 本学科首页   官方微博 | 高级检索  
     

利用拼音特征的深度学习文本分类模型
引用本文:赵博轩,房宁,赵群飞,张朋柱.利用拼音特征的深度学习文本分类模型[J].高技术通讯,2017,27(7).
作者姓名:赵博轩  房宁  赵群飞  张朋柱
作者单位:1. 上海交通大学自动化系 系统控制与信息处理教育部重点实验室 上海200240;2. 上海交通大学人文学院 上海200030;3. 上海交通大学安泰经济与管理学院 上海200030
摘    要:针对人-机器人语音交互中经过语音识别的文本指令,提出了一种利用汉语拼音中声韵母作为特征的深度学习文本分类模型。首先,以无人驾驶车语音导航控制为人机交互的应用背景,分析其文本指令结构并分别构建单一意图与复杂意图语料库;其次,在以字符作为文本分类特征的基础上,结合汉语拼音与英文单词的区别,提出了一种利用拼音声韵母字符作为中文文本分类的特征表示方法;然后,用门控递归单元(GRU)代替传统递归神经网络单元以解决其难以捕获长时间维度特征的不足,为提取信息的高阶特征、缩短特征序列长度并加快模型收敛速度,建立了一种结合卷积神经网络及GRU递归神经网络的深度学习文本分类模型。最后,为验证模型在处理长、短序列任务上的表现,在上述两个语料库上对提出的模型分别进行十折交叉测试,并与其他分类方法进行比较与分析,结果表明该模型显著地提高了分类准确率。

关 键 词:文本分类  意图理解  声韵母特征  门控递归单元(GRU)

A deep learning model for text classification using phonetic features
Zhao Boxuan,Fang Ning,Zhao Qunfei,Zhang Pengzhu.A deep learning model for text classification using phonetic features[J].High Technology Letters,2017,27(7).
Authors:Zhao Boxuan  Fang Ning  Zhao Qunfei  Zhang Pengzhu
Abstract:A deep learning model using the features of the consonant and vowel in Chinese Pinyin was proposed for the intention texts speech recognized in human-robot voice interaction.Firstly, by taking unmanned vehicle voice navi-gation as the application scenarios of human-robot interaction, the intention text structure was analyzed, and a sin-gle intention corpus and a complex intention corpus were built respectively;Secondly, based on the character-level features in text classification, a feature representation method using consonant and vowel in Pinyin for Chinese text classification was proposed with considering the differences between Chinese Pinyin and English words.Thirdly, traditional recurrent neural network ( RNN) units were replaced by gated recurrent units ( GRU) for the problem of difficulties in capturing long-term dependencies.To extract high-level features, shorten the length of feature se-quences and increase the convergence rate of the model, a deep learning model combining the convolutional neural network ( CNN) with the GRU-RNN was established.Finally, to evaluate the performance of the model on short and long sequence tasks, 10-fold cross validations were implemented on corpuses for two tasks respectively, and then the comparisons and analysis were carried out against other classification methods.The result shows that the proposed model can significantly improve the accuracy of classification for the intention texts.
Keywords:text classification  intention understanding  features of consonant and vowel  gated recurrent units ( GRU)
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号