结合Transformer的轻量化中文语音识别 Lightweight Chinese speech recognition with Transformer期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合Transformer的轻量化中文语音识别

引用本文：	沈逸文,孙俊.结合Transformer的轻量化中文语音识别[J].计算机应用研究,2023,40(2).

作者姓名：	沈逸文孙俊

作者单位：	江南大学人工智能与计算机学院,江南大学人工智能与计算机学院

基金项目：	国家自然科学基金资助项目(61672263);国家自然科学基金委员会联合基金资助项目(U1836218)

摘要：	近年来，深度神经网络模型在语音识别领域成为热门研究对象。然而，深层神经网络的构建依赖庞大的参数和计算开销，过大的模型体积也增加了其在边缘设备上部署的难度。针对上述问题，提出了基于Transformer的轻量化语音识别模型。首先使用深度可分离卷积获得音频特征信息；其次构建了双半步剩余权重前馈神经网络，即Macaron-Net结构，并引入低秩矩阵分解，实现了模型压缩；最后使用稀疏注意力机制，提升了模型的训练速度和解码速度。为了验证模型，在Aishell-1和aidatatang_200zh数据集上进行了测试。实验结果显示，所提模型与Open-Transformer相比，所提模型在字错误率上相对下降了19.8%，在实时率上相对下降了32.1%。
关键词：	语音识别 Transformer 低秩矩阵分解轻量卷积模型压缩稀疏注意力
收稿时间：	2022/6/23 0:00:00
修稿时间：	2022/8/30 0:00:00
Lightweight Chinese speech recognition with Transformer

shen yi wen and sun jun.Lightweight Chinese speech recognition with Transformer[J].Application Research of Computers,2023,40(2).

Authors:	shen yi wen and sun jun

Affiliation:	College of artificial intelligence and computer science, Jiangnan University,

Abstract:	Recently, deep neural network model has become a hot research object in the field of speech recognition. However, the deep neural network relies on huge parameters and computational overhead, the excessively large model size also increases the difficulty of its deployment on edge devices. Aiming at the above problems, this paper proposed a lightweight speech recognition model based on Transformer. This method used depthwise separable convolution to obtain the feature information. Secondly, this method constructed a two half-step feed-forward layers: Macaron-Net, and introduced the low-rank matrix factorization to realize the model compression. Finally, it used a sparse attention mechanism to improve the training speed and decoding speed of the model. It tested on the Aishell-1 and aidatang_200zh datasets. The experimental results show that compared with Open-Transformer, the word error rate of LM-Transformer has a relative drop of is 19.8% in word error rate and 32.1% in real time factor.

Keywords:	speech recognition Transformer low-rank matrix factorization lightweight convolution model compression sparse attention

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏