基于Conformer的端到端语音识别模型的压缩优化策略 Compression Optimization Strategy for End-to-End ASR Model Based on Conformer期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Conformer的端到端语音识别模型的压缩优化策略

引用本文：	桑江坤,努尔麦麦提·尤鲁瓦斯. 基于Conformer的端到端语音识别模型的压缩优化策略[J]. 信号处理, 2022, 38(12): 2639-2649. DOI: 10.16798/j.issn.1003-0530.2022.12.018

作者姓名：	桑江坤努尔麦麦提·尤鲁瓦斯

作者单位：	1.新疆大学信息科学与工程学院, 新疆乌鲁木齐 830046

基金项目：	国家自然科学基金62066043

摘要：	随着深度学习的兴起，端到端语音识别模型受到越来越多的关注。最近，基于Conformer框架的提出，使得端到端语音识别模型的性能得到进一步的提升，同时在语音识别领域也得到了广泛的应用。然而，这些端到端模型由于内存和计算需求较大，所以在资源有限的设备上部署和推理是受限的。该文为了保证模型精度损失较小的情况下，尽可能地减少模型的大小和计算量，分别采用了模型量化，基于权重通道的结构化剪枝以及奇异值分解等三种压缩优化策略，同时对模型量化进行了改进。探究了不同程度的压缩对模型精度损失所造成的影响。通过结合这些策略在不同设备进行了测试，相比于基线在其字错误率误差小于3%的情况下，模型推理识别的速度约提升3～4倍。
关键词：	深度学习端到端语音识别 Conformer 量化剪枝分解
收稿时间：	2022-06-01
Compression Optimization Strategy for End-to-End ASR Model Based on Conformer

Affiliation:	1.School of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China2.Multi-language Information Technology Laboratory of Xinjiang, Urumqi, Xinjiang 830046, China

Abstract:	? ?With the rise of deep learning， the end-to-end speech recognition model has received increasing attention. Currently， the performance of the end-to-end speech recognition model has been further updated on basis of on the proposal of the Conformer Framework， which has been widely used in the field of speech recognition. However， these models perform poorly on edge hardware due to large memory and computation requirements. Under the premise of ensuring that the loss of accuracy of the model is as small as possible， in order to reduce the size and calculation amount of the model as much as possible， three compression and optimization strategies are adopted， namely Model Quantization， Structured Pruning based on Weight Channels and Singular Value Decomposition. The model quantization has been improved simultaneously. Influence in varying degrees of compression on the loss of model accuracy is explored. Tests were carried out on different devices by combining these strategies. Comparing with the status quo of the baseline in which the Word Error Rate is less than 3%， the speed of model inference recognition is approximately 3~4 times faster.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏