首页 | 本学科首页   官方微博 | 高级检索  
     

基于中文发音视觉特点的唇语识别方法研究
引用本文:何珊,袁家斌,陆要要.基于中文发音视觉特点的唇语识别方法研究[J].计算机工程与应用,2022,58(4):157-162.
作者姓名:何珊  袁家斌  陆要要
作者单位:1.南京航空航天大学 计算机科学与技术学院,南京 211106 2.南京航空航天大学 信息化处,南京 211106
基金项目:南京市产学研合作后补助项目计划(201722025)。
摘    要:随着深度学习的发展,唇语识别技术在英文方面取得了长足的进步,但针对中文无论是在数据集丰富性还是识别准确率上均存在一定的落差.通过分析中文发音的视觉特点,提出"视觉拼音",意图规避中文在视觉表达上的歧义性.为了验证视觉拼音的有效性,建立了中文句子级唇语识别模型CHSLR-VP.该模型是一个端到端结构,其中以视觉拼音为媒介...

关 键 词:唇语识别  视觉拼音  深度学习  卷积神经网络(CNN)  序列到序列模型  注意机制

Research on Lip Reading Based on Visual Characteristics of Chinese Pronunciation
HE Shan,YUAN Jiabin,LU Yaoyao.Research on Lip Reading Based on Visual Characteristics of Chinese Pronunciation[J].Computer Engineering and Applications,2022,58(4):157-162.
Authors:HE Shan  YUAN Jiabin  LU Yaoyao
Affiliation:1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China 2.Information Department, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Abstract:With the development of deep learning, lip reading has made great progress in English. However, there is a large gap in both the richness of dataset and the accuracy of recognition in Chinese. According to the visual characteristic of Chinese pronunciation, this paper proposes “visual pinyin” to avoid the ambiguity of Chinese visual expression. Then, in order to verify the effectiveness of visual pinyin, a Chinese sentence-level lip reading model CHSLR-VP is established. This model is an end-to-end structure, in which visual pinyin is used as a medium to convert video frames into Chinese characters. Through experiments, CHSLR-VP performs better than other prior methods, which proves that visual pinyin can significantly improve the accuracy of Chinese lip reading. It can provide a benchmark for future related work.
Keywords:lip reading  visual pinyin  deep learning  convolutional neural networks(CNN)  sequence-to-sequence model  attention mechanism
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号