首页 | 本学科首页   官方微博 | 高级检索  
     

鸡尾酒会问题与相关听觉模型的研究现状与展望
引用本文:黄雅婷,石晶,许家铭,徐波.鸡尾酒会问题与相关听觉模型的研究现状与展望[J].自动化学报,2019,45(2):234-251.
作者姓名:黄雅婷  石晶  许家铭  徐波
作者单位:1.中国科学院自动化研究所 北京 100190
基金项目:中国科学院战略性先导科技专项XDBS01070000国家自然科学基金61602479北京市科技重大专项Z181100001518006
摘    要:近些年,随着电子设备和人工智能技术的飞速发展,人机语音交互的重要性日益凸显.然而,由于干扰声源的存在,在鸡尾酒会等复杂开放环境下的语音交互技术远没有达到令人满意的程度.现阶段,开发一个具备较强自适应性和鲁棒性的听觉计算系统仍然是一件极具挑战性的任务.因此,鸡尾酒会问题的深入探索对智能语音处理领域中的说话人识别、语音识别、关键词唤醒等一系列重要任务都具有非常重要的研究意义和应用价值.本文综述了鸡尾酒会问题相关听觉模型研究的现状与展望.在简要介绍了听觉机理的相关研究,并概括了解决鸡尾酒会问题的多说话人语音分离相关计算模型之后,本文还讨论了受听觉认知机理启发的听觉注意建模方法,认为融入声纹记忆和注意选择的听觉模型在复杂的听觉环境下具有更好的适应性.之后,本文简单回顾了近期的多说话人语音识别模型.最后,本文讨论了目前各类计算模型用于处理鸡尾酒会问题时遇到的困难和挑战,并对未来的研究方向进行了展望.

关 键 词:鸡尾酒会问题    听觉模型    语音分离    听觉注意    语音识别
收稿时间:2018-10-18

Research Advances and Perspectives on the Cocktail Party Problem and Related Auditory Models
Affiliation:1.Institute of Automation, Chinese Academy of Sciences, Beijing 1001902.University of Chinese Academy of Sciences, Beijing 1000493.Center for Excellence in Brain Science and Intelligence Technology, CAS, Shanghai 200031
Abstract:With the rapid development of electronic devices and artificial intelligence technologies, speech-based human-machine interaction has become increasingly prominent in recent years. However, the performance of these technologies in open complex environments, such as in the cocktail parties, is far from satisfactory. It is still a very challenging task to develop a computational auditory system with strong adaptivity and robustness at present. Therefore, the in-depth exploration of cocktail party problem plays an important role in the tasks of the intellectual speech processing field, such as speaker recognition, speech recognition, keyword spotting and so on. This paper reviews the auditory models related to the cocktail party problem and their developments. We first briefly introduce some relevant hearing research and computational models attacking the multi-speaker speech separation task for solving the cocktail party problem. Then we discuss the auditory attention modeling method inspired by cognitive science. We believe that the auditory model integrated with the memory of voiceprint information and selective attention is more suitable for complex auditory environments. Afterwards, we briefly review current works of multi-speaker speech recognition. Finally, the difficulties and challenges that the current computational models are confronted with are discussed and we give some views on the future research.
Keywords:
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号