首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 187 毫秒
1.
杨海滨  张军 《计算机应用研究》2010,27(11):4025-4031
语音分离是实现机器听觉的一个重要而基础性的任务,单通道语音分离是语音分离中最为困难的问题。讨论了基于模型的单通道语音分离方法,对说话人依赖的、说话人选择的和说话人独立的三类单通道语音分离问题展开分析,并指出当前方法存在的问题和影响算法性能的关键因素。最后对基于模型的单通道语音分离研究发展方向进行了展望。  相似文献   

2.
关勇  李鹏  刘文举  徐波 《自动化学报》2009,35(4):410-416
传统抗噪算法无法解决人声背景下语音识别(Automatic speech recognition, ASR)系统的鲁棒性问题. 本文提出了一种基于计算听觉场景分析(Computational auditory scene analysis, CASA)和语者模型信息的混合语音分离系统. 该系统在CASA框架下, 利用语者模型信息和因子最大矢量量化(Factorial-max vector quantization, MAXVQ)方法进行实值掩码估计, 实现了两语者混合语音中有效地分离出目标说话人语音的目标, 从而为ASR系统提供了鲁棒的识别前端. 在语音分离挑战(Speech separation challenge, SSC)数据集上的评估表明, 相比基线系统, 本文所提出的系统的语音识别正确率提高了15.68%. 相关的实验结果也验证了本文提出的多语者识别和实值掩码估计的有效性.  相似文献   

3.
基于听觉模型的语音特征提取   总被引:2,自引:1,他引:1  
分析了耳蜗对声音频率分解机理,毛细胞和听神经的能量转换机理以及中枢神经系统的侧抑制机理,分别在以上三个层次上建立了数学模型,并提取出识别语音特征参数。通过对听觉谱和LPC倒谱对比分析,得到了听觉谱适宜用作语音识别并具有良好的噪声鲁棒性的结论。听觉谱和LPC倒谱对比实验分析也反映了听觉模型特征的优良性能。  相似文献   

4.
人耳听觉系统能够在强噪声的环境下区分出自己感兴趣的语音,基于计算听觉场景分析(CASA)的基本原理,其重点和难点是找到合适的声音分离线索,完成目标语音信号和噪声信号的分离.针对单通道浊语音分离的问题,提出了一种以基音为线索的浊语音信号分离算法.在白噪声、鸡尾酒会噪声等六种噪声干扰条件下,通过仿真实验结果表明,相比于传统的谱减法,语音分离算法的输出信噪比平均提高了7.47 dB,并有效抑制了干扰噪声,改善了分离效果.  相似文献   

5.
稳健语音识别技术研究   总被引:4,自引:0,他引:4  
文章在简单叙述稳健语音识别技术产生的背景后,着重介绍了现阶段国内外有关稳健语音识别的主要技术、研究现状及未来发展方向。首先简述了引起语音质量恶化、影响语音识别系统稳健性的干扰源。然后介绍了抗噪语音特征的提取、声学预处理、麦克风阵列及基于人耳的听觉处理等技术路线及发展现状。最后讨论了稳健语音识别技术未来的发展方向。  相似文献   

6.
本文提出了一个基于心理声学理论和实验的听觉感知模型,它模拟了人对声音响度的听觉感知特征。该模型可在数字信号处理器(DSP)或计算机上实现,模型的输出参数已被用于语音识别。实验表明,用该模型参数表示语音信号可在环境有噪声的情况下保持较高的识别率。  相似文献   

7.
语音不仅包含说话人所要表达的语义信息,也蕴含着说话人所要表达的情感信息.语音情感识别是人机情感交互的关键,对语音情感的有效识别能够提升语音可懂度,使各种智能设备最大限度理解用户意图,提高机器人性化水平,从而更好地为人类服务.采用文献研究法从语音情感语料库、语音情感特征提取、语音情感模型的构建以及语音情感识别的应用等方面对其研究现状和进展进行了综述;同时,对其未来发展趋势也进行了展望.旨在尽可能全面地对语音情感识别技术进行详细分析,为相关研究人员提供有价值的学术参考.  相似文献   

8.
针对许多计算听觉场景分析系统无法很好地解决多说话人混合语音信号分离的问题,提出了一种基于多基音跟踪的单声道混合语音分离系统。该系统充分利用了多基音跟踪研究的最新成果,通过将多基音跟踪得到的目标语音和干扰语音的基音轨迹信息结合到分离系统中,有效地改善了分离系统在包括多说话人混合在内的多种干扰情况下的分离效果,为多说话人语音分离问题的解决提供了新的思路。  相似文献   

9.
语音是人机交互方式之一,语音识别技术是人工智能的重要组成部分.近年来神经网络技术在语音识别领域的应用快速发展,已经成为语音识别领域中主流的声学建模技术.然而测试条件中目标说话人语音与训练数据存在差异,导致模型不适配的问题.因此说话人自适应(SA)方法是为了解决说话人差异导致的不匹配问题,研究说话人自适应方法成为语音识别领域的一个热门方向.相比传统语音识别模型中的说话人自适应方法,使用神经网络的语音识别系统中的自适应存在着模型参数庞大,而自适应数据量相对较少等特点,这使得基于神经网络的语音识别系统中的说话人自适应方法成为一个研究难题.首先回顾说话人自适应方法的发展历程和基于神经网络的说话人自适应方法研究遇到的各种问题,其次将说话人自适应方法分为基于特征域和基于模型域的说话人自适应方法并介绍对应原理和改进方法,最后指出说话人自适应方法在语音识别中仍然存在的问题及未来的发展方向.  相似文献   

10.
智能语音技术包含语音识别、自然语言处理、语音合成三个方面的内容,其中语音识别是实现人机交互的关键技术,识别系统通常需要建立声学模型和语言模型。神经网络的兴起使声学模型数量急剧增加,基于神经网络的声学模型与传统识别模型相结合的方式,极大地推动了语音识别的发展。语音识别作为人机交互的前端,具有许多研究方向,文中着重对语音识别任务中的文本识别、说话人识别、情绪识别三个方向的声学模型研究现状进行归纳总结,尽可能对语音识别技术的演化进行细致介绍,为以后的相关研究提供有价值的参考。同时对目前语音识别的主流方法进行概括比较,介绍了端到端的语音识别模型的优势,并对发展趋势进行分析展望,最后提出当前语音识别任务中面临的挑战。  相似文献   

11.
The cocktail party problem, i.e., tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously, is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition (ASR) systems. In this overview paper, we review the techniques proposed in the last two decades in attacking this problem. We focus our discussions on the speech separation problem given its central role in the cocktail party environment, and describe the conventional single-channel techniques such as computational auditory scene analysis (CASA), non-negative matrix factorization (NMF) and generative models, the conventional multi-channel techniques such as beamforming and multi-channel blind source separation, and the newly developed deep learning-based techniques, such as deep clustering (DPCL), the deep attractor network (DANet), and permutation invariant training (PIT). We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment. We argue effectively exploiting information in the microphone array, the acoustic training set, and the language itself using a more powerful model. Better optimization objective and techniques will be the approach to solving the cocktail party problem.  相似文献   

12.
We present a new approach to the cocktail party problem that uses a cortronic artificial neural network architecture (Hecht-Nielsen, 1998) as the front end of a speech processing system. Our approach is novel in three important respects. First, our method assumes and exploits detailed knowledge of the signals we wish to attend to in the cocktail party environment. Second, our goal is to provide preprocessing in advance of a pattern recognition system rather than to separate one or more of the mixed sources explicitly. Third, the neural network model we employ is more biologically feasible than are most other approaches to the cocktail party problem. Although the focus here is on the cocktail party problem, the method presented in this study can be applied to other areas of information processing.  相似文献   

13.
The cocktail party problem   总被引:4,自引:0,他引:4  
Haykin S  Chen Z 《Neural computation》2005,17(9):1875-1902
This review presents an overview of a challenging problem in auditory perception, the cocktail party phenomenon, the delineation of which goes back to a classic paper by Cherry in 1953. In this review, we address the following issues: (1) human auditory scene analysis, which is a general process carried out by the auditory system of a human listener; (2) insight into auditory perception, which is derived from Marr's vision theory; (3) computational auditory scene analysis, which focuses on specific approaches aimed at solving the machine cocktail party problem; (4) active audition, the proposal for which is motivated by analogy with active vision, and (5) discussion of brain theory and independent component analysis, on the one hand, and correlative neural firing, on the other.  相似文献   

14.
通过对语音识别技术的发展梳理,简单介绍了语音识别的历史和应用现状,并将传统语音识别的技术和当前的研究进展进行描述.传统语音识别采用基于统计的方法,采用声谱特征,在GMM-HMM混合结构上进行训练和匹配.当前的语音识别模型主要基于深度学习的方法,采用CNN、RNN都可以有效的进行特征提取从而建立声学模型.进一步的研究采用了端到端的技术,避免了多个模型间的误差传导.端到端技术主要有CTC技术和attention技术,最新的模型和方法着重研究了attention技术,并在尝试进行与CTC的融合以达到更好的效果.最后结合作者自身的理解,概括了语音识别当前所面临问题和未来发展方向.  相似文献   

15.
In this paper, a set of features derived by filtering and spectral peak extraction in autocorrelation domain are proposed. We focus on the effect of the additive noise on speech recognition. Assuming that the channel characteristics and additive noises are stationary, these new features improve the robustness of speech recognition in noisy conditions. In this approach, initially, the autocorrelation sequence of a speech signal frame is computed. Filtering of the autocorrelation of speech signal is carried out in the second step, and then, the short-time power spectrum of speech is obtained from the speech signal through the fast Fourier transform. The power spectrum peaks are then calculated by differentiating the power spectrum with respect to frequency. The magnitudes of these peaks are then projected onto the mel-scale and pass the filter bank. Finally, a set of cepstral coefficients are derived from the outputs of the filter bank. The effectiveness of the new features for speech recognition in noisy conditions will be shown in this paper through a number of speech recognition experiments.A task of multi-speaker isolated-word recognition and another one of multi-speaker continuous speech recognition with various artificially added noises such as factory, babble, car and F16 were used in these experiments. Also, a set of experiments were carried out on Aurora 2 task. Experimental results show significant improvements under noisy conditions in comparison to the results obtained using traditional feature extraction methods. We have also reported the results obtained by applying cepstral mean normalization on the methods to get robust features against both additive noise and channel distortion.  相似文献   

16.
17.
深度语音信号与信息处理:研究进展与展望   总被引:1,自引:0,他引:1  
论文首先对深度学习进行简要的介绍,然后就其在语音信号与信息处理研究领域的主要研究方向,包括语音识别、语音合成、语音增强的研究进展进行了详细的介绍。语音识别方向主要介绍了基于深度神经网络的语音声学建模、大数据下的模型训练和说话人自适应技术;语音合成方向主要介绍了基于深度学习模型的若干语音合成方法;语音增强方向主要介绍了基于深度神经网络的若干典型语音增强方案。论文的最后我们对深度学习在语音信与信息处理领域的未来可能的研究热点进行展望。  相似文献   

18.
Recently, there is a significant increase in research interest in the area of biologically inspired systems, which, in the context of speech communications, attempt to learn from human's auditory perception and cognition capabilities so as to derive the knowledge and benefits currently unavailable in practice. One particular pursuit is to understand why the human auditory system generally performs with much more robustness than an engineering system, say a state-of-the-art automatic speech recognizer. In this study, we adopt a computational model of the mammalian central auditory system and develop a methodology to analyze and interpret its behavior for an enhanced understanding of its end product, which is a data-redundant, dimension-expanded representation of neural firing rates in the primary auditory cortex (A1). Our first approach is to reinterpret the well-known Mel-frequency cepstral coefficients (MFCCs) in the context of the auditory model. We then present a framework for interpreting the cortical response as a place-coding of speech information, and identify some key advantages of the model's dimension expansion. The framework consists of a model of ldquosourcerdquo-invariance that predicts how speech information is encoded in a class-dependent manner, and a model of ldquoenvironmentrdquo-invariance that predicts the noise-robustness of class-dependent signal-respondent neurons. The validity of these ideas are experimentally assessed under existing recognition framework by selecting features that demonstrate their effects and applying them in a conventional phoneme classification task. The results are quantitatively and qualitatively discussed, and our insights inspire future research on category-dependent features and speech classification using the auditory model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号