首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 609 毫秒
1.
人耳听觉系统在噪声环境中能够准确定位感兴趣的声源,实现其定位的主要因素是双耳时间差,但是在噪声环境下利用双耳时间差方法进行定位的效果比较差。针对这一问题,提出一个基于耳蜗核模型的声源定位系统。利用耳蜗核模型模拟耳蜗核对听觉信息的处理机制,提取听觉神经纤维中对语声刺激同步的信息和发放率信息,从而实现对噪声的抑制,完成噪声环境下的声源定位。该系统在噪声环境下定位的误差为1.297°。实验结果证明改进之后的声源定位系统能在噪声环境下进行声源定位。  相似文献   

2.
为有效解决现有语音信号压缩算法基本没有考虑人耳听觉特性,所划分出的子带与人耳听觉特性相去甚远,语音质量不高的问题,提出了一种基于不完全小波包分解的语音数据压缩算法.该算法充分考虑语音信号的特点以及人耳听觉特性,利用小波包合理分割子带,在每个子带内进行编码,并采用优化目标函数,作为选择最优小波基的评价函数,使划分出的子带更符合人耳听觉特性.实例仿真计算表明,该方法能取得较高的压缩率,压缩后恢复的语音信号具有良好的清晰度和自然度.  相似文献   

3.
人耳听觉系统能够在强噪声的环境下区分出自己感兴趣的语音,基于计算听觉场景分析(CASA)的基本原理,其重点和难点是找到合适的声音分离线索,完成目标语音信号和噪声信号的分离.针对单通道浊语音分离的问题,提出了一种以基音为线索的浊语音信号分离算法.在白噪声、鸡尾酒会噪声等六种噪声干扰条件下,通过仿真实验结果表明,相比于传统的谱减法,语音分离算法的输出信噪比平均提高了7.47 dB,并有效抑制了干扰噪声,改善了分离效果.  相似文献   

4.
听觉场景分析(AuditorySceneAnalysis,ASA)是一种模仿人的听觉特性实现对混叠声音信号进行分离的方法。作为ASA的基础研究,论文针对各种ASA系统存在的相近频带信号无法有效分离问题,提出了一种新的基于双基频多带激励分离模型的元音分离系统,该系统利用两语音信号的基音轨迹特性提取多带激励分离模型中两基频对应的语音的各次谐波参数,将两组参数代入多带激励合成模型从而得到两个分离的语音信号。文中给出了算法的原理与具体描述。仿真实验结果表明,系统对存在基音频率差异的元音信号能实现有效的分离。  相似文献   

5.
鸡尾酒会问题与相关听觉模型的研究现状与展望   总被引:2,自引:0,他引:2  
近些年,随着电子设备和人工智能技术的飞速发展,人机语音交互的重要性日益凸显.然而,由于干扰声源的存在,在鸡尾酒会等复杂开放环境下的语音交互技术远没有达到令人满意的程度.现阶段,开发一个具备较强自适应性和鲁棒性的听觉计算系统仍然是一件极具挑战性的任务.因此,鸡尾酒会问题的深入探索对智能语音处理领域中的说话人识别、语音识别、关键词唤醒等一系列重要任务都具有非常重要的研究意义和应用价值.本文综述了鸡尾酒会问题相关听觉模型研究的现状与展望.在简要介绍了听觉机理的相关研究,并概括了解决鸡尾酒会问题的多说话人语音分离相关计算模型之后,本文还讨论了受听觉认知机理启发的听觉注意建模方法,认为融入声纹记忆和注意选择的听觉模型在复杂的听觉环境下具有更好的适应性.之后,本文简单回顾了近期的多说话人语音识别模型.最后,本文讨论了目前各类计算模型用于处理鸡尾酒会问题时遇到的困难和挑战,并对未来的研究方向进行了展望.  相似文献   

6.
人耳听觉系统能够从嘈杂的环境中筛选出自己感兴趣的语音,基于计算听觉场景分析的方法,论文采用倒谱法提取语音基音周期轨迹,以连续的基音周期轨迹为线索,按基音频率的整数倍提取各次谐波的频谱,再通过傅里叶逆变换重构分离后的语音.实验表明,在几种典型噪音环境下,该方法能有效将目标语音从背景噪声中分离,信噪比(SNR)和评价意见分...  相似文献   

7.
陈斌杰  陆志华  周宇  叶庆卫 《计算机应用》2018,38(12):3643-3648
为了探究利用两个麦克风进行多声源分离和二维平面定位的可能性,提出了一种基于双麦克风的室内语音分离与声源定位系统。该系统根据麦克风采集的信号,建立了双麦克风时延-衰减模型,然后利用DUET算法估计了模型的时延-衰减参数,并绘制了参数直方图。在语音分离阶段,建立了二进制时频掩膜(BTFM),根据参数直方图,结合二值掩蔽的方法对混合语音进行了分离;在声源定位阶段,通过推导模型衰减参数与信号能量比之间的关系,得到了确定声源位置的数学方程组。利用Roomsimove工具箱模拟室内声学环境,通过Matlab仿真和几何坐标计算,在对多个声源目标分离的同时完成了二维平面中的定位。实验结果表明,该系统对多个声源信号的定位误差均在2%以下,有助于小型系统的研究和开发。  相似文献   

8.
针对现有的语音可懂度评价方法不能真实贴近人耳对语音的感知过程,提出一种基于人耳听觉特性的双谱特征预测语音可懂度评价(Gammatone-bspectral speech intelligibility metric, GBSIM)算法。充分利用双谱可以检测语音信号中的非线性相位耦合,抑制非高斯信号中的高斯噪声的特性,采用可以模拟人工耳蜗模型的Gammatone滤波器组,通过滤波处理将输入的语音信号分为32个听觉子频带,用三阶统计量对每个子频带的语音信号进行双谱估计并提取单一特征值来计算语音的可懂度。实例验证结果表明,该方法对信号失真变化敏感,其评价结果与主观评价具有很高的相关度,相对于传统的语音可懂度评价算法具有更好的评价效果。  相似文献   

9.
对于低信噪比环境下的语音信号,传统谱减法残留的背景噪声较大。针对该问题,基于听觉掩蔽效应提出一种改进的语音增强算法。将人耳听觉掩蔽特性与功率谱减法相结合,设计一种时域递归平均算法对噪声进行估计,同时对带噪语音信号做频谱相减处理,从听觉的角度出发,利用估计的语音信号功率谱计算掩蔽阈值,并引入谱减功率修正系数和谱减噪声系数,实现带噪语音的信号增强。利用Matlab 2012b进行仿真,实验结果表明,该算法在低信噪比条件下能够较好地抑制背景噪声,改善语音质量,且与改进自适应滤波算法相比,其输出信号的信噪比可提高5%左右。  相似文献   

10.
针对许多基于训练模型的计算机听觉场景分析系统,在解决双说话人混合语音信号分离时需要依赖样本训练的有效性以及说话人的先验知识,提出一种基于聚类的单声道混合语音分离系统。系统先利用多基音跟踪算法对语音信号进行分析并产生同时流,然后通过最大化类内散布矩阵与类间散布矩阵的迹,搜索同时流的最佳分类,最终完成对双说话人的语音分离。该系统不需要训练语音模型,并且有效地改善了在双说话人混合语音信号的分离效果,为双说话人的语音分离提供了新的思路。  相似文献   

11.
指出了盲源分离自适应算法之间的联系,在满足多种性能择优标准前提下,引入了改进的非线性函数,该函数有效地实现了语音信号的盲分离,同时也提高了算法的收敛速度,实验表明该方法能够更快速地分离混迭语音。  相似文献   

12.
There is evidence that people react more positively when they are presented with faces that are consistent with their voices. Nass and Brave [2005]. Wired for speech: How voice Activates and Advances the Human–computer Relationship. MIT Press, Cambridge, MA] found that computerized and human faces were perceived more positively when paired, respectively, with synthesized versus human voices than when paired with inconsistent voices. The present study sought to examine whether this type of inconsistency would effect perceptions of persuasive messages delivered by humans versus computers. We created a situation in which reactions to computer synthesized speech were compared to human speech when the speech was either from a person or a computer. This paper presents two studies, one using audio taped stimuli and one using videotaped stimuli, with type of speech (human versus computer synthesized) manipulated factorially with source (person versus computer). As hypothesized, both studies suggest that in the human as source condition, human voice is perceived more favorably than synthetic voice. However, in the computer as source condition, both human and computer voice were rated similarly. We discuss these findings in terms of consistency as well as group processes effects that may be occurring.  相似文献   

13.
严发鑫  徐岩  汤旻安 《测控技术》2019,38(9):103-107
语音信号在非平稳系统中是动态混合的,为了实时抑制盲源分离过程中的非平稳混合扰动,加快收敛速度,减小稳态误差,提出了一种应用PID控制原理的自适应盲源分离算法。依据一种无预处理的自适应盲源分离算法建立PID控制模型,调节学习速率,跟踪语音信号的分离过程,实时减小由非平稳混合引入的分离误差,动态更新分离矩阵。在混合矩阵缓变和突变两种情形下分别对PID参数整定和语音信号的分离进行仿真分析,结合经典算法对比提出算法的性能。仿真与对比结果表明,提出的算法适用于非平稳混合系统语音信号的分离,算法性能较经典算法有改善。  相似文献   

14.
Can synthetic speech be utilized in foreign language learning as natural speech? In this paper, we evaluated synthetic speech from the viewpoint of learners in order to find out an answer. The results pointed out that learners do not recognize remarkable differences between synthetic voices and natural voices for the words with short vowels and long vowels when they try to understand the meanings of the sounds. The data explicates that synthetic voice utterances of sentences are easier to understand and more acceptable by learners compared to synthetic voice utterances of words. In addition, the ratings on both synthetic voices and natural voices strongly depend upon the learners’ listening comprehension abilities. We conclude that some synthetic speech with specific pronunciations of vowels may be suitable for listening materials and suggest that evaluating TTS systems by comparing synthetic speech with natural speech and building a lexical database of synthetic speech that closely approximates natural speech will be helpful for teachers to readily use many existing CALL tools.  相似文献   

15.
This paper presents a technique to transform high-effort voices into breathy voices using adaptive pre-emphasis linear prediction (APLP). The primary benefit of this technique is that it estimates a spectral emphasis filter that can be used to manipulate the perceived vocal effort. The other benefit of APLP is that it estimates a formant filter that is more consistent across varying voice qualities. This paper describes how constant pre-emphasis linear prediction (LP) estimates a voice source with a constant spectral envelope even though the spectral envelope of the true voice source varies over time. A listening experiment demonstrates how differences in vocal effort and breathiness are audible in the formant filter estimated by constant pre-emphasis LP. APLP is presented as a technique to estimate a spectral emphasis filter that captures the combined influence of the glottal source and the vocal tract upon the spectral envelope of the voice. A final listening experiment demonstrates how APLP can be used to effectively transform high-effort voices into breathy voices. The techniques presented here are relevant to researchers in voice conversion, voice quality, singing, and emotion.  相似文献   

16.
17.
Probabilistic approaches can offer satisfactory solutions to source separation with a single channel, provided that the models of the sources match accurately the statistical properties of the mixed signals. However, it is not always possible to train such models. To overcome this problem, we propose to resort to an adaptation scheme for adjusting the source models with respect to the actual properties of the signals observed in the mix. In this paper, we introduce a general formalism for source model adaptation which is expressed in the framework of Bayesian models. Particular cases of the proposed approach are then investigated experimentally on the problem of separating voice from music in popular songs. The obtained results show that an adaptation scheme can improve consistently and significantly the separation performance in comparison with nonadapted models.  相似文献   

18.
《Advanced Robotics》2013,27(1-2):105-120
We developed a three-dimensional mechanical vocal cord model for Waseda Talker No. 7 (WT-7), an anthropomorphic talking robot, for generating speech sounds with various voice qualities. The vocal cord model is a cover model that has two thin folds made of thermoplastic material. The model self-oscillates by airflow exhausted from the lung model and generates the glottal sound source, which is fed into the vocal tract for generating the speech sound. Using the vocal cord model, breathy and creaky voices, as well as the modal (normal) voice, were produced in a manner similar to the human laryngeal control. The breathy voice is characterized by a noisy component mixed with the periodic glottal sound source and the creaky voice is characterized by an extremely low-pitch vibration. The breathy voice was produced by adjusting the glottal opening and generating the turbulence noise by the airflow just above the glottis. The creaky voice was produced by adjusting the vocal cord tension, the sub-glottal pressure and the vibration mass so as to generate a double-pitch vibration with a long pitch interval. The vocal cord model used to produce these voice qualities was evaluated in terms of the vibration pattern as measured by a high-speed camera, the glottal airflow and the acoustic characteristics of the glottal sound source, as compared to the data for a human.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号