首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 453 毫秒
1.
针对目前在近场语音环境中语音增强系统噪声干扰大、难以清晰还原信号等局限性,设计并实现一种基于麦克风阵列的新型语音增强系统。系统硬件设计新颖,采用低功耗设计,最大支持12路语音信号同时采集,同时对噪声的处理采用自适应差分脉冲编码调制和卡尔曼滤波算法相结合分开处理的新型方式。通过Matlab仿真和封闭近场环境下的实际测试,结果表明该系统性能稳定可靠,可以提高语音清晰度,实现短距离的多路语音增强。  相似文献   

2.
数字助听器系统中的回声消除方法   总被引:1,自引:0,他引:1  
提出一种数字助听器回声消除方法,该方法通过引进预滤波单元,解决了由于前向路径存在而导致数字助听器系统受话器输出信号与麦克风输入信号存在相关性的问题,进而保证了对回声路径的自适应估计算法收敛于无偏估计。提出了数字助听器回声路径估计的自适应次梯度投影算法,相比传统的NLMS自适应算法收敛速度更快,收敛精度更高,对噪声鲁棒性强。使用白噪声和真实语音信号的仿真实验证明:对于长的房间回声路径或短的助听器回声路径,该算法都能快速准确地收敛到正确路径。  相似文献   

3.
为了提高语音端点检测硬件电路的实时性,提出了一种基于短时幅度变化的语音端点检测方法。根据该检测方法,使用赛灵思公司的数字处理系统开发工具实现了端点检测硬件电路的设计。设计中采用了查找表和有限状态机等方法弥补了数字处理系统开发工具中元件库不足的缺点,从而简化了硬件设计,提高了设计的速度。通过协同仿真和现场可编程门阵列元件下载验证表明,该方法可以有效地检测出语音信号的起止点,硬件电路的设计满足系统有效性和准确性的要求。  相似文献   

4.
基于神经网络的语音增强算法相比于传统方法具有更好的语音增强效果,但因网络规模大导致其难以实时实施于助听器中.对此,本文提出了一种具有低复杂度的循环神经网络用于增强双耳语音.该算法结合双耳语音提取梅尔频率倒谱系数和耳间相位差作为网络的输入特征,在使用双耳语音振幅信息的基础上结合语音空间线索更好的映射目标语音和带噪语音之间的关系.实验结果表明,与助听器常用算法相比,本文算法的双耳平均信噪比和短时语音可懂度平均提高了4.68 dB和4.5%;与基于神经网络的算法相比,本文算法的双耳平均信噪比和短时语音可懂度分别提高了1.63 dB和4.8%.当时钟频率为10 MHz时,本文网络的硬件设计需要4.2 ms左右的处理时间,可以满足助听器的实时需求.  相似文献   

5.
针对基于神经网络的语音增强算法难以部署在助听器中的问题,本文基于循环神经网络,提出了一种低延迟、低复杂度的双麦克风语音增强算法。该算法利用两个麦克风做空域滤波初步去除非期望方向噪声,并进一步通过循环神经网络得到纯净语音信号。为了解决助听器中全相位滤波器组阶数较多而引起群延迟较大的问题,本文创新性地提出一种分段式滤波器组,在保证性能的同时有效减少了阶数。仿真结果显示,该滤波器组在16K采样率下的群延迟为3.125 ms,在0 dB 的babble、volvo、factory1环境下,该语音增强算法的SNR平均提升了10.5565 dB,PESQ平均提升了0.6787。实际测试结果中,SNR平均提升了9.4394 dB,PESQ平均提升了0.7350。当DSP系统时钟频率为104 MHz时,助听器经过的系统延迟为8.4 ms,功耗为6.2 mA,可以很好满足助听器高续航的需求。  相似文献   

6.
基于DSP的小型数字语音通信平台设计   总被引:4,自引:2,他引:2  
为满足有效可靠的水声语音通信需求,针对语音信号和水下信道的特性,以TI公司的DSP芯片TMS320VC5416和DVSI公司的AMBE-2000芯片为基础,设计了一种适用于水下的小型数字语音通信系统方案,建立了通信系统的整体硬件结构,并对各关键功能模块进行了性能调试和分析;通过对系统的测试和性能分析,硬件系统运行稳定可靠,为实现水声语音通信搭建了良好的硬件平台。  相似文献   

7.
在噪声环境下双麦克风语音增强应用中,由于麦克风之间存在交叉串扰,传统自适应对消算法降噪性能受到极大的哪影响.为了提高降噪系统性能,提出了一种基于神经元网络双麦克风自适应抗交叉串扰语音增强方法.该方法通过设置两级自适应算法,消除了麦克风之间的交叉串扰,其中自适应算法均采用神经元网络对消方法.算法仿真基础上,运用DSP制作了实时降噪处理样机.测试结果表明,采用新方法后,系统噪声抑制性能得到了很大的提高.  相似文献   

8.
张历卓  贾维嘉  曹慧玲 《计算机应用》2010,30(10):2825-2827
针对多方会议的实际应用需求,同时兼顾PDA等小设备的个性特征,提出一种新颖而简单的快速实时自适应跨平台多方会议方案。该方案采用概率决策优先权的方式,即各客户端根据语音能量值和编码后帧长度计算其语音概率值,服务器由语音概率值决策出当前发言者的语音流,并使用叠加原理将选出的多路流进行混音,最后转发混音后的语音包。该方案弥补了PDA等小设备计算能力弱的缺陷,同时又降低了服务器进行混音操作的运算量。实验结果表明该方案具有算法复杂度低、听觉主观效果好、易在PDA以及手机等硬件设备上实现等特点,可广泛应用在多媒体多方会议跨平台系统的实现中。  相似文献   

9.
双麦克风噪声抵消应用中,由于交叉串的存在,传统自适应算法降噪性能受到很大的影响。为了提高双麦克风算法降噪性能,使用两级自适应滤波系统消除交叉串扰问题。为提高自适应滤波器收敛性能,采用主从结构LMS算法自适应调节步长因子。同时为了适合窄带处理算法,将输入信号进行子带分析预处理,对每个子带独立进行抗交叉串绕自适应处理,将各子带增强信号合并得到增强语音信号。实验结果表明,该方消噪量大,语音损伤小,语音增强效果显著。  相似文献   

10.
使用深度学习的语音增强技术能够提升听者的言语识别率,但因神经网络的规模较大难以应用于边缘设备中。因此,本文提出了一种可用于助听器等边缘设备的循环神经网络语音增强加速器。该加速器将神经网络的计算用独立矩阵乘法硬件实现,并在多层神经网络的层之间实现硬件级的流水操作,通过并行和流水降低了计算延时。实验表明,与带噪语音相比,在volvo、factory2、babble噪声环境下,本算法的信噪比分别平均提升了17.302dB、8.412dB、4.732dB;短时语音可懂度分别平均提高了1.4%、0.8%和0.4%;语音质量感知评估平均提高了1.498、0.504和0.234;这三项指标均高于本文所对比的传统语音增强算法与神经网络算法。当时钟频率为10 MHz时,加速器的处理延时为9.2 ms,可以满足边缘侧应用的实时性需求。  相似文献   

11.
基于DSP技术的多路语音实时采集与压缩处理系统   总被引:1,自引:0,他引:1  
介绍一个多路语音实时采集与压缩处理系统。该系统基于 PC- ISA总线结构 ,最大的特点是通过单片 DSP高性能价格比实时地实现了多达 10路的语音采集和 10路语音实时压缩及一路语音解压处理。该系统已成功应用于某语音记录设备中。  相似文献   

12.
一种基于子带处理的PAC说话人识别方法研究   总被引:1,自引:1,他引:0  
目前,说话人识别系统对于干净语音已经达到较高的性能,但在噪声环境中,系统的性能急剧下降.一种基于子带处理的以相位自相关(PAC)系数及其能量作为特征的说话人识别方法,即宽带语音信号经Mel滤波器组后变为多个子带信号,对各个子带数据经DCT变换后提取PAC系数作为特征参数,然后对每个子带分别建立HMM模型进行识别,最后在识别概率层中将HMM得出的结果相结合之后得到最终的识别结果.实验表明,该方法在不同信噪比噪声和无噪声情况下的识别性能都有很大提高.  相似文献   

13.
情感语音合成是情感计算和语音信号处理研究的热点之一,进行准确的语音情感分析是合成高质量情感语音的前提.文中采用PAD情感模型作为情感分析量化模型,对情感语料库中的语音进行情感分析和聚类,获得各情感PAD参数模型.由HMM语音合成系统合成的情感语音,通过PAD模型进行参数修正,使得合成语音的情感参数更加准确,从而提高情感语音合成的质量.实验表明该方法能较好地提高合成语音的自然度和情感清晰度,在同性别不同说话人中也能达到较好的性能.  相似文献   

14.
用于车载无线终端的嵌入式语音处理系统   总被引:2,自引:0,他引:2  
刘志  刘加  刘润生 《计算机工程》2005,31(6):182-183,202
介绍了一个用于汽车环境的无线终端中利用语音技术进行语音识别拨号、语音合成和语音提示的系统.系统包括两个主要的模块:语音处理模块和蓝牙通信模块.其中蓝牙通信模块的功能是与具有蓝牙接口的手机进行通信,包括连接手机进行通话,下载手机内的电话号码本并传送给语音处理模块;语音处理模块完成语音识别、语音合成、语音提示、利用语音压缩编解码进行通话录放以及号码查询等功能,并控制整个系统的流程.该系统可以实现对手机电活本的下载并在线生成识别词表,识别词表容量可达1000词;在600词情况下的实验结果表明系统的识别率大于97%;系统基于SoC架构,具有高集成度和高稳定性的特点.  相似文献   

15.
Since it is impractical to prerecord human speech for dynamic content such as email messages and news, many commercial speech applications use recorded human speech for fixed content (e.g. system prompts) and synthetic speech for dynamic content. However, mixing human speech and synthetic speech may not be optimal from a consistency perspective. A two-condition between-participants experiment (N = 24) was conducted to compare two versions of a telephony application for Personal Information Management (PIM). In the first condition, all the system output was delivered with synthetic speech. In the second condition, users heard a mix of human speech and synthetic speech. Users managed several email and calendar tasks. Users' task performance was rated by two independent judges. Their self-ratings of task performance and attitudinal responses were also measured by means of questionnaires. Users interacting with the interface that used only synthetic speech performed the task significantly better, while users interacting with the mixed-speech interface thought they did better and had more positive attitudinal responses. A consistency framework drawn from human psychological processing is offered to explain the difference in task performance. Cognitive processing and attitudinal response are differentiated. Design implications and directions for future research are suggested.  相似文献   

16.
A normal synchronous time multiplexed system has low capacity utilisation of a trunk channel, owing to the ON-OFF nature of speech in human conversation. A time assignment speech interpolation (TASI) system nearly doubles this efficiency, at relatively small increase in hardware cost. In TASI, during the silent period of speech, the channel is allotted to some other ‘active’ subscriber on a first-come-first-serve basis. A microprocessor based system offers a very cost effective solution from hardware count. Intel 8085 A has been selected for the purpose. The microprocessor functions in a distributed processing mode together with the main CPU controlling the stored program exchange. The system uses digital dynamic speech detectors for detecting transitions in speech over a channel, as they show considerably superior performance over amplitude detectors. An assembly language program for the system has been developed.  相似文献   

17.
Monaural speech separation is a very challenging problem in speech signal processing. It has been studied extensively, and many separation systems based on computational auditory scene analysis (CASA) have been proposed in the last two decades. Although the research on CASA has tended to introduce high-level knowledge into separation processes using primitive data-driven methods, the knowledge on speech quality still has not been combined with it. This makes the performance evaluation of CASA mainly focused on the signal-to-noise ratio (SNR) improvement. Actually, the quality of the separated speech is not directly related to its SNR. In order to solve this problem, we propose a new method which combines CASA with objective quality assessment of speech (OQAS). In the grouping process of CASA, we use OQAS as the guide to instruct the CASA system. With this combination, the performance of the speech separation can be improved not only in SNR, but also in mean opinion score (MOS). Our system is systematically evaluated and compared with previous systems, and it yields substantially better performance, especially for the subjective perceptual quality of separated speech.  相似文献   

18.
A speech signal processing system using multi-parameter model bidirectional Kalman filter has been proposed in this paper. Conventional unidirectional Kalman filter usually performs estimation of current state speech signal by processing the time varying autoregressive model of speech signals from the past time states. A bidirectional Kalman filter utilizes the past and future measurements to estimate the current state of a speech signal that minimize the mean of the squared error using efficient recursive means. The matrices involved in the difference equations and the measurement equations of the bidirectional Kalman filter algorithm are kept constant throughout the process. With multi-parameter model, the proposed bidirectional Kalman filter relates more measurements from the future and past time states to the current time state. The proposed multi-parameter bidirectional Kalman filter has been implemented into a speech recognition system and its performance has been compared to other conventional speech processing algorithms. Compared to the single-parameter model bidirectional Kalman filter, the multi-parameter bidirectional Kalman filter improves the accuracy in the state prediction, reduces the speech information lost after the filtering process and better word error rate has been achieved at high SNR regions (clean, 20, 15, 10 dB).  相似文献   

19.
The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号