首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
结合ICA预处理的麦克风阵列语音增强系统   总被引:1,自引:0,他引:1  
在强背景噪声和强反射环境中,麦克风阵元接收的信号质量很差,从而影响麦克风阵列语音增强系统的性能.ICA能够仅从现测信号中提取出潜在的独立成分,基于此特性,本文将ICA引入麦克风阵列语音增强系统,利用ICA对麦克风阵元接收信号进行分析,从中提取出较纯净的目标语音作为系统的输入信号.ICA预处理可以有效抑制背景噪声和回声,提高输入信号的质量.真实环境中的实验表明,ICA预处理能够显著改善麦克风阵列语音增强系统的性能.  相似文献   

2.
基于TMS320DM642麦克风阵列声源定位系统   总被引:1,自引:0,他引:1  
李致金  乔杰 《测控技术》2011,30(1):35-38
麦克风声源定位是利用麦克风阵列拾取语音信号,并用数字信号处理技术对其进行分析和处理的声源定位技术.在麦克风阵列声源定位中,语音信号端点的拾取是重要的环节.语音端点检测是对接收到的信号利用端点检测算法分析,以确认麦克风阵列中语音信号到达的端点;并利用麦克风阵列中各麦克风接收到的语音信号的端点的先后,计算出麦克风阵列接收的...  相似文献   

3.
麦克风阵列及其消噪性能研究   总被引:2,自引:0,他引:2  
杨毅  杨宇  余达太 《计算机工程》2006,32(2):191-193
用麦克风阵列进行语音处理的方法可以提高信噪比,解决环境噪声,回声和混响引起的语音识别性能降低的问题,麦克风阵列系统是由一组按一定几何结构摆放的麦克组成的系统。此阵列能接收空间传播信号,经过适当的信号处理,提取所需的信号源和信号属性等信息。使用该系统可大大提高强干扰环境下的语音识别性能。  相似文献   

4.
何培宇  刘开文 《测控技术》2004,23(Z1):206-207,211
本文构建了基于麦克风阵列的语音盲信号分离DSP实验平台,重点介绍了麦克风阵列的时分多路以及DMA结合McBSP的数据双向传输技术.  相似文献   

5.
用麦克风阵列进行语音处理的方法可以提高信噪比,解决环境噪声、回声和混响引起的语音识别性能降低的问题.介绍基于延迟-累加方法(传统波束法) 、自适应波束法及基于后置自适应滤波等结构的麦克风阵列语音增强的基本原理,总结了各种算法的特点.  相似文献   

6.
基于子带广义旁瓣相消器的麦克风阵列语音增强*   总被引:2,自引:0,他引:2  
为了加快基于广义旁瓣相消器的麦克风阵列语音增强系统的收敛速度,将其自适应模块的输入信号分解到子带以进行处理,并将多通道维纳滤波器引入广义旁瓣相消器的非自适应支路,以更有效地抑制非相干噪声。实际测试结果表明,相对于基于全带广义旁瓣相消器的麦克风阵列语音增强系统,采用该子带广义旁瓣相消器结构的语音增强系统具有更快的收敛速度和更高的输出信噪比。  相似文献   

7.
研究讨论了用于麦克风阵列的高信噪比定向采音算法,设计实现了麦克风阵列语音采集系统.通过对采集到的空间中不同方向音频进行数字信号处理,使阵列形成的波束主瓣指向目标语音,零陷指向干扰源,提高采音信噪比,实现对声源的定向采音等.测试结果表明,本系统采音效果良好,采集到的声音信号主瓣很窄,能够实现高信噪比定向采音.  相似文献   

8.
随着近年来人机语音交互场景不断增加,利用麦克风阵列语音增强提高语音质量成为研究热点之一。与环境噪声不同,多说话人分离场景下干扰说话人语音与目标说话人同为语音信号,呈现类似的时、频特性,对传统麦克风阵列语音增强技术提出更高的挑战。针对多说话人分离场景,基于深度学习网络构建麦阵空间响应代价函数并进行优化,通过深度学习模型训练设计麦克风阵列期望空间传输特性,从而通过改善波束指向性能提高分离效果。仿真和实验结果表明,该方法有效提高了多说话人分离性能。  相似文献   

9.
针对四元麦克风阵列结构,提出了一种可在嘈杂环境下有效提升语音识别系统性能的频域自适应噪声抵消语音增强方法。该算法对阵型设计和阵列失配有十分好的鲁棒特性,便于在各种设备及使用场景下使用。仿真结果表明,在无需噪声统计特性等先验信息的情况下,算法能显著改善麦克风阵列的噪声抑制效果,有效的提升智能交互设备的唤醒率和识别率。  相似文献   

10.
麦克风阵列声源定位可为在复杂环境下的说话人的空间位置估计提供有效的解决方案。而传统的应用于雷达,声呐系统领域的阵列信号处理理论已趋于完美,很多应用于阵列信号处理的算法加以修改就可以用来进行麦克风阵列的声源定位。以阵列信号处理中的经典算法MUSIC(Multiple Signal Classification)算法为原型,同时根据语音信号在应用中的特点,介绍了一种基于近场的信号模型,并以此为依据对算法进行改进,使声源定位更加精确。对此算法进行了仿真实验,仿真结果表明,此算法具有良好的定位性能,并随着信号信噪比的上升,性能有所提高。  相似文献   

11.
Locating and tracking a speaker in real time using microphone arrays is important in many applications such as hands-free video conferencing, speech processing in large rooms, and acoustic echo cancellation. A speaker can be moving from the far field to the near field of the array, or vice versa. Many neural-network-based localization techniques exist, but they are applicable to either far-field or near-field sources, and are computationally intensive for real-time speaker localization applications because of the wide-band nature of the speech. We propose a unified neural-network-based source localization technique, which is simultaneously applicable to wide-band and narrow-band signal sources that are in the far field or near field of a microphone array. The technique exploits a multilayer perceptron feedforward neural network structure and forms the feature vectors by computing the normalized instantaneous cross-power spectrum samples between adjacent pairs of sensors. Simulation results indicate that our technique is able to locate a source with an absolute error of less than 3.5 degrees at a signal-to-noise ratio of 20 dB and a sampling rate of 8000 Hz at each sensor.  相似文献   

12.
Speech interaction systems are currently highly demanded for quick hands-free interactions. Conventional speech interaction systems (SISs) are trained to the user’s voice whilst most modern systems learn from interaction experience overtime. However, because speech expresses a human computer natural interaction (HCNI) with the world, SIS design must lead to interface computer system that can receive spoken information and act appropriately upon that information. In spite of significant advancements in recent years SISs, there still remain a large number of problems which must be solved in order to successfully apply the SISs in practice and also comfortably accepted by the users. Among many other problems, problems of devising and efficient modeling are considered the primary and important step in the speech recognition deployment in hands-free applications. Meanwhile, the brain–computer interfaces (BCIs) allow users to control applications by brain activity. The work presented in this paper emphasizes an improved implementation of SIS by integrating BCI in order to associate the brain signals for a list of commands as identification criteria for each specific command for controlling the wheelchair with spoken commands.  相似文献   

13.
This paper proposes a speech enhancement approach to suppress the interference of car noise. A linear microphone array is adopted for far-talking speech acquisition and delay-and-sum beamforming noise reduction. We present an effective time delay estimator using the coherence function between the reference microphone and the beamformed speech. To further enhance the beamformed speech, we exploit an improved Wiener filter where the resulting noise correlation in microphone array is relatively small so that the performance of optimal Wiener filtering could be achieved. Also, due to the serious degradation in low frequency car speech, we develop a spectral weighting function to compensate the low frequency filtering. These two processing units serve as the post filters to attain the desirable enhancement performance. In the experiments on microphone array speech in presence of real and simulated car noises, we find that the proposed algorithm performs well. Performance is measured in terms of the signal-to-noise ratio and the word error rate. The combined delay-and-sum beamformer and two post filters obtain the best results compared to other methods.  相似文献   

14.
Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network.  相似文献   

15.
A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human-computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates.  相似文献   

16.
以基于声达时间差(TDOA)的定位技术为基础,在噪声和混响同时存在的环境下,对基于麦克风阵列的声源定位方法进行了系统研究。在传统LMS自适应算法的基础上,提出了一种基于语音激励信息的LMS自适应时延估计新方法,再结合平面四元几何法定位。经过模拟房间环境的实验验证,该方法抗噪声、抗混响能力强,是一种定位精度高,运算量小的声源定位方法,可用于实时定位。  相似文献   

17.
Acoustic echo cancellation is one of the most severe requirements in hands-free telephone and teleconference communication. This paper proposes an Empirical Mode Decomposition (EMD)-based sub-band adaptive filtering structure, which applies the EMD-based algorithm dealing with the far-end speech signal and the microphone output to obtain two sets of intrinsic mode functions (IMFs). In addition, each IMF set is separated into different bands based on the power spectral density (PSD) of every IMF. Experiment signals were collected from a medium-size office room and simulations were taken under different conditions by three types of EMD-based algorithms. Results show that the proposed structure is able to model the transfer function of the unknown environment and track the change of the room much faster than the normalized adaptive filtering structure. The ensemble EMD (EEMD) algorithm and the noise-modulated EMD (NEMD) are proved to have better performance than the EMD algorithm in terms of echo return loss enhancement.  相似文献   

18.
差分麦克风阵列为实现小尺寸阵列条件下的声源定位提供了一条重要技术途径。语音信号具有稀疏性,利用该特性可实现基于差分麦克风阵列的多声源方位估计,其中的典型方法为直方图法。针对差分麦克风阵列,本文提出了一种基于时频掩蔽和模糊聚类分析的短时平均复声强多声源方位估计方法。分析了不同阵列尺寸条件下时频掩蔽频带范围的选择问题。该方法具有闭式解,在强混响噪声环境下的性能优于直方图法,并且受阵列尺寸变化的影响较小。为了改善直方图法的性能, 基于时频掩蔽的思想,文中还给出了一种修正的直方图方法。混响噪声环境下的仿真实验结果验证了本文所提方法的有效性。  相似文献   

19.
In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood framework of these two techniques, and demonstrates how such a framework can be adapted to create efficient SSL and beamforming algorithms for reverberant rooms and unknown directional patterns of microphones. The proposed method is closely related to steered response power-based algorithms, which are known to work extremely well in real-world environments. We demonstrate the effectiveness of the proposed method on challenging synthetic and real-world datasets, including over six hours of recorded meetings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号