共查询到19条相似文献,搜索用时 125 毫秒
1.
2.
基于TMS320DM642麦克风阵列声源定位系统 总被引:1,自引:0,他引:1
麦克风声源定位是利用麦克风阵列拾取语音信号,并用数字信号处理技术对其进行分析和处理的声源定位技术.在麦克风阵列声源定位中,语音信号端点的拾取是重要的环节.语音端点检测是对接收到的信号利用端点检测算法分析,以确认麦克风阵列中语音信号到达的端点;并利用麦克风阵列中各麦克风接收到的语音信号的端点的先后,计算出麦克风阵列接收的... 相似文献
3.
4.
本文构建了基于麦克风阵列的语音盲信号分离DSP实验平台,重点介绍了麦克风阵列的时分多路以及DMA结合McBSP的数据双向传输技术. 相似文献
5.
用麦克风阵列进行语音处理的方法可以提高信噪比,解决环境噪声、回声和混响引起的语音识别性能降低的问题.介绍基于延迟-累加方法(传统波束法) 、自适应波束法及基于后置自适应滤波等结构的麦克风阵列语音增强的基本原理,总结了各种算法的特点. 相似文献
6.
7.
8.
9.
《电子制作.电脑维护与应用》2020,(15)
针对四元麦克风阵列结构,提出了一种可在嘈杂环境下有效提升语音识别系统性能的频域自适应噪声抵消语音增强方法。该算法对阵型设计和阵列失配有十分好的鲁棒特性,便于在各种设备及使用场景下使用。仿真结果表明,在无需噪声统计特性等先验信息的情况下,算法能显著改善麦克风阵列的噪声抑制效果,有效的提升智能交互设备的唤醒率和识别率。 相似文献
10.
麦克风阵列声源定位可为在复杂环境下的说话人的空间位置估计提供有效的解决方案。而传统的应用于雷达,声呐系统领域的阵列信号处理理论已趋于完美,很多应用于阵列信号处理的算法加以修改就可以用来进行麦克风阵列的声源定位。以阵列信号处理中的经典算法MUSIC(Multiple Signal Classification)算法为原型,同时根据语音信号在应用中的特点,介绍了一种基于近场的信号模型,并以此为依据对算法进行改进,使声源定位更加精确。对此算法进行了仿真实验,仿真结果表明,此算法具有良好的定位性能,并随着信号信噪比的上升,性能有所提高。 相似文献
11.
Locating and tracking a speaker in real time using microphone arrays is important in many applications such as hands-free video conferencing, speech processing in large rooms, and acoustic echo cancellation. A speaker can be moving from the far field to the near field of the array, or vice versa. Many neural-network-based localization techniques exist, but they are applicable to either far-field or near-field sources, and are computationally intensive for real-time speaker localization applications because of the wide-band nature of the speech. We propose a unified neural-network-based source localization technique, which is simultaneously applicable to wide-band and narrow-band signal sources that are in the far field or near field of a microphone array. The technique exploits a multilayer perceptron feedforward neural network structure and forms the feature vectors by computing the normalized instantaneous cross-power spectrum samples between adjacent pairs of sensors. Simulation results indicate that our technique is able to locate a source with an absolute error of less than 3.5 degrees at a signal-to-noise ratio of 20 dB and a sampling rate of 8000 Hz at each sensor. 相似文献
12.
Speech interaction systems are currently highly demanded for quick hands-free interactions. Conventional speech interaction systems (SISs) are trained to the user’s voice whilst most modern systems learn from interaction experience overtime. However, because speech expresses a human computer natural interaction (HCNI) with the world, SIS design must lead to interface computer system that can receive spoken information and act appropriately upon that information. In spite of significant advancements in recent years SISs, there still remain a large number of problems which must be solved in order to successfully apply the SISs in practice and also comfortably accepted by the users. Among many other problems, problems of devising and efficient modeling are considered the primary and important step in the speech recognition deployment in hands-free applications. Meanwhile, the brain–computer interfaces (BCIs) allow users to control applications by brain activity. The work presented in this paper emphasizes an improved implementation of SIS by integrating BCI in order to associate the brain signals for a list of commands as identification criteria for each specific command for controlling the wheelchair with spoken commands. 相似文献
13.
This paper proposes a speech enhancement approach to suppress the interference of car noise. A linear microphone array is adopted for far-talking speech acquisition and delay-and-sum beamforming noise reduction. We present an effective time delay estimator using the coherence function between the reference microphone and the beamformed speech. To further enhance the beamformed speech, we exploit an improved Wiener filter where the resulting noise correlation in microphone array is relatively small so that the performance of optimal Wiener filtering could be achieved. Also, due to the serious degradation in low frequency car speech, we develop a spectral weighting function to compensate the low frequency filtering. These two processing units serve as the post filters to attain the desirable enhancement performance. In the experiments on microphone array speech in presence of real and simulated car noises, we find that the proposed algorithm performs well. Performance is measured in terms of the signal-to-noise ratio and the word error rate. The combined delay-and-sum beamformer and two post filters obtain the best results compared to other methods. 相似文献
14.
Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network. 相似文献
15.
A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human-computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates. 相似文献
16.
17.
Xiaochuan He Rafik A. Goubran Peter X. Liu 《International Journal of Speech Technology》2014,17(1):37-42
Acoustic echo cancellation is one of the most severe requirements in hands-free telephone and teleconference communication. This paper proposes an Empirical Mode Decomposition (EMD)-based sub-band adaptive filtering structure, which applies the EMD-based algorithm dealing with the far-end speech signal and the microphone output to obtain two sets of intrinsic mode functions (IMFs). In addition, each IMF set is separated into different bands based on the power spectral density (PSD) of every IMF. Experiment signals were collected from a medium-size office room and simulations were taken under different conditions by three types of EMD-based algorithms. Results show that the proposed structure is able to model the transfer function of the unknown environment and track the change of the room much faster than the normalized adaptive filtering structure. The ensemble EMD (EEMD) algorithm and the noise-modulated EMD (NEMD) are proved to have better performance than the EMD algorithm in terms of echo return loss enhancement. 相似文献
18.
差分麦克风阵列为实现小尺寸阵列条件下的声源定位提供了一条重要技术途径。语音信号具有稀疏性,利用该特性可实现基于差分麦克风阵列的多声源方位估计,其中的典型方法为直方图法。针对差分麦克风阵列,本文提出了一种基于时频掩蔽和模糊聚类分析的短时平均复声强多声源方位估计方法。分析了不同阵列尺寸条件下时频掩蔽频带范围的选择问题。该方法具有闭式解,在强混响噪声环境下的性能优于直方图法,并且受阵列尺寸变化的影响较小。为了改善直方图法的性能,
基于时频掩蔽的思想,文中还给出了一种修正的直方图方法。混响噪声环境下的仿真实验结果验证了本文所提方法的有效性。 相似文献
19.
Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings 总被引:1,自引:0,他引:1
In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood framework of these two techniques, and demonstrates how such a framework can be adapted to create efficient SSL and beamforming algorithms for reverberant rooms and unknown directional patterns of microphones. The proposed method is closely related to steered response power-based algorithms, which are known to work extremely well in real-world environments. We demonstrate the effectiveness of the proposed method on challenging synthetic and real-world datasets, including over six hours of recorded meetings. 相似文献