首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Locating and tracking a speaker in real time using microphone arrays is important in many applications such as hands-free video conferencing, speech processing in large rooms, and acoustic echo cancellation. A speaker can be moving from the far field to the near field of the array, or vice versa. Many neural-network-based localization techniques exist, but they are applicable to either far-field or near-field sources, and are computationally intensive for real-time speaker localization applications because of the wide-band nature of the speech. We propose a unified neural-network-based source localization technique, which is simultaneously applicable to wide-band and narrow-band signal sources that are in the far field or near field of a microphone array. The technique exploits a multilayer perceptron feedforward neural network structure and forms the feature vectors by computing the normalized instantaneous cross-power spectrum samples between adjacent pairs of sensors. Simulation results indicate that our technique is able to locate a source with an absolute error of less than 3.5 degrees at a signal-to-noise ratio of 20 dB and a sampling rate of 8000 Hz at each sensor.  相似文献   

2.
This paper, presents a design and implementation of dual microphone coherence based speech enhancement technique using field programmable gate array (FPGA). In order to have a proper enhancement of dual microphone system, we require to estimate the time delay of arrival (TDOA) between the two microphone signals which is followed by the application of the proposed speech enhancement algorithm. We have used TDOA algorithm based on phase transform to minimize the effect of reverberation for localization of the sound sources. Coherence based technique has been used for speech enhancement process which requires no background noise estimation. In this way, we can achieve a high localization accuracy and also the capability of dealing with coherent noise. In the proposed system, TDOA and speech enhancement processes are executed concurrently exploiting the parallel logic blocks of FPGA, thus increasing the throughput of the system to a great extent. We have implemented our design on Spartan6 Lx45 FPGA device. The subjective evaluation of the proposed design with normal hearing listeners using comprehensibility listing test has been done and its performance has been compared to the existing state of the art research works. The objective evaluation of the proposed design also designates the significant melioration over the existing state of the art research works. The subjective and objective evaluation infer that our proposed hardware induce feasible solution for hearing aid and other hand-held devices.  相似文献   

3.
Non-Gaussian noise distorts the speech signals and leads to the degradation of speaker tracking performance. In this paper, a distributed particle filter (DPF) based speaker tracking method in distributed microphone networks under non-Gaussian noise and reverberant environments is proposed. A generalized correntropy function is first presented to estimate the time differences of arrival (TDOA) for speech signals at each node in distributed microphone networks. Next, to address spurious TDOA estimations caused by noise and reverberation, a multiple-hypothesis likelihood model is introduced to calculate the local likelihood functions of the DPF. Finally, a DPF fusing local likelihood functions with an average consensus algorithm is employed to estimate a moving speaker's positions. The proposed method can accurately track the speaker under non-Gaussian noise and reverberant environments, and it is scalable and robust against the nodes failure in distributed networks. Simulation experiments show the validation of the proposed speaker tracking method.  相似文献   

4.
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.  相似文献   

5.
This paper proposes a speech enhancement approach to suppress the interference of car noise. A linear microphone array is adopted for far-talking speech acquisition and delay-and-sum beamforming noise reduction. We present an effective time delay estimator using the coherence function between the reference microphone and the beamformed speech. To further enhance the beamformed speech, we exploit an improved Wiener filter where the resulting noise correlation in microphone array is relatively small so that the performance of optimal Wiener filtering could be achieved. Also, due to the serious degradation in low frequency car speech, we develop a spectral weighting function to compensate the low frequency filtering. These two processing units serve as the post filters to attain the desirable enhancement performance. In the experiments on microphone array speech in presence of real and simulated car noises, we find that the proposed algorithm performs well. Performance is measured in terms of the signal-to-noise ratio and the word error rate. The combined delay-and-sum beamformer and two post filters obtain the best results compared to other methods.  相似文献   

6.
利用非同步采集设备,通过广义互相关(GCC)时延估计算法,估算出通道间的相对时延;对非同步采集方式产生的时延误差进行软件补偿;利用基于到达时间差(TDOA)声源定位算法的双步定位特性,估算出声源的位置.分别设计了四元均匀线阵系统和四元平面十字阵列系统对以上方法进行验证,系统能够较准确地实现对声源方位的估计.  相似文献   

7.
冯道宁  王浩 《计算机工程与应用》2012,48(21):130-132,142
对基于麦克风阵列的声源定位技术进行了研究,分析了时延估计算法的构成方法,在此基础上提出了用于二维DOA估计的双曲线算法。利用传声器MPA416和数据采集卡PXI4472结合Labview虚拟仪器实现了对声源的二维DOA估计。实验证明,该系统定位实时性好、准确度高。  相似文献   

8.
针对传统时差定位闭式解法在量测噪声较大情况下定位性能不佳的缺点,提出了一种新的时差定位算法。该算法首先在无约束条件下利用加权最小二乘得到目标的初始位置估计值,然后利用最大似然方程对初始位置估计值进行校正,校正后的位置估计值将更加接近最大似然估计。通过对算法的仿真分析,结果表明在量测噪声较大的情况下,算法的定位均方误差要小于经典的Chan算法。  相似文献   

9.
准确的时延估计(Time Delay Estimation,TDE)是基于到达时间差(Time Difference of Arrival,TDOA)的声源定位技术的前提.在众多时延估计算法中,广义互相关(Generalized Cross Correlation,GCC)算法因其较低的运算复杂度和易于实现的特点得到了广泛的应用.针对不同的噪声情况,GCC时延估计算法利用不同的加权函数来抑制噪声干扰.本文在介绍麦克风阵列模型和GCC时延估计算法的基础上,针对GCC算法的弊端提出了一种改进算法,并在多种信噪比条件下,对部分加权函数的GCC时延估计算法进行了MATLAB仿真,通过比较其时延估计性能和声源定位精度,分析了这些加权函数各自的优劣性.  相似文献   

10.
针对单独的音频和视频信息跟踪的缺陷,提出了一种音视频信息融合的粒子滤波跟踪算法。采用闭环跟踪框架,分为底层跟踪、融合、重要性粒子滤波、跟踪输出和反馈五个环节。底层跟踪环节利用说话人脸部肤色信息进行均值漂移跟踪的同时,利用说话人声音信号到达麦克风阵列的时间延迟进行跟踪定位;融合环节对这两者得到的跟踪信息进行整合,得出基于音视频信息融合的重要性函数和融合似然模型;滤波环节利用重要性粒子滤波算法对融合的数据进行滤波处理;跟踪环节根据滤波结果对说话人进行跟踪;反馈环节将跟踪结果动态反馈给人脸肤色跟踪和声源定位跟踪模块。流程化的闭环处理过程保证了算法的实时性。最后,采用AMI会议语料库对该算法进行测试,结果表明该算法平均误跟率仅为9.32%,比使用单一音频或视频信息的跟踪算法稳定性好、准确性高。  相似文献   

11.
基于圆形麦克风阵列的声源定位改进算法   总被引:1,自引:0,他引:1  
针对波达方向估计中传统互功率谱法声源方位估计准确性差、方位模糊的问题,提出了一种基于圆形麦克风阵列的声源定位改进算法,并进行了实验验证。在该改进算法中,先设计了十二元圆形麦克风阵列,由麦克风对接收语音信号的时延与相位得到相位旋转因子,再将其引入到语音信号的互功率谱中,新定义了圆形集成互功率谱,由该功率谱进行声源方位估计。仿真与实测实验结果表明,本文的圆形集成互功率谱法对声源方位进行估计,估计的准确度高于传统互功率谱法。  相似文献   

12.
节点定位是无线传感器网络中最为关键的一项技术。针对无源定位的问题,提出一种到达时间差(TDOA)和到达信号增益比(GROA)联合定位算法,并且采用飞行机制的萤火虫算法(GSO)来求得最终结果。结合TDOA和GROA定位模型,引入辅助变量将方程伪线性化,然后采用修正两步加权最小二乘算法(TSWLS)来进行求解。并且在不影响收敛速度和精度的前提下,采用带有飞行机制的GSO算法来寻求目标定位的最优解,克服粒子群算法易陷入局部最优的缺点。仿真结果表明,该算法相比较TDOA算法而言,定位精度提高了23 dB,并且具有相对较高和较稳定的定位精度。  相似文献   

13.
以基于声达时间差(TDOA)的定位技术为基础,在噪声和混响同时存在的环境下,对基于麦克风阵列的声源定位方法进行了系统研究。在传统LMS自适应算法的基础上,提出了一种基于语音激励信息的LMS自适应时延估计新方法,再结合平面四元几何法定位。经过模拟房间环境的实验验证,该方法抗噪声、抗混响能力强,是一种定位精度高,运算量小的声源定位方法,可用于实时定位。  相似文献   

14.
The general problem addressed in this paper is that of separating the voices of active moving speakers in the presence of background noise and moderate reverberation level in the acoustic field using a single microphone array. We adapt the multisensor multitarget tracking theory to the context of microphone arrays in order to form receptive beams that lock on each moving speaker on an extended time basis and therefore, achieve voice separation. Our approach: 1) incorporates kinematical information of speakers' movement by using an interacting multiple model (IMM) estimator per speaker in order to constrain the evolution of direction of arrival (DOA) measurements, which characterize various motions of the speakers, and 2) can directly account for measurement origin uncertainty, i.e., which measurement comes from which speaker, by using the probabilistic-data-association technique in conjunction with the IMM estimator. The effectiveness of the approach is illustrated by an extensive simulation study on tracking the DOAs of two speakers with crossing trajectories and three static speakers having a conversation with partially overlapping speech and long pauses  相似文献   

15.
时延估计是阵列信号处理中的一项关键技术,广泛用于如语音增强,说话人定位,其目的是要估计出同源信号到达不同传感器时,由于传输距离不同而引起的时间差。现有的算法主要包括相关、广义互相关方法、自适应最小均方方法等。这些算法因其抗噪性能不同,有着各自的应用场合。本文在一段公共的声波数据上利用各种算法进行时延估计,分析比较算法性能,并将相关方法应用于管道泄漏检测定位误差不超过5%。  相似文献   

16.
This paper deals with the problem of localizing and tracking a moving speaker over the full range around the mobile robot. The problem is solved by taking advantage of the phase shift between signals received at spatially separated microphones. The proposed algorithm is based on estimating the time difference of arrival by maximizing the weighted cross-correlation function in order to determine the azimuth angle of the detected speaker. The cross-correlation is enhanced with an adaptive signal-to-noise estimation algorithm to make the azimuth estimation more robust in noisy surroundings. A post-processing technique is proposed in which each of these microphone-pair determined azimuths are further combined into a mixture of von Mises distributions, thus producing a practical probabilistic representation of the microphone array measurement. It is shown that this distribution is inherently multimodal and that the system at hand is non-linear. Therefore, particle filtering is applied for discrete representation of the distribution function. Furthermore, the two most common microphone array geometries are analysed and exhaustive experiments were conducted in order to qualitatively and quantitatively test the algorithm and compare the two geometries. Also, a voice activity detection algorithm based on the before-mentioned signal-to-noise estimator was implemented and incorporated into the existing speaker localization system. The results show that the algorithm can reliably and accurately localize and track a moving speaker.  相似文献   

17.
The use of microphone arrays offers enhancements of speech signals recorded in meeting rooms and office spaces. A common solution for speech enhancement in realistic environments with ambient noise and multi-path propagation is the application of so-called beamforming techniques. Such beamforming algorithms enhance signals at the desired angle using constructive interference while attenuating signals coming from other directions by destructive interference. However, these techniques require as a priori the time difference of arrival information of the source. Therefore, the source localization and tracking algorithms are an integral part of such a system. The conventional localization algorithms deteriorate in realistic scenarios with multiple concurrent speakers. In contrast to conventional methods, the techniques presented in this paper make use of pitch information of speech signals in addition to the location information. This “position–pitch”-based algorithm pre-processes the speech signals by a multiband gammatone filterbank that is inspired from the auditory model of the human inner ear. The role of this gammatone filterbank is analyzed and discussed in details. For a robust localization of multiple concurrent speakers, a frequency-selective criterion is explored that is based on a study of the human neural system's use of correlations between adjacent sub-band frequencies. This frequency-selective criterion leads to improved localization performance. To further improve localization accuracy, an algorithm based on grouping of spectro-temporal regions formed by pitch cues is presented. All proposed speaker localization algorithms are tested using a multichannel database where multiple concurrent speakers are active. The real-world recordings were made with a 24-channel uniform circular microphone array using loudspeakers and human speakers under various acoustic environments including moving concurrent speaker scenarios. The proposed techniques produced a localization performance that was significantly better than the state-of-the-art baseline in the scenarios tested.  相似文献   

18.
This paper addresses the problem of distant speech acquisition in multiparty meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering. Beamforming techniques, however, rely on knowledge of the speaker location. In this paper, we present an integrated approach, in which an audio-visual multiperson tracker is used to track active speakers with high accuracy. Speech enhancement is then achieved using microphone array beamforming followed by a novel postfiltering stage. Finally, speech recognition is performed to evaluate the quality of the enhanced speech signal. The approach is evaluated on data recorded in a real meeting room for stationary speaker, moving speaker, and overlapping speech scenarios. The results show that the speech enhancement and recognition performance achieved using our approach are significantly better than a single table-top microphone and are comparable to a lapel microphone for some of the scenarios. The results also indicate that the audio-visual-based system performs significantly better than audio-only system, both in terms of enhancement and recognition. This reveals that the accurate speaker tracking provided by the audio-visual sensor array proved beneficial to improve the recognition performance in a microphone array-based speech recognition system.  相似文献   

19.
A distributed, self-organization algorithm for ground target tracking using unattended acoustic sensor network is developed. Instead of using microphone arrays, each sensor node in the sensor network uses only a single microphone as its sensing device. This design can greatly reduce the size and cost of each sensor node and allow more flexible deployment of the sensor network. The self-organization algorithm presented in this paper can dynamically select proper sensor nodes to form the localization sensor groups that can work as a virtual microphone array to perform energy efficient target localization and tracking. To achieve this, we use a time-delay based bearing estimation plus triangulation for source localization in the sensor network. Major error sources of the localization method like time delay estimation, bearing calculation and triangulation are analyzed and sensor selection criteria are developed. Based on these criteria and neighborhood information of each sensor node, a distributed self-organization algorithm is developed. Simulation results show the effectiveness of the proposed algorithm.  相似文献   

20.
A new method for spatial localization of bubble nucleations in superheated droplet detectors is presented which has two steps: validation and localization. Validation is accomplished through signal processing techniques serving the purpose of filtering out electromagnetic noise and gaseous micro-leaks. The 3D spatial localization uses a passive acoustic sound source localization technique using a microphone array (5 elements) and a simple generalised cross-correlation (GCC) time delay of arrival (TDOA) algorithm. The approach for the nucleation validation is new and regarding the localization, as of this writing, we are the first to endeavour its feasibility assessment. Experimental results are shown.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号