期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于相干性滤波器的广义旁瓣抵消器麦克风小阵列语音增强方法 总被引：1，自引：0，他引：1

杨立春钱沄涛《电子与信息学报》2012,34(12):3027-3033

为了克服传统麦克风小阵列语音增强算法噪音抑制能力有限的问题,该文提出一种基于相干性滤波器的广义旁瓣抵消器语音增强算法, 该算法基于动态平滑系数噪声谱估计来获得相干性滤波器,分别对每个阵元接收到的信号进行滤波用以抑制包括混响等噪声信号的干扰,并把滤波后的信号作为输入信号,使用基于小阵列的广义旁瓣抵消器波束形成算法抑制残余噪声信号的干扰。模拟和实际试验表明,该文提出的算法明显优于单独使用小阵列波束形成算法和相干性滤波器算法。相似文献

2.

基于任意麦克风阵列的近场声源三维定位算法研究 总被引：2，自引：0，他引：2

居太亮邵怀宗彭启琮林静然《信号处理》2007,23(2):231-234

基于麦克风阵列的声源定位技术在通信、语音处理等领域得到广泛的应用。本文从声音传播的基本原理出发,推导出了精确的相对幅度衰减因子,获得了更加准确的信号传播模型。利用该模型并结合子空间算法的思想,提出了适用于任意拓扑结构的麦克风阵列的近场宽带三维声源定位算法。仿真结果表明,该算法在一维均匀直线阵、二维均匀圆阵和三维均匀球面阵中,均能够得到较好的定位效果。相似文献

3.

基于时域Gammatone滤波特征的广播语种识别

下载免费PDF全文

陈亮邵玉斌龙华杜庆治彭艺唐维康《信号处理》2022,38(3):599-608

针对广播语种识别问题,提出一种语音时域滤波方法,用gammatone时域函数与预处理后的语音信号进行卷积滤波,再分帧加窗并求对数化能量得到时域GF(gammatone filterbank)特征.将特征参数图像化表示,然后通过VGG19和Resnet34分类网络进行语种识别实验.同时,也使用自动色阶算法对加噪语音的图像... 相似文献

4.

面向二元麦克风小阵列改进的广义旁瓣抵消器语音增强算法

下载免费PDF全文

杨立春钱沄涛《信号处理》2012,28(10):1379-1385

二元麦克风小阵列在手机、助听器等受空间、成本以及运算能力限制的设备中被广泛研究用以提高目标语音质量。二元麦克风小阵列中语音增强算法主要包括波束形成方法以及相干性滤波器方法。波束形成方法的思想是利用目标声源相对阵列的位置关系获取相应的时域和空域信息,可以保留目标声源方向的信号而抑制其他方向的干扰信号;相干性滤波器方法则通过阵元间不同信号的相关性进行噪音抑制。考虑这两种类型方法的优点,本文提出一种面向二元麦克风小阵列改进的广义旁瓣抵消器语音增强算法,通过在广义旁瓣抵消器的固定波束形成支路上使用相干性滤波器,提高固定波束形成输出信号的信噪比,然后在广义旁瓣抵消器自适应支路利用阵列的时域和空域信息对固定波束形成支路输出的信号中残余噪音进行估计,进而获得增强后目标输出信号。仿真和实际试验表明,本文提出的算法明显优于单独使用小阵列波束形成算法和相干性滤波器算法。相似文献

5.

基于等边三角形的二阶差分传声器阵列

陈卓粱维谦董保帅《电声技术》2011,35(7):38-41

实现了一种等边三角形结构微型传声器阵列的语音增强方法.不同于以往的线性一阶差分传声器阵列结构和线性二阶差分传声器阵列结构,提出并且验证了一种基于延时相加的二阶三角差分传声器阵列的算法,通过真实环境的检测,证明该算法能够实现12个方向的语音增强,同时方向性信噪比比线性一阶差分传声器阵列增强3～4 dB. 相似文献

6.

基于反幂法和卡尔曼滤波的自适应语音去混响方法

下载免费PDF全文

梅铁民《信号处理》2018,34(7):776-786

噪声鲁棒的自适应语音信号去混响是现代语音信号处理的重要研究内容,其困难在于语音信号的非白性、非平稳性及房间的超长冲激响应特性。针对单输入多输出(SIMO)麦克风阵列系统获取的多路混响语音信号,提出了一种新的去混响算法。首先通过相关法时延估计对SIMO混响语音信号进行时延对齐;其次在保持SIMO系统输出信号间交叉关联关系(cross relation)基础上对混响语音信号进行预白化处理;最后把交叉关联关系、用于矩阵最小特征向量计算的反幂法与卡尔曼滤波解卷积方法有机结合,实现了SIMO混响语音信号的实时自适应去混响。仿真与实验研究表明,本方法对混响语音信号去混响效果明显,同时具有较好的抗噪声性能。相似文献

7.

Optimal Design of Nearfield Wideband Beamformers Robust Against Errors in Microphone Array Characteristics 总被引：1，自引：0，他引：1

Huawei Chen Wee Ser Zhu Liang Yu 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(9):1950-1959

Nearfield wideband beamformers for microphone arrays have wide applications, such as hands-free telephony, hearing aids, and speech input devices to computers. The existing design approaches for nearfield wideband beamformers are highly sensitive to errors in microphone array characteristics, i.e., microphone gain, phase, and position errors, as well as sound speed errors. In this paper, a robust design approach for nearfield wideband beamformers for microphone arrays is proposed. The robust nearfield wideband beamformers are designed based on the minimax criterion with the worst case performance optimization. The design problems can be formulated as second-order cone programming and be solved efficiently using the well-established polynomial time interior-point methods. Several interesting properties of the robust nearfield wideband beamformers are derived. Numerical examples are given to demonstrate the efficacy of the proposed beamformers in the presence of errors in microphone array characteristics. 相似文献

8.

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Jen-Tzung Chien Jain-Ray Lai 《Journal of Signal Processing Systems》2004,36(2-3):141-151

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead. 相似文献

9.

Development of an Optimized Feature Extraction Algorithm for Throat Signal Analysis

Young‐Giu Jung Mun‐Sung Han Sang Jo Lee 《ETRI Journal》2007,29(3):292-299

In this paper, we present a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. Due to the absence of high frequencies and the partial loss of formant frequencies, previous systems using throat microphones have shown a lower recognition rate than systems which use standard microphones. To develop a high performance automatic speech recognition (ASR) system using only a throat microphone, we propose two methods. First, based on Korean phonological feature theory and a detailed throat signal analysis, we show that it is possible to develop an ASR system using only a throat microphone, and propose conditions of the feature extraction algorithm. Second, we optimize the zero‐crossing with peak amplitude (ZCPA) algorithm to guarantee the high performance of the ASR system using only a throat microphone. For ZCPA optimization, we propose an intensification of the formant frequencies and a selection of cochlear filters. Experimental results show that this system yields a performance improvement of about 4% and a reduction in time complexity of 25% when compared to the performance of a standard ZCPA algorithm on throat microphone signals. 相似文献

10.

Subband processing for broadband microphone arrays

F. Lorenzelli A. Wang D. Korompis R. Hudson K. Yao 《The Journal of VLSI Signal Processing》1996,14(1):43-55

This paper considers array processing for wideband signals. The optimization techniques and associated performance results correspond to steerable but fixed beam microphone arrays, to be used in hearing aid applications, both in free-space and reverberant conditions. We first review the results on maximum energy (ME) broadband arrays. We subsequently formulate optimization criteria for array subband processing. The uniformly spaced subband and the non-uniformly spaced subband using quadrature mirror filter approaches are treated. Finally, various simulation results for free-space and reverberant conditions are presented to demonstrate the usefulness of this class of microphone arrays, as well as the feasibility of quadrature mirror filter-based subband processing.This work was partially supported by the House Ear Institute and the Retirement Research Foundation. 相似文献

11.

基于DCT与维纳滤波的单通道语音增强算法 总被引：5，自引：0，他引：5

欧世峰赵晓晖顾海军《通信学报》2006,27(10):86-93

针对复杂噪声背景下的语音增强问题,基于离散余弦变换(DCT)和维纳滤波提出了一种新的单通道语音增强算法。该算法不依赖任何语音信号模型且无需对噪声的统计特性进行先验假定,它利用DCT域中连续时刻语音信号分量间的相关特性结合最小均方误差算法实现纯净语音分量的最优估计,弥补了一般算法仅依赖单帧带噪语音对语音分量估计得不足。多种噪声背景下的仿真结果表明,该算法在主观和客观测试中都具有良好的语音增强效果。相似文献

12.

基于粒子群优化的宽带同心圆环阵型优化方法

下载免费PDF全文

康雅聪魏明洋田巳睿丁林宁《压电与声光》2022,44(6):929-935

同心圆环传声器阵列广泛应用于声成像阵列系统,能有效地抑制干扰并确定声源位置。在限定阵元数量、阵列最大孔径和最小阵元间距等条件下,提出了一种利用粒子群算法设计宽带低旁瓣同心圆环传声器阵列的方法。该方法通过构造一个反映宽带信号在扫描区域内旁瓣水平的适应度函数,以圆环半径和阵元偏转角作为联合优化参数,基于粒子群优化算法实现了阵型的快速优化求解。仿真实验表明,该方法获得的优化阵型在扫描范围内的整体旁瓣水平低于以最高旁瓣级为适应度函数得到的最优阵型,也低于仅优化圆环半径获得的最优阵型,这说明了该优化方法的可行性和有效性。相似文献

13.

基于任意麦克风阵列的声源二维DOA估计算法研究 总被引：7，自引：0，他引：7

李广侠彭启琮邵怀宗林静然《通信学报》2005,26(8):129-133

对基于麦克风阵列的声源定位技术进行了研究,分析了基于麦克风阵列的远场信号模型,并结合子空间的方法推导出了声源二维(水平角和俯仰角)DOA估计——2D-MUSIC算法,该算法适用于任意拓扑结构的麦克风阵列。利用MATLAB仿真工具,对几种典型阵列结构进行了对比分析,提出了2种新型的三维麦克风阵列:均匀球面阵和三维均匀直线阵。仿真结果表明,提出的DOA估计算法在二维的均匀圆阵、三维的均匀球面阵和三维均匀直线阵中,均能得到较好的DOA估计效果。相似文献

14.

利用相位时频掩蔽的麦克风阵列噪声消除方法

下载免费PDF全文

何礼周翊刘宏清《信号处理》2018,34(12):1490-1498

本文提出了一种在干扰声源和背景噪声存在条件下麦克风阵列噪声消除的方法。麦克风阵列通过波束形成增强由导向矢量所指定方向的目标声源来抑制背景噪声。然而,现有的波束形成算法在干扰声源存在的情况下,无法进行准确的导向矢量估计。为此,本文提出一种基于音频信号互相关功率谱相位的麦克风阵列噪声消除方法。首先通过音频信号的相位时频掩码估计导向矢量,并对其进行波束形成,从而有效抑制干扰声源和背景噪声;然后利用语音存在概率,采用最大似然的方法估计波束形成后信号中残留的干扰噪声功率谱密度,对其进行后处理,进一步抑制残留干扰和噪声。实验结果表明在干扰声源和背景噪声存在的条件下,所提方法有效地实现了麦克风阵列噪声消除,且各种性能指标优于基线方法。相似文献

15.

Speech frame recognition based on less shift sensitive wavelet filter banks

Hamid Reza Tohidypour Amin Banitalebi-Dehkordi 《Signal, Image and Video Processing》2016,10(4):633-637

The wavelet transform possesses multi-resolution property and high localization performance; hence, it can be optimized for speech recognition. In our previous work, we show that redundant wavelet filter bank parameters work better in speech recognition task, because they are much less shift sensitive than those of critically sampled discrete wavelet transform (DWT). In this paper, three types of wavelet representations are introduced, including features based on dual-tree complex wavelet transform (DT-CWT), perceptual dual-tree complex wavelet transform, and four-channel double-density discrete wavelet transform (FCDDDWT). Then, appropriate filter values for DT-CWT and FCDDDWT are proposed. The performances of the proposed wavelet representations are compared in a phoneme recognition task using special form of the time-delay neural networks. Performance evaluations confirm that dual-tree complex wavelet filter banks outperform conventional DWT in speech recognition systems. The proposed perceptual dual-tree complex wavelet filter bank results in up to approximately 9.82 % recognition rate increase, compared to the critically sampled two-channel wavelet filter bank. 相似文献

16.

Self-localizing dynamic microphone arrays

P. Aarabi 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》2002,32(4):474-484

This paper introduces a mechanism for localizing a microphone array when the location of sound sources in the environment is known. Using the proposed spatial observability function based microphone array integration technique, a maximum likelihood estimator for the correct position and orientation of the array is derived. This is used to localize and track a microphone array with a known and fixed geometrical structure, which can be viewed as the inverse sound localization problem. Simulations using a two-element dynamic microphone array illustrate the ability of the proposed technique to correctly localize and estimate the orientation of the array even in a very reverberant environment. Using 1 s male speech segments from three speakers in a 7 m by 6 m by 2.5 m simulated environment, a 30 cm inter-microphone distance, and PHAT histogram SLF generation, the average localization error was approximately 3 cm with an average orientation error of 19/spl deg/. The same simulation configuration but with 4 s speech segments results in an average localization error less than 1cm, with an average orientation error of approximately 2/spl deg/. Experimental examples illustrate localizations for both stationary and dynamic microphone pairs. 相似文献

17.

基于传声器阵列语音增强方法研究

董明方元《电声技术》2008,32(3):44-48

传声器阵列通过对拾取的多路语音信号进行分析与处理,能取得改进语音质量、消除背景噪声和提高语音可懂度等明显效果,现已成为语音信号增强的一个重要的研究领域。介绍了基于传声器阵列的自适应波束形成方法,该方法采用GSC结构基于TF-GSC的最优后置滤波算法。仿真实验结果表明,该自适应波束形成器对干扰有很好的消除作用,对阵元的增益误差、位置误差不敏感,可以取得较好的语音增强效果。相似文献

18.

面向语音通信与交互的麦克风阵列波束形成方法

下载免费PDF全文

潘超黄公平陈景东《信号处理》2020,36(6):804-815

临境语音通信与智能语音交互都面临复杂声学环境中的远距离高保真拾音难题，解决这一难题的有效途径是使用由多个麦克风传感器组成的麦克风阵列或多通道拾音系统，这种系统的核心是信号处理，通过对空间采样的声场信息进行时、空、频三域的联合处理来实现声源定向/定位、信号增强、噪声抑制、混响抑制、声源分离、声场参数估计等功能。麦克风阵列信号处理的方法有很多，其中研究的最多、使用的最广的方法是波束形成。本文对麦克风阵列波束形成的原理、进展以及当前常用的方法进行简要综述，内容涵盖延迟求和、超指向、差分、正交级数展开、Kronecker和自适应波束形成方法等。论文侧重于方法原理、机理和架构方面的探讨，具体的算法实现细节感兴趣的读者可以参考相应的文献。相似文献

19.

Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system

R. K. Aggarwal M. Dave 《Telecommunication Systems》2013,52(3):1457-1466

State-of-the-art automatic speech recognition (ASR) systems follow a well established statistical paradigm, that of parameterization of speech signals (a.k.a. feature extraction) at front-end and likelihood evaluation of feature vectors at back-end. For feature extraction, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are the two dominant signal processing methods, which have been used mainly in ASR. Although the effects of both techniques have been analyzed individually, it is not known whether any combination of the two can produce an improvement in the recognition accuracy or not. This paper presents an investigation on the possibility to integrate different types of features such as MFCC, PLP and gravity centroids to improve the performance of ASR in the context of Hindi language. Our experimental results show a significant improvement in case of such few combinations when applied to medium size lexicons in typical field conditions. 相似文献

20.

An Adaptive Non Reference Anchor Array Framework for Audio Retrieval in Teleconferencing Environment

Karan Nathwani Arpit Shukla Shubham Khunteta Rajesh M. Hegde 《Journal of Signal Processing Systems》2014,74(1):91-102

In this paper, an adaptive framework for audio retrieval in live teleconferencing environments with multiple participants is proposed. The framework uses a non reference anchor array (NRA) to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest (SOI). A linearly constrained-minimum variance (LC-MV) beamformer is used herein such that the signal coming from the look direction is preserved while interferences coming from the non look direction are nulled. Additionally, the reverberant component of the speech acquired by this framework is removed by a novel method that uses the linear prediction (LP) residual cepstrum. This method does not require the computation of the acoustic impulse response (AIR) of the teleconferencing room and hence is computationally efficient. The NRA framework is therefore able to remove correlated noise coming from the direction of the SOI and also dereverberating the noise free signal. The performance of the proposed framework is evaluated by conducting experiments on clean speech acquisition from distant microphone arrays. Experiments on distant speech recognition are also conducted using the TIMIT and MONC databases. Experimental results obtained from the proposed framework indicate a reasonable improvement over correlation, subspace and standard minimum variance beamforming methods. The application of the framework in audio retrieval in a live teleconferencing environment with multiple participants is also discussed. 相似文献