首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

2.
In this paper, an intelligent speaker identification system is presented for speaker identification by using speech/voice signal. This study includes both combination of the adaptive feature extraction and classification by using optimum wavelet entropy parameter values. These optimum wavelet entropy values are obtained from measured Turkish speech/voice signal waveforms using speech experimental set. It is developed a genetic wavelet adaptive network based on fuzzy inference system (GWANFIS) model in this study. This model consists of three layers which are genetic algorithm, wavelet and adaptive network based on fuzzy inference system (ANFIS). The genetic algorithm layer is used for selecting of the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the eight different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet decomposition, wavelet decomposition – short time Fourier transform, wavelet decomposition – Born–Jordan time–frequency representation, wavelet decomposition – Choi–Williams time–frequency representation, wavelet decomposition – Margenau–Hill time–frequency representation, wavelet decomposition – Wigner–Ville time–frequency representation, wavelet decomposition – Page time–frequency representation, wavelet decomposition – Zhao–Atlas–Marks time–frequency representation. The wavelet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet decomposition and wavelet entropies. The ANFIS approach is used for evaluating to fitness function of the genetic algorithm and for classification speakers. It has been evaluated the performance of the developed system by using noisy Turkish speech/voice signals. The test results showed that this system is effective in detecting real speech signals. The correct classification rate is about 91% for speaker classification.  相似文献   

3.
Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification.  相似文献   

4.
This paper presents a novel method for the enhancement of independent components of mixed speech signal segregated by the frequency domain independent component analysis (FDICA) algorithm. The enhancement algorithm proposed here is based on maximum a posteriori (MAP) estimation of the speech spectral components using generalized Gaussian distribution (GGD) function as the statistical model for the time–frequency series of speech (TFSS) signal. The proposed MAP estimator has been used and evaluated as the post-processing stage for the separation of convolutive mixture of speech signals by the fixed-point FDICA algorithm. It has been found that the combination of separation algorithm with the proposed enhancement algorithm provides better separation performance under both the reverberant and non-reverberant conditions.  相似文献   

5.
Complex AM and FM signal models can be used for parametric modeling of speech signals. Complex AM signal model has been found to be suitable for voiced speech phonemes, whereas complex FM signal model can be used for representation of unvoiced speech phonemes. This article explains the basic principles of parameter estimation of these two models, and presents techniques for fast on-line processing of speech data and automated model fitting.  相似文献   

6.
7.
This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the log-spectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the log-spectral domain GMM into the frequency domain using minimal Kullback-Leiber (KL)-divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the log-spectral domain Laplace method computes the MAP estimator for the log-spectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation-maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speech-shaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signal-to-noise ratio, lower word recognition error rate, and less spectral distortion.  相似文献   

8.
Synchroextracting transform (SET) is a recently developed time-frequency analysis (TFA) method aiming to achieve a highly concentrated TF representation. However, SET suffers from two drawbacks. The one is that SET is based upon the assumption of constant amplitude and linear frequency modulation signals, therefore it is unsatisfactory for strongly amplitude-modulated and frequency-modulated (AM-FM) signals. The other is that SET does not allow for perfect signal reconstruction, which leads to large reconstruction errors when addressing fast-varying signals. To tackle these problems, in this paper, we first present some theoretical analysis for the SET method, including the existence of the fixed squeeze frequency, the performances of the instantaneous frequency (IF) estimator and the SET reconstruction. Then, a new TFA method, named synchroextracting chirplet transform (SECT), is proposed, which sharpens the TF representation by extracting the TF points satisfying IF equation, and retains an excellent signal reconstruction ability. Numerical experiments on simulated and real signals demonstrate the effectiveness of the SECT method.  相似文献   

9.
基于瞬时频率估计和特征映射的汉语耳语音话者识别   总被引:1,自引:0,他引:1  
耳语音是有别于正常音的一种微弱语音信号,在正常音训练的说话人识别系统中,用耳语音进行识别时会造成系统性能的急速下降。本文在基于语音产生的调幅-调频(AM-FM)模型基础上,采用多带解调分析(Multi-band demodulation analysis,MDA)和能量分离算法(Energy separation algorithm,ESA)计算语音信号的瞬时频率,作为语音的一种特征。随后在基于耳语音和正常音来自不同信道的假设下,对语音的参数做特征映射后再进行训练和识别,以减少信道对系统的影响。实验表明,和传统的MFCC参数相比,加入特征映射后系统的识别率得到提高,且IFE的识别率和稳健性都优于MFCC。  相似文献   

10.
A new approach to representing a time-limited, and essentially bandpass signal x(t) , by a set of discrete frequency values is proposed. The set of discrete frequency values is the set of locations along the frequency axis at which (real and/or imaginary parts of) the Fourier transform of the signal x(t) cross certain levels (especially zero level). Analogously, invoking time-frequency duality, a set of time instants denoting the zero/level crossings of a waveform x(t) can be used to represent a bandlimited spectrum X(f) . The proposed signal representation is based on a simple bandpass signal model that exploits our prior knowledge of the bandwidth/timewidth of the signal. We call it a Sum-of-Sincs (SOS) model, where Sinc stands for the familiar sinx/x function. Given the discrete fequency/time locations, we can accurately reconstruct the signal x(t) or the spectrum X(f) by solving a simple eigenvalue or a least squares problem. Using this approach as the basis, we propose an analysis/synthesis algorithm to decompose and represent complex multicomponent signals like speech over the entire time-frequency region. The proposed signal representation is an alternative to standard analog to discrete conversion based on the sampling theorem, and in principle, possesses some of the desirable attributes of signal representation in natural sensory systems.  相似文献   

11.
A method utilizing single channel recordings to blindly separate the multicomponents overlapped in time and frequency domains is proposed in this paper. Based on the time varying AR model, the instantaneous frequency and amplitude of each signal component are estimated respectively, thus the signal component separation is achieved. By using prolate spheroidal sequence as basis functions to expand the time varying parameters of the AR model, the method turns the problem of linear time varying parameters estimation to a linear time invariant parameter estimation problem, then the parameters are estimated by a recursive algorithm. The computation of this method is simple, and no prior knowledge of the signals is needed. Simulation results demonstrate validity and excellent performance of this method.  相似文献   

12.
为了提高语音识别率,提出了一种改进的MFCC参数提取方法。该方法应用小波包变换高分辨率的特点和语音高频加权的功能,在传统MFCC参数的基础上提取了一种新特征参数。新参数能对语音信号频率进行更加精细的划分,能够更稳定地减小频谱失真,且在一定程度上降低了信号的噪声。最后采用高斯混合模型(GMM)进行说话人语音识别,实验表明新特征参数取得了较好的识别率。  相似文献   

13.
摘要:语音信号的频谱结构复杂性决定了其短时谱分布不能用单一的概率密度函数(probability density function; PDF )准确描述,据此,本文提出了一种采用超高斯混合模型对语音信号幅度谱建模以实现语音增强的新方法。首先,采用超高斯混合模型对语音信号幅度谱的先验分布进行建模,相对于传统的单一模型,该模型能更好地描述语音信号的多类特性;然后,在增强过程中自适应更新混合分量的PDF及其权重,从而克服了传统模型难以跟踪语音信号分布动态变化的缺点。仿真结果表明与传统的短时谱估计算法相比,该算法的噪声抑制性能有较大的提升,增强语音的主观感知质量也有明显改善。  相似文献   

14.
This paper presents, on the basis of a rigorous mathematical formulation, a multicomponent sinusoidal model that allows an asymptotically exact reconstruction of nonstationary speech signals, regardless of their duration and without any limitation in the modeling of voiced, unvoiced, and transitional segments. The proposed approach is based on the application of the Hilbert transform to obtain an amplitude signal from which an AM component is extracted by filtering, so that the residue can then be iteratively processed in the same way. This technique permits a multicomponent AM-FM model to be derived in which the number of components (iterations) may be arbitrarily chosen. Additionally, the instantaneous frequencies of these components can be calculated with a given accuracy by segmentation of the phase signals. The validity of the proposed approach has been proven by some applications to both synthetic signals and natural speech. Several comparisons show how this approach almost always has a higher performance than that obtained by current best practices, and does not need the complex filter optimizations required by other techniques  相似文献   

15.
Considering a real signal as the sum of a number of sinusoidal signals in the presence of additive noise, maximum windowed likelihood (MWL) criterion is introduced and applied to construct an adaptive algorithm in order to estimate the amplitude and frequency of these components. The amplitudes, phases and frequencies are assumed to be slowly time varying. Employing MWL an adaptive algorithm is obtained in two steps. First, assuming some initial values for the frequency of each component, a closed form is derived to estimate the amplitudes. Then, the gradient of MWL is used to adaptively track the frequencies, using the latter values of amplitudes. The proposed algorithm has a parallel structure in which each branch estimates parameters of one of the components. The proposed multicomponent phase locked loop (MPLL) algorithm is implemented employing low complexity blocks. It is adjustable to be used in different conditions. The mean squared error of the algorithm is studied to analyze the effect of the window length and type and the step size. Simulations have been conducted to illustrate the efficiency and the performance of the algorithm in different conditions including: the effect of the initialization, the frequency resolution, for chirp components, for components during frequency crossover and for speech signals. Simulations illustrate that the method efficiently tracks slowly time-varying components of the signals such as voiced speech segments.  相似文献   

16.
基于小波变换和Kalman滤波的语音增强方法   总被引:1,自引:0,他引:1  
针对受加性噪声干扰的语音信号,采用基于小波变换的Kalman滤波方法,提出一种有效的语音增强方法.分析在实际处理中所遇到的二进小波变换、滤波参数估计、Kalman滤波发散等问题.语音增强的效果采用信噪比来进行评估.仿真实验表明在加性噪声为高斯白噪声和色噪的情况下,该方法均具有较好的有效性.  相似文献   

17.
Since Hermite–Gaussian (HG) functions provide an orthonormal basis with the most compact time–frequency supports (TFSs), they are ideally suited for time–frequency component analysis of finite energy signals. For a signal component whose TFS tightly fits into a circular region around the origin, HG function expansion provides optimal representation by using the fewest number of basis functions. However, for signal components whose TFS has a non-circular shape away from the origin, straight forward expansions require excessively large number of HGs resulting to noise fitting. Furthermore, for closely spaced signal components with non-circular TFSs, direct application of HG expansion cannot provide reliable estimates to the individual signal components. To alleviate these problems, by using expectation maximization (EM) iterations, we propose a fully automated pre-processing technique which identifies and transforms TFSs of individual signal components to circular regions centered around the origin so that reliable signal estimates for the signal components can be obtained. The HG expansion order for each signal component is determined by using a robust estimation technique. Then, the estimated components are post-processed to transform their TFSs back to their original positions. The proposed technique can be used to analyze signals with overlapping components as long as the overlapped supports of the components have an area smaller than the effective support of a Gaussian atom which has the smallest time-bandwidth product. It is shown that if the area of the overlap region is larger than this threshold, the components cannot be uniquely identified. Obtained results on the synthetic and real signals demonstrate the effectiveness for the proposed time–frequency analysis technique under severe noise cases.  相似文献   

18.
We discuss the use of low-dimensional physical models of the voice source for speech coding and processing applications. A class of waveform-adaptive dynamic glottal models and parameter identification procedures are illustrated. The model and the identification procedures are assessed by addressing signal transformations on recorded speech, achievable by fitting the model to the data, and then acting on the physically oriented parameters of the voice source. The class of models proposed provides in principle a tool for both the estimation of glottal source signals, and the encoding of the speech signal for transformation purposes. The application of this model to time stretching and to fundamental frequency control (pitch shifting) is also illustrated. The experiments show that copy synthesis is perceptually very similar to the target, and that time stretching and “pitch extrapolation” effects can be obtained by simple control strategies.  相似文献   

19.
三相PWM波形发生器SA866AE在交-交变频器中的应用   总被引:1,自引:0,他引:1  
介绍了三相PWM专用芯片SA866AE的工作原理 ,并用该芯片研制出在交 -交变频器中应用的三相低频正弦波信号发生器装置。实验结果表明 ,以SA866AE为控制芯片的信号发生器装置结构简单、幅值和相位对称度好、参数调整方便、精确、性能可靠  相似文献   

20.
Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech coding, speech recognition and speaker recognition. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. In this paper, we propose a new speech enhancement technique, which integrates a new proposed wavelet transform which we call stationary bionic wavelet transform (SBWT) and the maximum a posterior estimator of magnitude-squared spectrum (MSS-MAP). The SBWT is introduced in order to solve the problem of the perfect reconstruction associated with the bionic wavelet transform. The MSS-MAP estimation was used for estimation of speech in the SBWT domain. The experiments were conducted for various noise types and different speech signals. The results of the proposed technique were compared with those of other popular methods such as Wiener filtering and MSS-MAP estimation in frequency domain. To test the performance of the proposed speech enhancement system, four objective quality measurement tests [signal to noise ratio (SNR), segmental SNR, Itakura–Saito distance and perceptual evaluation of speech quality] were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of the proposed speech enhancement technique. It provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号