期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improvement in monaural speech separation using sparse non-negative tucker decomposition

Yash Vardhan Varshney Prashant Upadhyaya Zia Ahmad Abbasi Musiur Raza Abidi Omar Farooq 《International Journal of Speech Technology》2018,21(4):837-849

A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input. 相似文献

2.

基于语音信号稀疏性的FDICA初始化和后处理方法

马峰《数据采集与处理》2012,27(2):210-217

目前解决语音信号盲源分离(Blind source separation,BSS)的两大类方法分别为频域独立成分分析(Frequency domain independent component analysis,FDICA)和基于稀疏性的时频掩蔽(Time frequency masking,TF masking).为此将两类方法优点相结合,利用TF masking方法的结果,对FDICA做初始化,在加快FDICA收敛速度的同时也避免了次序不确定性问题.此外还提出了一种新的基于语音稀疏性FDICA的BSS后处理方法:基于局部最小比例控制(Local minimum ratio controlled,LMRC)谱减法,比常规的TF masking、维纳滤波等后处理方法,能够更有效地控制音乐噪声,提高分离性能.合成数据和实际采集数据的实验结果验证了所提方法的有效性. 相似文献

3.

Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures

《IEEE transactions on audio, speech, and language processing》2007,15(1):96-108

Looking at the speaker's face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures 相似文献

4.

Low bit-rate speech coding based on multicomponent AFM signal model

Mohan Bansal Pradip Sircar 《International Journal of Speech Technology》2018,21(4):783-795

In this paper, we propose a novel multicomponent amplitude and frequency modulated (AFM) signal model for parametric representation of speech phonemes. An efficient technique is developed for parameter estimation of the proposed model. The Fourier–Bessel series expansion is used to separate a multicomponent speech signal into a set of individual components. The discrete energy separation algorithm is used to extract the amplitude envelope (AE) and the instantaneous frequency (IF) of each component of the speech signal. Then, the parameter estimation of the proposed AFM signal model is carried out by analysing the AE and IF parts of the signal component. The developed model is found to be suitable for representation of an entire speech phoneme (voiced or unvoiced) irrespective of its time duration, and the model is shown to be applicable for low bit-rate speech coding. The symmetric Itakura–Saito and the root-mean-square log-spectral distance measures are used for comparison of the original and reconstructed speech signals. 相似文献

5.

语音分离与HMM相结合的语音增强方法

下载免费PDF全文

刘凤增李国辉唐敏《计算机工程与应用》2013,49(16):196-200

针对基于隐马尔科夫（HMM,Hidden Markov Model）的MAP和MMSE两种语音增强算法计算量大且前者不能处理非平稳噪声的问题,借鉴语音分离方法,提出了一种语音分离与HMM相结合的语音增强算法。该算法采用适合处理非平稳噪声的多状态多混合单元HMM,对带噪语音在语音模型和噪声模型下的混合状态进行解码,结合语音分离方法中的最大模型理论进行语音估计,避免了迭代过程和计算量特别大的公式计算,减少了计算复杂度。实验表明,该算法能够有效地去除平稳噪声和非平稳噪声,且感知评价指标PESQ 的得分有明显提高,算法时间也得到有效控制。相似文献

6.

On the Assumption of Spherical Symmetry and Sparseness for the Frequency-Domain Speech Model

Intae Lee Te-Won Lee 《IEEE transactions on audio, speech, and language processing》2007,15(5):1521-1528

A new independent component analysis (ICA) formulation called independent vector analysis (IVA) was proposed in order to solve the permutation problem in convolutive blind source separation (BSS). Instead of running ICA in each frequency bin separately and correcting the disorder with an additional algorithmic scheme afterwards, IVA exploited the dependency among the frequency components of a source and dealt with them as a multivariate source by modeling it with sparse and spherically, or radially, symmetric joint probability density functions (pdfs). In this paper, we compare the speech separation performances of IVA by using a group of l^p-norm-invariant sparse pdfs where the value of and the sparseness can be controlled. Also, we derive an IVA algorithm from a nonparametric perspective with the constraint of spherical symmetry and high dimensionality. Simulation results confirm the efficiency of assuming sparseness and spherical symmetry for the speech model in the frequency domain. 相似文献

7.

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

Soram Jun Minook Kim Myungwoo Oh Hyung-Min Park 《Neural computing & applications》2013,22(7-8):1321-1327

This paper describes an algorithm that enhances speech by independent vector analysis (IVA) using harmonic frequency dependency for robust speech recognition. While the conventional IVA exploits the full-band uniform dependencies of each source signal, a harmonic clique model is introduced to improve the enhancement performance by modeling strong dependencies among multiples of fundamental frequencies. An IVA-based learning algorithm is derived to consider the non-holonomic constraint and the minimal distortion principle to reduce the unavoidable distortion of IVA, and the minimum power distortionless response beamformer is used as a pre-processing step. In addition, the algorithm compares the log-spectral features of the enhanced speech and observed noisy speech to identify time–frequency segments corrupted by noise and restores those with the cluster-based missing feature reconstruction technique. Experimental results demonstrate that the proposed method enhances recognition performance significantly in noisy environments, especially with competing interference. 相似文献

8.

基于卷积编解码器和门控循环单元的语音分离算法

陈修凯陆志华周宇《计算机应用》2020,40(7):2137-2141

在大部分基于深度学习的语音分离和语音增强算法中,把傅里叶变换后的频谱特征作为神经网络的输入特征,并未考虑到语音信号中的相位信息。然而过去的一些研究表明,尤其是在低信噪比（SNR）条件下,相位信息对于提高语音质量是必不可少的。针对这个问题,提出了一种基于卷积编解码器网络和门控循环单元（CED-GRU）的语音分离算法。首先,利用原始波形既包含幅值信息也包含相位信息的特点,在输入端以混合语音信号的原始波形作为输入特征;其次,通过结合卷积编解码器（CED）网络和门控循环单元（GRU）网络,可以有效解决语音信号中存在的时序问题。提出的改进算法在男性和男性、男性和女性、女性和女性的语音质量的感知评价（PESQ）和短时目标可懂度（STOI）方面,与基于排列不变训练（PIT）算法、基于深度聚类（DC）算法、基于深度吸引网络（DAN）算法相比,分别提高了1.16和0.29、1.37和0.27、1.08和0.3;0.87和0.21、1.11和0.22、0.81和0.24;0.64和0.24、1.01和0.34、0.73和0.29个百分点。实验结果表明,基于CED-GRU的语音分离系统在实际应用中具有较大的价值。相似文献

9.

Single-Mixture Audio Source Separation by Subspace Decomposition of Hilbert Spectrum 总被引：1，自引：0，他引：1

Md. Khademul Islam Molla Keikichi Hirose 《IEEE transactions on audio, speech, and language processing》2007,15(3):893-900

A novel technique is developed to separate the audio sources from a single mixture. The method is based on decomposing the Hilbert spectrum (HS) of the mixed signal into independent source subspaces. Hilbert transform combined with empirical mode decomposition (EMD) constitutes HS, which is a fine-resolution time-frequency representation of a nonstationary signal. The EMD represents any time-domain signal as the sum of a finite set of oscillatory components called intrinsic mode functions (IMFs). After computing the spectral projections between the mixed signal and the individual IMF components, the projection vectors are used to derive a set of spectral independent bases by applying principal component analysis (PCA) and independent component analysis (ICA). A k-means clustering algorithm based on Kulback-Leibler divergence (KLd) is introduced to group the independent basis vectors into the number of component sources inside the mixture. The HS of the mixed signal is projected onto the space spanned by each group of basis vectors yielding the independent source subspaces. The time-domain source signals are reconstructed by applying the inverse transformation. Experimental results show that the proposed algorithm performs separation of speech and interfering sound from a single mixture 相似文献

10.

Two-microphone separation of speech mixtures. 总被引：1，自引：0，他引：1

Michael Syskind Pedersen DeLiang Wang Jan Larsen Ulrik Kjems 《Neural Networks, IEEE Transactions on》2008,19(3):475-492

Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications, these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary time - frequency (T-F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals. 相似文献

11.

Soft Mask Methods for Single-Channel Speaker Separation 总被引：1，自引：0，他引：1

Reddy A.M. Raj B. 《IEEE transactions on audio, speech, and language processing》2007,15(6):1766-1776

相似文献

12.

Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation

Jiucang Hao Attias H. Nagarajan S. Te-Won Lee Sejnowski T.J. 《IEEE transactions on audio, speech, and language processing》2009,17(1):24-37

This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the log-spectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the log-spectral domain GMM into the frequency domain using minimal Kullback-Leiber (KL)-divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the log-spectral domain Laplace method computes the MAP estimator for the log-spectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation-maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speech-shaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signal-to-noise ratio, lower word recognition error rate, and less spectral distortion. 相似文献

13.

基于子带分解的语音分离算法研究

潘赛虎孙琦马正华孙玉强《计算机应用与软件》2009,26(3)

提出一种有效解决不相互独立语音源信号混合的分离算法.利用子带分解方法,将混合信号分解成多个子带信号,在各个子带上分别进行语音分离得出语音分离信号,利用提出的相关性能指数,判断出相互独立的子带信号,把该子带的分离矩阵作为混合信号的解混合矩阵对混合信号进行分离.实验证明了本算法对相关语音源信号较好的分离效果. 相似文献

14.

基于子带谱熵的仿生小波语音增强

刘艳倪万顺《计算机应用》2015,35(3):868-871

前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。相似文献

15.

A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

16.

A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. 总被引：2，自引：0，他引：2

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

17.

Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

R. Senthamizh Selvi G. R. Suresh 《International Journal of Speech Technology》2016,19(1):19-31

Speech enhancement has received a significant amount of research attention over the past several decades. The enhancement of speech signal is needed so as to improve the degraded signal and the goal is to separate a single mixture into its underlying clean speech and interferer components. This is achieved by having prior knowledge through learning and generation of masks accordingly. Hybridization of the spectral filtering and optimization algorithm is employed for speech enhancement in this paper. The proposed technique uses MMSE (Minimum Mean Squared Error) and PSO (Particle Swarm Optimization) for effective enhancement. The proposed technique is three module technique consisting of pre-processing module, optimization module and spectral filtering module. Loizou’s database and Aurora dataset are used for evaluating the proposed technique using standard evaluation metrics consists of PESQ and SNR. Comparative analysis is also made by comparing with other existing techniques such as MMSE and BNMF. Highest PESQ for proposed technique is 2.75 and highest SNR came about 32.97. The technique gave average PESQ of 2.18 and average SNR of 20.53 which was higher than the average values for other techniques. Hence, we can observe that proposed technique yielded better evaluation metrics than the existing methods. 相似文献

18.

Adaptive maximum windowed likelihood multicomponent AM-FM signal decomposition

Gazor S. Far R.R. 《IEEE transactions on audio, speech, and language processing》2006,14(2):479-491

Considering a real signal as the sum of a number of sinusoidal signals in the presence of additive noise, maximum windowed likelihood (MWL) criterion is introduced and applied to construct an adaptive algorithm in order to estimate the amplitude and frequency of these components. The amplitudes, phases and frequencies are assumed to be slowly time varying. Employing MWL an adaptive algorithm is obtained in two steps. First, assuming some initial values for the frequency of each component, a closed form is derived to estimate the amplitudes. Then, the gradient of MWL is used to adaptively track the frequencies, using the latter values of amplitudes. The proposed algorithm has a parallel structure in which each branch estimates parameters of one of the components. The proposed multicomponent phase locked loop (MPLL) algorithm is implemented employing low complexity blocks. It is adjustable to be used in different conditions. The mean squared error of the algorithm is studied to analyze the effect of the window length and type and the step size. Simulations have been conducted to illustrate the efficiency and the performance of the algorithm in different conditions including: the effect of the initialization, the frequency resolution, for chirp components, for components during frequency crossover and for speech signals. Simulations illustrate that the method efficiently tracks slowly time-varying components of the signals such as voiced speech segments. 相似文献

19.

Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition

Anirban Bhowmick Mahesh Chandra Astik Biswas 《International Journal of Speech Technology》2017,20(4):813-827

In recent past, wavelet packet (WP) based speech enhancement techniques have been gaining popularity due to their inherent nature of noise minimization. WP based techniques appeared as more robust and efficient than short-time Fourier transform based methods. In the present work, a speech enhancement method using Teager energy operated equal rectangular bandwidth (ERB)-like WP decomposition has been proposed. Twenty four sub-band perceptual wavelet packet decomposition (PWPD) structure is implemented according to the auditory ERB scale. ERB scale based decomposition structure is used because the central frequency of the ERB scale distribution is similar to the frequency response of the human cochlea. Teager energy operator is applied to estimate the threshold value for the PWPD coefficients. Lastly, Wiener filtering is applied to remove the low frequency noise before final reconstruction stage. The proposed method has been applied to evaluate the Hindi sentences database, corrupted with six noise conditions. The proposed method’s performance is analysed with respect to several speech quality parameters and output signal to noise ratio levels. Performance indicates that the proposed technique outperforms some traditional speech enhancement algorithms at all SNR levels. 相似文献

20.

稀疏低秩模型及相位谱补偿的语音增强算法

下载免费PDF全文

王虎李晶赵恒淼臧燕李春堂《计算机工程与应用》2018,54(5):150-155

针对现有的语音增强算法存在增强效果差、语音信号失真等问题,提出了稀疏低秩模型及改进型相位谱补偿的语音增强算法。首先,用稀疏低秩模型处理含噪语音的幅度谱,得到分离后的语音。接着,用归一化最小均方自适应滤波算法优化相位谱补偿算法的补偿因子。然后,对稀疏低秩分离后的语音进行改进型相位谱补偿处理,得到最终增强的语音。最后,对增强后的语音进行感知语音质量评价分析及频谱分析。实验结果表明,该方法能够有效地去除噪声,并且在低信噪比的情况下,可以保持语音的清晰度。相似文献