首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 140 毫秒
1.
非平稳噪声和低信噪比条件下提高增强语音质量一直以来都是语音增强研究的难题。近年来,卷积非负矩阵分解在语音增强算法中成功应用,本文进一步考虑语音信号在时频域的稀疏性,提出了稀疏卷积非负矩阵分解(Sparse Convolutive Nonnegative Matrix Factorization, SCNMF)的语音增强算法。该算法包括训练和增强两个阶段。训练阶段通过SCNMF算法分别对纯净语音和噪声的频谱进行训练,得到纯净语音和噪声字典,并将其作为增强阶段的先验信息。增强阶段首先通过SCNMF算法对带噪语音的频谱进行分解,然后利用纯净语音和噪声联合字典对语音编码矩阵进行估计,重构增强语音。本文通过实验仿真分析了稀疏因子对增强语音质量的影响。实验结果表明,在非平稳噪声和低信噪比条件下,本文算法增强效果均优于多带谱减、非负矩阵分解、卷积非负矩阵分解等传统的算法。  相似文献   

2.
低信噪比非稳态噪声环境中的语音增强仍是一个开放且具有挑战性的任务. 为了提高传统的基于非负矩阵分解(nonnegative matrix factorization, NMF)的语音增强算法性能, 同时考虑到语音信号的时频稀疏特性和非稳态噪声信号的低秩特性, 本文提出了一种基于多重约束的非负矩阵分解语音增强算法(multi-constraint nonnegative matrix factorization speech enhancement, MC–NMFSE). 在训练阶段, 采用干净语音训练数据集和噪声训练数据集分别构建语音字典和噪声字典. 在语音增强阶段, 在非负矩阵分解目标函数中增加语音分量的稀疏性约束和噪声信号的低秩性约束条件, MC–NMFSE能够更好地从带噪语音中获得语音分量的表示, 从而提高语音增强效果. 通过实验表明, 在大量不同非平稳噪声条件和不同信噪比条件下, 与传统的基于NMF的语音增强方法相比, MC–NMFSE能获得较低的语音失真和更好的非稳态噪声抑制能力.  相似文献   

3.
针对非负矩阵分解稀疏性不够,通过引入平滑矩阵调节字典矩阵和系数矩阵的稀疏性,提出基于非平滑非负矩阵分解语音增强算法。算法通过语音和噪声的先验字典学习构造联合字典矩阵;然后通过非平滑非负矩阵分解更新带噪语音在联合字典矩阵下的投影系数实现语音增强;同时通过滑动窗口法实时更新先验噪声字典。仿真结果表明,该算法相对非负矩阵分解语音增强算法和MMSE算法具有更好的抑制噪声能力。  相似文献   

4.
提出一种基于交替方向乘子法的(Alternating Direction Method of Multipliers,ADMM)稀疏非负矩阵分解语音增强算法,该算法既能克服经典非负矩阵分解(Nonnegative Matrix Factorization,NMF)语音增强算法存在收敛速度慢、易陷入局部最优等问题,也能发挥ADMM分解矩阵具有的强稀疏性。算法分为训练和增强两个阶段:训练时,采用基于ADMM非负矩阵分解算法对噪声频谱进行训练,提取噪声字典,保存其作为增强阶段的先验信息;增强时,通过稀疏非负矩阵分解算法,从带噪语音频谱中对语音字典和语音编码进行估计,重构原始干净的语音,实现语音增强。实验表明,该算法速度更快,增强后语音的失真更小,尤其在瞬时噪声环境下效果显著。  相似文献   

5.
为了在语音转换过程中充分考虑语音的帧间相关性,提出了一种基于卷积非负矩阵分解的语音转换方法.卷积非负矩阵分解得到的时频基可较好地保存语音信号中的个人特征信息及帧间相关性.利用这一特性,在训练阶段,通过卷积非负矩阵分解从训练数据中提取源说话人和目标说话人相匹配的时频基.在转换阶段,通过时频基替换实现对源说话人语音的转换.相对于传统方法,本方法能够更好地保存和转换语音帧间相关性.实验仿真及主、客观评价结果表明,与基于高斯混合模型、状态空间模型的语音转换方法相比,该方法具有更好的转换语音质量和转换相似度.  相似文献   

6.
对于非负矩阵分解的语音增强算法在不同环境噪声的鲁棒性问题,提出一种稀疏正则非负矩阵分解(SRNMF)的语音增强算法。该算法不仅考虑到数据处理时的噪声影响,而且对系数矩阵进行了稀疏约束,使其分解出的数据具有较好的语音特征。该算法首先在对语音和噪声的幅度谱先验字典矩阵学习的基础上,构建联合字典矩阵,然后更新带噪语音幅度谱在联合字典矩阵下的系数矩阵,最后重构原始纯净语音,实现语音增强。实验结果表明,在非平稳噪声和低信噪比(小于0 dB)条件下,该算法较好地削弱了噪声的变化对算法性能的影响,不仅有较高的信源失真率(SDR),提高了1~1.5个数量级,而且运算速度也有一定程度的提高,使得基于非负矩阵分解的语音增强算法更实用。  相似文献   

7.
卷积混叠环境下的盲源分离(Blind source separation, BSS)是一个极具挑战性和实际意义的问题.本文在独立分量分析框架下,建立非负矩阵分解(Nonnegative matrix factorization, NMF)模型,设计新的优化目标函数,通过严格的数学理论推导,得到新的模型参数更新规则;并对解混叠矩阵进行标准化处理,避免幅度歧义性问题;在源信号的重构阶段,通过实时更新非负矩阵分解模型参数,避免源信号的排序歧义性问题.实验结果验证了所提算法在分离中英文语音混叠信号、音乐混叠信号时的有效性和优越性.  相似文献   

8.
低信噪比非稳态噪声环境中的语音增强仍是一个开放且具有挑战性的任务.为了提高传统的基于非负矩阵分解(nonnegative matrix factorization,NMF)的语音增强算法性能,同时考虑到语音信号的时频稀疏特性和非稳态噪声信号的低秩特性,本文提出了一种基于多重约束的非负矩阵分解语音增强算法(multi-constraint nonnegative matrix factorization speech enhancement,MC–NMFSE).在训练阶段,采用干净语音训练数据集和噪声训练数据集分别构建语音字典和噪声字典.在语音增强阶段,在非负矩阵分解目标函数中增加语音分量的稀疏性约束和噪声信号的低秩性约束条件,MC–NMFSE能够更好地从带噪语音中获得语音分量的表示,从而提高语音增强效果.通过实验表明,在大量不同非平稳噪声条件和不同信噪比条件下,与传统的基于NMF的语音增强方法相比,MC–NMFSE能获得较低的语音失真和更好的非稳态噪声抑制能力.  相似文献   

9.
基于约束NMF的欠定盲信号分离算法*   总被引:2,自引:2,他引:0  
提出一种约束非负矩阵分解方法用于解决欠定盲信号分离问题。非负矩阵分解直接用于求解欠定盲信号分离时,分解结果不唯一,无法正确分离源信号。本文在基本非负矩阵分解算法基础上,对分解得到的混合矩阵施加行列式约束,保证分解结果的唯一性;对分解得到的源信号同时施加稀疏性约束和最小相关约束,实现混合信号的唯一分解,提高源信号分离性能。仿真实验证明了本文算法的有效性。  相似文献   

10.
对现有增量型非负矩阵分解算法存在的一些缺陷进行改进,给出了一个基于误差判断的增量算法有效性准则.在此基础上,利用增加样本前的非负矩阵分解结果进行增量分解初始化,提出了一种新的动态非负矩阵分解算法.在多个数据集上的实验结果表明该算法可以实现对基矩阵和编码矩阵的即时更新,且具有较低的计算复杂度,在处理动态数据集时,还可有效识别噪声点,是一个有效的动态分解算法.  相似文献   

11.
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura–Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).   相似文献   

12.
In this paper, we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the nonnegative matrix factorization algorithm. Due to the nonnegativity constraint this type of coding is very well suited for intuitively and efficiently representing magnitude spectra. We present results that reveal the nature of these basis functions and we introduce their utility in separating monophonic mixtures of known speakers  相似文献   

13.
针对独立矢量分析(IVA)算法初始分离矩阵取值对分离性能影响较大的局限性,提出了基于回溯搜索优化的卷积混合语音盲分离算法。采用频域各频率点IVA分离信号的复数峭度和作为目标函数,利用回溯搜索优化算法(BSA)对初始分离矩阵进行优化调整,更好地实现了语音信号的盲分离。在分离过程中,采用复Givens旋转变换原理将对分离矩阵的求解转化为对旋转角度的求解,有效减少了BSA的参数编码维数,降低了优化求解难度。针对语音信号的卷积混合分离实验表明,该算法具有良好的分离效果,其分离性能较之基本IVA算法显著提升。  相似文献   

14.
Looking at the speaker's face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures  相似文献   

15.
A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input.  相似文献   

16.
This study investigated whether the signal-to-noise ratio (SNR) of the interlocutor (speech partner) influences a speaker's vocal intensity in conversational speech. Twenty participants took part in artificial conversations with controlled levels of interlocutor speech and background noise. Three different levels of background noise were presented over headphones and the participant engaged in a “live interaction” with the experimenter. The experimenter's vocal intensity was manipulated in order to modify the SNR. The participants’ vocal intensity was measured. As observed previously, vocal intensity increased as background noise level increased. However, the SNR of the interlocutor did not have a significant effect on participants’ vocal intensity. These results suggest that increasing the signal level of the other party at the earpiece would not reduce the tendency of telephone users to talk loudly  相似文献   

17.
In this paper, we present a new method, called large margin based nonnegative matrix factorization (LMNMF), to encode latent discriminant information in training data. LMNMF seeks a nonnegative subspace such that k nearest neighbors of each sample always belong to same class and samples from different classes are separated by a large margin. In the subspace, the local separation structure of data is explicit. The large-margin criterion leads to a new objective function, and a convergency provable multiplicative nonnegative updating rule is derived to learn the basis matrix and encoding vectors. Then, partial least squares regression (PLSR) learns the mapping from the original data to low dimensional representations in order to capture local separation information. PLSR offers a unified solution to out-of-sample extension problem. Extensive experimental results demonstrate LMNMF with PLSR leads significant improvements on classification than several other commonly used NMF-based algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号