首页 | 本学科首页   官方微博 | 高级检索  
 共查询到16条相似文献,搜索用时 0 毫秒
针对现有基于字典学习的增强算法依赖先验信息的问题,基于矩阵的稀疏低秩分解提出一种无监督的单通道语音增强算法。该算法首先通过稀疏低秩分解将带噪语音的幅度谱分解为低秩、稀疏和噪声三部分,然后通过对低秩部分进行自学习构建出噪声字典,最后利用所得噪声字典和乘性迭代准则于低秩和稀疏部分中分离出纯净语音。相较于其他基于字典学习的语音增强算法,本文所提算法无需语音或噪声的先验信息,因而更加方便和实用。实验结果显示,本文算法能够在保留语音谐波结构的同时有效抑制噪声,增强效果明显优于鲁棒主成分分析和多带谱减法。  相似文献   

In this paper, we propose a new adaptation mode controller (AMC) for a generalized sidelobe canceller (GSC)-based speech enhancement system. Here, a likelihood ratio for target speech presence was first estimated and then utilized to estimate both the local target speech presence probability (SPP) and global SPP. Next, the estimated SPPs were applied to the design of an AMC that controlled the parameters of adaptive filters for an adaptive blocking matrix (ABM) and noise canceller (NC). In particular, the combination of local and global SPPs was applied to the AMC in the ABM, whereas only global SPPs were used for the NC. Finally, a multiple-microphone speech enhancement system was constructed on the basis of a GSC having the proposed AMC. The performance of the speech enhancement system was subsequently evaluated in terms of the perceptual evaluation of speech quality (PESQ) and the cepstral distortion (CD) for car noise conditions. It was shown from this evaluation that a speech enhancement system using the proposed AMC method provided better performance than conventional AMC methods using power ratios between the target and non-target directional signals, the inter-channel normalized cross-correlation, and the local SPPs only.  相似文献   

当广义旁瓣抵消器(Generalized sidelobe canceller,GSC)结构的语音增强算法对语音信号的入射方向角估计不准确时,阻塞矩阵(Blocking matrix,BM)不能完全阻塞目标语音,使得部分语音通过阻塞矩阵,在后期多输入抵消器(Multiple-input canceller,MC)模块中和参考信号相抵消,造成目标语音的损失。针对广义旁瓣抵消器因信号到达方向(Direction of arrival,DOA)估计误差而导致语音泄漏的问题,本文提出了一种麦克风阵列语音增强的优化算法,先对经过时延补偿的信号进行频谱调整,再利用MC模块输出与BM模块输出存在相关性的特点,对阻塞矩阵进行自适应调整,使方向估计参数更趋近于真实目标语音方向,以减少阻塞矩阵中目标语音的泄漏。仿真结果表明,该算法 可以有效减少阻塞矩阵中目标语音的泄漏、增强系统的鲁棒性以及提高语音增强效果。  相似文献   

This paper presents a novel noise-robust graph-based semi-supervised learning algorithm to deal with the challenging problem of semi-supervised learning with noisy initial labels. Inspired by the successful use of sparse coding for noise reduction, we choose to give new L1-norm formulation of Laplacian regularization for graph-based semi-supervised learning. Since our L1-norm Laplacian regularization is explicitly defined over the eigenvectors of the normalized Laplacian matrix, we formulate graph-based semi-supervised learning as an L1-norm linear reconstruction problem which can be efficiently solved by sparse coding. Furthermore, by working with only a small subset of eigenvectors, we develop a fast sparse coding algorithm for our L1-norm semi-supervised learning. Finally, we evaluate the proposed algorithm in noise-robust image classification. The experimental results on several benchmark datasets demonstrate the promising performance of the proposed algorithm.  相似文献   

This paper proposes a method for enhancing speech and/or audio quality under noisy conditions. The proposed method first estimates the local signal-to-noise ratio (SNR) of the noisy input signal via sparse non-negative matrix factorization (SNMF). Next, a sparse binary mask (SBM) is proposed that separates the audio signal from the noise by measuring the sparsity of the pool of local SNRs from the adjacent frequency bands of the current and several previous frames. However, some spectral gaps remain across frequency bands after applying the binary masks, which distorts the separated audio signal due to spectral discontinuity. Thus, a spectral imputation technique is used to fill the empty spectrum of the frequency band where it is removed by the SBM. Spectral imputation is conducted by online learning NMF with the spectra of the neighboring non-overlapped frequency bands and their local sparsity. The effectiveness of the proposed enhancement method is demonstrated on two different tasks use speech and musical content, respectively. Consequently, objective measurements and subjective listening tests show that the proposed method outperforms conventional speech and audio enhancement methods, such as SNMF-based alternatives and deep recurrent neural networks for speech enhancement, block thresholding, and a commercially available software tool for audio enhancement.  相似文献   

In this paper, we present a novel approach to voice activity detection (VAD) based on the sparse representation of an input noisy speech over a learned dictionary. First, we investigate the relationship between the signal detection and the sparse representation based on the Bayesian framework. Second, we derive the decision rule and an adaptive threshold based on a likelihood ratio test, by modeling the non-zero elements in the sparse representation as a Gaussian distribution. The experimental results show that the proposed approach outperforms the current statistical model-based methods, such as Gaussian, Laplacian, and Gamma, under white, babble, and vehicle noise conditions.  相似文献   

非平稳噪声环境下基于谐波能量的语音检测   总被引:1,自引:0,他引:1  
语音端点检测的鲁棒性,对于构建实际语音识别系统具有重要的意义.谐波成分是语音信号的一个基本特点,为此提出了一种基于谐波成分能量的端点检测算法.通过sobeI算子计算窄带语谱图的方向场,通过Gabor滤波增强谐波区域,通过门限方法得到二值化图,去除方向大于45度和依赖度低的点,得到连续的水平方向的带状分布,即谐波分布区域,求取谐波分布区域内的能量,以此作为门限判决的特征.实验结果表明,在不同信噪比、多种非平稳噪声环境下都能够达到较好的语音检出效果.其优点为,不需要噪声的先验知识,充分利用了语音在频率域和时间域的相关性,适应于各种非平稳复杂噪声.  相似文献   

由于建筑物结构健康问题大部分是累积性损害,很难被检测到,实际结构和环境噪声的复杂性使得结构健康监测更加困难,并且现有方法在训练模型时需要大量的数据,但是实际中对于数据的标记是很复杂的。为克服该问题,通过配备无线传感器网络,并采用稀疏编码实现桥梁结构健康监测,然后通过大量未标记实例在实现特征提取基础上进行稀疏编码算法训练,实现数据维度压缩和无标记数据预处理。其次,利用深度学习算法实现桥梁结构健康监测类别预测,同时基于线性共轭梯度对Hessian优化进行改进,利用半正定高斯-牛顿曲率矩阵替换不确定Hessian矩阵,进行二次目标组合,以实现深度学习算法效率提升;实验结果表明,所提深度学习桥梁结构安全检测算法实现了环境噪声稀疏编码水平下的高精度结构健康监测。  相似文献   

在许多语音信号处理的实际应用中,都要求系统能够低延迟地实时处理多个任务,并且对噪声要有很强的鲁棒性。针对上述问题,提出了一种语音增强和语音活动检测(Voice Activity Detection,VAD)的多任务深度学习模型。该模型通过引入长短时记忆(Long Short-Term Memory,LSTM)网络,构建了一个适合于实时在线处理的因果系统。基于语音增强和VAD的强相关性,该模型以硬参数共享的方式连接了两个任务的输出层,不仅减少了计算量,还通过多任务学习提高了任务的泛化能力。实验结果表明,相较串行处理两个任务的基线模型,多任务模型在语音增强结果非常相近、VAD结果更优的情况下,其速度快了44.2%,这对于深度学习模型的实际应用和部署将具有重要的意义。  相似文献   

程塨  郭雷  贺胜  赵天云 《计算机科学》2010,37(11):212-213
针对非平稳噪声环境和低信噪比下的语音增强,提出了一种基于实时噪声估计的改进谱减法。该方法首先利用临界带特征矢量距离进行端点检测,然后利用低频区和高频区带噪语音特性定义一个时变的调节系数,该系数结合端点检测可以实时地对噪声的估计值进行更新,从而达到快速跟踪外界环境变化的目的。仿真结果表明,该方法在抑制背景噪声、提高信噪比、减少语音失真等方面优于传统的语音增强方法。  相似文献   

The paper presents a supervised discriminative dictionary learning algorithm specially designed for classifying HEp-2 cell patterns. The proposed algorithm is an extension of the popular K-SVD algorithm: at the training phase, it takes into account the discriminative power of the dictionary atoms and reduces their intra-class reconstruction error during each update. Meanwhile, their inter-class reconstruction effect is also considered. Compared to the existing extension of K-SVD, the proposed algorithm is more robust to parameters and has better discriminative power for classifying HEp-2 cell patterns. Quantitative evaluation shows that the proposed algorithm outperforms general object classification algorithms significantly on standard HEp-2 cell patterns classifying benchmark1 and also achieves competitive performance on standard natural image classification benchmark.  相似文献   

提出一种联合边路和中路解码特征学习的多描述编码图像增强方法。该方法同时考虑了边路解码图像增强和中路解码图像增强的问题,因而可以通过联合学习优化中路解码和边路解码的特征来实现更好的网络训练。首先,考虑到多描述编码的边路独立解码和中路联合解码的特性,提出一种网络共享的边路低分辨率特征提取网络来有效地提取具有相同内容和差异细节的两个边路解码图像的特征,同时设计一种残差递归补偿网络结构并将其用于边路与中路低分辨率特征提取网络。其次,设计一种多描述边路上采样重建网络,该网络采用部分网络层参数共享策略,该策略能够减小网络模型参数量,同时提高网络的泛化能力。最后,提出一种多描述中路上采样重建网络,将两个边路低分辨率特征与中路低分辨率特征进行深层特征融合来实现多描述压缩图像的增强。大量的实验结果表明:在模型复杂度、客观质量和视觉质量评价方面,所提方法优于很多的图像增强方法如ARCNN、FastARCNN、DnCNN、WSR和DWCNN。  相似文献   

As an alternative to classical representations in machine learning algorithms, we explore coding strategies using events as is observed for spiking neurons in the central nervous system. Focusing on visual processing, we have previously shown that we can define with an over-complete dictionary a sparse spike coding scheme by implementing lateral interactions that account for redundant information. Since this class of algorithms is both compatible with biological constraints and with neuro-physiological observations, it can provide a possible algorithm to explain the speed of visual processing despite the relatively slow time of response of single neurons. Here, I explore learning mechanisms to derive in an unsupervised manner an over-complete set of filters which provides a progressively sparser representation of the input. This work is based on a previous model of sparse coding from Olshausen et al. (1998) and the results leads to similar results, suggesting that this strategy provides a simple neural implementation of this algorithm and thus of Blind Source Separation. Moreover, this neuro-mimetic algorithm may be easily extended to realistic architectures of cortical columns in the primary visual cortex and we show results for different strategies of representation, leading to neuro-mimetic adaptive sparse spike coding schemes. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

Difering from common 2D images,a texture map,since it is used to project onto a 3D model in 3D space,not only contains 2D texture information,but also implicitly associates certain 3D geometric information.Related to this,an efective 3D geometry-dependent texture map compression method with hybrid region of interest(ROI)coding is proposed in this paper.We regard the visually important area of the texture map as the ROI.To acquire the visually important areas of the texture map,we take into account information from both the 3D geometry and 2D texture maps,depicting the saliency of the textured model,distortion of the texture mapping,and boundary of the texture atlas.These visually important areas are expressed as a visual importance map.According to the particularity of the texture map,a hybrid ROI coding method that utilizes Max-Shift and an improved post compression rate distortion(PCRD)technique is presented,guided by this visual importance map.To find the exact wavelet coefcients pertaining to these ROIs before carrying out the hybrid ROI coding,this paper proposes a stochastic coefcient priority mask map computational method.Experimental results show that the visually important areas of the texture image have a better visual efect and that a good rendering result can be obtained from the texture mapping.  相似文献   

The modern chip multiprocessors are vulnerable to transient faults caused by either on-purpose attacks or system mistakes, especially for those with large and multi-level caches in cloud servers. In this paper, we propose a modified/shared replication cache to keep a redundancy for the latest accessed and modified/shared L2 cache lines. According to the experiments based on Multi2Sim, this cache with proper size can provide considerable data reliability. In addition, the cache can reduce the average latency of memory hierarchy for error correction, with only about 20.2% of L2 cache energy cost and 2% of L2 cache silicon overhead.  相似文献   

Online systems have come to be heavily used in education, particularly for online learning and collecting information not otherwise readily available. Most e-learning systems, including interactive learning systems, have been designed to “push” course materials to students but rarely to “collect” or “pull” ideas from them. The interactive mechanisms in proposed instructional design models, however, prevent many potential designers from improving course quality, even though some believe that the learning experience and the comments of students are important for enhancing course materials. As well, students could actually contribute to instructional design.This paper presents a course material enhancement process that elicits ideas from students by encouraging students to modify course materials. This process had been tested on different higher education programs, both graduate and undergraduate. It aims to understand which programs’ students have a higher willingness to participate in this work and if they can benefit from this process. To facilitate this research, an asynchronous interaction system, teacher digital assistant (TDA), was designed for teachers to receive responses, recommendations, and modified materials from students at any time. The major advantage of this process is that it could embed students’ thoughts into the course material to improve the curriculum, which can benefit future students.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号