首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
针对传统谱减法存在的算法缺陷,提出一种基于联合最大后验概率的改进谱减法.传统谱减法通过获取带噪语音与噪声的幅度差值,并提取带噪语音的相位信息进行语音信号重建.该方法因为谱相减产生“音乐噪声”,并因为相位估计不准确,导致低信噪比下信号增强效果不理想.为此,引入多频带谱减法和相位估计,通过划分频谱,分别在子频带进行谱减法,有效降低“音乐噪声”的影响;同时构建基于最大后验概率的相位估计器,联合信号幅度函数和相位函数,通过多次交替迭代得到相位估值.实验结果表明,相对于传统谱减法,在低信噪比下该算法有效提高增强语音的质量感知和可懂度.  相似文献   

2.
提出一种可适应非平稳噪声环境的基于码本学习的改进谱减语音增强算法。该算法分为训练阶段和增强阶段。训练阶段,使用自回归模型对语音和噪声的频谱形状进行建模并构造语音和噪声码本;增强阶段,采用对数谱最小化算法估计出语音和噪声的频谱,通过谱相减消除噪声。算法在每个时间帧估计语音和噪声频谱,即使在语音存在时仍能够有效跟踪快速变化的非平稳噪声;采用自回归模型能得到噪声频谱的平滑估计,减少了音乐噪声。实验仿真表明,相比于传统谱减法和多带谱减法,改进的谱减法具有更好的噪声抑制性能并且语音失真更小。  相似文献   

3.
Automatic speech recognition (ASR) in reverberant environments is still a challenging task. In this study, we propose a robust feature-extraction method on the basis of the normalization of the sub-band temporal modulation envelopes (TMEs). The sub-band TMEs were extracted using a series of constant bandwidth band-pass filters with Hilbert transforms followed by low-pass filtering. Based on these TMEs, the modulation spectrums in both clean and reverberation spaces are transformed to a reference space by using modulation transfer functions (MTFs), wherein the MTFs are estimated as the measure of the modulation transfer effect on the sub-band TMEs between the clean, reverberation, and reference spaces. By using the MTFs on the modulation spectrum, it is supposed that the difference on the modulation spectrum caused by the difference of the recording environments is removed. Based on the normalized modulation spectrum, inverse Fourier transform was conducted to restore the sub-band TMEs by retaining their original phase information. We tested the proposed method on speech recognition experiments in a reverberant room with differing speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel frequency cepstral coefficients with mean and variance normalization was used as the baseline. The experimental results showed that by averaging the results for SMDs from 50 cm to 400 cm, we obtained a 44.96% relative improvement by only using sub-band TME processing, and obtained a further 15.68% relative improvement by performing the normalization on the modulation spectrum of the sub-band TMEs. In all, we obtained a 53.59% relative improvement, which was better than using other temporal filtering and normalization methods.  相似文献   

4.
在谱减去噪过程中噪音的功率谱估计一般根据经验而定,对此提出了一种改进方法,利用含噪语音的短时能零积和基本谱减法,得到语音起止点和噪音功率谱估计,有利于在不同语音阶段对含噪语音进行谱减去噪。最后利用无音阶段噪音特点对去噪之后的残留噪音进行残差处理以彻底去除噪音。仿真实验表明该方法比传统单一的谱减去噪方法效果理想。  相似文献   

5.
6.
刘金刚  周翊  马永保  刘宏清 《计算机应用》2016,36(12):3369-3373
针对语音识别系统在噪声环境下不能保持很好鲁棒性的问题,提出了一种切换语音功率谱估计算法。该算法假设语音的幅度谱服从Chi分布,提出了一种改进的基于最小均方误差(MMSE)的语音功率谱估计算法。然后,结合语音存在的概率(SPP),推导出改进的基于语音存在概率的MMSE估计器。接下来,将改进的MSME估计器与传统的维纳滤波器结合。在噪声干扰比较大时,使用改进的MMSE估计器来估计纯净语音的功率谱,当噪声干扰较小时,改用传统的维纳滤波器以减少计算量,最终得到用于识别系统的切换语音功率谱估计算法。实验结果表明,所提算法相比传统的瑞利分布下的MMSE估计器在各种噪声的情况下识别率平均提高在8个百分点左右,在去除噪声干扰、提高识别系统鲁棒性的同时,减小了语音识别系统的功耗。  相似文献   

7.
We investigate a statistical model for integrating narrowband cues in speech. The model is inspired by two ideas in human speech perception: (i) Fletcher’s hypothesis (1953) that independent detectors, working in narrow frequency bands, account for the robustness of auditory strategies, and (ii) Miller and Nicely’s analysis (1955) that perceptual confusions in noisy bandlimited speech are correlated with phonetic features. We apply the model to detecting the phonetic feature [ +  /   sonorant] that distinguishes vowels, approximants, and nasals (sonorants) from stops, fricatives, and affricates (obstruents). The model is represented by a multilayer probabilistic network whose binary hidden variables indicate sonorant cues from different parts of the frequency spectrum. We derive the Expectation-Maximization algorithm for estimating the model’s parameters and evaluate its performance on clean and corrupted speech.  相似文献   

8.
This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for context-sensitive Tandem feature extraction and show how the Connectionist Temporal Classification approach can be used as a BLSTM-based back-end, alternatively to Hidden Markov Models (HMM). We combine context-sensitive BLSTM-based feature generation and speech decoding techniques with source separation by convolutive non-negative matrix factorization. Applying our speaker adapted multi-stream HMM framework that processes MFCC features from NMF-enhanced speech as well as word predictions obtained via BLSTM networks and non-negative sparse classification (NSC), we obtain an average accuracy of 91.86% on the PASCAL CHiME Challenge task at signal-to-noise ratios ranging from ?6 to 9 dB. To our knowledge, this is the best result ever reported for the CHiME Challenge task.  相似文献   

9.
抑制坦克强背景噪声的改进谱减法研究   总被引:1,自引:1,他引:0       下载免费PDF全文
谱减法是处理宽带噪声较为传统和有效的方法,它运算量较小,容易实时处理,增强效果也较好。根据经典谱减法及其各种改进形式的基本原理,提出一种新的改进谱减法语音增强算法。根据语音和噪声各自的特性,对带噪语音进行时域平滑和频谱统计加权处理。对该算法进行客观和主观测试表明:相对于传统的谱减法,该算法能更好地抑制背景噪声和音乐噪声,同时也较好地保持了语音的可懂度和自然度。  相似文献   

10.
由于噪声的影响导致语音信号的质量降低,因此需要对语音信号进行语音增强。语音增强是语音信号处理的前沿领域,其主要目标足从带噪语音中提取纯净的原始语音信号。介绍了实现语音增强方法的原理,利用实验仿真了传统谱减法和改进谱减方法,改进法通过对带噪信号进行参数调整,然后进行频域谱减,实验结果表明改进方法对语音增强效果明显好于传统方法。此外,对传统谱减法和改进谱减法的信噪比分别进行了计算,结果表明改进谱减方法的信噪比相对传统谱减方法有很大提高。  相似文献   

11.
Variation in the foliar chemistry of humid tropical forests is poorly understood, and airborne imaging spectroscopy could provide useful information at leaf and canopy scales. However, variation in canopy structure affects our ability to estimate foliar properties from airborne spectrometer data, yet these structural affects remain poorly quantified. Using leaf spectral (400–2500 nm) and chemical data collected from 162 Australian tropical forest species, along with partial least squares (PLS) analysis and canopy radiative transfer modeling, we determined the strength of the relationship between canopy reflectance and foliar properties under conditions of varying canopy structure.At the leaf level, chlorophylls, carotenoids and specific leaf area (SLA) were highly correlated with leaf spectral reflectance (r = 0.90–0.91). Foliar nutrients and water were also well represented by the leaf spectra (r = 0.79–0.85). When the leaf spectra were incorporated into the canopy radiative transfer simulations with an idealistic leaf area index (LAI) = 5.0, correlations between canopy reflectance spectra and leaf properties increased in strength by 4–18%. The effects of random LAI (= 3.0–6.5) variation on the retrieval of leaf properties remained minimal, particularly for pigments and SLA (r = 0.92–0.93). In contrast, correlations between leaf nitrogen (N) and canopy reflectance estimates decreased from r = 0.87 at constant LAI = 5 to r = 0.65 with randomly varying LAI = 3.0–6.5. Progressive increases in the structural variability among simulated tree crowns had relatively little effect on pigment, SLA and water predictions. However, N and phosphorus (P) were more sensitive to canopy structural variability. Our modeling results suggest that multiple leaf chemicals and SLA can be estimated from leaf and canopy reflectance spectroscopy, and that the high-LAI canopies found in tropical forests enhance the signal via multiple scattering. Finally, the two factors we found to most negatively impact leaf chemical predictions from canopy reflectance were variation in LAI and viewing geometry, which can be managed with new airborne technologies and analytical methods.  相似文献   

12.
语音增强主要用来提高受噪声污染的语音可懂度和语音质量,它的主要应用与在嘈杂环境中提高移动通信质量有关。传统的语音增强方法有谱减法、维纳滤波、小波系数法等。针对复杂噪声环境下传统语音增强算法增强后的语音质量不佳且存在音乐噪声的问题,提出了一种结合小波包变换和自适应维纳滤波的语音增强算法。分析小波包多分辨率在信号频谱划分中的作用,通过小波包对含噪信号作多尺度分解,对不同尺度的小波包系数进行自适应维纳滤波,使用滤波后的小波包系数重构进而获取增强的语音信号。仿真实验结果表明,与传统增强算法相比,该算法在低信噪比的非平稳噪声环境下不仅可以更有效地提高含噪语音的信噪比,而且能较好地保存语音的谱特征,提高了含噪语音的质量。  相似文献   

13.
A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speech recognition (ASR) performance. One way to solve this problem is to dereverberate the observed signal prior to ASR. In this paper, a room impulse response is assumed to consist of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of ASR performance degradation, this paper focuses on dealing with the effect of late reverberations. The proposed method first estimates the late reverberations using long-term multi-step linear prediction, and then reduces the late reverberation effect by employing spectral subtraction. The algorithm provided good dereverberation with training data corresponding to the duration of one speech utterance, in our case, less than 6 s. This paper describes the proposed framework for both single-channel and multichannel scenarios. Experimental results showed substantial improvements in ASR performance with real recordings under severe reverberant conditions.   相似文献   

14.
Pitch estimation is quite crucial to many applications. Although a number of estimation methods working in different domains have been put forward, there are still demands for improvement, especially for noisy speech. In this paper, we present iPEEH, a general technique to raise performance of pitch estimators by enhancing harmonics. By analysis and experiments, it is found that missing and submerged harmonics are the root causes for failures of many pitch detectors. Hence, we propose to enhance the harmonics in spectrum before implementing the pitch detection. One enhancement algorithm that mainly applies the square operation to regenerate harmonics is presented in detail, including the theoretical analysis and implementation. Four speech databases with 11 types of additive noise and 5 noise levels are utilized in assessment. We compare the performance of algorithms before and after using iPEEH. Experimental results indicate that the proposed iPEEH can effectively reduce the detection errors. In some cases, the error rate reductions are higher than 20%. In addition, the advantage of iPEEH is manifold since it is demonstrated in experiments that the iPEEH is effective for various noise types, noise levels, multiple basic frequency-based estimators, and two audio types. Through this work, we investigated the underlying reasons for pitch detection failures and presented a novel direction for pitch detection. Besides, this approach, a preprocessing step in essence, indicates the significance of preprocessing for any intelligent systems.  相似文献   

15.
Leaf area index (LAI) is a key forest structural characteristic that serves as a primary control for exchanges of mass and energy within a vegetated ecosystem. Most previous attempts to estimate LAI from remotely sensed data have relied on empirical relationships between field-measured observations and various spectral vegetation indices (SVIs) derived from optical imagery or the inversion of canopy radiative transfer models. However, as biomass within an ecosystem increases, accurate LAI estimates are difficult to quantify. Here we use lidar data in conjunction with SPOT5-derived spectral vegetation indices (SVIs) to examine the extent to which integration of both lidar and spectral datasets can estimate specific LAI quantities over a broad range of conifer forest stands in the northern Rocky Mountains. Our results show that SPOT5-derived SVIs performed poorly across our study areas, explaining less than 50% of variation in observed LAI, while lidar-only models account for a significant amount of variation across the two study areas located in northern Idaho; the St. Joe Woodlands (R2 = 0.86; RMSE = 0.76) and the Nez Perce Reservation (R2 = 0.69; RMSE = 0.61). Further, we found that LAI models derived from lidar metrics were only incrementally improved with the inclusion of SPOT 5-derived SVIs; increases in R2 ranged from 0.02–0.04, though model RMSE values decreased for most models (0–11.76% decrease). Significant lidar-only models tended to utilize a common set of predictor variables such as canopy percentile heights and percentile height differences, percent canopy cover metrics, and covariates that described lidar height distributional parameters. All integrated lidar-SPOT 5 models included textural measures of the visible wavelengths (e.g. green and red reflectance). Due to the limited amount of LAI model improvement when adding SPOT 5 metrics to lidar data, we conclude that lidar data alone can provide superior estimates of LAI for our study areas.  相似文献   

16.
Plant species discrimination using remote sensing is generally limited by the similarity of their reflectance spectra in the visible, NIR and SWIR domains. Laboratory measured emissivity spectra in the mid infrared (MIR; 2.5 μm–6 μm) and the thermal infrared (TIR; 8 μm–14 μm) domain of different plant species, however, reveal significant differences. It is anticipated that with the advances in airborne and space borne hyperspectral thermal sensors, differentiation between plant species may improve. The laboratory emissivity spectra of thirteen common broad leaved species, comprising 3024 spectral bands in the MIR and TIR, were analyzed. For each wavelength the differences between the species were tested for significance using the one way analysis of variance (ANOVA) with the post-hoc Tukey HSD test. The emissivity spectra of the analyzed species were found to be statistically different at various wavebands. Subsequently, six spectral bands were selected (based on the histogram of separable pairs of species for each waveband) to quantify the separability between each species pair based on the Jefferies Matusita (JM) distance. Out of 78 combinations, 76 pairs had a significantly different JM distance. This means that careful selection of hyperspectral bands in the MIR and TIR (2.5 μm–14 μm) results in reliable species discrimination.  相似文献   

17.
针对提高应用多通道皮肤听声系统进行语音识别的识别率,提出了基于多频带谱减法的语音增强算法。在多通道皮肤听声的实验中,有色噪声会严重降低语音质量,进而降低皮肤听声系统语音识别的识别率,因而首次将基于多带谱减法的语音增强算法引入到皮肤听声系统中以降低有色噪声。多频带谱减法将语音频带划分为多个子频带,分别在每个子频带作不同系数的谱减运算实现语音增强。通过Matlab完成了算法仿真并通过DSP硬件实现了算法并将增强后的语音信号输出给皮肤听声系统,实验证明此设计能够有效抑制有色噪声,增强皮肤听声系统的可靠性和实用性。  相似文献   

18.
叶斌  丁永生 《计算机仿真》2006,23(9):327-329
语音增强的目的是为了在保持语音可懂度和清晰度的前提下,尽可能地从带噪语音中提取需要的纯净语音,从而改善其质量,在实际应用中还需要对背景噪声进行预估。该文将实时噪声估计与维纳滤波法相结合,提出了一套简易有效的语音增强方案,在语音帧阶段对噪声功率谱进行平滑处理,使噪声估计更适合于维纳滤波,并配合传统的过减法以补偿估计引入的误差。Matlab实验表明在较低信噪比下,这种方法使得语音的信噪比有较大的提高,语音增强效果十分明显。  相似文献   

19.
This paper deals with the development of acoustic source localization algorithms for service robots working in real conditions. One of the main utilizations of these algorithms in a mobile robot is that the robot can localize a human operator and eventually interact with him/herself by means of verbal commands. The location of a speaking operator is detected with a microphone array based algorithm; localization information is passed to a navigation module which sets up a navigation mission using knowledge of the environment map. In fact, the system we have developed aims at integrating acoustic, odometric and collision sensors with the mobile robot control architecture. Good performance with real acoustic data have been obtained using neural network approach with spectral subtraction and a noise robust voice activity detector. The experiments show that the average absolute localization error is about 40 cm at 0 dB and about 10 cm at 10 dB of SNR for the named localization. Experimental results describing mobile robot performance in a talker following task are reported.  相似文献   

20.
基于语音存在概率和听觉掩蔽特性的语音增强算法   总被引:1,自引:0,他引:1  
宫云梅  赵晓群  史仍辉 《计算机应用》2008,28(11):2981-2983
低信噪比下,谱减语音增强法中一直存在的去噪度、残留的音乐噪声和语音畸变度三者间均衡这一关键问题显得尤为突出。为降低噪声对语音通信的干扰,提出了一种适于低信噪比下的语音增强算法。在传统的谱减法基础上,根据噪声的听觉掩蔽阈值自适应调整减参数,利用语音存在概率,对语音、噪声信号估计,避免低信噪比下端点检测(VAD)的不准确,有更强的鲁棒性。对算法进行了客观和主观测试,结果表明:相对于传统的谱减法,在几乎不损伤语音清晰度的前提下该算法能更好地抑制残留噪声和背景噪声,特别是对低信噪比和非平稳噪声干扰的语音信号,效果更加明显。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号