首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Future wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. We consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose two unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment technique reduces the ASR error rate by more than a factor of 2.5 for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance  相似文献   

2.
The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment  相似文献   

3.
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.  相似文献   

4.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients  相似文献   

5.
Encoding frequency modulation to improve cochlear implant performance in noise   总被引:10,自引:0,他引:10  
Different from traditional Fourier analysis, a signal can be decomposed into amplitude and frequency modulation components. The speech processing strategy in most modern cochlear implants only extracts and encodes amplitude modulation in a limited number of frequency bands. While amplitude modulation encoding has allowed cochlear implant users to achieve good speech recognition in quiet, their performance in noise is severely compromised. Here, we propose a novel speech processing strategy that encodes both amplitude and frequency modulations in order to improve cochlear implant performance in noise. By removing the center frequency from the subband signals and additionally limiting the frequency modulation's range and rate, the present strategy transforms the fast-varying temporal fine structure into a slowly varying frequency modulation signal. As a first step, we evaluated the potential contribution of additional frequency modulation to speech recognition in noise via acoustic simulations of the cochlear implant. We found that while amplitude modulation from a limited number of spectral bands is sufficient to support speech recognition in quiet, frequency modulation is needed to support speech recognition in noise. In particular, improvement by as much as 71 percentage points was observed for sentence recognition in the presence of a competing voice. The present result strongly suggests that frequency modulation be extracted and encoded to improve cochlear implant performance in realistic listening situations. We have proposed several implementation methods to stimulate further investigation. Index Terms-Amplitude modulation, cochlear implant, fine structure, frequency modulation, signal processing, speech recognition, temporal envelope.  相似文献   

6.
作为说话人识别特征参量的MFCC的提取过程   总被引:5,自引:0,他引:5  
说话人识别是人的个体特征识别中的一个重要分支,在实际生活中已得到广泛应用。而人的听觉系统是一个比较理想的说话人识别系统,MFCC(Mel倒谱系数)模拟了人的听觉特性,是符合人听觉特性的语音特征参量,在实际应用中取得了较高的识别率。文中通过一个卷积同态系统简单介绍了语音信号的倒谱分析方法,并通过对Mel频率刻度得到符合人听觉特性的Mel频率等效滤波器组,最后介绍了MFCC求取的一般过程和算法。  相似文献   

7.
简志华  杨震 《信号处理》2007,23(3):383-387
本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。  相似文献   

8.
Lee  L.-M. Wang  H.-C. 《Electronics letters》1995,31(8):616-617
The state parameters of the hidden Markov model are represented by the autocorrelation coefficients of a context window that can be adaptively transformed to cepstral and delta cepstral coefficients according to the environmental noise. Experimental results show that it can significantly improve the speech recognition rate under noisy environments  相似文献   

9.
A new method for representation of speech spectra based on a pole-zero decomposition technique is proposed in this paper. In this method the parameters of a pole-zero model for the smoothed short-time spectrum of speech are determined by adopting a cepstral matching criterion. The cepstral coefficients of the impulse response of the model are equal to the cepstral coefficients of the signal up to a specified number which determine the order of the model system. This is analogous to autocorrelation matching in linear prediction analysis. It is shown that the model spectrum represents both peaks and valleys of the smoothed spectrum equally well, unlike the all pole model of linear prediction analysis where only the peaks are well represented. The pole and zero parameters are derived in an identical manner by approximately deconvolving the pole and zero contributions in the cepstral domain. The residual from the inverse pole-zero system can be used to obtain information about the excitation signal.  相似文献   

10.
稳健语音识别技术发展现状及展望   总被引:12,自引:0,他引:12  
姚文冰  姚天任  韩涛 《信号处理》2001,17(6):484-493
本文在简单叙述稳健语音识别技术产生的背景后,着重介绍了现阶段国内外有关稳健语音识别的主要技术、研究现状及未来发展方向.首先简述引起语音质量恶化、影响语音识别系统稳健性的干扰源及其影响.然后分别介绍语音增强、稳健语音特征的提取、基于特征和模型的补偿技术、麦克风阵列、基于人耳的听觉处理及听觉视觉双模态语音识别等技术路线及发展现状.最后讨论稳健语音识别技术朱来的发展方向.  相似文献   

11.
Overview of compression and packet loss effects in speech biometrics   总被引:2,自引:0,他引:2  
An overview is presented of compression and packet loss effects in speech biometrics. These new problems appear particularly in recent applications of biometrics over mobile or Internet networks. The influence of speech compression on speaker recognition performance in mobile networks is investigated. In a first experiment, it is found that the use of GSM coding degrades the performance. In a second experiment, the features for the speaker recognition system are calculated directly from the information available in the encoded bit stream. It is found that a low LPC order in GSM coding is responsible for most performance degradations. A speaker recognition system was obtained which is equivalent in performance to the original one which decodes and reanalyses speech before performing recognition. The joint packet loss and compression effects over IP networks are also studied. It is experimentally demonstrated that the adverse effects of packet loss alone are negligible, while the encoding of speech, particularly at a low bit rate, coupled with packet loss, can reduce the verification accuracy considerably.  相似文献   

12.
It is noted that of great importance to the success of the articulatory approach to speech coding is the use of a good distortion measure between a given speech signal and the entries in a stored codebook of impulse responses and corresponding vocal-track shapes (articulatory codebook). One promising distortion measure is the weighted cepstral distortion. Since the impulse responses in the articulatory codebook do not include glottal characteristics, the authors derive optimal weighting functions (cepstral lifters) to reduce the influence of a varying glottal source on the cepstral distortion measure. This is done by examining the ensemble of cepstral coefficients of speech produced by an articulatory speech synthesizer that also includes a vocal-cord model. The obtained cepstral lifters are optimal for the given ensemble of cepstral coefficients and for given constraints on the weighting function. They are different for cepstral coefficients derived from the power spectrum (FFT cepstra) and for those derived from LPC (linear predictive coding) coefficients (LPC cepstra). The performances of the obtained cepstral lifters are compared in an articulatory codebook search  相似文献   

13.
In wireless commercial and military communications systems, where bandwidth is at a premium, robust low-bit-rate speech coders are essential. They operate at fix bit rates and those bit rates cannot be altered without major modifications in the vocoder design. A novel approach to vocoders, in order to reduce the bit rate required to transmit speech signal, is proposed. While traditional low-bit-rate vocoders code original input speech, the proposed procedure operates on the time-scale modified signal. The proposed method offers any bit rate from 2400 b/s to downwards without modifying the principle vocoder structure, which is the new NATO standard, Stanag 4591, Mixed Excitation Linear Prediction (MELP) vocoder. We consider the application of transmitting MELP-encoded speech over noisy communication channels by applying different modulation techniques, after time-scale compression is applied. Three different time-scale modification algorithms have been evaluated and waveform similarity overlap and add (WSOLA) algorithm has been selected for time-scale modification purposes. Computer simulation results, both source and channel, are presented in terms of objective speech quality metrics and informal subjective listening tests. Design parameters such as codec complexity and delay are also investigated. Simulation results lead to a possible wireless communications system, whose performance might be enhanced by using the spared bits offered by the procedure.  相似文献   

14.
介绍了一种甚低码率编码的网络视频服务器,采用W9961CF编码芯片,既考虑到了视频数据编码效率的提高,也考虑到了信道容错能力的增强,能够适应矿区高误码率的通信网络.详细阐述系统硬件设计及工作流程,重点分析了其视频编码的实现.  相似文献   

15.
In this paper, a low-power, low-voltage speech processing system is presented. The system is intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link. The proposed system is based on a CMOS feature extraction chip for speech recognition that computes 15 cepstrum parameters, each 8 ms, and dissipates 30 μW at 0.9-V supply. Single-cell battery operation is achieved. Processing relies on a novel feature extraction algorithm using 1-bit A/D conversion of the input speech signal. The chip has been implemented as a gate array in a standard 0.5-μm, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpus is 10 μJ. It achieves recognition rates over 98% in isolated-word speech recognition tasks  相似文献   

16.
自动检测正常嗓音和病理嗓音的关键是选出有效的特征参数,并对其进行优化得到简单易实现的参数。同时选择合适的识别模型对正常嗓音和病理嗓音进行识别以得到最好的识别率。为了能实时、便利地检测正常嗓音和病理嗓音,这里提出了线性预测倒谱系数(LPCC)和MEL频率倒谱系数(MFCC)声学特征参数,采用动态时间规整(DTW)算法进行识别,实验结果表明该模型的识别率可达到90%以上,且MFCC方法优于LPCC。  相似文献   

17.
目前,关于语音识别的研究尚处在实验室环境中,而实际的语音总是与噪声和干扰并存。人类能够在信噪比很低甚至在有干扰声音存在的环境中正确识剐语音主要是依靠人的双耳输入作用,本文就模仿人耳的听觉掩蔽效应来掩蔽噪声信号,提出了一种MFCC(Mel频率倒谱系数)改进提取算法。该算法能更好地减少噪声信号对纯净语音信号的影响,从而提高语音信号的识别率。实验表明改进后的算法相对于传统的MFCC提取算法大约有4.43%~8.42%的相对性能提升。  相似文献   

18.
Hybrid companding delta modulation (HCDM) is known to be superior in performance to other instantaneous or syllabic companding delta modulation systems [1]. To improve its performance or to reduce the bit rate further in coding speech, we propose to use a variable-rate sampling scheme in the HCDM system. The proposed system employs several different sampling rates but transmits the output binary signal at a fixed rate using a buffer. By using the variable-rate scheme, one can improve its performance by 3 to 4 dB in signal-to-quantization noise ratio (SQNR) over the fixedrate HCDM. Detailed algorithm and computer simulation results are presented. Buffer behavior and its control are also discussed. In addition, it is shown that the performance gain of a DM system with variable-rate sampling depends on the degree of variation of the input signal.  相似文献   

19.
Wavelet transform has been found to be an effective tool for the time-frequency analysis of non-stationary and quasi-stationary signals. Recent years have seen wavelet transform being used for feature extraction in speech recognition applications. In the paper a sub-band feature extraction technique based on an admissible wavelet transform is proposed and the features are modified to make them robust to additive white Gaussian noise. The performance of this system is compared with the conventional mel frequency cepstral coefficients (MFCC) under various signal to noise ratios. The recognition performance based on the eight sub-band features is found to be superior under the noisy conditions compared with MFCC features.  相似文献   

20.
Variable-bit-rate (VBR) compressed video can exhibit significant multiple-time-scale bit-rate variability. In this paper we consider the transmission of stored video from a server to a client across a network, and explore how the client buffer space can be used most effectively toward reducing the variability of the transmitted bit rate. Two basic results are presented. First, we show how to achieve the greatest possible reduction in rate variability when sending stored video to a client with given buffer size. We formally establish the optimality of our approach and illustrate its performance over a set of long MPEG-1 encoded video traces. Second, we evaluate the impact of optimal smoothing on the network resources needed for video transport, under two network service models: deterministic guaranteed service (Chang 1994; Wrege et al. 1996) and renegotiated constant-bit-rate (RCBR) service (Grossglauser et al. 1997). Under both models, the impact of optimal smoothing is dramatic  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号