共查询到20条相似文献,搜索用时 31 毫秒
1.
Weerackody V. Reichl W. Potamianos A. 《Wireless Communications, IEEE Transactions on》2002,1(2):282-291
Future wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. We consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose two unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment technique reduces the ASR error rate by more than a factor of 2.5 for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance 相似文献
2.
Lee L.-M. Chen J.-K. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(6):397-402
The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment 相似文献
3.
Ho‐Young Jung 《ETRI Journal》2004,26(3):273-276
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature. 相似文献
4.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients 相似文献
5.
Different from traditional Fourier analysis, a signal can be decomposed into amplitude and frequency modulation components. The speech processing strategy in most modern cochlear implants only extracts and encodes amplitude modulation in a limited number of frequency bands. While amplitude modulation encoding has allowed cochlear implant users to achieve good speech recognition in quiet, their performance in noise is severely compromised. Here, we propose a novel speech processing strategy that encodes both amplitude and frequency modulations in order to improve cochlear implant performance in noise. By removing the center frequency from the subband signals and additionally limiting the frequency modulation's range and rate, the present strategy transforms the fast-varying temporal fine structure into a slowly varying frequency modulation signal. As a first step, we evaluated the potential contribution of additional frequency modulation to speech recognition in noise via acoustic simulations of the cochlear implant. We found that while amplitude modulation from a limited number of spectral bands is sufficient to support speech recognition in quiet, frequency modulation is needed to support speech recognition in noise. In particular, improvement by as much as 71 percentage points was observed for sentence recognition in the presence of a competing voice. The present result strongly suggests that frequency modulation be extracted and encoded to improve cochlear implant performance in realistic listening situations. We have proposed several implementation methods to stimulate further investigation. Index Terms-Amplitude modulation, cochlear implant, fine structure, frequency modulation, signal processing, speech recognition, temporal envelope. 相似文献
6.
作为说话人识别特征参量的MFCC的提取过程 总被引:5,自引:0,他引:5
说话人识别是人的个体特征识别中的一个重要分支,在实际生活中已得到广泛应用。而人的听觉系统是一个比较理想的说话人识别系统,MFCC(Mel倒谱系数)模拟了人的听觉特性,是符合人听觉特性的语音特征参量,在实际应用中取得了较高的识别率。文中通过一个卷积同态系统简单介绍了语音信号的倒谱分析方法,并通过对Mel频率刻度得到符合人听觉特性的Mel频率等效滤波器组,最后介绍了MFCC求取的一般过程和算法。 相似文献
7.
本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。 相似文献
8.
The state parameters of the hidden Markov model are represented by the autocorrelation coefficients of a context window that can be adaptively transformed to cepstral and delta cepstral coefficients according to the environmental noise. Experimental results show that it can significantly improve the speech recognition rate under noisy environments 相似文献
9.
B. Yegnanarayana 《Signal processing》1981,3(1):5-17
A new method for representation of speech spectra based on a pole-zero decomposition technique is proposed in this paper. In this method the parameters of a pole-zero model for the smoothed short-time spectrum of speech are determined by adopting a cepstral matching criterion. The cepstral coefficients of the impulse response of the model are equal to the cepstral coefficients of the signal up to a specified number which determine the order of the model system. This is analogous to autocorrelation matching in linear prediction analysis. It is shown that the model spectrum represents both peaks and valleys of the smoothed spectrum equally well, unlike the all pole model of linear prediction analysis where only the peaks are well represented. The pole and zero parameters are derived in an identical manner by approximately deconvolving the pole and zero contributions in the cepstral domain. The residual from the inverse pole-zero system can be used to obtain information about the excitation signal. 相似文献
10.
11.
Besacier L. Mayorga P. Bonastre J.-F. Fredouille C. Meignier S. 《Vision, Image and Signal Processing, IEE Proceedings -》2003,150(6):372-376
An overview is presented of compression and packet loss effects in speech biometrics. These new problems appear particularly in recent applications of biometrics over mobile or Internet networks. The influence of speech compression on speaker recognition performance in mobile networks is investigated. In a first experiment, it is found that the use of GSM coding degrades the performance. In a second experiment, the features for the speaker recognition system are calculated directly from the information available in the encoded bit stream. It is found that a low LPC order in GSM coding is responsible for most performance degradations. A speaker recognition system was obtained which is equivalent in performance to the original one which decodes and reanalyses speech before performing recognition. The joint packet loss and compression effects over IP networks are also studied. It is experimentally demonstrated that the adverse effects of packet loss alone are negligible, while the encoding of speech, particularly at a low bit rate, coupled with packet loss, can reduce the verification accuracy considerably. 相似文献
12.
It is noted that of great importance to the success of the articulatory approach to speech coding is the use of a good distortion measure between a given speech signal and the entries in a stored codebook of impulse responses and corresponding vocal-track shapes (articulatory codebook). One promising distortion measure is the weighted cepstral distortion. Since the impulse responses in the articulatory codebook do not include glottal characteristics, the authors derive optimal weighting functions (cepstral lifters) to reduce the influence of a varying glottal source on the cepstral distortion measure. This is done by examining the ensemble of cepstral coefficients of speech produced by an articulatory speech synthesizer that also includes a vocal-cord model. The obtained cepstral lifters are optimal for the given ensemble of cepstral coefficients and for given constraints on the weighting function. They are different for cepstral coefficients derived from the power spectrum (FFT cepstra) and for those derived from LPC (linear predictive coding) coefficients (LPC cepstra). The performances of the obtained cepstral lifters are compared in an articulatory codebook search 相似文献
13.
In wireless commercial and military communications systems, where bandwidth is at a premium, robust low-bit-rate speech coders are essential. They operate at fix bit rates and those bit rates cannot be altered without major modifications in the vocoder design. A novel approach to vocoders, in order to reduce the bit rate required to transmit speech signal, is proposed. While traditional low-bit-rate vocoders code original input speech, the proposed procedure operates on the time-scale modified signal. The proposed method offers any bit rate from 2400 b/s to downwards without modifying the principle vocoder structure, which is the new NATO standard, Stanag 4591, Mixed Excitation Linear Prediction (MELP) vocoder. We consider the application of transmitting MELP-encoded speech over noisy communication channels by applying different modulation techniques, after time-scale compression is applied. Three different time-scale modification algorithms have been evaluated and waveform similarity overlap and add (WSOLA) algorithm has been selected for time-scale modification purposes. Computer simulation results, both source and channel, are presented in terms of objective speech quality metrics and informal subjective listening tests. Design parameters such as codec complexity and delay are also investigated. Simulation results lead to a possible wireless communications system, whose performance might be enhanced by using the spared bits offered by the procedure. 相似文献
14.
15.
Borgatti M. Felici M. Ferrari A. Guerrieri R. 《Solid-State Circuits, IEEE Journal of》1998,33(7):1082-1089
In this paper, a low-power, low-voltage speech processing system is presented. The system is intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link. The proposed system is based on a CMOS feature extraction chip for speech recognition that computes 15 cepstrum parameters, each 8 ms, and dissipates 30 μW at 0.9-V supply. Single-cell battery operation is achieved. Processing relies on a novel feature extraction algorithm using 1-bit A/D conversion of the input speech signal. The chip has been implemented as a gate array in a standard 0.5-μm, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpus is 10 μJ. It achieves recognition rates over 98% in isolated-word speech recognition tasks 相似文献
16.
17.
18.
Chong Un Dong Cho 《Communications, IEEE Transactions on》1982,30(4):593-599
Hybrid companding delta modulation (HCDM) is known to be superior in performance to other instantaneous or syllabic companding delta modulation systems [1]. To improve its performance or to reduce the bit rate further in coding speech, we propose to use a variable-rate sampling scheme in the HCDM system. The proposed system employs several different sampling rates but transmits the output binary signal at a fixed rate using a buffer. By using the variable-rate scheme, one can improve its performance by 3 to 4 dB in signal-to-quantization noise ratio (SQNR) over the fixedrate HCDM. Detailed algorithm and computer simulation results are presented. Buffer behavior and its control are also discussed. In addition, it is shown that the performance gain of a DM system with variable-rate sampling depends on the degree of variation of the input signal. 相似文献
19.
Wavelet transform has been found to be an effective tool for the time-frequency analysis of non-stationary and quasi-stationary signals. Recent years have seen wavelet transform being used for feature extraction in speech recognition applications. In the paper a sub-band feature extraction technique based on an admissible wavelet transform is proposed and the features are modified to make them robust to additive white Gaussian noise. The performance of this system is compared with the conventional mel frequency cepstral coefficients (MFCC) under various signal to noise ratios. The recognition performance based on the eight sub-band features is found to be superior under the noisy conditions compared with MFCC features. 相似文献
20.
Salehi J.D. Zhi-Li Zhang Kurose J. Towsley D. 《Networking, IEEE/ACM Transactions on》1998,6(4):397-410
Variable-bit-rate (VBR) compressed video can exhibit significant multiple-time-scale bit-rate variability. In this paper we consider the transmission of stored video from a server to a client across a network, and explore how the client buffer space can be used most effectively toward reducing the variability of the transmitted bit rate. Two basic results are presented. First, we show how to achieve the greatest possible reduction in rate variability when sending stored video to a client with given buffer size. We formally establish the optimality of our approach and illustrate its performance over a set of long MPEG-1 encoded video traces. Second, we evaluate the impact of optimal smoothing on the network resources needed for video transport, under two network service models: deterministic guaranteed service (Chang 1994; Wrege et al. 1996) and renegotiated constant-bit-rate (RCBR) service (Grossglauser et al. 1997). Under both models, the impact of optimal smoothing is dramatic 相似文献