期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An error-protected speech recognition system for wirelesscommunications

Weerackody V. Reichl W. Potamianos A. 《Wireless Communications, IEEE Transactions on》2002,1(2):282-291

Future wireless multimedia terminals will have a variety of applications that require speech recognition capabilities. We consider a robust distributed speech recognition system where representative parameters of the speech signal are extracted at the wireless terminal and transmitted to a centralized automatic speech recognition (ASR) server. We propose two unequal error protection schemes for the ASR bit stream and demonstrate the satisfactory performance of these schemes for typical wireless cellular channels. In addition, a "soft-feature" error concealment strategy is introduced at the ASR server that uses "soft-outputs" from the channel decoder to compute the marginal distribution of only the reliable features during likelihood computation at the speech recognizer. This soft-feature error concealment technique reduces the ASR error rate by more than a factor of 2.5 for certain channels. Also considered is a channel decoding technique with source information that improves ASR performance 相似文献

2.

Nonlinear cepstral equalisation method for noisy speech recognition

Lee L.-M. Chen J.-K. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(6):397-402

The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment 相似文献

3.

Filtering of Filter‐Bank Energies for Robust Speech Recognition

Ho‐Young Jung 《ETRI Journal》2004,26(3):273-276

We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature. 相似文献

4.

Isolated Mandarin syllable recognition using segmental features

Chang S. Chen S.-H. 《Vision, Image and Signal Processing, IEE Proceedings -》1995,142(1):59-64

A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients 相似文献

5.

Encoding frequency modulation to improve cochlear implant performance in noise 总被引：10，自引：0，他引：10

Nie K Stickney G Zeng FG 《IEEE transactions on bio-medical engineering》2005,52(1):64-73

Different from traditional Fourier analysis, a signal can be decomposed into amplitude and frequency modulation components. The speech processing strategy in most modern cochlear implants only extracts and encodes amplitude modulation in a limited number of frequency bands. While amplitude modulation encoding has allowed cochlear implant users to achieve good speech recognition in quiet, their performance in noise is severely compromised. Here, we propose a novel speech processing strategy that encodes both amplitude and frequency modulations in order to improve cochlear implant performance in noise. By removing the center frequency from the subband signals and additionally limiting the frequency modulation's range and rate, the present strategy transforms the fast-varying temporal fine structure into a slowly varying frequency modulation signal. As a first step, we evaluated the potential contribution of additional frequency modulation to speech recognition in noise via acoustic simulations of the cochlear implant. We found that while amplitude modulation from a limited number of spectral bands is sufficient to support speech recognition in quiet, frequency modulation is needed to support speech recognition in noise. In particular, improvement by as much as 71 percentage points was observed for sentence recognition in the presence of a competing voice. The present result strongly suggests that frequency modulation be extracted and encoded to improve cochlear implant performance in realistic listening situations. We have proposed several implementation methods to stimulate further investigation. Index Terms-Amplitude modulation, cochlear implant, fine structure, frequency modulation, signal processing, speech recognition, temporal envelope. 相似文献

6.

作为说话人识别特征参量的MFCC的提取过程 总被引：5，自引：0，他引：5

丁爱明《电子工程师》2006,32(1):51-53

说话人识别是人的个体特征识别中的一个重要分支,在实际生活中已得到广泛应用。而人的听觉系统是一个比较理想的说话人识别系统,MFCC(Mel倒谱系数)模拟了人的听觉特性,是符合人听觉特性的语音特征参量,在实际应用中取得了较高的识别率。文中通过一个卷积同态系统简单介绍了语音信号的倒谱分析方法,并通过对Mel频率刻度得到符合人听觉特性的Mel频率等效滤波器组,最后介绍了MFCC求取的一般过程和算法。相似文献

7.

基于GARCH模型的改进的倒谱域特征参数补偿算法

简志华杨震《信号处理》2007,23(3):383-387

本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。相似文献

8.

Representation of hidden Markov model for noise adaptive speechrecognition

Lee L.-M. Wang H.-C. 《Electronics letters》1995,31(8):616-617

The state parameters of the hidden Markov model are represented by the autocorrelation coefficients of a context window that can be adaptively transformed to cepstral and delta cepstral coefficients according to the environmental noise. Experimental results show that it can significantly improve the speech recognition rate under noisy environments 相似文献

9.

Speech analysis by pole-zero decomposition of short-time spectra

B. Yegnanarayana 《Signal processing》1981,3(1):5-17

A new method for representation of speech spectra based on a pole-zero decomposition technique is proposed in this paper. In this method the parameters of a pole-zero model for the smoothed short-time spectrum of speech are determined by adopting a cepstral matching criterion. The cepstral coefficients of the impulse response of the model are equal to the cepstral coefficients of the signal up to a specified number which determine the order of the model system. This is analogous to autocorrelation matching in linear prediction analysis. It is shown that the model spectrum represents both peaks and valleys of the smoothed spectrum equally well, unlike the all pole model of linear prediction analysis where only the peaks are well represented. The pole and zero parameters are derived in an identical manner by approximately deconvolving the pole and zero contributions in the cepstral domain. The residual from the inverse pole-zero system can be used to obtain information about the excitation signal. 相似文献

10.

稳健语音识别技术发展现状及展望 总被引：12，自引：0，他引：12

姚文冰姚天任韩涛《信号处理》2001,17(6):484-493

本文在简单叙述稳健语音识别技术产生的背景后,着重介绍了现阶段国内外有关稳健语音识别的主要技术、研究现状及未来发展方向.首先简述引起语音质量恶化、影响语音识别系统稳健性的干扰源及其影响.然后分别介绍语音增强、稳健语音特征的提取、基于特征和模型的补偿技术、麦克风阵列、基于人耳的听觉处理及听觉视觉双模态语音识别等技术路线及发展现状.最后讨论稳健语音识别技术朱来的发展方向. 相似文献

11.

Overview of compression and packet loss effects in speech biometrics 总被引：2，自引：0，他引：2

Besacier L. Mayorga P. Bonastre J.-F. Fredouille C. Meignier S. 《Vision, Image and Signal Processing, IEE Proceedings -》2003,150(6):372-376

An overview is presented of compression and packet loss effects in speech biometrics. These new problems appear particularly in recent applications of biometrics over mobile or Internet networks. The influence of speech compression on speaker recognition performance in mobile networks is investigated. In a first experiment, it is found that the use of GSM coding degrades the performance. In a second experiment, the features for the speaker recognition system are calculated directly from the information available in the encoded bit stream. It is found that a low LPC order in GSM coding is responsible for most performance degradations. A speaker recognition system was obtained which is equivalent in performance to the original one which decodes and reanalyses speech before performing recognition. The joint packet loss and compression effects over IP networks are also studied. It is experimentally demonstrated that the adverse effects of packet loss alone are negligible, while the encoding of speech, particularly at a low bit rate, coupled with packet loss, can reduce the verification accuracy considerably. 相似文献

12.

Design and evaluation of optimal cepstral lifters for accessingarticulatory codebooks

Meyer P. Schroeter J. Sondhi M.M. 《Signal Processing, IEEE Transactions on》1991,39(7):1493-1502

It is noted that of great importance to the success of the articulatory approach to speech coding is the use of a good distortion measure between a given speech signal and the entries in a stored codebook of impulse responses and corresponding vocal-track shapes (articulatory codebook). One promising distortion measure is the weighted cepstral distortion. Since the impulse responses in the articulatory codebook do not include glottal characteristics, the authors derive optimal weighting functions (cepstral lifters) to reduce the influence of a varying glottal source on the cepstral distortion measure. This is done by examining the ensemble of cepstral coefficients of speech produced by an articulatory speech synthesizer that also includes a vocal-cord model. The obtained cepstral lifters are optimal for the given ensemble of cepstral coefficients and for given constraints on the weighting function. They are different for cepstral coefficients derived from the power spectrum (FFT cepstra) and for those derived from LPC (linear predictive coding) coefficients (LPC cepstra). The performances of the obtained cepstral lifters are compared in an articulatory codebook search 相似文献

13.

Channel and source considerations of a bit-rate reduction technique for a possible wireless communications system's performance enhancement

Ilk H.G. Tugac S. 《Wireless Communications, IEEE Transactions on》2005,4(1):93-99

In wireless commercial and military communications systems, where bandwidth is at a premium, robust low-bit-rate speech coders are essential. They operate at fix bit rates and those bit rates cannot be altered without major modifications in the vocoder design. A novel approach to vocoders, in order to reduce the bit rate required to transmit speech signal, is proposed. While traditional low-bit-rate vocoders code original input speech, the proposed procedure operates on the time-scale modified signal. The proposed method offers any bit rate from 2400 b/s to downwards without modifying the principle vocoder structure, which is the new NATO standard, Stanag 4591, Mixed Excitation Linear Prediction (MELP) vocoder. We consider the application of transmitting MELP-encoded speech over noisy communication channels by applying different modulation techniques, after time-scale compression is applied. Three different time-scale modification algorithms have been evaluated and waveform similarity overlap and add (WSOLA) algorithm has been selected for time-scale modification purposes. Computer simulation results, both source and channel, are presented in terms of objective speech quality metrics and informal subjective listening tests. Design parameters such as codec complexity and delay are also investigated. Simulation results lead to a possible wireless communications system, whose performance might be enhanced by using the spared bits offered by the procedure. 相似文献

14.

甚低码率网络数字视频服务器的设计

程德强钱建生黄书慧《电视技术》2004,(4):72-74

介绍了一种甚低码率编码的网络视频服务器,采用W9961CF编码芯片,既考虑到了视频数据编码效率的提高,也考虑到了信道容错能力的增强,能够适应矿区高误码率的通信网络.详细阐述系统硬件设计及工作流程,重点分析了其视频编码的实现. 相似文献

15.

A low-power integrated circuit for remote speech recognition

Borgatti M. Felici M. Ferrari A. Guerrieri R. 《Solid-State Circuits, IEEE Journal of》1998,33(7):1082-1089

In this paper, a low-power, low-voltage speech processing system is presented. The system is intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link. The proposed system is based on a CMOS feature extraction chip for speech recognition that computes 15 cepstrum parameters, each 8 ms, and dissipates 30 μW at 0.9-V supply. Single-cell battery operation is achieved. Processing relies on a novel feature extraction algorithm using 1-bit A/D conversion of the input speech signal. The chip has been implemented as a gate array in a standard 0.5-μm, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpus is 10 μJ. It achieves recognition rates over 98% in isolated-word speech recognition tasks 相似文献

16.

基于LPCC和MFCC参数的病理嗓音识别研究

莫丽花周孝进张晓俊陶智赵鹤鸣顾济华《通信技术》2012,45(1):87-89

自动检测正常嗓音和病理嗓音的关键是选出有效的特征参数,并对其进行优化得到简单易实现的参数。同时选择合适的识别模型对正常嗓音和病理嗓音进行识别以得到最好的识别率。为了能实时、便利地检测正常嗓音和病理嗓音,这里提出了线性预测倒谱系数(LPCC)和MEL频率倒谱系数(MFCC)声学特征参数,采用动态时间规整(DTW)算法进行识别,实验结果表明该模型的识别率可达到90%以上,且MFCC方法优于LPCC。相似文献

17.

基于听觉掩蔽效应的改进MFCC特征提取算法

鲁五一吴德华谢志明刘建《电子工程师》2009,35(9):16-18

目前,关于语音识别的研究尚处在实验室环境中,而实际的语音总是与噪声和干扰并存。人类能够在信噪比很低甚至在有干扰声音存在的环境中正确识剐语音主要是依靠人的双耳输入作用,本文就模仿人耳的听觉掩蔽效应来掩蔽噪声信号,提出了一种MFCC（Mel频率倒谱系数）改进提取算法。该算法能更好地减少噪声信号对纯净语音信号的影响,从而提高语音信号的识别率。实验表明改进后的算法相对于传统的MFCC提取算法大约有4．43％～8．42％的相对性能提升。相似文献

18.

Hybrid Companding Delta Modulation with Variable-Rate Sampling

Chong Un Dong Cho 《Communications, IEEE Transactions on》1982,30(4):593-599

Hybrid companding delta modulation (HCDM) is known to be superior in performance to other instantaneous or syllabic companding delta modulation systems [1]. To improve its performance or to reduce the bit rate further in coding speech, we propose to use a variable-rate sampling scheme in the HCDM system. The proposed system employs several different sampling rates but transmits the output binary signal at a fixed rate using a buffer. By using the variable-rate scheme, one can improve its performance by 3 to 4 dB in signal-to-quantization noise ratio (SQNR) over the fixedrate HCDM. Detailed algorithm and computer simulation results are presented. Buffer behavior and its control are also discussed. In addition, it is shown that the performance gain of a DM system with variable-rate sampling depends on the degree of variation of the input signal. 相似文献

19.

Wavelet based robust sub-band features for phoneme recognition

Farooq O. Datta S. 《Vision, Image and Signal Processing, IEE Proceedings -》2004,151(3):187-193

Wavelet transform has been found to be an effective tool for the time-frequency analysis of non-stationary and quasi-stationary signals. Recent years have seen wavelet transform being used for feature extraction in speech recognition applications. In the paper a sub-band feature extraction technique based on an admissible wavelet transform is proposed and the features are modified to make them robust to additive white Gaussian noise. The performance of this system is compared with the conventional mel frequency cepstral coefficients (MFCC) under various signal to noise ratios. The recognition performance based on the eight sub-band features is found to be superior under the noisy conditions compared with MFCC features. 相似文献

20.

Supporting stored video: reducing rate variability and end-to-endresource requirements through optimal smoothing

Salehi J.D. Zhi-Li Zhang Kurose J. Towsley D. 《Networking, IEEE/ACM Transactions on》1998,6(4):397-410

Variable-bit-rate (VBR) compressed video can exhibit significant multiple-time-scale bit-rate variability. In this paper we consider the transmission of stored video from a server to a client across a network, and explore how the client buffer space can be used most effectively toward reducing the variability of the transmitted bit rate. Two basic results are presented. First, we show how to achieve the greatest possible reduction in rate variability when sending stored video to a client with given buffer size. We formally establish the optimality of our approach and illustrate its performance over a set of long MPEG-1 encoded video traces. Second, we evaluate the impact of optimal smoothing on the network resources needed for video transport, under two network service models: deterministic guaranteed service (Chang 1994; Wrege et al. 1996) and renegotiated constant-bit-rate (RCBR) service (Grossglauser et al. 1997). Under both models, the impact of optimal smoothing is dramatic 相似文献