首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The noise robustness of automatic speech recognition systems can be improved by reducing an eventual mismatch between the training and test data distributions during feature extraction. Based on the quantiles of these distributions the parameters of transformation functions can be reliably estimated with small amounts of data. This paper will give a detailed review of quantile equalization applied to the Mel scaled filter bank, including considerations about the application in online systems and improvements through a second transformation step that combines neighboring filter channels. The recognition tests have shown that previous experimental observations on small vocabulary recognition tasks can be confirmed on the larger vocabulary Aurora 4 noisy Wall Street Journal database. The word error rate could be reduced from 45.7% to 25.5% (clean training) and from 19.5% to 17.0% (multicondition training).  相似文献   

2.
In this article we review several successful extensions to the standard hidden-Markov-model/artificial neural network (HMM/ANN) hybrid, which have recently made important contributions to the field of noise robust automatic speech recognition. The first extension to the standard hybrid was the “multi-band hybrid”, in which a separate ANN is trained on each frequency sub-band, followed by some form of weighted combination of ANN state posterior probability outputs prior to decoding. However, due to the inaccurate assumption of sub-band independence, this system usually gives degraded performance, except in the case of narrow-band noise. All of the systems which we review overcome this independence assumption and give improved performance in noise, while also improving or not significantly degrading performance with clean speech. The “all-combinations multi-band” hybrid trains a separate ANN for each sub-band combination. This, however, typically requires a large number of ANNs. The “all-combinations multi-stream” hybrid trains an ANN expert for every combination of just a small number of complementary data streams. Multiple ANN posteriors combination using maximum a-posteriori (MAP) weighting gives rise to the further successful strategy of hypothesis level combination by MAP selection. An alternative strategy for exploiting the classification capacity of ANNs is the “tandem hybrid” approach in which one or more ANN classifiers are trained with multi-condition data to generate discriminative and noise robust features for input to a standard ASR system. The “multi-stream tandem hybrid” trains an ANN for a number of complementary feature streams, permitting multi-stream data fusion. The “narrow-band tandem hybrid” trains an ANN for a number of particularly narrow frequency sub-bands. This gives improved robustness to noises not seen during training. Of the systems presented, all of the multi-stream systems provide generic models for multi-modal data fusion. Test results for each system are presented and discussed.  相似文献   

3.
Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment.  相似文献   

4.
Missing data imputation is an important research topic in data mining. The impact of noise is seldom considered in previous works while real-world data often contain much noise. In this paper, we systematically investigate the impact of noise on imputation methods and propose a new imputation approach by introducing the mechanism of Group Method of Data Handling (GMDH) to deal with incomplete data with noise. The performance of four commonly used imputation methods is compared with ours, called RIBG (robust imputation based on GMDH), on nine benchmark datasets. The experimental result demonstrates that noise has a great impact on the effectiveness of imputation techniques and our method RIBG is more robust to noise than the other four imputation methods used as benchmark.  相似文献   

5.
In this paper we investigated Artificial Neural Networks (ANN) based Automatic Speech Recognition (ASR) by using limited Arabic vocabulary corpora. These limited Arabic vocabulary subsets are digits and vowels carried by specific carrier words. In addition to this, Hidden Markov Model (HMM) based ASR systems are designed and compared to two ANN based systems, namely Multilayer Perceptron (MLP) and recurrent architectures, by using the same corpora. All systems are isolated word speech recognizers. The ANN based recognition system achieved 99.5% correct digit recognition. On the other hand, the HMM based recognition system achieved 98.1% correct digit recognition. With vowels carrier words, the MLP and recurrent ANN based recognition systems achieved 92.13% and 98.06, respectively, correct vowel recognition; but the HMM based recognition system achieved 91.6% correct vowel recognition.  相似文献   

6.
Zhu  Yun  Wang  Weiye  Yu  Gaohang  Wang  Jun  Tang  Lei 《Multimedia Tools and Applications》2022,81(23):33171-33184

The inevitable problem of missing data is ubiquitous in the real transportation system, which makes the data-driven intelligent transportation system suffer from incorrect response. We propose a Bayesian robust Candecomp/Parafac (CP) tensor decomposition (BRCP) approach to deal with missing data and outliers by integrating the general form of transportation system domain knowledge. Specifically, when the lower rank tensor captures the global information, the sparse tensor is added to capture the local information, which can robustly predict the distribution of missing items and under the fully Bayesian treatment, the effective variational reasoning can prevent the over fitting problem. Real and reliable traffic data sets are used to evaluate the performance of the model in two data missing scenarios, which the experimental results show that the proposed BRCP model achieves the best imputation accuracy and is better than the most advanced baseline (Bayesian Gaussian CP decomposition (BGCP), high accuracy low-rank tensor completion (HaLRTC) and SVD-combined tensor decomposition (STD)), even in the case of high missed detection rate, the model still has the best performance and robustness.

  相似文献   

7.
提出一种应用于鲁棒自适应噪声消除的新结构,用优化波束形成方法取代GSC中的固定波束形成(Fixed Beamforming)以得到较宽的带宽;对GSC的输出进行后滤波处理消除残余噪声。  相似文献   

8.
This paper presents a new approach for increasing the robustness of multi-channel automatic speech recognition in noisy and reverberant multi-source environments. The proposed method uses uncertainty propagation techniques to dynamically compensate the speech features and the acoustic models for the observation uncertainty determined at the beamforming stage. We present and analyze two methods that allow integrating classical multi-channel signal processing approaches like delay and sum beamformers or Zelinski-type Wiener filters, with uncertainty-of-observation techniques like uncertainty decoding or modified imputation. An analysis of the results on the PASCAL-CHiME task shows that this approach consistently outperforms conventional beamformers with a minimal increase in computational complexity. The use of dynamic compensation based on observation uncertainty also outperforms conventional static adaptation with no need of adaptation data.  相似文献   

9.
In crowded scenes, the extracted low-level features, such as optical flow or spatio-temporal interest point, are inevitably noisy and uncertainty. In this paper, we propose a fully unsupervised non-negative sparse coding based approach for abnormality event detection in crowded scenes, which is specifically tailored to cope with feature noisy and uncertainty. The abnormality of query sample is decided by the sparse reconstruction cost from an atomically learned event dictionary, which forms a sparse coding bases. In our algorithm, we formulate the task of dictionary learning as a non-negative matrix factorization (NMF) problem with a sparsity constraint. We take the robust Earth Mover's Distance (EMD), instead of traditional Euclidean distance, as distance metric reconstruction cost function. To reduce the computation complexity of EMD, an approximate EMD, namely wavelet EMD, is introduced and well combined into our approach, without losing performance. In addition, the combination of wavelet EMD with our approach guarantees the convexity of optimization in dictionary learning. To handle both local abnormality detection (LAD) and global abnormality detection, we adopt two different types of spatio-temporal basis. Experiments conducted on four public available datasets demonstrate the promising performance of our work against the state-of-the-art methods.  相似文献   

10.
Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the “recommended software” for imputations in international and national statistical institutions. Using artificial data and real data sets from official statistics and other fields, the advantages of IRMI over IVEWARE-especially with respect to robustness-are demonstrated.  相似文献   

11.
The success of using Hidden Markov Models (HMMs) for speech recognition application has motivated the adoption of these models for handwriting recognition especially the online handwriting that has large similarity with the speech signal as a sequential process. Some languages such as Arabic, Farsi and Urdo include large number of delayed strokes that are written above or below most letters and usually written delayed in time. These delayed strokes represent a modeling challenge for the conventional left-right HMM that is commonly used for Automatic Speech Recognition (ASR) systems. In this paper, we introduce a new approach for handling delayed strokes in Arabic online handwriting recognition using HMMs. We also show that several modeling approaches such as context based tri-grapheme models, speaker adaptive training and discriminative training that are currently used in most state-of-the-art ASR systems can provide similar performance improvement for Hand Writing Recognition (HWR) systems. Finally, we show that using a multi-pass decoder that use the computationally less expensive models in the early passes can provide an Arabic large vocabulary HWR system with practical decoding time. We evaluated the performance of our proposed Arabic HWR system using two databases of small and large lexicons. For the small lexicon data set, our system achieved competing results compared to the best reported state-of-the-art Arabic HWR systems. For the large lexicon, our system achieved promising results (accuracy and time) for a vocabulary size of 64k words with the possibility of adapting the models for specific writers to get even better results.  相似文献   

12.
近年来大词汇量连续语音识别技术得到了迅速的发展,国内外研究机构加大了对汉语和英语语音识别技术的研究,然而,维吾尔语语音识别技术的研究工作最近才起步。建立了面向大词汇量的维吾尔语语音语料库,研究了维吾尔语声学模型和语言模型建模技术、解码技术,进行了面向大词汇量的维吾尔语连续语音识别实验。对维吾尔语大词汇量连续语音识别技术进一步发展中存在的问题进行了讨论。  相似文献   

13.
Speaker recognition faces many practical difficulties, among which signal inconsistency due to environmental and acquisition channel factors is most challenging. The noise imposed to the voice signal varies greatly and a priori noise model is usually unavailable. In this article, we propose a robust speaker recognition method that employs a novel adaptive wavelet shrinkage method for noise suppression. In our method, wavelet subband coefficient thresholds are automatically computed, which are proportional to the noise contamination. In the application of wavelet shrinkage for noise removal, a dual-threshold strategy is developed to suppress noise, preserve signal coefficients and minimize the introduction of artifacts. The recognition is achieved using modification of Mel-frequency cepstral coefficient of overlapped voice signal segments. The efficacy of our method is evaluated with voice signals from two public available speech signal databases and is compared with state-of-the-art methods. It is demonstrated that our proposed method exhibits great robustness in various noise conditions. The improvement is significant especially when noise dominates the underlying speech.  相似文献   

14.
This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.  相似文献   

15.
16.
噪声鲁棒语音识别研究综述*   总被引:3,自引:1,他引:2  
针对噪声环境下的语音识别问题,对现有的噪声鲁棒语音识别技术进行讨论,阐述了噪声鲁棒语音识别研究的主要问题,并根据语音识别系统的构成将噪声鲁棒语音识别技术按照信号空间、特征空间和模型空间进行分类总结,分析了各种鲁棒语音识别技术的特点、实现,以及在语音识别中的应用。最后展望了进一步的研究方向。  相似文献   

17.
A new matching procedure based on imputing missing data by means of a local linear estimator of the underlying population regression function (that is assumed not necessarily linear) is introduced. Such a procedure is compared to other traditional approaches, more precisely hot deck methods as well as methods based on kNN estimators. The relationship between the variables of interest is assumed not necessarily linear. Performance is measured by the matching noise given by the discrepancy between the distribution generating genuine data and the distribution generating imputed values.  相似文献   

18.
A new matching procedure based on imputing missing data by means of a local linear estimator of the underlying population regression function (that is assumed not necessarily linear) is introduced. Such a procedure is compared to other traditional approaches, more precisely hot deck methods as well as methods based on kNN estimators. The relationship between the variables of interest is assumed not necessarily linear. Performance is measured by the matching noise given by the discrepancy between the distribution generating genuine data and the distribution generating imputed values.  相似文献   

19.
Discriminative classifiers are a popular approach to solving classification problems. However, one of the problems with these approaches, in particular kernel based classifiers such as support vector machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noise-specific generative kernels can be obtained. These can be used to train a noise-independent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. The noise-specific kernels used in this paper are based on Vector Taylor Series (VTS) model-based compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS, and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTS-based test-set noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models.  相似文献   

20.
In this paper, we propose a novel two-stage algorithm for the detection and removal of random-valued impulse noise using sparse representations. The main aim of the paper is to demonstrate the strength of image inpainting technique for the reconstruction of images corrupted by random-valued impulse noise at high noise densities. First, impulse locations are detected by applying the combination of sparse denoising and thresholding, based on sparse and overcomplete representations of the corrupted image. This stage optimally selects threshold values so that the sum of the number of false alarms and missed detections obtained at a particular noise level is the minimum. In the second stage, impulses, detected in the first stage, are considered as the missing pixels or holes and subsequently these holes are filled-up using an image inpainting method. Extensive simulation results on standard gray scale images show that the proposed method successfully removes random-valued impulse noise with better preservation of edges and other details compared to the existing techniques at high noise ratios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号