期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Keyword spotting using an evolutionary-based classifier and discriminative features

Shima Tabibian Ahmad Akbari Babak Nasersharif 《Engineering Applications of Artificial Intelligence》2013,26(7):1660-1670

Keyword spotting refers to detection of all occurrences of any given keyword in input speech utterances. In this paper, we define a keyword spotter as a binary classifier that separates a class of sentences containing a target keyword from a class of sentences which do not include the target keyword. In order to discriminate the mentioned classes, an efficient classification method and a suitable feature set are to be studied. For the classification method, we propose an evolutionary algorithm to train the separating hyper-plane between the two classes. As our discriminative feature set, we propose two confidence measure functions. The first confidence measure function computes the possibility of phonemes presence in the speech frames, and the second one determines the duration of each phoneme. We define these functions based on the acoustic, spectral and statistical features of speech. The results on TIMIT indicate that the proposed evolutionary-based discriminative keyword spotter has lower computational complexity and higher speed in both test and train phases, in comparison to the SVM-based discriminative keyword spotter. Additionally, the proposed system is robust in noisy conditions. 相似文献

2.

MPE-based discriminative linear transforms for speaker adaptation

Lan Wang Philip C. Woodland 《Computer Speech and Language》2008,22(3):256-272

相似文献

3.

Classification of hyperspectral remote-sensing images using discriminative linear projections

Lior Weizman 《International journal of remote sensing》2013,34(21):5605-5617

In this study we apply a variant of a recently proposed linear subspace method, the Neighbourhood Component Analysis (NCA), to the task of hyperspectral classification. The NCA algorithm explicitly utilizes the classification performance criterion to obtain the optimal linear projection. NCA assumes nothing about the form of each class and the shape of the separating surfaces. In some cases we would like to weight the penalty function for different types of misclassifications of the algorithm. A modification of the NCA cost function is introduced for this case. Experimental studies are conducted on hyperspectral images acquired by two sensors: the Airborne Visible/Infrared Imaging Spectroradiometer (AVIRIS) and AISA-EAGLE. Experimental results confirm the superiority of the NCA classifier in the context of hyperspectral data classification over methodologies that were previously suggested. 相似文献

4.

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

G. Bouselmi D. Fohr I. Illina 《International Journal of Speech Technology》2012,15(2):203-213

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches, pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated: MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction compared to the reference system. 相似文献

5.

Continuous speech recognition using linear dynamic models

Tao Ma Sundararajan Srinivasan Georgios Lazarou Joseph Picone 《International Journal of Speech Technology》2014,17(1):11-16

Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition. 相似文献

6.

Adaptation of automatic speech recognizers to new speakers using canonical correlation analysis techniques

K. Choukri G. Chollet 《Computer Speech and Language》1986,1(2)

This paper describes various speaker normalization and adaptation techniques of a knowledge data base or reference templates to new speakers in automatic speech recognition (ASR). It focuses on a technique for learning spectral transformations, based on a statistical-analysis tool (canonical correlation analysis), to adapt a standard dictionary to arbitrary speakers. The proposed method should permit to improve speaker independence in large vocabulary ASR. Application to an isolated word recognizer improved a 70% correct score to 87%.A dynamic aspect of the speaker adaptation procedure is introduced and evaluated in a particular strategy. 相似文献

7.

Fuzzy-based discriminative feature representation for children's speech recognition

《Digital Signal Processing》2014

Automatic recognition of the speech of children is a challenging topic in computer-based speech recognition systems. Conventional feature extraction method namely Mel-frequency cepstral coefficient (MFCC) is not efficient for children's speech recognition. This paper proposes a novel fuzzy-based discriminative feature representation to address the recognition of Malay vowels uttered by children. Considering the age-dependent variational acoustical speech parameters, performance of the automatic speech recognition (ASR) systems degrades in recognition of children's speech. To solve this problem, this study addresses representation of relevant and discriminative features for children's speech recognition. The addressed methods include extraction of MFCC with narrower filter bank followed by a fuzzy-based feature selection method. The proposed feature selection provides relevant, discriminative, and complementary features. For this purpose, conflicting objective functions for measuring the goodness of the features have to be fulfilled. To this end, fuzzy formulation of the problem and fuzzy aggregation of the objectives are used to address uncertainties involved with the problem.The proposed method can diminish the dimensionality without compromising the speech recognition rate. To assess the capability of the proposed method, the study analyzed six Malay vowels from the recording of 360 children, ages 7 to 12. Upon extracting the features, two well-known classification methods, namely, MLP and HMM, were employed for the speech recognition task. Optimal parameter adjustment was performed for each classifier to adapt them for the experiments. The experiments were conducted based on a speaker-independent manner. The proposed method performed better than the conventional MFCC and a number of conventional feature selection methods in the children speech recognition task. The fuzzy-based feature selection allowed the flexible selection of the MFCCs with the best discriminative ability to enhance the difference between the vowel classes. 相似文献

8.

Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

Pawan K. Ajmera Dattatray V. Jadhav Raghunath S. Holambe 《Pattern recognition》2011,44(10-11):2749-2759

This paper presents a new feature extraction technique for speaker recognition using Radon transform (RT) and discrete cosine transform (DCT). The spectrogram is compact, efficient in representation and carries information about acoustic features in the form of pattern. In the proposed method, speaker specific features have been extracted by applying image processing techniques to the pattern available in the spectrogram. Radon transform has been used to derive the effective acoustic features from the speech spectrogram. Radon transform adds up the pixel values in the given image along a straight line in a particular direction and at a specific displacement. The proposed technique computes Radon projections for seven orientations and captures the acoustic characteristics of the spectrogram. DCT applied on Radon projections yields low dimensional feature vector. The technique is computationally efficient, text-independent, robust to session variations and insensitive to additive noise. The performance of the proposed algorithm has been evaluated using the Texas Instruments and Massachusetts Institute of Technology (TIMIT) and our own created Shri Guru Gobind Singhji (SGGS) databases. The recognition rate of the proposed algorithm on TIMIT database (consisting of 630 speakers) is 96.69% and for SGGS database (consisting of 151 speakers) is 98.41%. These results highlight the superiority of the proposed method over some of the existing algorithms. 相似文献

9.

On the relevance of linear discriminative features

Hong Tang Nozha Boujemaa 《Information Sciences》2010,180(18):3422-3433

Linear Discriminant Analysis (LDA) has been widely used to extract linear features for classification. In real applications, the usefulness of the extracted features usually needs to be confirmed using an error rate of classification embedded in a classifier. Little attention has been paid to whether and how discriminative features themselves can be interpreted as indicators of usefulness. We refer to this as relevance, i.e., the capability of discriminative features to characterize the contribution of the original variables to classification. We approach the relevance by considering how it could be lost while extracting optimal discriminative features. Then, the discrepancy between the relevance and optimality of discriminative features is shown to originate from the “angle” between the space spanned by eigenvectors of the within-class scattering matrix, and the primary space in which the original variables reside. In particular, for a given dataset, the larger the “angle”, the less evident is the relevance discovered from optimal discriminative features. Furthermore, the relevance and optimality are regarded as two constraint conditions, or a tradeoff, in order to extract relevant-discriminative features. At last, a simulated experiment is used to show how the relevance is lost when the “angle” is changed. Experimental results based on both USPS handwritten digitals and PIE face databases show that a maximum margin criterion is a reasonable compromise between the relevance and optimality, since it approximates the averaged class margin using Euclidean distance measured in the primary space. 相似文献

10.

Multicriteria decisions on interdependent infrastructure transportation projects using an evolutionary-based framework

Juan Gaytán Iniestra Javier García Gutiérrez 《Applied Soft Computing》2009,9(2):512-526

When evaluating transportation infrastructure projects and determining which of them will be carried out from a set of projects and given a budget constraint, several criteria need to be considered in the decision. Standard evaluation practices imply the aggregation of impacts into one utility function which is later optimized. Nevertheless these techniques used for translation of different measuring units into monetary terms are highly controversial. Multicriteria techniques can explicitly deal with different measuring units, however, they are not suitable to model interdependence relationships of projects that share a common characteristic (same route, location or target population, for instance). In this research we model this transportation planning problem, the multi-objective transportation infrastructure project selection problem (MTIPSP), as a constrained multi-objective optimization problem with quadratic objective functions, using a variation of the multi-objective 0–1 knapsack problem plus some additional constraints. Given the combinatorial nature of the problem, an evolutionary-based framework is used for the identification of Pareto solutions, and later, those with non-attractive properties are filtered using a Knee Identification Procedure. The final selection of the projects portfolio is made using a well known multicriteria decision aid method and including the decision makers’ preferences based on the existing context. 相似文献

11.

Robust speech recognition method based on discriminative environment feature extraction

下载免费PDF全文

韩纪庆高文《计算机科学技术学报》2001,16(5):458-464

It is an effective approach to learn the influence of environmental parameters,such as additive noise and channel distortions,from training data for robust speech recognition.Most of the previous methods are based on maximum likelihood estimation criterion.However,these methods do not lead to a minimum error rate result.In this paper,a novel discriinative learning method of environmental parameters,which is based on Minimum Classification Error (MCE) criterion,is proposed.In the method,a simple classifier and the Generalized Probabilistic Descent (GPD)algorithm are adopted to iteratively learn the environmental parameters.Consequently,the clean speech features are estimated from the noisy speech features with the estimated environmental parameters,and then the estimations of clean speech features are utilized in the back-end HMM classifier,Experiments show that the best error rate reudction of 32.1% is obtained,tested on a task of 18 isolated confusion Korean words,relative to a conventional HMM system. 相似文献

12.

Efficient algorithms to compute Hankel transforms using wavelets

Vineet K. Singh Rajesh K. Pandey 《Computer Physics Communications》2008,179(11):812-818

The aim of the paper is to propose two efficient algorithms for the numerical evaluation of Hankel transform of order ν, ν>−1 using Legendre and rationalized Haar (RH) wavelets. The philosophy behind the algorithms is to replace the part xf(x) of the integrand by its wavelet decomposition obtained by using Legendre wavelets for the first algorithm and RH wavelets for the second one, thus representing Fν(y) as a Fourier-Bessel series with coefficients depending strongly on the input function xf(x) in both the cases. Numerical evaluations of test functions with known analytical Hankel transforms illustrate the proposed algorithms. 相似文献

13.

Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners 总被引：1，自引：0，他引：1

Caroline Jones Lynn Berry Catherine Stevens 《Computer Speech and Language》2007,21(4):641-651

相似文献

14.

Parametric reconfiguration improvement in non-iterative concurrent mechatronic design using an evolutionary-based approach

Edgar Alfredo Portilla-Flores Efrén Mezura-Montes 《Engineering Applications of Artificial Intelligence》2011,24(5):757-771

Parametric reconfiguration plays a key role in non-iterative concurrent design of mechatronic systems. This is because it allows the designer to select, among different competitive solutions, the most suitable without sacrificing sub-optimal characteristics. This paper presents a method based on an evolutionary algorithm to improve the parametric reconfiguration feature in the optimal design of a continuously variable transmission and a five-bar parallel robot. The approach considers a solution-diversity mechanism coupled with a memory of those sub-optimal solutions found during the process. Furthermore, a constraint-handling mechanism is added to bias the search to the feasible region of the search space. Differential Evolution is utilized as the search algorithm. The results obtained in a set of five experiments performed per each mechatronic system show the effectiveness of the proposed approach. 相似文献

15.

Object recognition using discriminative parts

Ying-Ho Liu Anthony J.T. Lee Fu Chang 《Computer Vision and Image Understanding》2012,116(7):854-867

The existing object recognition methods can be classified into two categories: interest-point-based and discriminative-part-based. The interest-point-based methods do not perform well if the interest points cannot be selected very carefully. The performance of the discriminative-part-base methods is not stable if viewpoints change, because they select discriminative parts from the interest points. In addition, the discriminative-part-based methods often do not provide an incremental learning ability. To address these problems, we propose a novel method that consists of three phases. First, we use some sliding windows that are different in scale to retrieve a number of local parts from each model object and extract a feature vector for each local part retrieved. Next, we construct prototypes for the model objects by using the feature vectors obtained in the first phase. Each prototype represents a discriminative part of a model object. Then, we establish the correspondence between the local parts of a test object and those of the model objects. Finally, we compute the similarity between the test object and each model object, based on the correspondence established. The test object is recognized as the model object that has the highest similarity with the test object. The experimental results show that our proposed method outperforms or is comparable with the compared methods in terms of recognition rates on the COIL-100 dataset, Oxford buildings dataset and ETH-80 dataset, and recognizes all query images of the ZuBuD dataset. It is robust enough for distortion, occlusion, rotation, viewpoint and illumination change. In addition, we accelerate the recognition process using the C4.5 decision tree technique, and the proposed method has the ability to build prototypes incrementally. 相似文献

16.

Parallel distance transforms on a linear array architecture

Tsorng-Lin Chia Kuang-Bor WangZen Chen Der-Chyuan Lou 《Information Processing Letters》2002,82(2):73-81

Distance transformation (DT) has been widely used for image matching and shape analysis. In this paper, a parallel algorithm for computing distance transformation is presented. First, it is shown that the algorithm has an execution time of 6N−4 cycles, for an N×N image using a parallel architecture that requires ⌈N/2⌉ parallel processors. By doing so, the real time requirement is fulfilled and its execution time is independent of the image contents. In addition, a partition method is developed to process an image when the parallel architecture has a fixed number of processing elements (PEs); say two or more. The total execution time for an N×N image by employing a fixed number of PEs is 2[N²/M+2(M−1)], when M is the fixed number of PEs. 相似文献

17.

Evaluation of discrete transforms for use in digital speech recognition

H.A. Barger K.R. Rao 《Computers & Electrical Engineering》1979,6(3):183-197

Traditionally FFT (fast Fourier transform) has been utilized in recognition algorithms involving speech. Other discrete transforms such as Walsh-Hadamard transform (WHT) and rapid transform (RT) can play equally important roles in the recognition process as they have advantages in implementation and hardware realization. The capability of these transforms in recognizing phonemes based on training matrices and various matching criteria is investigated. The speech data base consists of ten sentences spoken by ten different speakers (all male). For recognition purposes the speech is sectioned into 10 ms intervals and is sampled at 20 KHz. Training matrices for all the three transforms are developed. Test matrices in the transform domain are compared with the prototypes based on these criteria which led to the decision process. WHT and RT appear to offer promise and potential compared to FFT as the former are easier to implement and as the yield recognition results comparable to those of the FFT. Other distance measures and recognition schemes are proposed for improving the classification accuracy. 相似文献

18.

Optimized discriminative transformations for speech features based on minimum classification error

Behzad Zamani Ahmad Akbari Babak Nasersharif Azarakhsh Jalalvand 《Pattern recognition letters》2011,32(7):948-955

Feature extraction is an important component of pattern classification and speech recognition. Extracted features should discriminate classes from each other while being robust to environmental conditions such as noise. For this purpose, several feature transformations are proposed which can be divided into two main categories: data-dependent transformation and classifier-dependent transformation. The drawback of data-dependent transformation is that its optimization criteria are different from the measure of classification error which can potentially degrade the classifier’s performance. In this paper, we propose a framework to optimize data-dependent feature transformations such as PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis) and HLDA (Heteroscedastic LDA) using minimum classification error (MCE) as the main objective. The classifier itself is based on Hidden Markov Model (HMM). In our proposed HMM minimum classification error technique, the transformation matrices are modified to minimize the classification error for the mapped features, and the dimension of the feature vector is not changed. To evaluate the proposed methods, we conducted several experiments on the TIMIT phone recognition and the Aurora2 isolated word recognition tasks. The experimental results show that the proposed methods improve performance of PCA, LDA and HLDA transformation for mapping Mel-frequency cepstral coefficients (MFCC). 相似文献

19.

Gauge inspection using hough transforms

Dyer CR 《IEEE transactions on pattern analysis and machine intelligence》1983,(6):621-623

An automatic method of inspecting the scaling accuracy of needle-type instrument gauges using a two-stage Hough transform technique is described. The system measures and verifies the relative accuracy of a gauge's response to a specified set of analog input signals. The method does not require that the gauge's position, orientation, or size be known a priori and the algorithm is very suitable for high-speed hardware implementation. 相似文献

20.

Non-parametric linear time-invariant system identification by discrete wavelet transforms

《Digital Signal Processing》2006,16(3):303-319

We describe the use of the discrete wavelet transform (DWT) for non-parametric linear time-invariant system identification. Identification is achieved by using a test excitation to the system under test (SUT) that also acts as the analyzing function for the DWT of the SUT's output, so as to recover the impulse response. The method uses as excitation any signal that gives an orthogonal inner product in the DWT at some step size (that cannot be 1). We favor wavelet scaling coefficients as excitations, with a step size of 2. However, the system impulse or frequency response can then only be estimated at half the available number of points of the sampled output sequence, introducing a multirate problem that means we have to ‘oversample’ the SUT output. The method has several advantages over existing techniques, e.g., it uses a simple, easy to generate excitation, and avoids the singularity problems and the (unbounded) accumulation of round-off errors that can occur with standard techniques. In extensive simulations, identification of a variety of finite and infinite impulse response systems is shown to be considerably better than with conventional system identification methods. 相似文献