共查询到20条相似文献,搜索用时 15 毫秒
1.
Shima Tabibian Ahmad Akbari Babak Nasersharif 《Engineering Applications of Artificial Intelligence》2013,26(7):1660-1670
Keyword spotting refers to detection of all occurrences of any given keyword in input speech utterances. In this paper, we define a keyword spotter as a binary classifier that separates a class of sentences containing a target keyword from a class of sentences which do not include the target keyword. In order to discriminate the mentioned classes, an efficient classification method and a suitable feature set are to be studied. For the classification method, we propose an evolutionary algorithm to train the separating hyper-plane between the two classes. As our discriminative feature set, we propose two confidence measure functions. The first confidence measure function computes the possibility of phonemes presence in the speech frames, and the second one determines the duration of each phoneme. We define these functions based on the acoustic, spectral and statistical features of speech. The results on TIMIT indicate that the proposed evolutionary-based discriminative keyword spotter has lower computational complexity and higher speed in both test and train phases, in comparison to the SVM-based discriminative keyword spotter. Additionally, the proposed system is robust in noisy conditions. 相似文献
2.
3.
Lior Weizman 《International journal of remote sensing》2013,34(21):5605-5617
In this study we apply a variant of a recently proposed linear subspace method, the Neighbourhood Component Analysis (NCA), to the task of hyperspectral classification. The NCA algorithm explicitly utilizes the classification performance criterion to obtain the optimal linear projection. NCA assumes nothing about the form of each class and the shape of the separating surfaces. In some cases we would like to weight the penalty function for different types of misclassifications of the algorithm. A modification of the NCA cost function is introduced for this case. Experimental studies are conducted on hyperspectral images acquired by two sensors: the Airborne Visible/Infrared Imaging Spectroradiometer (AVIRIS) and AISA-EAGLE. Experimental results confirm the superiority of the NCA classifier in the context of hyperspectral data classification over methodologies that were previously suggested. 相似文献
4.
This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce
phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers,
but has difficulties dealing with pronunciation errors like phoneme insertions or substitutions. For these pronunciation mismatches,
pronunciation modeling can make the recognition system more robust. Our approach is based on acoustic model transformation
and pronunciation modeling for multiple non-native accents. For acoustic model transformation, two approaches are evaluated:
MAP and model re-estimation. For pronunciation modeling, confusion rules (alternate pronunciations) are automatically extracted
from a small non-native speech corpus. This paper presents a novel approach to introduce confusion rules in the recognition
system which are automatically learned through pronunciation modelling. The modified HMM of a foreign spoken language phoneme
includes its canonical pronunciation along with all the alternate non-native pronunciations, so that spoken language phonemes
pronounced correctly by a non-native speaker could be recognized. We evaluate our approaches on the European project HIWIRE non-native corpus which contains English sentences pronounced by French, Italian, Greek and Spanish speakers. Two cases are
studied: the native language of the test speaker is either known or unknown. Our approach gives better recognition results
than the classical acoustic adaptation of HMM when the foreign origin of the speaker is known. We obtain 22% WER reduction
compared to the reference system. 相似文献
5.
Tao Ma Sundararajan Srinivasan Georgios Lazarou Joseph Picone 《International Journal of Speech Technology》2014,17(1):11-16
Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition. 相似文献
6.
This paper describes various speaker normalization and adaptation techniques of a knowledge data base or reference templates to new speakers in automatic speech recognition (ASR). It focuses on a technique for learning spectral transformations, based on a statistical-analysis tool (canonical correlation analysis), to adapt a standard dictionary to arbitrary speakers. The proposed method should permit to improve speaker independence in large vocabulary ASR. Application to an isolated word recognizer improved a 70% correct score to 87%.A dynamic aspect of the speaker adaptation procedure is introduced and evaluated in a particular strategy. 相似文献
7.
Automatic recognition of the speech of children is a challenging topic in computer-based speech recognition systems. Conventional feature extraction method namely Mel-frequency cepstral coefficient (MFCC) is not efficient for children's speech recognition. This paper proposes a novel fuzzy-based discriminative feature representation to address the recognition of Malay vowels uttered by children. Considering the age-dependent variational acoustical speech parameters, performance of the automatic speech recognition (ASR) systems degrades in recognition of children's speech. To solve this problem, this study addresses representation of relevant and discriminative features for children's speech recognition. The addressed methods include extraction of MFCC with narrower filter bank followed by a fuzzy-based feature selection method. The proposed feature selection provides relevant, discriminative, and complementary features. For this purpose, conflicting objective functions for measuring the goodness of the features have to be fulfilled. To this end, fuzzy formulation of the problem and fuzzy aggregation of the objectives are used to address uncertainties involved with the problem.The proposed method can diminish the dimensionality without compromising the speech recognition rate. To assess the capability of the proposed method, the study analyzed six Malay vowels from the recording of 360 children, ages 7 to 12. Upon extracting the features, two well-known classification methods, namely, MLP and HMM, were employed for the speech recognition task. Optimal parameter adjustment was performed for each classifier to adapt them for the experiments. The experiments were conducted based on a speaker-independent manner. The proposed method performed better than the conventional MFCC and a number of conventional feature selection methods in the children speech recognition task. The fuzzy-based feature selection allowed the flexible selection of the MFCCs with the best discriminative ability to enhance the difference between the vowel classes. 相似文献
8.
Pawan K. Ajmera Dattatray V. Jadhav Raghunath S. Holambe 《Pattern recognition》2011,44(10-11):2749-2759
This paper presents a new feature extraction technique for speaker recognition using Radon transform (RT) and discrete cosine transform (DCT). The spectrogram is compact, efficient in representation and carries information about acoustic features in the form of pattern. In the proposed method, speaker specific features have been extracted by applying image processing techniques to the pattern available in the spectrogram. Radon transform has been used to derive the effective acoustic features from the speech spectrogram. Radon transform adds up the pixel values in the given image along a straight line in a particular direction and at a specific displacement. The proposed technique computes Radon projections for seven orientations and captures the acoustic characteristics of the spectrogram. DCT applied on Radon projections yields low dimensional feature vector. The technique is computationally efficient, text-independent, robust to session variations and insensitive to additive noise. The performance of the proposed algorithm has been evaluated using the Texas Instruments and Massachusetts Institute of Technology (TIMIT) and our own created Shri Guru Gobind Singhji (SGGS) databases. The recognition rate of the proposed algorithm on TIMIT database (consisting of 630 speakers) is 96.69% and for SGGS database (consisting of 151 speakers) is 98.41%. These results highlight the superiority of the proposed method over some of the existing algorithms. 相似文献
9.
Linear Discriminant Analysis (LDA) has been widely used to extract linear features for classification. In real applications, the usefulness of the extracted features usually needs to be confirmed using an error rate of classification embedded in a classifier. Little attention has been paid to whether and how discriminative features themselves can be interpreted as indicators of usefulness. We refer to this as relevance, i.e., the capability of discriminative features to characterize the contribution of the original variables to classification. We approach the relevance by considering how it could be lost while extracting optimal discriminative features. Then, the discrepancy between the relevance and optimality of discriminative features is shown to originate from the “angle” between the space spanned by eigenvectors of the within-class scattering matrix, and the primary space in which the original variables reside. In particular, for a given dataset, the larger the “angle”, the less evident is the relevance discovered from optimal discriminative features. Furthermore, the relevance and optimality are regarded as two constraint conditions, or a tradeoff, in order to extract relevant-discriminative features. At last, a simulated experiment is used to show how the relevance is lost when the “angle” is changed. Experimental results based on both USPS handwritten digitals and PIE face databases show that a maximum margin criterion is a reasonable compromise between the relevance and optimality, since it approximates the averaged class margin using Euclidean distance measured in the primary space. 相似文献
10.
When evaluating transportation infrastructure projects and determining which of them will be carried out from a set of projects and given a budget constraint, several criteria need to be considered in the decision. Standard evaluation practices imply the aggregation of impacts into one utility function which is later optimized. Nevertheless these techniques used for translation of different measuring units into monetary terms are highly controversial. Multicriteria techniques can explicitly deal with different measuring units, however, they are not suitable to model interdependence relationships of projects that share a common characteristic (same route, location or target population, for instance). In this research we model this transportation planning problem, the multi-objective transportation infrastructure project selection problem (MTIPSP), as a constrained multi-objective optimization problem with quadratic objective functions, using a variation of the multi-objective 0–1 knapsack problem plus some additional constraints. Given the combinatorial nature of the problem, an evolutionary-based framework is used for the identification of Pareto solutions, and later, those with non-attractive properties are filtered using a Knee Identification Procedure. The final selection of the projects portfolio is made using a well known multicriteria decision aid method and including the decision makers’ preferences based on the existing context. 相似文献
11.
It is an effective approach to learn the influence of environmental parameters,such as additive noise and channel distortions,from training data for robust speech recognition.Most of the previous methods are based on maximum likelihood estimation criterion.However,these methods do not lead to a minimum error rate result.In this paper,a novel discriinative learning method of environmental parameters,which is based on Minimum Classification Error (MCE) criterion,is proposed.In the method,a simple classifier and the Generalized Probabilistic Descent (GPD)algorithm are adopted to iteratively learn the environmental parameters.Consequently,the clean speech features are estimated from the noisy speech features with the estimated environmental parameters,and then the estimations of clean speech features are utilized in the back-end HMM classifier,Experiments show that the best error rate reudction of 32.1% is obtained,tested on a task of 18 isolated confusion Korean words,relative to a conventional HMM system. 相似文献
12.
The aim of the paper is to propose two efficient algorithms for the numerical evaluation of Hankel transform of order ν, ν>−1 using Legendre and rationalized Haar (RH) wavelets. The philosophy behind the algorithms is to replace the part xf(x) of the integrand by its wavelet decomposition obtained by using Legendre wavelets for the first algorithm and RH wavelets for the second one, thus representing Fν(y) as a Fourier-Bessel series with coefficients depending strongly on the input function xf(x) in both the cases. Numerical evaluations of test functions with known analytical Hankel transforms illustrate the proposed algorithms. 相似文献
13.
14.
Edgar Alfredo Portilla-Flores Efrén Mezura-Montes 《Engineering Applications of Artificial Intelligence》2011,24(5):757-771
Parametric reconfiguration plays a key role in non-iterative concurrent design of mechatronic systems. This is because it allows the designer to select, among different competitive solutions, the most suitable without sacrificing sub-optimal characteristics. This paper presents a method based on an evolutionary algorithm to improve the parametric reconfiguration feature in the optimal design of a continuously variable transmission and a five-bar parallel robot. The approach considers a solution-diversity mechanism coupled with a memory of those sub-optimal solutions found during the process. Furthermore, a constraint-handling mechanism is added to bias the search to the feasible region of the search space. Differential Evolution is utilized as the search algorithm. The results obtained in a set of five experiments performed per each mechatronic system show the effectiveness of the proposed approach. 相似文献
15.
The existing object recognition methods can be classified into two categories: interest-point-based and discriminative-part-based. The interest-point-based methods do not perform well if the interest points cannot be selected very carefully. The performance of the discriminative-part-base methods is not stable if viewpoints change, because they select discriminative parts from the interest points. In addition, the discriminative-part-based methods often do not provide an incremental learning ability. To address these problems, we propose a novel method that consists of three phases. First, we use some sliding windows that are different in scale to retrieve a number of local parts from each model object and extract a feature vector for each local part retrieved. Next, we construct prototypes for the model objects by using the feature vectors obtained in the first phase. Each prototype represents a discriminative part of a model object. Then, we establish the correspondence between the local parts of a test object and those of the model objects. Finally, we compute the similarity between the test object and each model object, based on the correspondence established. The test object is recognized as the model object that has the highest similarity with the test object. The experimental results show that our proposed method outperforms or is comparable with the compared methods in terms of recognition rates on the COIL-100 dataset, Oxford buildings dataset and ETH-80 dataset, and recognizes all query images of the ZuBuD dataset. It is robust enough for distortion, occlusion, rotation, viewpoint and illumination change. In addition, we accelerate the recognition process using the C4.5 decision tree technique, and the proposed method has the ability to build prototypes incrementally. 相似文献
16.
Tsorng-Lin Chia Kuang-Bor WangZen Chen Der-Chyuan Lou 《Information Processing Letters》2002,82(2):73-81
Distance transformation (DT) has been widely used for image matching and shape analysis. In this paper, a parallel algorithm for computing distance transformation is presented. First, it is shown that the algorithm has an execution time of 6N−4 cycles, for an N×N image using a parallel architecture that requires ⌈N/2⌉ parallel processors. By doing so, the real time requirement is fulfilled and its execution time is independent of the image contents. In addition, a partition method is developed to process an image when the parallel architecture has a fixed number of processing elements (PEs); say two or more. The total execution time for an N×N image by employing a fixed number of PEs is 2[N2/M+2(M−1)], when M is the fixed number of PEs. 相似文献
17.
Traditionally FFT (fast Fourier transform) has been utilized in recognition algorithms involving speech. Other discrete transforms such as Walsh-Hadamard transform (WHT) and rapid transform (RT) can play equally important roles in the recognition process as they have advantages in implementation and hardware realization. The capability of these transforms in recognizing phonemes based on training matrices and various matching criteria is investigated. The speech data base consists of ten sentences spoken by ten different speakers (all male). For recognition purposes the speech is sectioned into 10 ms intervals and is sampled at 20 KHz. Training matrices for all the three transforms are developed. Test matrices in the transform domain are compared with the prototypes based on these criteria which led to the decision process. WHT and RT appear to offer promise and potential compared to FFT as the former are easier to implement and as the yield recognition results comparable to those of the FFT. Other distance measures and recognition schemes are proposed for improving the classification accuracy. 相似文献
18.
Behzad Zamani Ahmad Akbari Babak Nasersharif Azarakhsh Jalalvand 《Pattern recognition letters》2011,32(7):948-955
Feature extraction is an important component of pattern classification and speech recognition. Extracted features should discriminate classes from each other while being robust to environmental conditions such as noise. For this purpose, several feature transformations are proposed which can be divided into two main categories: data-dependent transformation and classifier-dependent transformation. The drawback of data-dependent transformation is that its optimization criteria are different from the measure of classification error which can potentially degrade the classifier’s performance. In this paper, we propose a framework to optimize data-dependent feature transformations such as PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis) and HLDA (Heteroscedastic LDA) using minimum classification error (MCE) as the main objective. The classifier itself is based on Hidden Markov Model (HMM). In our proposed HMM minimum classification error technique, the transformation matrices are modified to minimize the classification error for the mapped features, and the dimension of the feature vector is not changed. To evaluate the proposed methods, we conducted several experiments on the TIMIT phone recognition and the Aurora2 isolated word recognition tasks. The experimental results show that the proposed methods improve performance of PCA, LDA and HLDA transformation for mapping Mel-frequency cepstral coefficients (MFCC). 相似文献
19.
An automatic method of inspecting the scaling accuracy of needle-type instrument gauges using a two-stage Hough transform technique is described. The system measures and verifies the relative accuracy of a gauge's response to a specified set of analog input signals. The method does not require that the gauge's position, orientation, or size be known a priori and the algorithm is very suitable for high-speed hardware implementation. 相似文献
20.
《Digital Signal Processing》2006,16(3):303-319
We describe the use of the discrete wavelet transform (DWT) for non-parametric linear time-invariant system identification. Identification is achieved by using a test excitation to the system under test (SUT) that also acts as the analyzing function for the DWT of the SUT's output, so as to recover the impulse response. The method uses as excitation any signal that gives an orthogonal inner product in the DWT at some step size (that cannot be 1). We favor wavelet scaling coefficients as excitations, with a step size of 2. However, the system impulse or frequency response can then only be estimated at half the available number of points of the sampled output sequence, introducing a multirate problem that means we have to ‘oversample’ the SUT output. The method has several advantages over existing techniques, e.g., it uses a simple, easy to generate excitation, and avoids the singularity problems and the (unbounded) accumulation of round-off errors that can occur with standard techniques. In extensive simulations, identification of a variety of finite and infinite impulse response systems is shown to be considerably better than with conventional system identification methods. 相似文献