排序方式: 共有4条查询结果,搜索用时 0 毫秒
1
1.
The goal of this article is the application of genetic algorithms (GAs) to the automatic speech recognition (ASR) domain at the acoustic sequences classification level. Speech recognition has been cast as a pattern classification problem where we would like to classify an input acoustic signal into one of all possible phonemes. Also, the supervised classification has been formulated as a function optimization problem. Thus, we have attempted to recognize Standard Arabic (SA) phonemes of continuous, naturally spoken speech by using GAs, which have several advantages in resolving complicated optimization problems. In SA, there are 40 sounds. We have analyzed a corpus that contains several sentences composed of the whole SA phoneme types in the initial, medium, and final positions, recorded by several male speakers. Furthermore, the acoustic segments classification and the GAs have been explored. Among a set of classifiers such as Bayesian, likelihood, and distance classifier, we have used the distance classifier. It is based on the classification measure criterion. Therefore, we have used the decision rule Manhattan distance as the fitness functions for our GA evaluations. The corpus phonemes were extracted and classified successfully with an overall accuracy of 90.20%. 相似文献
2.
Text-to-speech conversion has traditionally been performed either by concatenating short samples of speech or by using rule-based
systems to convert a phonetic representation of speech into an acoustic representation, which is then converted into speech.
This paper describes a text-to-speech synthesis system for modern standard Arabic based on artificial neural networks and
residual excited LPC coder. The networks offer a storage-efficient means of synthesis without the need for explicit rule enumeration.
These neural networks require large prosodically labeled continuous speech databases in their training stage. As such databases
are not available for the Arabic language, we have developed one for this purpose. Thus, we discuss various stages undertaken
for this development process. In addition to interpolation capabilities of neural networks, a linear interpolation of the
coder parameters is performed to create smooth transitions at segment boundaries. A residual-excited all pole vocal tract
model and a prosodic-information synthesizer based on neural networks are also described in this paper. 相似文献
3.
Automatic Detection of Articulations Disorders from children’s speech is the major interest for the diagnosis and monitoring of articulations disorders therapy. In this work, acoustic features LPC (Linear prediction cepstrum) have been used with the two most commonly used classifier GMM-UBM (Gaussian mixture model-universal background model) and SVM (Support Vector Machines). We have used the idea of stacking the means of the GMM-UBM model to form a mean super vector and introduce the resulting super vector to SVM system. The main contribution of this paper is the used of automatic speaker recognition to detect the articulation disorder from the children speech and the investigation of the performance gained using a hybrid strategy between GMM-UBM and SVM systems. Series of experiments will be conducted; demonstrations of results from different experiments will be presented, tested and evaluated. Indeed we have found that this method is effective and robust. 相似文献
4.
The purpose of this paper is the application of the Genetic Algorithms (GAs) to the supervised classification level, in order
to recognize Standard Arabic (SA) fricative consonants of continuous, naturally spoken, speech. We have used GAs because of
their advantages in resolving complicated optimization problems where analytic methods fail. For that, we have analyzed a
corpus that contains several sentences composed of the thirteen types of fricative consonants in the initial, medium and final
positions, recorded by several male Jordanian speakers. Nearly all the world’s languages contain at least one fricative sound.
The SA language occupies a rather exceptional position in that nearly half of its consonants are fricatives and nearly half
of fricative inventory is situated far back in the uvular, pharyngeal and glottal areas. We have used Mel-Frequency Cepstral
analysis method to extract vocal tract coefficients from the speech signal. Among a set of classifiers like Bayesian, likelihood
and distance classifier, we have used the distance one. It is based on the classification measure criterion. So, we formulate
the supervised classification as a function optimization problem and we have used the decision rule Mahalanobis distance as
the fitness function for the GA evaluation. We report promising results with a classification recognition accuracy of 82%. 相似文献
1