首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes a new feature extraction technique using wavelet based sub-band parameters (WBSP) for classification of unaspirated Hindi stop consonants. The extracted acoustic parameters show marked deviation from the values reported for English and other languages, Hindi having distinguishing manner based features. Since acoustic parameters are difficult to be extracted automatically for speech recognition. Mel Frequency Cepstral Coefficient (MFCC) based features are usually used. MFCC are based on short time Fourier transform (STFT) which assumes the speech signal to be stationary over a short period. This assumption is specifically violated in case of stop consonants. In WBSP, from acoustic study, the features derived from CV syllables have different weighting factors with the middle segment having the maximum. The wavelet transform has been applied to splitting of signal into 8 sub-bands of different bandwidths and the variation of energy in different sub-bands is also taken into account. WBSP gives improved classification scores. The number of filters used (8) for feature extraction in WBSP is less compared to the number (24) used for MFCC. Its classification performance has been compared with four other techniques using linear classifier. Further, Principal components analysis (PCA) has also been applied to reduce dimensionality.  相似文献   

2.
This paper investigates the unique pharyngeal and uvular consonants of Arabic from the point of view of automatic speech recognition (ASR). Comparisons of the recognition error rates for these phonemes are analyzed in five experiments that involve different combinations of native and non-native Arabic speakers. The most three confusing consonants for every investigated consonant are discussed. All experiments use the Hidden Markov Model Toolkit (HTK) and the Language Data Consortium (LDC) WestPoint Modern Standard Arabic (MSA) database. Results confirm that these Arabic distinct consonants are a major source of difficulty for Arabic ASR. While the recognition rate for certain of these unique consonants such as // can drop below 35% when uttered by non-native speakers, there is advantage to include non-native speakers in ASR. Besides, regional differences in pronunciation of MSA by native Arabic speakers require the attention of Arabic ASR research.  相似文献   

3.
The purpose of this paper is the application of the Genetic Algorithms (GAs) to the supervised classification level, in order to recognize Standard Arabic (SA) fricative consonants of continuous, naturally spoken, speech. We have used GAs because of their advantages in resolving complicated optimization problems where analytic methods fail. For that, we have analyzed a corpus that contains several sentences composed of the thirteen types of fricative consonants in the initial, medium and final positions, recorded by several male Jordanian speakers. Nearly all the world’s languages contain at least one fricative sound. The SA language occupies a rather exceptional position in that nearly half of its consonants are fricatives and nearly half of fricative inventory is situated far back in the uvular, pharyngeal and glottal areas. We have used Mel-Frequency Cepstral analysis method to extract vocal tract coefficients from the speech signal. Among a set of classifiers like Bayesian, likelihood and distance classifier, we have used the distance one. It is based on the classification measure criterion. So, we formulate the supervised classification as a function optimization problem and we have used the decision rule Mahalanobis distance as the fitness function for the GA evaluation. We report promising results with a classification recognition accuracy of 82%.  相似文献   

4.
Due to the various techniques used in experimental phonetics and the language inventories, more and more has been learned about the nature of stops of the world's languages. Stop consonants occur in all languages, with voiceless unaspirated stops being the most common. The differences in voice onset time (VOT) have been termed lead vs. short lag, where VOT itself is defined as the timing between the onset of phonation and the release of the occlusion of the vocal tract.For Hungarian, no systematic analysis of the stops has been carried out thus far. This paper aims to investigate the acoustic and perceptual properties of VOTs of the three Hungarian voiceless stops when they appear in isolation (in syllables and in words) but also when they occur in spontaneous speech.The results of the acoustic analysis show a clear difference between careful and spontaneous speech. Bilabials and velars are significantly shorter in fluent speech than in careful speech (18.51 msec and 35.31 msec respectively, as opposed to 24.64 msec and 50.17 msec) while dentals seem to be unchanged (23.3 msec as opposed to 26.59 msec). Therefore, the actual duration of VOT is characteristic of the place of the articulation of stops in spontaneous speech, and VOTs of bilabials and dentals do not differ from each other in careful speech. Vowels following the stops influence them more in careful than in spontaneous speech, which can also be explained by the experimentally confirmed phenomenon of the changing quality of the present-day Hungarian vowels into the neutral vowel. Voice onset time is a specific feature of the Hungarian unaspirated plosive consonants. A further experiment was carried out to define the actual function of the VOTs of the voiceless stops in the Hungarian listeners' perception.  相似文献   

5.
6.
The paper proposes a diphone/sub-syllable method for Arabic Text-to-Speech (ATTS) systems. The proposed approach exploits the particular syllabic structure of the Arabic words. For good quality, the boundaries of the speech segments are chosen to occur only at the sustained portion of vowels. The speech segments consists of consonants–half vowels, half vowel–consonants, half vowels, middle portion of vowels, and suffix consonants. The minimum set consists of about 310 segments for classical Arabic.  相似文献   

7.
Amita Dev 《AI & Society》2009,23(4):603-612
As development of the speech recognition system entirely depends upon the spoken language used for its development, and the very fact that speech technology is highly language dependent and reverse engineering is not possible, there is an utmost need to develop such systems for Indian languages. In this paper we present the implementation of a time delay neural network system (TDNN) in a modular fashion by exploiting the hidden structure of previously phonetic subcategory network for recognition of Hindi consonants. For the present study we have selected all the Hindi phonemes for srecognition. A vocabulary of 207 Hindi words was designed for the task-specific environment and used as a database. For the recognition of phoneme, a three-layered network was constructed and the network was trained using the back propagation learning algorithm. Experiments were conducted to categorize the Hindi voiced, unvoiced stops, semi vowels, vowels, nasals and fricatives. A close observation of confusion matrix of Hindi stops revealed maximum confusion of retroflex stops with their non-retroflex counterparts.  相似文献   

8.
This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.  相似文献   

9.
10.
To provide speech prostheses for individuals with severe communication impairments, we investigated a classification method for brain computer interfaces (BCIs) using silent speech. Event-related potentials (ERPs) were recorded using scalp electrodes when five subjects imagined the vocalization of Japanese vowels, /a/, /i/, /u/, /e/, and /o/ in order and in random order, while the subjects remained silent and immobilized.For actualization, we tried to apply relevance vector machine (RVM) and RVM with Gaussian kernel (RVM-G) instead of support vector machine with Gaussian kernel (SVM-G) to reduce the calculation cost in the use of 19 channels, common special patterns (CSPs) filtering, and adaptive collection (AC). Results show that using RVM-G instead of SVM-G reduced the ratio of the number of efficient vectors to the number of training data from 97% to 55%. At this time, the averaged classification accuracies (CAs) using SVM-G and RVM-G were, respectively, 77% and 79%, showing no degradation. However, the calculation cost was more than that using SVM-G because RVM-G necessitates high calculation costs for optimization. Furthermore, results show that CAs using RVM-G were weaker than SVM-G when the training data were few. Additionally, results showed that nonlinear classification was necessary for silent speech classification.This paper serves as a beginning of feasibility study for speech prostheses using an imagined voice. Although classification for silent speech presents great potential, many feasibility problems remain.  相似文献   

11.
12.
13.
This paper proposes hybrid classification models and preprocessing methods for enhancing the consonant-vowel (CV) recognition in the presence of background noise. Background Noise is one of the major degradation in real-time environments which strongly effects the performance of speech recognition system. In this work, combined temporal and spectral processing (TSP) methods are explored for preprocessing to improve CV recognition performance. Proposed CV recognition method is carried out in two levels to reduce the similarity among large number of CV classes. In the first level vowel category of CV unit will be recognized, and in the second level consonant category will be recognized. At each level complementary evidences from hybrid models consisting of support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance. Performance of the proposed CV recognition system is evaluated on Telugu broadcast database for white and vehicle noise. The proposed preprocessing methods and hybrid classification models have improved the recognition performance compared to existed methods.  相似文献   

14.
This article describes an unrestricted vocabulary text-to-speech (TTS) conversion system for the synthesis of Standard Arabic (SA) speech. The system uses short phonetic clusters that are derived from the Arabic syllables to synthesize Arabic. Basic and phonetic variants of the synthesis units are defined after qualitative and quantitative analyses of the phonetics of SA. A speech database of the synthesis units and their phonetic variations is created and the units are tested to control their segmental quality. Besides the types of synthesis unit used, their enhancement with phonetic variants, and their segmental quality control, the production of good quality speech also depends on waveform analysis and the method used to concatenate the synthesis units together. Waveform analysis is needed to condition the selected synthesis units at their junctures to produce synthesized speech of better quality. The types of speech juncture between contiguous units, the phonetic characteristics of the sounds surrounding the junctures and the concatenation artifacts occurring across the junctures are important and will be discussed. The results of waveform analysis and smoothing algorithms will be presented. The intelligibility of synthesized Arabic by a standard intelligibility test method that is adapted to suit the Arabic phonetic characteristics and scoring the results of the tests will also be dealt with.  相似文献   

15.
16.
Studies of human speech processing have provided evidece for a segmentation strategy in the perception of continuous speech, whereby a word boundary is postulated, and a lexical access procedure initiated, at each metrically strong syllable. The likely success of this strategy was here estimated against the characteristics of the English vocabulary. Two computerized dictionaries were found to list approximately three times as many words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables containing a reduced vowel). Consideration of frequency of lexical word occurrence reveals that words beginning with strong syllables occur on average more often than words beginning with weak syllables. Together, these findings motivate an estimate for everyday speech recognition that approximately 85% of lexical words (i.e. excluding function words) will begin with strong syllables. This estimate was tested against a corpus of 190 000 words of spontaneous British English conversion. In this corpus, 90% of lexical words were found to begin with strong syllables. This suggests that a strategy of postulating word boundaries at the onset of strong syllables would have a high success rate in that few actual lexical word onsets would be missed.  相似文献   

17.
The goal of this article is the application of genetic algorithms (GAs) to the automatic speech recognition (ASR) domain at the acoustic sequences classification level. Speech recognition has been cast as a pattern classification problem where we would like to classify an input acoustic signal into one of all possible phonemes. Also, the supervised classification has been formulated as a function optimization problem. Thus, we have attempted to recognize Standard Arabic (SA) phonemes of continuous, naturally spoken speech by using GAs, which have several advantages in resolving complicated optimization problems. In SA, there are 40 sounds. We have analyzed a corpus that contains several sentences composed of the whole SA phoneme types in the initial, medium, and final positions, recorded by several male speakers. Furthermore, the acoustic segments classification and the GAs have been explored. Among a set of classifiers such as Bayesian, likelihood, and distance classifier, we have used the distance classifier. It is based on the classification measure criterion. Therefore, we have used the decision rule Manhattan distance as the fitness functions for our GA evaluations. The corpus phonemes were extracted and classified successfully with an overall accuracy of 90.20%.  相似文献   

18.
19.
20.
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号