首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
Text-to-speech system (TTS), known also as speech synthesizer, is one of the important technology in the last years due to the expanding field of applications. Several works on speech synthesizer have been made on English and French, whereas many other languages, including Arabic, have been recently taken into consideration. The area of Arabic speech synthesis has not sufficient progress and it is still in its first stage with a low speech quality. In fact, speech synthesis systems face several problems (e.g. speech quality, articulatory effect, etc.). Different methods were proposed to solve these issues, such as the use of large and different unit sizes. This method is mainly implemented with the concatenative approach to improve the speech quality and several works have proved its effectiveness. This paper presents an efficient Arabic TTS system based on statistical parametric approach and non-uniform units speech synthesis. Our system includes a diacritization engine. Modern Arabic text is written without mention the vowels, called also diacritic marks. Unfortunately, these marks are very important to define the right pronunciation of the text which explains the incorporation of the diacritization engine to our system. In this work, we propose a simple approach based on deep neural networks. Deep neural networks are trained to directly predict the diacritic marks and to predict the spectral and prosodic parameters. Furthermore, we propose a new simple stacked neural network approach to improve the accuracy of the acoustic models. Experimental results show that our diacritization system allows the generation of full diacritized text with high precision and our synthesis system produces high-quality speech.  相似文献   

5.
This paper presents a new technique of high accuracy to recognize both typewritten and handwritten English and Arabic texts without thinning. After segmenting the text into lines (horizontal segmentation) and the lines into words, it separates the word into its letters. Separating a text line (row) into words and a word into letters is performed by using the region growing technique (implicit segmentation) on the basis of three essential lines in a text row. This saves time as there is no need to skeletonize or to physically isolate letters from the tested word whilst the input data involves only the basic information—the scanned text. The baseline is detected, the word contour is defined and the word is implicitly segmented into its letters according to a novel algorithm described in the paper. The extracted letter with its dots is used as one unit in the system of recognition. It is resized into a 9 × 9 matrix following bilinear interpolation after applying a lowpass filter to reduce aliasing. Then the elements are scaled to the interval [0,1]. The resulting array is considered as the input to the designed neural network. For typewritten texts, three types of Arabic letter fonts are used—Arial, Arabic Transparent and Simplified Arabic. The results showed an average recognition success rate of 93% for Arabic typewriting. This segmentation approach has also found its application in handwritten text where words are classified with a relatively high recognition rate for both Arabic and English languages. The experiments were performed in MATLAB and have shown promising results that can be a good base for further analysis and considerations of Arabic and other cursive language text recognition as well as English handwritten texts. For English handwritten classification, a success rate of about 80% in average was achieved while for Arabic handwritten text, the algorithm performance was successful in about 90%. The recent results have shown increasing success for both Arabic and English texts.  相似文献   

6.
The Arabic alphabet is used in around 27 languages, including Arabic, Persian, Kurdish, Urdu, and Jawi. Many researchers have developed systems for recognizing cursive handwritten Arabic words, using both holistic and segmentation-based approaches. This paper introduces a system that achieves high accuracy using efficient segmentation, feature extraction, and recurrent neural network (RNN). We describe a robust rule-based segmentation algorithm that uses special feature points identified in the word skeleton to segment the cursive words into graphemes. We show that careful selection from a wide range of features extracted during and after the segmentation stage produces a feature set that significantly reduces the label error. We demonstrate that using same RNN recognition engine, the segmentation approach with efficient feature extraction gives better results than a holistic approach that extracts features from raw pixels. We evaluated this segmentation approach against an improved version of the holistic system MDLSTM that won the ICDAR 2009 Arabic handwritten word recognition competition. On the IfN/ENIT database of handwritten Arabic words, the segmentation approach reduces the average label error by 18.5 %, the sequence error by 22.3 %, and the execution time by 31 %, relative to MDLSTM. This approach also has the best published accuracies on two IfN/ENIT test sets.  相似文献   

7.
8.
9.
10.
11.
The absence of short vowels in Arabic texts is the source of some difficulties in several automatic processing systems of Arabic language. Several developed hybrid systems of automatic diacritization of the Arabic texts are presented and evaluated in this paper. All these approaches are based on three phases: a morphological step followed by statistical phases based on Hidden Markov Model at the word level and at the character level. The two versions of the morpho-syntactic analyzer Alkhalil were used and tested and the outputs of this stage are the different possible diacritizations of words. A lexical database containing the most frequent words in the Arabic language has been incorporated into some systems in order to make the system faster. The learning step was performed on a large Arabic corpus and the impact of the size of this learning corpus on the performance of the system was studied. The systems use smoothing techniques to circumvent the problem of missing transitions words and the Viterbi algorithm to select the optimal solution. Our proposed system that benefits from the wealth of morphological analysis and a large diacritized corpus presents interesting experimental results in comparison to other automatic diacritization systems known until now.  相似文献   

12.
13.
In this paper we are concerned with the problem of the adaptation of non-native speech in a large-vocabulary speech recognition system for Modern Standard Arabic (MSA). A technique to adapt Hidden Markov Models (HMMs) to foreign accents by using Genetic Algorithms (GAs) in unsupervised mode is presented. The implementation requirements of GAs, such as genetic operators and objective function, have been selected to give more reliability to a global linear transformation matrix. The Minimum Phone Error (MPE) criterion is used as an objective function. The West Point Language Data Consortium (LDC) modern standard Arabic database is used throughout our experiments. Results show that significant decrease of word error rate has been achieved by the evolutionary-based approach compared to conventional Maximum Likelihood Linear Regression (MLLR), Maximum a posteriori (MAP) techniques and to the adaptation combining MLLR and MPE-based training.  相似文献   

14.
In this paper we investigated Artificial Neural Networks (ANN) based Automatic Speech Recognition (ASR) by using limited Arabic vocabulary corpora. These limited Arabic vocabulary subsets are digits and vowels carried by specific carrier words. In addition to this, Hidden Markov Model (HMM) based ASR systems are designed and compared to two ANN based systems, namely Multilayer Perceptron (MLP) and recurrent architectures, by using the same corpora. All systems are isolated word speech recognizers. The ANN based recognition system achieved 99.5% correct digit recognition. On the other hand, the HMM based recognition system achieved 98.1% correct digit recognition. With vowels carrier words, the MLP and recurrent ANN based recognition systems achieved 92.13% and 98.06, respectively, correct vowel recognition; but the HMM based recognition system achieved 91.6% correct vowel recognition.  相似文献   

15.
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.  相似文献   

16.
The paper proposes a diphone/sub-syllable method for Arabic Text-to-Speech (ATTS) systems. The proposed approach exploits the particular syllabic structure of the Arabic words. For good quality, the boundaries of the speech segments are chosen to occur only at the sustained portion of vowels. The speech segments consists of consonants–half vowels, half vowel–consonants, half vowels, middle portion of vowels, and suffix consonants. The minimum set consists of about 310 segments for classical Arabic.  相似文献   

17.
This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号