首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 170 毫秒
1.
2.
3.
4.
The goal of this article is the application of genetic algorithms (GAs) to the automatic speech recognition (ASR) domain at the acoustic sequences classification level. Speech recognition has been cast as a pattern classification problem where we would like to classify an input acoustic signal into one of all possible phonemes. Also, the supervised classification has been formulated as a function optimization problem. Thus, we have attempted to recognize Standard Arabic (SA) phonemes of continuous, naturally spoken speech by using GAs, which have several advantages in resolving complicated optimization problems. In SA, there are 40 sounds. We have analyzed a corpus that contains several sentences composed of the whole SA phoneme types in the initial, medium, and final positions, recorded by several male speakers. Furthermore, the acoustic segments classification and the GAs have been explored. Among a set of classifiers such as Bayesian, likelihood, and distance classifier, we have used the distance classifier. It is based on the classification measure criterion. Therefore, we have used the decision rule Manhattan distance as the fitness functions for our GA evaluations. The corpus phonemes were extracted and classified successfully with an overall accuracy of 90.20%.  相似文献   

5.
This paper describes the principles for developing the AZBAT phonetic alphabet, which was created by analogy with the DARPAbet phonetic alphabet of the English language and is oriented to creating the speech corpus and system for synthesis of Chechen speech. The experience of developers of other phonetic alphabets and databases was used; account was also taken of the features of pronunciation and graphics, rules of compatibility, and variability of phonemes, which had been described in the works of well-known Chechen philologists. The classification of vowels and consonant phonemes is given, according to which each phoneme has the attributes that are necessary to implement the program code. The designed system for synthesis of Chechen speech is assigned a basic set of acoustic phonetic elements that consists of diphones and allophones. This set will be complied with to build an acoustic phonetic database that is the basis of a system for automatic synthesis of Chechen speech.  相似文献   

6.
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.  相似文献   

7.
8.
This article describes an unrestricted vocabulary text-to-speech (TTS) conversion system for the synthesis of Standard Arabic (SA) speech. The system uses short phonetic clusters that are derived from the Arabic syllables to synthesize Arabic. Basic and phonetic variants of the synthesis units are defined after qualitative and quantitative analyses of the phonetics of SA. A speech database of the synthesis units and their phonetic variations is created and the units are tested to control their segmental quality. Besides the types of synthesis unit used, their enhancement with phonetic variants, and their segmental quality control, the production of good quality speech also depends on waveform analysis and the method used to concatenate the synthesis units together. Waveform analysis is needed to condition the selected synthesis units at their junctures to produce synthesized speech of better quality. The types of speech juncture between contiguous units, the phonetic characteristics of the sounds surrounding the junctures and the concatenation artifacts occurring across the junctures are important and will be discussed. The results of waveform analysis and smoothing algorithms will be presented. The intelligibility of synthesized Arabic by a standard intelligibility test method that is adapted to suit the Arabic phonetic characteristics and scoring the results of the tests will also be dealt with.  相似文献   

9.
10.
德语语音合成中的字音转换研究   总被引:1,自引:1,他引:0       下载免费PDF全文
字音转换是德语语音合成系统不得不解决的难题。可以使用基于规则驱动的迭代有限状态转录机来解决这一问题。在该算法中,首先在一个词库的基础上制定一些字音转换规则,然后在此规则的基础上通过迭代有限状态转录机将德语单词中的所有字素转换成音素。经过对整个词库进行算法测试,单词的字音转换正确率可以达到94.4%。  相似文献   

11.
“形声”作为一种重要的造字方式,构筑了汉字家族中最为庞大的一支。造字之初,形声字以形符表义,以声符表音。随着时代的发展,声符的表音度渐渐发生变化,为人们准确地标音读字造成了一定困难。该文试采用聚类分析的方法,以普通话中3 500常用汉字为对象,结合语言学理论和计算机知识,依据声符表音程度相同、相似和不同制定详细分级标准,并得到每一层级的形声字表和百分数据,从而对现代汉字中形声字声符的表音度情况进行系统、直观而全面地呈现,以期为现代汉字规范的制定和汉语教学提供一定的参考和佐证。  相似文献   

12.
Initial phoneme is used in spoken word recognition models. These are used to activate words starting with that phoneme in spoken word recognition models. Such investigations are critical for classification of initial phoneme into a phonetic group. A work is described in this paper using an artificial neural network (ANN) based approach to recognize initial consonant phonemes of Assamese words. A self organizing map (SOM) based algorithm is developed to segment the initial phonemes from its word counterpart. Using a combination of three types of ANN structures, namely recurrent neural network (RNN), SOM and probabilistic neural network (PNN), the proposed algorithm proves its superiority over the conventional discrete wavelet transform (DWT) based phoneme segmentation. The algorithm is exclusively designed on the basis of Assamese phonemical structure which consists of certain unique features and are grouped into six distinct phoneme families. Before applying the segmentation approach using SOM, an RNN is used to take some localized decision to classify the words into six phoneme families. Next the SOM segmented phonemes are classified into individual phonemes. A two-class PNN classification is performed with clean Assamese phonemes, to recognize the segmented phonemes. The validation of recognized phonemes is checked by matching the first formant frequency of the phoneme. Formant frequency of Assamese phonemes, estimated using the pole or formant location determination from the linear prediction model of vocal tract, is used effectively as a priori knowledge in the proposed algorithm.  相似文献   

13.
14.
15.
16.
17.
以建立维吾尔语连续音素识别基础平台为目标,在HTK(基于隐马尔可夫模型的工具箱)的基础上,首次研究了其语言相关环节的几项关键技术;结合维吾尔语的语言特征,完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计;根据具体技术指标,录制了较大规模语音语料库;确定音素作为基元,训练了维吾尔语声学模型;在基于字母的N-gram语言模型下,得出了从语音句子向字母序列句子的识别结果;统计了维吾尔语32个音素的识别率,给出了容易混淆的音素及其根源分析,为进一步提高识别率奠定了基础。  相似文献   

18.
针对俄语语音合成和语音识别系统中发音词典规模有限的问题,提出一种基于长短时记忆(LSTM)序列到序列模型的俄语词汇标音算法,同时设计实现了标音原型系统。首先,对基于SAMPA的俄语音素集进行了改进设计,使标音结果能够反映俄语单词的重音位置及元音弱化现象,并依据改进的新音素集构建了包含20 000词的俄语发音词典;然后利用TensorFlow框架实现了这一算法,该算法通过编码LSTM将俄语单词转换为固定维数的向量,再通过解码LSTM将向量转换为目标发音序列;最后,设计实现了具有交互式单词标音等功能的俄语词汇标音系统。实验结果表明,该算法在集外词测试集上的词形正确率达到了74.8%,音素正确率达到了94.5%,均高于Phonetisaurus方法。该系统能够有效为俄语发音词典的构建提供支持。  相似文献   

19.
In phoneme recognition experiments, it was found that approximately 75% of misclassified frames were assigned labels within the same broad phonetic group (BPG). While the phoneme can be described as the smallest distinguishable unit of speech, phonemes within BPGs contain very similar characteristics and can be easily confused. However, different BPGs, such as vowels and stops, possess very different spectral and temporal characteristics. In order to accommodate the full range of phonemes, acoustic models of speech recognition systems calculate input features from all frequencies over a large temporal context window. A new phoneme classifier is proposed consisting of a modular arrangement of experts, with one expert assigned to each BPG and focused on discriminating between phonemes within that BPG. Due to the different temporal and spectral structure of each BPG, novel feature sets are extracted using mutual information, to select a relevant time-frequency (TF) feature set for each expert. To construct a phone recognition system, the output of each expert is combined with a baseline classifier under the guidance of a separate BPG detector. Considering phoneme recognition experiments using the TIMIT continuous speech corpus, the proposed architecture afforded significant error rate reductions up to 5% relative  相似文献   

20.
Despite the increasing need for accuracy, current text-to-speech (TTS) systems are still poor at generating the correct pronunciation of Arabic numerals due to their high ambiguity and various interpretations. In this paper, we propose a mini-transliteration system for Arabic-numeral expressions, which can efficiently and correctly convert Arabic numeral expressions found in Korean text into phonemes for embedded TTS systems. For the purpose of building grapheme-to-phoneme rules, we deduced the components of ANEs, and investigated their pattern and arithmetic features based on the analyzed corpus. A word sense disambiguation based on lexical hierarchies in KorLex 1.0 was developed to resolve ambiguities caused by the homographic components of the ANEs. Our system minimized the amount of memory used by 1) separating the morphological analysis module from the transliteration system, 2) compacting the lexicon size, and 3) removing named entities. It reduced the process time dramatically without any serious loss of accuracy, and showed an accuracy of 97.2%-98.3%, which was 21.4%-22.5% higher than that of the baseline, and 5.5%-19.5% higher than current commercial Korean TTS systems  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号