期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model

《Computer Speech and Language》2014,28(4):959-978

相似文献

2.

Cross-word Arabic pronunciation variation modeling for speech recognition

Dia AbuZeina Wasfi Al-Khatib Moustafa Elshafei Husni Al-Muhtaseb 《International Journal of Speech Technology》2011,14(3):227-236

相似文献

3.

Text-to-Speech Conversion of Standard Malay

Yousif A. El-Imam Zuraida Mohammed Don 《International Journal of Speech Technology》2000,3(2):129-146

相似文献

4.

GENETIC ALGORITHM APPLICATION TO THE STANDARD ARABIC PHONEMES CLASSIFICATION

M. Aissiou M. Guerti 《控制论与系统》2013,44(3):199-212

The goal of this article is the application of genetic algorithms (GAs) to the automatic speech recognition (ASR) domain at the acoustic sequences classification level. Speech recognition has been cast as a pattern classification problem where we would like to classify an input acoustic signal into one of all possible phonemes. Also, the supervised classification has been formulated as a function optimization problem. Thus, we have attempted to recognize Standard Arabic (SA) phonemes of continuous, naturally spoken speech by using GAs, which have several advantages in resolving complicated optimization problems. In SA, there are 40 sounds. We have analyzed a corpus that contains several sentences composed of the whole SA phoneme types in the initial, medium, and final positions, recorded by several male speakers. Furthermore, the acoustic segments classification and the GAs have been explored. Among a set of classifiers such as Bayesian, likelihood, and distance classifier, we have used the distance classifier. It is based on the classification measure criterion. Therefore, we have used the decision rule Manhattan distance as the fitness functions for our GA evaluations. The corpus phonemes were extracted and classified successfully with an overall accuracy of 90.20%. 相似文献

5.

The Phonetic Alphabet of the Chechen Language as a Basis of a Speech-Synthesis System

E.?S.?Izrailova Email author 《Automatic Documentation and Mathematical Linguistics》2018,52(1):51-55

This paper describes the principles for developing the AZBAT phonetic alphabet, which was created by analogy with the DARPAbet phonetic alphabet of the English language and is oriented to creating the speech corpus and system for synthesis of Chechen speech. The experience of developers of other phonetic alphabets and databases was used; account was also taken of the features of pronunciation and graphics, rules of compatibility, and variability of phonemes, which had been described in the works of well-known Chechen philologists. The classification of vowels and consonant phonemes is given, according to which each phoneme has the attributes that are necessary to implement the program code. The designed system for synthesis of Chechen speech is assigned a basic set of acoustic phonetic elements that consists of diphones and allophones. This set will be complied with to build an acoustic phonetic database that is the basis of a system for automatic synthesis of Chechen speech. 相似文献

6.

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Dia AbuZeina Wasfi Al-Khatib Moustafa Elshafei Husni Al-Muhtaseb 《International Journal of Speech Technology》2012,15(2):65-75

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model. 相似文献

7.

Arabic broadcast news transcription system

Mansour Alghamdi Moustafa Elshafei Husni Al-Muhtaseb 《International Journal of Speech Technology》2007,10(4):183-195

相似文献

8.

Synthesis of Arabic from short sound clusters

Yousif A. El-Imam 《Computer Speech and Language》2001,15(4):355

This article describes an unrestricted vocabulary text-to-speech (TTS) conversion system for the synthesis of Standard Arabic (SA) speech. The system uses short phonetic clusters that are derived from the Arabic syllables to synthesize Arabic. Basic and phonetic variants of the synthesis units are defined after qualitative and quantitative analyses of the phonetics of SA. A speech database of the synthesis units and their phonetic variations is created and the units are tested to control their segmental quality. Besides the types of synthesis unit used, their enhancement with phonetic variants, and their segmental quality control, the production of good quality speech also depends on waveform analysis and the method used to concatenate the synthesis units together. Waveform analysis is needed to condition the selected synthesis units at their junctures to produce synthesized speech of better quality. The types of speech juncture between contiguous units, the phonetic characteristics of the sounds surrounding the junctures and the concatenation artifacts occurring across the junctures are important and will be discussed. The results of waveform analysis and smoothing algorithms will be presented. The intelligibility of synthesized Arabic by a standard intelligibility test method that is adapted to suit the Arabic phonetic characteristics and scoring the results of the tests will also be dealt with. 相似文献

9.

Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

Khalid M. O. Nahar Mohammed Abu Shquier Wasfi G. Al-Khatib Husni Al-Muhtaseb Moustafa Elshafei 《International Journal of Speech Technology》2016,19(3):495-508

相似文献

10.

德语语音合成中的字音转换研究 总被引：1，自引：1，他引：0

下载免费PDF全文

王永生《计算机工程与应用》2009,45(35):132-134

字音转换是德语语音合成系统不得不解决的难题。可以使用基于规则驱动的迭代有限状态转录机来解决这一问题。在该算法中,首先在一个词库的基础上制定一些字音转换规则,然后在此规则的基础上通过迭代有限状态转录机将德语单词中的所有字素转换成音素。经过对整个词库进行算法测试,单词的字音转换正确率可以达到94.4%。相似文献

11.

现代汉字形声字声符在普通话中的表音度测查

胡韧奋曹冰杜健一《中文信息学报》2013,27(3):41-48

“形声”作为一种重要的造字方式,构筑了汉字家族中最为庞大的一支。造字之初,形声字以形符表义,以声符表音。随着时代的发展,声符的表音度渐渐发生变化,为人们准确地标音读字造成了一定困难。该文试采用聚类分析的方法,以普通话中3 500常用汉字为对象,结合语言学理论和计算机知识,依据声符表音程度相同、相似和不同制定详细分级标准,并得到每一层级的形声字表和百分数据,从而对现代汉字中形声字声符的表音度情况进行系统、直观而全面地呈现,以期为现代汉字规范的制定和汉语教学提供一定的参考和佐证。相似文献

12.

An ANN based approach to recognize initial phonemes of spoken words of Assamese language

Mousmita Sarma Kandarpa Kumar Sarma 《Applied Soft Computing》2013,13(5):2281-2291

Initial phoneme is used in spoken word recognition models. These are used to activate words starting with that phoneme in spoken word recognition models. Such investigations are critical for classification of initial phoneme into a phonetic group. A work is described in this paper using an artificial neural network (ANN) based approach to recognize initial consonant phonemes of Assamese words. A self organizing map (SOM) based algorithm is developed to segment the initial phonemes from its word counterpart. Using a combination of three types of ANN structures, namely recurrent neural network (RNN), SOM and probabilistic neural network (PNN), the proposed algorithm proves its superiority over the conventional discrete wavelet transform (DWT) based phoneme segmentation. The algorithm is exclusively designed on the basis of Assamese phonemical structure which consists of certain unique features and are grouped into six distinct phoneme families. Before applying the segmentation approach using SOM, an RNN is used to take some localized decision to classify the words into six phoneme families. Next the SOM segmented phonemes are classified into individual phonemes. A two-class PNN classification is performed with clean Assamese phonemes, to recognize the segmented phonemes. The validation of recognized phonemes is checked by matching the first formant frequency of the phoneme. Formant frequency of Assamese phonemes, estimated using the pole or formant location determination from the linear prediction model of vocal tract, is used effectively as a priori knowledge in the proposed algorithm. 相似文献

13.

Phonetic set indexing for fast lexical access

Sarukkai R.R. Ballard D.H. 《IEEE transactions on pattern analysis and machine intelligence》1998,20(1):78-82

相似文献

14.

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Amal Houidhek Vincent Colotte Zied Mnasri Denis Jouvet 《International Journal of Speech Technology》2018,21(4):895-906

相似文献

15.

Fassieh¯, a Semi-Automatic Visual Interactive Tool for Morphological, PoS-Tags, Phonetic, and Semantic Annotation of Arabic Text Corpora

《IEEE transactions on audio, speech, and language processing》2009,17(5):916-925

相似文献

16.

Revisiting distinctive phonetic features from applied computing perspective: unifying views and analyzing modern Arabic speech varieties

Yasser Seddiq Yousef Alotaibi Ali Meftah Sid-Ahmed Selouani Mansour Alghamdi 《International Journal of Speech Technology》2018,21(4):907-913

相似文献

17.

基于HTK的维吾尔语连续音素识别技术研究

米日古力·阿布都热素米吉提·阿不力米提艾克白尔·帕塔尔艾斯卡尔·艾木都拉《计算机工程与应用》2013,(22):150-154,172

以建立维吾尔语连续音素识别基础平台为目标,在HTK（基于隐马尔可夫模型的工具箱）的基础上,首次研究了其语言相关环节的几项关键技术;结合维吾尔语的语言特征,完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计;根据具体技术指标,录制了较大规模语音语料库;确定音素作为基元,训练了维吾尔语声学模型;在基于字母的N-gram语言模型下,得出了从语音句子向字母序列句子的识别结果;统计了维吾尔语32个音素的识别率,给出了容易混淆的音素及其根源分析,为进一步提高识别率奠定了基础。相似文献

18.

基于TensorFlow的俄语词汇标音系统

冯伟易绵竹马延周《计算机应用》2018,38(4):971-977

针对俄语语音合成和语音识别系统中发音词典规模有限的问题,提出一种基于长短时记忆（LSTM）序列到序列模型的俄语词汇标音算法,同时设计实现了标音原型系统。首先,对基于SAMPA的俄语音素集进行了改进设计,使标音结果能够反映俄语单词的重音位置及元音弱化现象,并依据改进的新音素集构建了包含20 000词的俄语发音词典;然后利用TensorFlow框架实现了这一算法,该算法通过编码LSTM将俄语单词转换为固定维数的向量,再通过解码LSTM将向量转换为目标发音序列;最后,设计实现了具有交互式单词标音等功能的俄语词汇标音系统。实验结果表明,该算法在集外词测试集上的词形正确率达到了74.8%,音素正确率达到了94.5%,均高于Phonetisaurus方法。该系统能够有效为俄语发音词典的构建提供支持。相似文献

19.

Using Broad Phonetic Group Experts for Improved Speech Recognition

Patricia Scanlon Daniel P. W. Ellis Richard B. Reilly 《IEEE transactions on audio, speech, and language processing》2007,15(3):803-812

In phoneme recognition experiments, it was found that approximately 75% of misclassified frames were assigned labels within the same broad phonetic group (BPG). While the phoneme can be described as the smallest distinguishable unit of speech, phonemes within BPGs contain very similar characteristics and can be easily confused. However, different BPGs, such as vowels and stops, possess very different spectral and temporal characteristics. In order to accommodate the full range of phonemes, acoustic models of speech recognition systems calculate input features from all frequencies over a large temporal context window. A new phoneme classifier is proposed consisting of a modular arrangement of experts, with one expert assigned to each BPG and focused on discriminating between phonemes within that BPG. Due to the different temporal and spectral structure of each BPG, novel feature sets are extracted using mutual information, to select a relevant time-frequency (TF) feature set for each expert. To construct a phone recognition system, the output of each expert is combined with a baseline classifier under the guidance of a separate BPG detector. Considering phoneme recognition experiments using the TIMIT continuous speech corpus, the proposed architecture afforded significant error rate reductions up to 5% relative 相似文献

20.

Grapheme-to-Phoneme Conversion of Arabic Numeral Expressions for Embedded TTS Systems

Youngim Jung Aesun Yoon Hyuk-Chul Kwon 《IEEE transactions on audio, speech, and language processing》2007,15(1):296-309

Despite the increasing need for accuracy, current text-to-speech (TTS) systems are still poor at generating the correct pronunciation of Arabic numerals due to their high ambiguity and various interpretations. In this paper, we propose a mini-transliteration system for Arabic-numeral expressions, which can efficiently and correctly convert Arabic numeral expressions found in Korean text into phonemes for embedded TTS systems. For the purpose of building grapheme-to-phoneme rules, we deduced the components of ANEs, and investigated their pattern and arithmetic features based on the analyzed corpus. A word sense disambiguation based on lexical hierarchies in KorLex 1.0 was developed to resolve ambiguities caused by the homographic components of the ANEs. Our system minimized the amount of memory used by 1) separating the morphological analysis module from the transliteration system, 2) compacting the lexicon size, and 3) removing named entities. It reduced the process time dramatically without any serious loss of accuracy, and showed an accuracy of 97.2%-98.3%, which was 21.4%-22.5% higher than that of the baseline, and 5.5%-19.5% higher than current commercial Korean TTS systems 相似文献