首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
This paper describes the principles for developing the AZBAT phonetic alphabet, which was created by analogy with the DARPAbet phonetic alphabet of the English language and is oriented to creating the speech corpus and system for synthesis of Chechen speech. The experience of developers of other phonetic alphabets and databases was used; account was also taken of the features of pronunciation and graphics, rules of compatibility, and variability of phonemes, which had been described in the works of well-known Chechen philologists. The classification of vowels and consonant phonemes is given, according to which each phoneme has the attributes that are necessary to implement the program code. The designed system for synthesis of Chechen speech is assigned a basic set of acoustic phonetic elements that consists of diphones and allophones. This set will be complied with to build an acoustic phonetic database that is the basis of a system for automatic synthesis of Chechen speech.  相似文献   

3.
Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a small spelling speech corpus and an existing large continuous speech corpus, are used to train hidden Markov models (HMMs). Then their recognition results are compared to each other. To solve the problem of utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed has been taken into account. Two alternative language models, bigram and trigram, are used for investigating performance of spelling speech recognition. Our approach achieves up to 98.0% letter correction rate, 97.9% letter accuracy and 82.8% utterance correction rate when the language model is trained based on trigram and the acoustic model is trained from the small spelling speech corpus with eight Gaussian mixtures.  相似文献   

4.
This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20 ms of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer.  相似文献   

5.
This article describes a novel method that models the correlation among acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by mean trajectory polynomial segment models (PSM). This method is an extension of conventional segment modeling approaches in that it describes the correlation of acoustic observations not only inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g., triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. Using the proposed method in a speaker-independent phoneme classification test resulted in a 7 to 9% relative reduction of error rate as compared with the traditional triphone segmental model system and a 31% reduction as compared with a similar triphone hidden Markov model (HMM) system.  相似文献   

6.
以建立维吾尔语连续音素识别基础平台为目标,在HTK(基于隐马尔可夫模型的工具箱)的基础上,首次研究了其语言相关环节的几项关键技术;结合维吾尔语的语言特征,完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计;根据具体技术指标,录制了较大规模语音语料库;确定音素作为基元,训练了维吾尔语声学模型;在基于字母的N-gram语言模型下,得出了从语音句子向字母序列句子的识别结果;统计了维吾尔语32个音素的识别率,给出了容易混淆的音素及其根源分析,为进一步提高识别率奠定了基础。  相似文献   

7.
In automatic speech recognition (ASR) systems, the speech signal is captured and parameterized at front end and evaluated at back end using the statistical framework of hidden Markov model (HMM). The performance of these systems depend critically on both the type of models used and the methods adopted for signal analysis. Researchers have proposed a variety of modifications and extensions for HMM based acoustic models to overcome their limitations. In this review, we summarize most of the research work related to HMM-ASR which has been carried out during the last three decades. We present all these approaches under three categories, namely conventional methods, refinements and advancements of HMM. The review is presented in two parts (papers): (i) An overview of conventional methods for acoustic phonetic modeling, (ii) Refinements and advancements of acoustic models. Part I explores the architecture and working of the standard HMM with its limitations. It also covers different modeling units, language models and decoders. Part II presents a review on the advances and refinements of the conventional HMM techniques along with the current challenges and performance issues related to ASR.  相似文献   

8.
9.
10.
11.
12.
在大词汇量连续语音识别应用中,优质的语音训练语料是所有识别工作的基础和前提, 能否挑选出覆盖更多语音现象的语料是提高语音识别性能的关键。该文在多种维吾尔文口语化传播平台中采集了大量口语句子语料,并考虑协同发音的影响和常用词的适用性,根据评估函数对语料筛选。经过筛选后的语料包含的三音子更加均衡和高效,囊括的语音现象更加全面,为训练准确而牢靠的语音模型打下了稳固的根基。  相似文献   

13.
This article focuses on the systematic design of a segment database which has been used to support a time-domain speech synthesis system for the Greek language. Thus, a methodology is presented for the generation of a corpus containing all possible instances of the segments for the specific language. Issues such as the phonetic coverage, the sentence selection and iterative evaluation techniques employing custom-built tools, are examined. Emphasis is placed on the comparison of the process-derived corpus to naturally-occurring corpora with respect to their suitability for use in time-domain speech synthesis. The proposed methodology generates a corpus characterised by a near-minimal size and which provides a complete coverage of the Greek language. Furthermore, within this corpus, the distribution of segmental units is similar to that of natural corpora, allowing for the extraction of multiple units in the case of the most frequently-occurring segments. The corpus creation algorithm incorporates mechanisms that enable the fine-tuning of the segment database's language-dependent characteristics and thus assists in the generation of high-quality text-to-speech synthesis.  相似文献   

14.
15.
水声对抗仿真环境设计   总被引:2,自引:1,他引:2  
该文以现代海战为背景,分析了水声对抗参与单元的技术性能参数及详细作战过程。利用系统仿真的方法,为各参与单元及水声环境建立了数学模型和仿真模型,利用结构化的设计思想,提出了建立水声对抗仿真系统的方法,给出了仿真系统的原理结构及实现过程。该系统利用VC 语言进行开发,具有良好的实用性、可扩展性和经济性。可用于水声对抗系统的研制、性能检测、效能评估和技术改进,同时也可用于水声对抗过程的模拟训练,具有良好的应用前景。  相似文献   

16.
The parallel phone recognition followed by language model (PPRLM) architecture represents one of the state-of-the-art spoken language identification systems. A PPRLM system comprises multiple parallel subsystems, where each subsystem employs a phone recognizer with a different phone set for a particular language. The phone recognizer extracts phonotactic attributes from the speech input to characterize a language. The multiple parallel subsystems are devised to capture the phonetic diversification available in the speech input. Alternatively, this paper investigates a new approach for building a PPRLM system that aims at improving the acoustic diversification among its parallel subsystems by using multiple acoustic models. These acoustic models are trained on the same speech data with the same phone set but using different model structures and training paradigms. We examine the use of various structured precision (inverse covariance) matrix modeling techniques as well as the maximum likelihood and maximum mutual information training paradigms to produce complementary acoustic models. The results show that acoustic diversification, which requires only one set of phonetically transcribed speech data, yields similar performance improvements compared to phonetic diversification. In addition, further improvements were obtained by combining both diversification factors. The best performing system reported in this paper combined phonetic and acoustic diversifications to achieve EERs of 4.71% and 8.61% on the 2003 and 2005 NIST LRE sets, respectively, compared to 5.77% and 9.94% using phonetic diversification alone.  相似文献   

17.
18.
This paper presents our work in automatic speech recognition (ASR) in the context of under-resourced languages with application to Vietnamese. Different techniques for bootstrapping acoustic models are presented. First, we present the use of acoustic–phonetic unit distances and the potential of crosslingual acoustic modeling for under-resourced languages. Experimental results on Vietnamese showed that with only a few hours of target language speech data, crosslingual context independent modeling worked better than crosslingual context dependent modeling. However, it was outperformed by the latter one, when more speech data were available. We concluded, therefore, that in both cases, crosslingual systems are better than monolingual baseline systems. The proposal of grapheme-based acoustic modeling, which avoids building a phonetic dictionary, is also investigated in our work. Finally, since the use of sub-word units (morphemes, syllables, characters, etc.) can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling for under-resourced languages, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. The proposed lattice combination scheme results in a relative syllable error rate reduction of 6.6% over the sentence MAP baseline method for a Vietnamese ASR task.   相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号