共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
This paper describes the principles for developing the AZBAT phonetic alphabet, which was created by analogy with the DARPAbet phonetic alphabet of the English language and is oriented to creating the speech corpus and system for synthesis of Chechen speech. The experience of developers of other phonetic alphabets and databases was used; account was also taken of the features of pronunciation and graphics, rules of compatibility, and variability of phonemes, which had been described in the works of well-known Chechen philologists. The classification of vowels and consonant phonemes is given, according to which each phoneme has the attributes that are necessary to implement the program code. The designed system for synthesis of Chechen speech is assigned a basic set of acoustic phonetic elements that consists of diphones and allophones. This set will be complied with to build an acoustic phonetic database that is the basis of a system for automatic synthesis of Chechen speech. 相似文献
3.
Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a small spelling speech corpus and an existing large continuous speech corpus, are used to train hidden Markov models (HMMs). Then their recognition results are compared to each other. To solve the problem of utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed has been taken into account. Two alternative language models, bigram and trigram, are used for investigating performance of spelling speech recognition. Our approach achieves up to 98.0% letter correction rate, 97.9% letter accuracy and 82.8% utterance correction rate when the language model is trained based on trigram and the acoustic model is trained from the small spelling speech corpus with eight Gaussian mixtures. 相似文献
4.
《Computer Speech and Language》2007,21(4):580-593
This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20 ms of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer. 相似文献
5.
This article describes a novel method that models the correlation among acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by mean trajectory polynomial segment models (PSM). This method is an extension of conventional segment modeling approaches in that it describes the correlation of acoustic observations not only inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g., triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. Using the proposed method in a speaker-independent phoneme classification test resulted in a 7 to 9% relative reduction of error rate as compared with the traditional triphone segmental model system and a 31% reduction as compared with a similar triphone hidden Markov model (HMM) system. 相似文献
6.
以建立维吾尔语连续音素识别基础平台为目标,在HTK(基于隐马尔可夫模型的工具箱)的基础上,首次研究了其语言相关环节的几项关键技术;结合维吾尔语的语言特征,完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计;根据具体技术指标,录制了较大规模语音语料库;确定音素作为基元,训练了维吾尔语声学模型;在基于字母的N-gram语言模型下,得出了从语音句子向字母序列句子的识别结果;统计了维吾尔语32个音素的识别率,给出了容易混淆的音素及其根源分析,为进一步提高识别率奠定了基础。 相似文献
7.
In automatic speech recognition (ASR) systems, the speech signal is captured and parameterized at front end and evaluated
at back end using the statistical framework of hidden Markov model (HMM). The performance of these systems depend critically
on both the type of models used and the methods adopted for signal analysis. Researchers have proposed a variety of modifications
and extensions for HMM based acoustic models to overcome their limitations. In this review, we summarize most of the research
work related to HMM-ASR which has been carried out during the last three decades. We present all these approaches under three
categories, namely conventional methods, refinements and advancements of HMM. The review is presented in two parts (papers):
(i) An overview of conventional methods for acoustic phonetic modeling, (ii) Refinements and advancements of acoustic models.
Part I explores the architecture and working of the standard HMM with its limitations. It also covers different modeling units,
language models and decoders. Part II presents a review on the advances and refinements of the conventional HMM techniques
along with the current challenges and performance issues related to ASR. 相似文献
8.
9.
10.
11.
12.
在大词汇量连续语音识别应用中,优质的语音训练语料是所有识别工作的基础和前提, 能否挑选出覆盖更多语音现象的语料是提高语音识别性能的关键。该文在多种维吾尔文口语化传播平台中采集了大量口语句子语料,并考虑协同发音的影响和常用词的适用性,根据评估函数对语料筛选。经过筛选后的语料包含的三音子更加均衡和高效,囊括的语音现象更加全面,为训练准确而牢靠的语音模型打下了稳固的根基。 相似文献
13.
This article focuses on the systematic design of a segment database which has been used to support a time-domain speech synthesis
system for the Greek language. Thus, a methodology is presented for the generation of a corpus containing all possible instances
of the segments for the specific language. Issues such as the phonetic coverage, the sentence selection and iterative evaluation
techniques employing custom-built tools, are examined. Emphasis is placed on the comparison of the process-derived corpus
to naturally-occurring corpora with respect to their suitability for use in time-domain speech synthesis. The proposed methodology
generates a corpus characterised by a near-minimal size and which provides a complete coverage of the Greek language. Furthermore,
within this corpus, the distribution of segmental units is similar to that of natural corpora, allowing for the extraction
of multiple units in the case of the most frequently-occurring segments. The corpus creation algorithm incorporates mechanisms
that enable the fine-tuning of the segment database's language-dependent characteristics and thus assists in the generation
of high-quality text-to-speech synthesis. 相似文献
14.
15.
16.
Khe Chai Sim Haizhou Li 《IEEE transactions on audio, speech, and language processing》2008,16(5):1029-1037
The parallel phone recognition followed by language model (PPRLM) architecture represents one of the state-of-the-art spoken language identification systems. A PPRLM system comprises multiple parallel subsystems, where each subsystem employs a phone recognizer with a different phone set for a particular language. The phone recognizer extracts phonotactic attributes from the speech input to characterize a language. The multiple parallel subsystems are devised to capture the phonetic diversification available in the speech input. Alternatively, this paper investigates a new approach for building a PPRLM system that aims at improving the acoustic diversification among its parallel subsystems by using multiple acoustic models. These acoustic models are trained on the same speech data with the same phone set but using different model structures and training paradigms. We examine the use of various structured precision (inverse covariance) matrix modeling techniques as well as the maximum likelihood and maximum mutual information training paradigms to produce complementary acoustic models. The results show that acoustic diversification, which requires only one set of phonetically transcribed speech data, yields similar performance improvements compared to phonetic diversification. In addition, further improvements were obtained by combining both diversification factors. The best performing system reported in this paper combined phonetic and acoustic diversifications to achieve EERs of 4.71% and 8.61% on the 2003 and 2005 NIST LRE sets, respectively, compared to 5.77% and 9.94% using phonetic diversification alone. 相似文献
17.
18.
《IEEE transactions on audio, speech, and language processing》2009,17(8):1471-1482
19.