共查询到18条相似文献,搜索用时 187 毫秒
1.
在基于语料库的语音合成方法中,语音合成单元选择的优劣直接影响合成语音的自然度和流畅性。该文针对藏语言文字的特点,提出以基本构件、组合构件、字、词及句单元相融合的混合单元语音合成策略,并提出了藏语语音合成混合单元选择算法。主观评价与客观评测数据表明该策略与算法有效和合理,各类合成单元在开放语料上的覆盖率与语音合成效果均达到预期的目标。 相似文献
2.
针对藏语的语音合成问题,根据藏语的规律和特点,提出一套完整的基于HMM模型的藏语拉萨语语音合成技术解决方案。并对其中的关键技术进行阐述,包括合成前端的语料选择、拉丁转写、分词处理、文本分析,以及后端的韵律标注、声码器技术、语音建模、问题集设计等。实验结果表明,基于该方案搭建的藏语语音合成测试系统有较好的综合得分。 相似文献
3.
4.
5.
在目前汉语语音合成常用的波形编码合成方法中,通常是以单音节作为语音合成的声音基元.但是由于合成时音节连接处往往不能很好的过渡,导致合成语音自然度不是很好.本文针对这个问题通过对汉语中协同发音现象的研究,提出了一种新的合成声音基元选取策略,在单音节合成单元基础上增加了部分自然语音中的音节连接段作为合成单元,使用该策略结合TD-PSOLA算法进行语音合成,合成语音的自然度较通常的波形合成法有了较大的提高. 相似文献
6.
提出了一种用于语音合成的语音片断基音平滑技术。在基于波形拼接的语音合成中,一般使用TD-PSOLA算法进行基频和时长的修改,但是用传统的TD-PSOLA算法进行的基频修改是针对片断整体而言,所以仍然不能很好的解决语音合成中的拼接单元之间的基频不连续问题,特别是在片断接合处。由于基元片断提取白不同语境的语料,合成语音听起来明显感觉到音高的不自然。对传统的TD-PSOLA算法进行了改进,以基音周期为间隔对语音片断信号进行分帧,通过指数加权相应帧的方法来进行平滑处理,经听音测试,较好的解决了拼接片断间的不连续现象。 相似文献
7.
在基于隐Markov模型(Hidden Markov Model,HMM)的统计参数藏语语音合成中引入了DAEM(Deterministic Annealing EM)算法,对没有时间标注的藏语训练语音进行自动时间标注。以声母和韵母为合成基元,在声母和韵母的声学模型的训练过程中,利用DAEM算法确定HMM模型的嵌入式重估的最佳参数。训练好声学模型后,再利用强制对齐自动获得声母和韵母的时间标注。实验结果表明,该方法对声母和韵母的时间标注接近手工标注的结果。对合成的藏语语音进行主观评测表明,该方法合成的藏语语音和手工标注声、韵母时间的方法合成的藏语语音的音质接近。因此,利用该方法可以在不需要声、韵母的时间标注的情况下建立合成基元的声学模型。 相似文献
8.
9.
提出一种基于统计声学模型的单元挑选语音合成算法.在模型训练阶段,首先提取语料库中语音数据的频谱、基频等声学参数,结合语料库中的音段和韵律标注来估计各上下文相关音素对应的统计声学模型,使用的模型结构为隐马尔柯夫模型.在合成阶段,以使目标合成句对应的声学模型具有最大的似然值输出为准则,来进行最佳合成单元的挑选,最后通过平滑连接各备选单元波形来生成合成语音.以此算法为基础,构建一个以声韵母为基本拼接单元的中文语音合成系统,并通过测听实验证明此算法相对传统算法在提高合成语音自然度上的有效性. 相似文献
10.
11.
音节是维吾尔语的最小发音单元,所以大部分维吾尔语语音合成系统以音节作为基本的合成单元,但维吾尔语中音节数量很大,语料库很难保证覆盖所有的音节样本,这会导致合成语音不稳定和不连续。为解决合成语音不稳定的情况,提出了结合单音素和三音素两个不同基元的单元挑选算法。通过在单元挑选模块中加入韵律参数相匹配的方法选出最佳韵律匹配的单元并解决了合成语音不连续的情况。实验结果表明,提出的方法有效地解决了合成语音不稳定和不连续的现象,从而提高了合成语音的自然度。 相似文献
12.
Syllable based text to speech synthesis system using auto associative neural network prosody prediction 总被引:1,自引:0,他引:1
This paper presents the design and development of an Auto Associative Neural Network (AANN) based unrestricted prosodic information synthesizer. Unrestricted Text To Speech System (TTS) is capable of synthesize different domain speech with improved quality. This paper deals with a corpus-driven text-to speech system based on the concatenative synthesis approach. Concatenative speech synthesis involves the concatenation of the basic units to synthesize an intelligent, natural sounding speech. A corpus-based method (unit selection) uses a large inventory to select the units and concatenate. The prosody prediction is done with the help of five layer auto associative neural network which helps us to improve the quality of speech synthesis. Here syllables are used as basic unit of speech synthesis database. The database consisting of the units along with their annotated information is called annotated speech corpus. A clustering technique is used in annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit. Discontinuities present at the unit boundaries are lowered by using the mel-LPC smoothing technique. The experiment has been made for the Dravidian language Tamil and the results reveal to demonstrate the improved intelligibility and naturalness of the proposed method. The proposed system is applicable to all the languages if the syllabification rules has been changed. 相似文献
13.
针对嵌入式设备的存储容量小、计算能力有限的特点,设计了一种基于CART(Classification and Regression Trees)决策树模型的基元预选算法和基元选取算法,可以从原始语音语料库中挑选出最有代表性的基元样本,从而有效地降低音库规模和算法的复杂度,满足了嵌入式TFS(Text-to-Speech)系统的需要。基于以上算法,移动终端上实现了一个嵌入式中文TTS系统,实验结果表明该系统的合成语音具有较高的可懂度和自然度。 相似文献
14.
基于对普通语音语料库构建方法的研究与分析,结合自然口语语音识别研究相关需求以及藏语自然口语语音的基本特点,研究设计了适用于藏语语音识别的口语语音语料库建设方案以及相应的标注规范,并据此构建了时长50小时,包含音素、半音节、音节、藏文字以及语句共5层标注信息的藏语拉萨话口语语音语料库。统计结果显示,该语料库在保留口语语音自然属性的同时,对音素、半音节等常用语音建模单元也有均衡的覆盖,为基于藏语口语语音数据的语音识别技术研究提供了可靠的数据支撑。 相似文献
15.
Bernd Möbius 《International Journal of Speech Technology》2003,6(1):57-71
One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rule-based or data-driven models are often underestimated or even unrecognized. This paper discusses the problems pertinent to rare events in four components of speech synthesis systems: in linguistic text analysis, where productive word formation processes generate a potentially unbounded lexicon and cause heavily skewed word frequency distributions; in syllabification, where some syllables occur very frequently but most phonotactically possible syllables are very infrequent; in speech timing, where most constellations of factors affecting segmental duration are sparsely or not at all represented in training databases; and in unit selection synthesis, where the uneven distribution of speech unit frequencies poses challenges to speech corpus design. Currently available techniques for coping with the problem of rare or unseen events in each of these components are reviewed. Finally, a distinction is made between a strictly closed domain with a fixed vocabulary and a merely restricted domain with loopholes for unseen words and names, and the consequences of the respective type of domain for appropriate synthesis strategies are discussed. 相似文献
16.
语音合成技术是人机言语交互中重要的媒介方式,基元选取算法一直是拼接式语音合成中的研究重点.在传统的语音合成中基于代价函数的拼接合成基元选取算法的基础上,将双音子(diphone)的稳定段边界模型应用到单词和音节中,最后使用3种基元模型的分层不定长选音算法,从语料库中优选出最佳合成基元序列拼接合成最终语音.该算法一方面利用分层统一的不定长选音策略,尽可能地选取具有更好韵律特性和声学连续性的较大基元,从而显著减少拼接点,将有可能发生协同发音或者切分错误的拼接点包含到更大的基元内部;另一方面通过稳定段切分修改传统拼接基元边界类型,充分利用了diphone的稳定段边界良好的拼接特性,从而提高了合成语音的连续性和自然度.评测结果显示,这种方法与传统diphone拼接合成方法相比,其合成效果有显著的提升. 相似文献
17.
提出了一种融合自动检错的单元挑选语音合成方法。本文方法旨在设计与主观听感更加一致的单
元挑选准则,以提高合成语音的自然度。首先利用众包网络平台快速大量地收集测听人对于合成语音的主观评价数据,取代了传统的利用具备语言学知识的专家收集主观评价数
据的方法;然后基于这些主观评价数据,提取对应语音的音节时长、单元代价以及声学参数距
离等特征,构建基于支持向量机的合成错误检测器;在合成阶段,该检测器被用来对传统单元
挑选输出的N条路径行重打分,以确定最优的单元挑选序列。倾向性测听结果表明本文方法可以有效地提高合成语音的自然度。 相似文献
18.
Sanghun Kim Youngjik Lee Keikichi Hirose 《International Journal of Speech Technology》2002,5(2):105-116
This paper describes a new Korean Text-to-Speech (TTS) system based on a large speech corpus. Conventional concatenative TTS systems still produce machine-like synthetic speech. The poor naturalness is caused by excessive prosodic modification using a small speech database. To cope with this problem, we utilized a dynamic unit selection method based on a large speech database without prosodic modification. The proposed TTS system adopts triphones as synthesis units. We designed a new sentence set maximizing phonetic or prosodic coverage of Korean triphones. All the utterances were segmented automatically into phonemes using a speech recognizer. With the segmented phonemes, we achieved a synthesis unit cost of zero if two synthesis units were placed consecutively in an utterance. This reduces the number of concatenating points that may occur due to concatenating mismatches. In this paper, we present data concerning the realization of major prosodic variations through a consideration of prosodic phrase break strength. The phrase break was divided into four kinds of strength based on pause length. Using phrase break strength, triphones were further classified to reflect major prosodic variations. To predict phrase break strength on texts, we adopted an HMM-like Part-of-Speech (POS) sequence model. The performance of the model showed 73.5% accuracy for 4-level break strength prediction. For unit selection, a Viterbi beam search was performed to find the most appropriate triphone sequence, which has the minimum continuation cost of prosody and spectrum at concatenating boundaries. From the informal listening test, we found that the proposed Korean corpus-based TTS system showed better naturalness than the conventional demisyllable-based one. 相似文献