期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王欣吴志勇蔡莲红《软件学报》2014,25(S2):63-69

语音合成技术是人机言语交互中重要的媒介方式,基元选取算法一直是拼接式语音合成中的研究重点.在传统的语音合成中基于代价函数的拼接合成基元选取算法的基础上,将双音子(diphone)的稳定段边界模型应用到单词和音节中,最后使用3种基元模型的分层不定长选音算法,从语料库中优选出最佳合成基元序列拼接合成最终语音.该算法一方面利用分层统一的不定长选音策略,尽可能地选取具有更好韵律特性和声学连续性的较大基元,从而显著减少拼接点,将有可能发生协同发音或者切分错误的拼接点包含到更大的基元内部;另一方面通过稳定段切分修改传统拼接基元边界类型,充分利用了diphone的稳定段边界良好的拼接特性,从而提高了合成语音的连续性和自然度.评测结果显示,这种方法与传统diphone拼接合成方法相比,其合成效果有显著的提升. 相似文献

2.

Optimal Utterance Selection for Unit Selection Speech Synthesis Databases

Alan W. Black Kevin Lenzo 《International Journal of Speech Technology》2003,6(4):357-363

This paper describes techniques to find an optimal data set for building high quality unit-selection speech synthesis inventories. As the quality of unit-selection speech synthesis is dependent on the coverage of the database used in the selection, it is important to select the right data to record. In this paper we describe some simple techniques as well as a more complex acoustic modeling technique based on the database speaker's acoustic characteristics. Result of a simple evaluation procedure are presented justifying the technique. 相似文献

3.

Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders

Akemi Iida Nick Campbell 《International Journal of Speech Technology》2003,6(4):379-392

相似文献

4.

基于虚拟不定长的语音库裁剪方法

张巍吴晓如赵志伟王仁华《软件学报》2006,17(5):983-990

语音库裁剪或语音库去冗余,是大语料库语音合成技术的一个重要问题.提出了虚拟不定长替换的概念,以弥补不定长的损失.结合合成使用变体的频度,构建了语音库裁剪算法StaRp-VPA.该算法能够以任意比例裁剪语音库.实验表明:当裁剪率小于50%时,合成自然度几乎没有下降;当裁剪率大于50%时,合成自然度也不会严重降低. 相似文献

5.

汉语普通话语音合成语料库TH-CoSS的建设和分析 总被引：6，自引：0，他引：6

蔡莲红崔丹丹蔡锐《中文信息学报》2007,21(2):94-99

本文介绍了汉语语音合成语料库TH-CoSS的建设和分析。本语料库包括男女声朗读语句约2万个。语料库分为四个部分: TTS系统建库用语句、TTS系统测试用语句、特殊语调语句和特殊音节组。语料设计考虑了语料的平衡和音段、韵律信息的丰富。语料库中除了文本、语音数据外,还带有音段切分标志,标注文件采用XML格式。为了方便语音分析与开发,特研制了标注软件。本文还给出了语境特征对语音韵律影响的分析结果。相似文献

6.

基于音素及其特征参数的维吾尔语音合成技术 总被引：4，自引：0，他引：4

姑丽加玛丽·麦麦提艾力艾斯卡尔·艾木都拉《中文信息学报》2008,22(4):100-104

首先建立了由维吾尔语中的单音素、双音素所构成的小规模语音语料库,设计了相应的拼接单元挑选算法,利用参数调整算法对拼接单元语音信号的时长、基频和短时能量等特征参数进行调整,并利用时域平滑算法对拼接点处的语音参数进行调整,从而进一步提高了合成语音的自然度。用C Sharp 编程语言实现了上述算法,试验结果表明研究思路和技术方案的可行性。该系统具有语料库小,合成语音的可懂度和自然度较高等优势。相似文献

7.

合成语音自然度客观测度 总被引：2，自引：1，他引：1

赵博蔡莲红《计算机工程与应用》2005,41(7):32-33,152

目前合成语音的自然度有待提高,论文根据目前的研究现状提出了一种合成语音自然度的客观评价方法,该方法主要从语音韵律特征的主要参数出发,计算同一发音人的自然语音和合成语音之间的基频、时长、音强等参数的差距,其中由于两种语音基频时间不匹配,所以采用DTW(Dynamic Time Warping)算法来对两种语音的基频进行了时间弯折对准。最后再将计算结果与主观评测(MOS)的结果进行比较。实验数据表明,论文提出的基频曲线失真测度与MOS之间具有很强的相关性,从韵律特征角度给出的评价结果能够衡量合成语音的自然度。相似文献

8.

基于协同发音现象的一种汉语语音合成方法

张钦李辉戴蓓倩《小型微型计算机系统》2003,24(6):1091-1094

在目前汉语语音合成常用的波形编码合成方法中，通常是以单音节作为语音合成的声音基元．但是由于合成时音节连接处往往不能很好的过渡，导致合成语音自然度不是很好．本文针对这个问题通过对汉语中协同发音现象的研究，提出了一种新的合成声音基元选取策略，在单音节合成单元基础上增加了部分自然语音中的音节连接段作为合成单元，使用该策略结合TD-PSOLA算法进行语音合成，合成语音的自然度较通常的波形合成法有了较大的提高．相似文献

9.

Merge-Weighted Dynamic Time Warping for Speech Recognition

下载免费PDF全文

张湘莉兰骆志刚李明《计算机科学技术学报》2014,29(6):1072-1082

Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe... 相似文献

10.

基于汉语节律特征描述的语音合成模型

下载免费PDF全文

吴禀雅琚春华《计算机工程与科学》2007,29(10):128-131

汉语节律的合理使用能使合成语音表现出语篇的正确内涵和感情色彩。本文介绍了一种基于汉语节律特征描述的语音合成模型。本文首先介绍了汉语节律的停延、词重音、句重音、变调、调模等节律特征的分析和提取,详细描述了节律特征的各类情形,并阐述了基于汉语节律的语音合成算法模型,包括切词、标注、分析、定模、修正、输出的处理流程和合成语音声学参数序列{（h,l,s）}的生成。最后,给出了语音合成模型的实验结果与分析。相似文献

11.

基于语料库的藏语语音合成单元选择算法

才让卓玛才智杰《中文信息学报》2017,31(5):59-63

在基于语料库的语音合成方法中,语音合成单元选择的优劣直接影响合成语音的自然度和流畅性。该文针对藏语言文字的特点,提出以基本构件、组合构件、字、词及句单元相融合的混合单元语音合成策略,并提出了藏语语音合成混合单元选择算法。主观评价与客观评测数据表明该策略与算法有效和合理,各类合成单元在开放语料上的覆盖率与语音合成效果均达到预期的目标。相似文献

12.

最佳相位设计的MBE声码器语音合成

周群群马泳王盛青王宏远《计算机与数字工程》2012,40(9):21-23,35

提出了一种基于最佳相位设计的语音合成技术,能够有效降低MBE声码器合成语音信号由于波形失衡而导致的饱和失真的概率.此外,为了保证合成滤波器的稳定性,对线谱频率(LSF)系数提取进行了优化.实验结果显示,合成语音信号波形近似平衡地分布在零幅度值的上下,语音听起来没有不舒服的感觉.实验结果表明,基于最佳相位设计的语音合成技术能够有效改善合成语音质量. 相似文献

13.

Trainable Articulatory Control Models for Visual Speech Synthesis

Jonas Beskow 《International Journal of Speech Technology》2004,7(4):335-349

This paper deals with the problem of modelling the dynamics of articulation for a parameterised talking head based on phonetic input. Four different models are implemented and trained to reproduce the articulatory patterns of a real speaker, based on a corpus of optical measurements. Two of the models, (Cohen-Massaro and Öhman) are based on coarticulation models from speech production theory and two are based on artificial neural networks, one of which is specially intended for streaming real-time applications. The different models are evaluated through comparison between predicted and measured trajectories, which shows that the Cohen-Massaro model produces trajectories that best matches the measurements. A perceptual intelligibility experiment is also carried out, where the four data-driven models are compared against a rule-based model as well as an audio-alone condition. Results show that all models give significantly increased speech intelligibility over the audio-alone case, with the rule-based model yielding highest intelligibility score. 相似文献

14.

基于HMM的可训练中文语音合成 总被引：1，自引：0，他引：1

吴义坚王仁华《中文信息学报》2006,20(4):77-83

本文将基于HMM的可训练语音合成方法应用到中文语音合成。通过对HMM建模参数的合理选择和优化,并基于中文语音特性设计上下文属性集以及用于模型聚类的问题集,提高其建模和训练效果。从对比评测实验结果来看, 98.5%的合成语音在改进后其音质得到改善。此外,针对合成语音节奏感不强的问题,提出了一种基于状态和声韵母单元的两层模型用于时长建模和预测,集外时长预测RMSE由29,56ms降为27.01ms。从最终的合成系统效果来看,合成语音整体稳定流畅,而且节奏感也比较强。由于合成系统所需的存贮量非常小,特别适合嵌入式应用。相似文献

15.

Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian

Dragos Burileanu 《International Journal of Speech Technology》2002,5(3):211-225

相似文献

16.

Stressed Syllable Determination for Romanian Words within Speech Synthesis Applications

Eugeniu Oancea Adriana Badulescu 《International Journal of Speech Technology》2002,5(3):237-246

This paper proposes and experimentally evaluates a method to determine the stressed syllable of a word in the framework of speech synthesis in Romanian. In order to produce high quality speech, a speech synthesis system needs information about the position of the stress for each word of a sentence to be generated. Otherwise, incorrect positioning of stress (or, in the worst case, completely ignoring it) translates into poor quality synthesized speech. Since Romanian is a free-stressed language (as is English, for example), the position of the stressed syllable within a word is not clearly defined. Consequently, a set of explicit rules that can determine the exact position of the stress is difficult to generate. In order to solve this problem, we propose an original method to find stressing rules for the Romanian language as well as an algorithm to implement this method. According to this algorithm, the position of the stressed syllable is computed according to a number of word parameters encompassing morphologic, phonetic, and lexical characteristics of the word. The experimental results show that the errors of the automatic stress assignment using our method do not exceed 6%. 相似文献

17.

支持重音合成的汉语语音合成系统 总被引：1，自引：1，他引：1

朱维彬《中文信息学报》2007,21(3):122-128

针对基于单元挑选的汉语语音合成系统中重音预测及实现,本文采用了知识指导下的数据驱动建模策略。首先,采用经过感知结果优化的重音检测器,实现了语音数据库的自动标注;其次,利用重音标注数据库,训练得到支持重音预测的韵律预测模型;用重音韵律预测模型替代原语音合成系统中的相应模型,从而构成了支持重音合成的语音合成系统。实验结果分析表明,基于感知结果优化的重音检测器的标注结果是可靠的;支持重音的韵律声学预测模型是合理的;新的合成系统能够合成出带有轻重变化的语音。相似文献

18.

基于多元激励的高质量语音合成声学模型

陶建华康永国《中文信息学报》2004,18(3):74-81

传统的参数语音合成系统,多采用单纯的源滤波模型,缺少变化,通常导致在韵律变化较大或生成特定语气时,音质损伤较大。本文则在语音逆滤波过程的基础上,对声源在不同韵律特征和音色条件下的变化进行了仔细的比较分析,通过声源的重构、分类,进而形成了适用于多种韵律特征和音色特征的多元激励(Multi - Source , MS)模型。在此基础构建了基于多元激励的语音合成的声学模型,在一定意义上较大的提高了语音合成在大范围语气变化中的合成质量,对个性化语音合成,以及超小型语音合成系统的建立起到了较好的推动作用。相似文献

19.

A Multimedia, Multilingual Teaching and Training System for Children with Speech Disorders

K. Vicsi P. Roach A. Öster Z. Kacic P. Barczikay A. Tantos F. Csatári Zs. Bakcsi A. Sfakianaki 《International Journal of Speech Technology》2000,3(3-4):289-300

The development of an audiovisual pronunciation teaching and training method and software system is discussed in this article. The method is designed to help children with speech and hearing disorders gain better control over their speech production. The teaching method is drawn up for progression from individual sound preparation to practice of sounds in sentences for four languages: English, Swedish, Slovenian, and Hungarian. The system is a general language-independent measuring tool and database editor. This database editor makes it possible to construct modules for all participant languages and for different sound groups. Two modules are under development for the system in all languages: one for teaching and training vowels to hearing-impaired children and the other for correction of misarticulated fricative sounds. In the article we present the measuring methods, the used distance score calculations of the visualized speech spectra, and problems in the evaluation of the new multimedia tool. 相似文献

20.

基于阈值的小波域语音增强新算法 总被引：1，自引：0，他引：1

徐爽韩芳芳郑德忠《传感技术学报》2004,17(1):150-153

提出了一种新的基于阈值的小波域语音增强算法,采用Bark尺度小波包对含噪语音进行分解,以模拟人耳的听觉特性.采用结点阈值法,用基于谱熵的方法估计结点噪声,实验表明,该算法在多种噪声,尤其是有色噪声和非平稳噪声条件下均有较好的语音增强效果. 相似文献