首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Highest quality synthetic voices remain scarce in both parametric synthesis systems and in concatenative ones. Much synthetic speech lacks naturalness, pleasantness and flexibility. While great strides have been made over the past few years in the quality of synthetic speech, there is still much work that needs to be done. Now the major challenges facing developers are how to provide optimal size, performance, extensibility, and flexibility, together with developing improved signal processing techniques. This paper focuses on issues of performance and flexibility against a background containing a brief evolution of speech synthesis; some acoustic, phonetic and linguistic issues; and the merits and demerits of two commonly used synthesis techniques: parametric and concatenative. Shortcomings of both techniques are reviewed. Methodological developments in the variable size, selection and specification of the speech units used in concatenative systems are explored and shown to provide a more positive outlook for more natural, bearable synthetic speech. Differentiating considerations in making and improving concatenative systems are explored and evaluated. Acoustic and sociophonetic criteria are reviewed for the improvement of variable synthetic voices, and a ranking of their relative importance is suggested. Future rewards are weighed against current technical and developmental challenges. The conclusion indicates some of the current and future applications of TTS.  相似文献   

The task of assigning appropriate intonation to syntheticspeech is one that requires knowledge of linguistic structure as well ascomputational possibilities. This paper surveys the basic challengesfacing the designer of a text-to-speech system, and reviews some of theperspectives on these problems that have been developed in thelinguistic literature.  相似文献   

马强 《电脑开发与应用》2004,17(4):18-19,22
分析了语音合成技术及 TTS引擎技术的基本结构 ,并结合罪犯数据档案声音系统 ,在 VB平台下给出了一种嵌入 TTS开发 32位语音合成软件的具体方法  相似文献   

This article focuses on the systematic design of a segment database which has been used to support a time-domain speech synthesis system for the Greek language. Thus, a methodology is presented for the generation of a corpus containing all possible instances of the segments for the specific language. Issues such as the phonetic coverage, the sentence selection and iterative evaluation techniques employing custom-built tools, are examined. Emphasis is placed on the comparison of the process-derived corpus to naturally-occurring corpora with respect to their suitability for use in time-domain speech synthesis. The proposed methodology generates a corpus characterised by a near-minimal size and which provides a complete coverage of the Greek language. Furthermore, within this corpus, the distribution of segmental units is similar to that of natural corpora, allowing for the extraction of multiple units in the case of the most frequently-occurring segments. The corpus creation algorithm incorporates mechanisms that enable the fine-tuning of the segment database's language-dependent characteristics and thus assists in the generation of high-quality text-to-speech synthesis.  相似文献   

语音库裁剪或语音库去冗余,是大语料库语音合成技术的一个重要问题.提出了虚拟不定长替换的概念,以弥补不定长的损失.结合合成使用变体的频度,构建了语音库裁剪算法StaRp-VPA.该算法能够以任意比例裁剪语音库.实验表明:当裁剪率小于50%时,合成自然度几乎没有下降;当裁剪率大于50%时,合成自然度也不会严重降低.  相似文献   

基于HMM的可训练中文语音合成   总被引:1,自引:0,他引:1  
本文将基于HMM的可训练语音合成方法应用到中文语音合成。通过对HMM建模参数的合理选择和优化,并基于中文语音特性设计上下文属性集以及用于模型聚类的问题集,提高其建模和训练效果。从对比评测实验结果来看, 98.5%的合成语音在改进后其音质得到改善。此外,针对合成语音节奏感不强的问题,提出了一种基于状态和声韵母单元的两层模型用于时长建模和预测,集外时长预测RMSE由29,56ms降为27.01ms。从最终的合成系统效果来看,合成语音整体稳定流畅,而且节奏感也比较强。由于合成系统所需的存贮量非常小,特别适合嵌入式应用。  相似文献   

为了实现语音在网络上的加密传输,设计了一个基于混沌序列加密算法的语音保密通信系统;针对网络传输的实时性要求,作者提出了等间距自动步方法,使传输系统具有较强的容错能力;采用Java语言、TCP/IP协议和面向对象程序设计方法,编制了一个对称C/S(客户/服务器)模型的语音保密通信软件测试分析表明,该软件具有界面友好,加密速度快,安全性高等优点。  相似文献   

文语转换是中文信息处理中研究的热点,是实现人机语音通信的一项关键技术。文章对实现中文文语转换的整个过程进行了初步分析和研究,给出了基于语音数据库的文语转换方法和实现过程。具体介绍了语音库的建立,分析了文本录入、文本分词、文本正则化、语音标注、韵律处理和语音合成等各个环节处理的内容及技术难点。  相似文献   

This paper introduces the German text-to-speech synthesis system MARY. The system's main features, namely a modular design and an XML-based system-internal data representation, are pointed out, and the properties of the individual modules are briefly presented. An interface allowing the user to access and modify intermediate processing steps without the need for a technical understanding of the system is described, along with examples of how this interface can be put to use in research, development and teaching. The usefulness of the modular and transparent design approach is further illustrated with an early prototype of an interface for emotional speech synthesis.  相似文献   

王欣  吴志勇  蔡莲红 《软件学报》2014,25(S2):63-69
语音合成技术是人机言语交互中重要的媒介方式,基元选取算法一直是拼接式语音合成中的研究重点.在传统的语音合成中基于代价函数的拼接合成基元选取算法的基础上,将双音子(diphone)的稳定段边界模型应用到单词和音节中,最后使用3种基元模型的分层不定长选音算法,从语料库中优选出最佳合成基元序列拼接合成最终语音.该算法一方面利用分层统一的不定长选音策略,尽可能地选取具有更好韵律特性和声学连续性的较大基元,从而显著减少拼接点,将有可能发生协同发音或者切分错误的拼接点包含到更大的基元内部;另一方面通过稳定段切分修改传统拼接基元边界类型,充分利用了diphone的稳定段边界良好的拼接特性,从而提高了合成语音的连续性和自然度.评测结果显示,这种方法与传统diphone拼接合成方法相比,其合成效果有显著的提升.  相似文献   

提出了一种基于最佳相位设计的语音合成技术,能够有效降低MBE声码器合成语音信号由于波形失衡而导致的饱和失真的概率.此外,为了保证合成滤波器的稳定性,对线谱频率(LSF)系数提取进行了优化.实验结果显示,合成语音信号波形近似平衡地分布在零幅度值的上下,语音听起来没有不舒服的感觉.实验结果表明,基于最佳相位设计的语音合成技术能够有效改善合成语音质量.  相似文献   

讨论了语音合成系统,在输入文档中加入注释标记的重要性和必要性;以及说明迷了实现合成器之间的兼容,便于它们与其它系统集成,而制定一个统一的文本民注释方案的重要性。  相似文献   

汉语普通话语音合成语料库TH-CoSS的建设和分析   总被引:6,自引:0,他引:6  
本文介绍了汉语语音合成语料库TH-CoSS的建设和分析。本语料库包括男女声朗读语句约2万个。语料库分为四个部分: TTS系统建库用语句、TTS系统测试用语句、特殊语调语句和特殊音节组。语料设计考虑了语料的平衡和音段、韵律信息的丰富。语料库中除了文本、语音数据外,还带有音段切分标志,标注文件采用XML格式。为了方便语音分析与开发,特研制了标注软件。本文还给出了语境特征对语音韵律影响的分析结果。  相似文献   

语音识别赋予了计算机能够识别出语音内容的功能,是人机交互技术领域的重要研究内容。随着计算机技术的发展,语音识别已经得到了成熟的发展。但是关于方言的语音识别还有很大的发展空间。中国是一个幅员辽阔、人口众多的国家,因此方言种类繁多,其中有3000多万人交流使用的重庆方言就是其中之一。采集了重庆方言的部分词语的文本文件和对应的语音文件建立语料库,根据重庆方言的发音特点,选取重庆方言的声韵母作为声学建模基元,选取隐马尔可夫模型(Hidden Markov Model, HMM)为声学模型设计了一个基于HMM的重庆方言语音识别系统。在训练过程利用语料库中训练集语料对声学模型进行训练,形成HMM模型库;在识别过程利用语料库中的测试集语料进行识别测试。实验结果表明,该系统能够实现重庆方言的语音识别,并且识别的正确率为100%。  相似文献   

Individuals with severe speech disability can benefit from the use of augmentative and alternative communication (AAC) speech output assistive technology. The recent development of tools and methods for measuring AAC performance through the collection and analysis of language samples has advanced the clinical practice of this field. The definition of a new summary measure for characterizing performance of AAC systems in use is presented. The summary measure, here named rate index, is the average communication rate (in words per minute) divided by the selection rate (in bits per second) for the language sample. Thus the unit of measure for the rate index is words per bit. The rate index provides for the comparison of communication rates adjusted for differences in selection rates. Rate index comparisons can be made between individuals using similar or different systems or for one individual under different conditions. The clinical value of the rate index is the identification of opportunity for improved communication performance. Demonstrated rate index data also can serve as evidence to be used in the selection of AAC systems. The language sample data reported in this paper were collected using automated language activity monitoring (LAM).  相似文献   

基于音素及其特征参数的维吾尔语音合成技术   总被引:4,自引:0,他引:4  
首先建立了由维吾尔语中的单音素、双音素所构成的小规模语音语料库,设计了相应的拼接单元挑选算法,利用参数调整算法对拼接单元语音信号的时长、基频和短时能量等特征参数进行调整,并利用时域平滑算法对拼接点处的语音参数进行调整,从而进一步提高了合成语音的自然度。用C Sharp 编程语言实现了上述算法,试验结果表明研究思路和技术方案的可行性。该系统具有语料库小,合成语音的可懂度和自然度较高等优势。  相似文献   

合成语音自然度客观测度   总被引:2,自引:1,他引:1  
目前合成语音的自然度有待提高,论文根据目前的研究现状提出了一种合成语音自然度的客观评价方法,该方法主要从语音韵律特征的主要参数出发,计算同一发音人的自然语音和合成语音之间的基频、时长、音强等参数的差距,其中由于两种语音基频时间不匹配,所以采用DTW(Dynamic Time Warping)算法来对两种语音的基频进行了时间弯折对准。最后再将计算结果与主观评测(MOS)的结果进行比较。实验数据表明,论文提出的基频曲线失真测度与MOS之间具有很强的相关性,从韵律特征角度给出的评价结果能够衡量合成语音的自然度。  相似文献   

The development of an audiovisual pronunciation teaching and training method and software system is discussed in this article. The method is designed to help children with speech and hearing disorders gain better control over their speech production. The teaching method is drawn up for progression from individual sound preparation to practice of sounds in sentences for four languages: English, Swedish, Slovenian, and Hungarian. The system is a general language-independent measuring tool and database editor. This database editor makes it possible to construct modules for all participant languages and for different sound groups. Two modules are under development for the system in all languages: one for teaching and training vowels to hearing-impaired children and the other for correction of misarticulated fricative sounds. In the article we present the measuring methods, the used distance score calculations of the visualized speech spectra, and problems in the evaluation of the new multimedia tool.  相似文献   

支持重音合成的汉语语音合成系统   总被引:1,自引:1,他引:1  
针对基于单元挑选的汉语语音合成系统中重音预测及实现,本文采用了知识指导下的数据驱动建模策略。首先,采用经过感知结果优化的重音检测器,实现了语音数据库的自动标注;其次,利用重音标注数据库,训练得到支持重音预测的韵律预测模型;用重音韵律预测模型替代原语音合成系统中的相应模型,从而构成了支持重音合成的语音合成系统。实验结果分析表明,基于感知结果优化的重音检测器的标注结果是可靠的;支持重音的韵律声学预测模型是合理的;新的合成系统能够合成出带有轻重变化的语音。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号