期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱秀红于振华王煦法《计算机应用》2004,24(7):64-65,68

文中在原有嵌入式合成系统基础上引入不定长单元挑选、拼接技术提升系统语音合成效果的自然度，并且运用聚类算法对音库中不定长单元进行裁减，降低挑选算法的复杂度，减少系统的资源消耗，从而达到资源消耗和合成效果最佳平衡。相似文献

2.

基于数据驱动方法的汉语文本-可视语音合成 总被引：7，自引：0，他引：7

王志明蔡莲红艾海舟《软件学报》2005,16(6):1054-1063

计算机文本-可视语音合成系统(TTVS)可以增强语音的可懂度,并使人机交互界面变得更为友好.给出一个基于数据驱动方法(基于样本方法)的汉语文本-可视语音合成系统,通过将小段视频拼接生成新的可视语音.给出一种构造汉语声韵母视觉混淆树的有效方法,并提出了一个基于视觉混淆树和硬度因子的协同发音模型,模型可用于分析阶段的语料库选取和合成阶段的基元选取.对于拼接边界处两帧图像的明显差别,采用图像变形技术进行平滑并.结合已有的文本-语音合成系统(TTS),实现了一个中文文本视觉语音合成系统. 相似文献

3.

混合单元选择语音合成系统的目标代价构建

下载免费PDF全文

蔡文彬魏云龙徐海华潘林《计算机工程与应用》2018,54(24):20-25

合成语音的基元是通过最小化目标代价和拼接代价来选取。由于拼接基元涉及复杂的语言学、声学特性,如何选择能准确描述基元信息的声学特征（或语言学特征）并构建相应目标代价是提高合成语音质量的关键。从声学特征和声学模型两个方面对目标代价构建进行了探究。实验结果表明,经过相似语料训练后微调的深度声学网络模型,预测的瓶颈特征更能表征拼接基元特性,从而指导目标代价筛选理想候选单元,提高合成语音的质量。相似文献

4.

基于指数加权的基音平滑算法

屈小刚蒋保臣《计算机工程与设计》2006,27(17):3265-3266,3308

提出了一种用于语音合成的语音片断基音平滑技术。在基于波形拼接的语音合成中，一般使用TD-PSOLA算法进行基频和时长的修改，但是用传统的TD-PSOLA算法进行的基频修改是针对片断整体而言，所以仍然不能很好的解决语音合成中的拼接单元之间的基频不连续问题，特别是在片断接合处。由于基元片断提取白不同语境的语料，合成语音听起来明显感觉到音高的不自然。对传统的TD-PSOLA算法进行了改进，以基音周期为间隔对语音片断信号进行分帧，通过指数加权相应帧的方法来进行平滑处理，经听音测试，较好的解决了拼接片断间的不连续现象。相似文献

5.

基于协同发音现象的一种汉语语音合成方法

张钦李辉戴蓓倩《小型微型计算机系统》2003,24(6):1091-1094

在目前汉语语音合成常用的波形编码合成方法中，通常是以单音节作为语音合成的声音基元．但是由于合成时音节连接处往往不能很好的过渡，导致合成语音自然度不是很好．本文针对这个问题通过对汉语中协同发音现象的研究，提出了一种新的合成声音基元选取策略，在单音节合成单元基础上增加了部分自然语音中的音节连接段作为合成单元，使用该策略结合TD-PSOLA算法进行语音合成，合成语音的自然度较通常的波形合成法有了较大的提高．相似文献

6.

多基元及韵律参数匹配的维吾尔语语音合成方法

下载免费PDF全文

姑丽加玛丽·麦麦提艾力艾斯卡尔·肉孜艾斯卡尔·艾木都拉《计算机工程与应用》2012,48(2):116-118

音节是维吾尔语的最小发音单元,所以大部分维吾尔语语音合成系统以音节作为基本的合成单元,但维吾尔语中音节数量很大,语料库很难保证覆盖所有的音节样本,这会导致合成语音不稳定和不连续。为解决合成语音不稳定的情况,提出了结合单音素和三音素两个不同基元的单元挑选算法。通过在单元挑选模块中加入韵律参数相匹配的方法选出最佳韵律匹配的单元并解决了合成语音不连续的情况。实验结果表明,提出的方法有效地解决了合成语音不稳定和不连续的现象,从而提高了合成语音的自然度。相似文献

7.

基于关联规则的藏语语音韵律参数提取

李勇于洪志达哇彭措《微计算机信息》2009,25(6)

韵律规则对于语音识别和语音合成具有重要意义,韵律特征参数的描述正确与否直接影响合成系统的输出.为了提高藏语语音合成中语音的自然度,本文研究了基于数据挖掘中的关联规则来发现韵律参数之间的相互关系,并基于关联规则算法获得藏语韵律参数中基频参数的变化规则,这些规则可以为藏语语音合成系统的选音提供帮助. 相似文献

8.

情感语音合成中韵律参数的基频研究

王敬华刘建银张国燕赵新想《小型微型计算机系统》2013,34(9)

在语音合成技术的研究中,情感语音合成是当前研究的热点.在众多研究因素中,建立恰当的韵律模型和选取好的韵律参数是研究的关键,它们描述的正确与否,直接影响到情感语音合成的输出效果.为了攻克提高情感语音自然度这一难点,对影响情感语音合成技术韵律参数进行了分析,建立了基于关联规则的情感语音韵律基频模型.本文通过研究关联规则、改进数据挖掘Apriori算法并由此来获得韵律参数中基频变化规则,并为情感语音合成的选音提供指导和帮助. 相似文献

9.

基于韵律特征参数的情感语音合成算法研究 总被引：1，自引：0，他引：1

何凌黄华刘肖珩《计算机工程与设计》2013,34(7)

为了合成更为自然的情感语音,提出了基于语音信号声学韵律参数及时域基音同步叠加算法的情感语音合成系统.实验通过对情感语音数据库中生气、无聊、高兴和悲伤4种情感的韵律参数分析,建立4种情感模板,采用波形拼接语音合成技术,运用时域基音同步叠加算法合成含有目标感情色彩的语音信号.实验结果表明,运用波形拼接算法,调节自然状态下语音信号的韵律特征参数,可合成较理想的情感语音.合成的目标情感语音具有明显的感情色彩,其主观情感类别判别正确率较高. 相似文献

10.

语音库裁剪的一种不定长递阶聚类方法

张巍吴晓如刘江王仁华《计算机学报》2007,30(11):2017-2024

大量使用不定长是大语料库语音合成质量的一个重要保证,而语音库裁剪方法通常会导致不定长的损失.针对这一关键性问题,该文构建了NuClustering-VPA算法:对不同粒度的不定长变体进行聚类,根据高阶聚类结果调整低阶变体的聚类,使得低阶聚类中心有所偏向.NuClustering-VPA算法保留了最重要的不定长,从而有效减小了裁剪对不定长的破坏.测听实验表明,利用NuClustering-VPA算法,即使在语音库裁减率为39.63%时,合成自然度下降较小,仍然保持在较高的水平.这一技术已被应用在科大讯飞公司的实际语音产品中. 相似文献

11.

一种基于决策树模型的音库构建和基元选取方法 总被引：2，自引：1，他引：2

叶振兴蔡莲红《计算机工程》2006,32(10):189-190,220

针对嵌入式设备的存储容量小、计算能力有限的特点，设计了一种基于CART（Classification and Regression Trees）决策树模型的基元预选算法和基元选取算法，可以从原始语音语料库中挑选出最有代表性的基元样本，从而有效地降低音库规模和算法的复杂度，满足了嵌入式TFS（Text-to-Speech）系统的需要。基于以上算法，移动终端上实现了一个嵌入式中文TTS系统，实验结果表明该系统的合成语音具有较高的可懂度和自然度。相似文献

12.

Syllable based text to speech synthesis system using auto associative neural network prosody prediction 总被引：1，自引：0，他引：1

Sudhakar Sangeetha Sekar Jothilakshmi 《International Journal of Speech Technology》2014,17(2):91-98

This paper presents the design and development of an Auto Associative Neural Network (AANN) based unrestricted prosodic information synthesizer. Unrestricted Text To Speech System (TTS) is capable of synthesize different domain speech with improved quality. This paper deals with a corpus-driven text-to speech system based on the concatenative synthesis approach. Concatenative speech synthesis involves the concatenation of the basic units to synthesize an intelligent, natural sounding speech. A corpus-based method (unit selection) uses a large inventory to select the units and concatenate. The prosody prediction is done with the help of five layer auto associative neural network which helps us to improve the quality of speech synthesis. Here syllables are used as basic unit of speech synthesis database. The database consisting of the units along with their annotated information is called annotated speech corpus. A clustering technique is used in annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit. Discontinuities present at the unit boundaries are lowered by using the mel-LPC smoothing technique. The experiment has been made for the Dravidian language Tamil and the results reveal to demonstrate the improved intelligibility and naturalness of the proposed method. The proposed system is applicable to all the languages if the syllabification rules has been changed. 相似文献

13.

融合自动检错的单元挑选语音合成方法

孙晓辉凌震华戴礼荣《数据采集与处理》2016,31(2):385-392

提出了一种融合自动检错的单元挑选语音合成方法。本文方法旨在设计与主观听感更加一致的单元挑选准则,以提高合成语音的自然度。首先利用众包网络平台快速大量地收集测听人对于合成语音的主观评价数据,取代了传统的利用具备语言学知识的专家收集主观评价数据的方法;然后基于这些主观评价数据,提取对应语音的音节时长、单元代价以及声学参数距离等特征,构建基于支持向量机的合成错误检测器;在合成阶段,该检测器被用来对传统单元挑选输出的N条路径行重打分,以确定最优的单元挑选序列。倾向性测听结果表明本文方法可以有效地提高合成语音的自然度。相似文献

14.

合成语音自然度客观测度 总被引：1，自引：1，他引：1

赵博蔡莲红《计算机工程与应用》2005,41(7):32-33,152

目前合成语音的自然度有待提高,论文根据目前的研究现状提出了一种合成语音自然度的客观评价方法,该方法主要从语音韵律特征的主要参数出发,计算同一发音人的自然语音和合成语音之间的基频、时长、音强等参数的差距,其中由于两种语音基频时间不匹配,所以采用DTW(Dynamic Time Warping)算法来对两种语音的基频进行了时间弯折对准。最后再将计算结果与主观评测(MOS)的结果进行比较。实验数据表明,论文提出的基频曲线失真测度与MOS之间具有很强的相关性,从韵律特征角度给出的评价结果能够衡量合成语音的自然度。相似文献

15.

一种改进的粒子群微学习单元特征选择算法

冯耀武张月琴陈健《小型微型计算机系统》2021,(4):748-754

微学习单元是微学习过程里的基本学习单位,具有高维性.提取微学习单元适合的特征,保留有代表性的特征,有助于降低冗余,是提高微学习聚类精度的重要方法之一.为获得适合的微学习单元特征、降低计算复杂度,并确保聚类准确性,本研究提出一种改进的骨干粒子群无监督特征选择算法用于选择微学习单元的特征.该方法用互信息构造适应度函数,并采用适应性突变概率策略,以提高算法收敛速度和计算精度.实验表明,该方法有助于提取适合的微学习单元特征,且所提取的特征能够提高微学习单元聚类的准确性. 相似文献

16.

Globally Optimal Training of Unit Boundaries in Unit Selection Text-to-Speech Synthesis

Jerome R. Bellegarda 《IEEE transactions on audio, speech, and language processing》2007,15(3):957-965

The level of quality that can be achieved by modern concatenative text-to-speech synthesis heavily depends on a judicious composition of the unit inventory used in the unit selection process. Unit boundary optimization, in particular, can make a huge difference in the users' perception of the concatenated acoustic waveform. This paper considers the iterative refinement of unit boundaries based on a data-driven feature extraction framework separately optimized for each boundary region. This guarantees a globally optimal cut point between any two matching units in the underlying inventory. The associated boundary training procedure is objectively characterized, first in terms of convergence behavior, and then by comparing the distributions in inter-unit discontinuity obtained before and after training. Experimental results underscore the viability of this approach for unit boundary optimization. Listening evidence also qualitatively exemplifies a noticeable reduction in the perception of discontinuity between concatenated acoustic units 相似文献

17.

A New Korean Corpus-Based Text-to-Speech System

Sanghun Kim Youngjik Lee Keikichi Hirose 《International Journal of Speech Technology》2002,5(2):105-116

This paper describes a new Korean Text-to-Speech (TTS) system based on a large speech corpus. Conventional concatenative TTS systems still produce machine-like synthetic speech. The poor naturalness is caused by excessive prosodic modification using a small speech database. To cope with this problem, we utilized a dynamic unit selection method based on a large speech database without prosodic modification. The proposed TTS system adopts triphones as synthesis units. We designed a new sentence set maximizing phonetic or prosodic coverage of Korean triphones. All the utterances were segmented automatically into phonemes using a speech recognizer. With the segmented phonemes, we achieved a synthesis unit cost of zero if two synthesis units were placed consecutively in an utterance. This reduces the number of concatenating points that may occur due to concatenating mismatches. In this paper, we present data concerning the realization of major prosodic variations through a consideration of prosodic phrase break strength. The phrase break was divided into four kinds of strength based on pause length. Using phrase break strength, triphones were further classified to reflect major prosodic variations. To predict phrase break strength on texts, we adopted an HMM-like Part-of-Speech (POS) sequence model. The performance of the model showed 73.5% accuracy for 4-level break strength prediction. For unit selection, a Viterbi beam search was performed to find the most appropriate triphone sequence, which has the minimum continuation cost of prosody and spectrum at concatenating boundaries. From the informal listening test, we found that the proposed Korean corpus-based TTS system showed better naturalness than the conventional demisyllable-based one. 相似文献