首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
该研究基于大规模语音数据库,通过建立普通话连续语流中的声韵母时长预测模型,考察声韵母时长的影响因素,探讨普通话声韵母在连续语流中的时长变化类型与话语韵律结构之间的关系。初步研究结果表明 话语的韵律结构对声母时长的影响较小,而对韵母时长的影响较为显著,这种影响主要体现为 韵律单元末音节的韵母时长是否发生显著延长与话语的韵律结构密切相关,韵律大短语和语调短语末尾的音节通常会发生显著的韵母延长,韵律词内以及韵律词末尾的音节通常不会发生韵母延长;韵律小短语末尾的音节在韵母时长方面的表现比较混乱,规律性不明显,可能需要进一步做分化处理。  相似文献   

2.
维吾尔语三音节词韵律特征声学分析   总被引:3,自引:0,他引:3  
本文从文本分析模块入手,利用“维吾尔语语音声学参数库”,选择了以开音节和闭音节结尾的333个三音节词的韵律参数,包括元音时长、音高和音强进行了统计分析,归纳了其元音时长、音高和音强分布模式,探讨了维吾尔语三音节词的韵律节奏模式与三音节词重音之间的关系问题,其目的是为了提高语音合成的自然度即更好的为自然语言处理服务。本项研究对维吾尔语语言乃至整个阿尔泰语系语言的韵律研究具有较高的参考价值。  相似文献   

3.
维吾尔语双音节词韵律特征声学分析   总被引:3,自引:0,他引:3  
该文从文本分析模块入手,利用“维吾尔语语音声学参数库”,选择了以开音节和闭音节结尾的969个双音节词的韵律参数,包括元音时长、音高和音强进行了统计分析,归纳了其元音时长、音高和音强分布模式,探讨了维吾尔语双音节词的韵律节奏模式与双音节词重音之间的关系问题,其目的是为了提高语音合成的自然度。我们相信本项研究对维吾尔语语言乃至整个阿尔泰语系语言的韵律研究具有较高的参考价值。  相似文献   

4.
以自然语流中出现的焦点为对象,对汉语中焦点的声学特征表现进行了研究.研究结果表明:(1)焦点对音节韵律特征的影响与音节所在的高层韵律环境(上下文相关信息)密切相关.处于不同高层韵律环境的音节,其韵律特征受焦点影响改变的幅度和方向是不同的.(2)焦点的轻重感知一定程度上可以通过线性调节语音声学参数增量来表现出来.(3)在语音合成中,焦点的韵律特征可分为两步来进行预测.实验证实,在焦点位置已知的情况下该方法能够合成自然度很高的汉语语句焦点.  相似文献   

5.
基于决策树的语料库分析   总被引:2,自引:0,他引:2       下载免费PDF全文
利用CART决策树算法,对TH-CoSS语料库音节的韵律参数进行聚类,分析语境特征的分布:出现率、平均层级。为了评价语境特征对语音韵律表现的影响程度,设计了一个影响权重的重要性函数,对语料库文本设计和TTS系统选音参数的权重设定具有较高的参考价值。  相似文献   

6.
嵌入式汉语TTS系统的设计与实现   总被引:5,自引:0,他引:5  
针对手持设备和PDA存储量较小的特点,本文提出了基于音节基频包络特征、采用k中心点算法聚类裁减音库容量的方法。聚类结果的听辩实验和统计分析表明此算法可以保证聚类内部音节样本的相似性及类间样本的相异性。经过对汉语可选合成基元的分析,系统中首次引入声韵母半音节与音节作为混合基元,构造了基于混合基元的音库。经过对样本集分别聚类裁减,进一步压缩了音库容量,并在PDA平台上实现了嵌入式TTS系统。  相似文献   

7.
汉语韵律短语的时长与音高研究   总被引:2,自引:1,他引:1  
语句和篇章的韵律结构和信息结构的分析及模型化是提高语音合成的自然度、降低自然语言识别错误率的关键。该文在带有韵律标注ASCCD语料库的基础上对韵律短语的时长和音高特性进行了研究,得到并验证了如下一些结论:(1)韵律短语边界对音节时长有明显的延长作用,不同声调对音节的时长延长作用不同,并且不同的重音级别对音节时长的延长作用也不同。(2)韵律短语边界处中断的时长在较小的韵律边界表现的更为明显。韵律短语的边界处发生了明显的音高重置现象,韵律短语的音高低线总是下降的,而音高高线只是在重音后下降,并且重音处的音域大而且音高高线的位置高。  相似文献   

8.
基于统计韵律模型的汉语语音合成系统的研究   总被引:2,自引:4,他引:2  
本文论述了采用统计模型进行汉语韵律层级结构分析和韵律建模的思路,在此基础上建立了汉语语音合成系统。其中,本文还仔细阐述了韵律代价函数的构造,及其参数的自动训练算法。同时,论文还分析了韵律特征间相互作用对音节基元选取的影响,并最终实现了一个连续语流中用于汉语语音合成的音节基元选取模型。测试表明了本文提出的基于统计模型的韵律层级分析和韵律建模思路,能够较好应用于汉语语音合成系统的构造,并使之具有良好的合成语音的自然度。  相似文献   

9.
模糊K-Modes聚类精确度分析   总被引:4,自引:1,他引:4  
赵恒  杨万海 《计算机工程》2003,29(12):27-28,175
模糊K-Modes聚类算法是对具有分类属性的数据进行聚类的一种有效的算法。为了评价聚类结果,以具有明确分类结构的数据作为输入数据,将模糊K-Modes聚类结果与原始数据的分类结构进行对比,分析了确定它们之间对应关系的方法,在期望聚类结果应该具有的特点的基础上,对现有的精确度定义和计算方法进行修正,在划分相似度的基础上,重新定义模糊K-Modes聚类精确度。  相似文献   

10.
基于韵律特征和语法信息的韵律边界检测模型   总被引:2,自引:2,他引:2  
韵律短语边界的自动检测,对语音合成中语料库的韵律标注以及语音识别中韵律短语的自动划分都有重要意义。本文通过对影响韵律短语边界的声学、韵律等参量的分析,得到和韵律短语边界关联性较大的一组声学特征参数、韵律环境参数和语法信息;同时引入语音合成中的韵律预测思想,在假定所有音节边界均为非韵律短语边界时,预测每个音节的基频。最后使用决策树模型,将音节边界处的韵律环境信息、语法信息以及预测结果作为决策树的输入,利用决策树综合判定当前音节边界是否为韵律短语的边界。实验表明,这种方法对于基于确定性文本(text-dependent)的语音韵律短语边界的检测,具有较好效果,同时可以显著提高语音合成中语料库的标注效率和标注结果的一致性。  相似文献   

11.
In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions.  相似文献   

12.
面向维吾尔语情感语音转换,提出一种韵律建模转换方法。该方法结合了维吾尔语韵律特点及语言特点,首次利用离散余弦变换(DCT)分别参数化维吾尔语音节和韵律短语的情感基频。采用高斯混合模型(GMM)训练中性-情感基频联合特征,同时合成中性语速情感语音和情感语速情感语音,主观评测结果显示情感语速更有助于表达情感效果。主客观实验结果显示转换方法可有效进行维吾尔语情感韵律转换,三种情感下,音节和韵律短语的结果均达到75%以上,韵律短语的转换效果要稍优于音节。  相似文献   

13.
Higher quality synthesized speech is required for widespread use of text-to-speech (TTS) technology, and the prosodic pattern is the key feature that makes synthetic speech sound unnatural and monotonous, which mainly describes the variation of pitch. The rules used in most Chinese TTS systems are constructed by experts, with weak quality control and low precision. In this paper, we propose a combination of clustering and machine learning techniques to extract prosodic patterns from actual large mandarin speech databases to improve the naturalness and intelligibility of synthesized speech. Typical prosody models are found by clustering analysis. Some machine learning techniques, including Rough Set, Artificial Neural Network (ANN) and Decision tree, are trained for fundamental frequency and energy contours, which can be directly used in a pitch-synchronous-overlap-add-based (PSOLA-based) TTS system. The experimental results showed that synthesized prosodic features greatly resembled their original counterparts for most syllables.  相似文献   

14.
In this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are online available for demonstration.  相似文献   

15.
汉语韵律词内部音节重音的强弱对总的F0曲线的特征有很大影响。本文参考生成F0曲线的数学优化模型,提出了对由孤立单音节调型曲线串接而成的汉语韵律词的F0曲线的连续性、平滑性、曲线形状、平均值进行整体优化的x2估计方法,实现了在重音作用下的F0曲线的优化。在谐波+噪声合成系统上实验研究了汉语三音节韵律词的64种不包含轻声的调型组合和10种结尾为轻声的调型组合的F0曲线的优化效果,展示优化过程中三个控制参数——平滑因子(smooth)、音节重音强度(stress)、音节F0形状失真度(Distor-tion)对F0曲线整体形状的控制效果和参数取值的有效范围。非正式的听觉实验表明合成语音的自然度有明显提高。  相似文献   

16.
根据语音合成与识别等语音应用研究的需求,从实验语音学的角度出发,研究维吾尔语固有音节结构中最常见的CVC音节类型的声学特征,从“维吾尔语语音声学参数库”中选择1 255个CVC型音节的各种韵律参数,包括音节时长、音强和音高,进行统计分析并归纳其时长、音高和音强分布模式。  相似文献   

17.
时长信息是韵律的重要组成部分,对于语音合成的自然度和可懂度都有不可忽视的作用。时长预测是建立对时长有影响的韵律环境与自然语流中音段时长的对应关系。本文引入了统计学中etasquared 的概念研究汉语中韵律环境因素对时长的影响,设计了残差算法定量分析属性之间的交互作用,由此建立了多项式回归的汉语时长预测模型。实验结果表明,使用5~6 个韵律属性基本上就能够建立比较相关的对应关系,和使用同样韵律属性的Wagon 回归树的效果相比有明显的优势。  相似文献   

18.
A precise identification of prosodic phenomena and the construction of tools able to properly manage such phenomena are essential steps to disambiguate the meaning of certain utterances. In particular they are useful for a wide variety of tasks: automatic recognition of spontaneous speech, automatic enhancement of speech-generation systems, solving ambiguities in natural language interpretation, the construction of large annotated language resources, such as prosodically tagged speech corpora, and teaching languages to foreign students using Computer Aided Language Learning (CALL) systems. This paper presents a study on the automatic detection of prosodic prominence in continuous speech, with particular reference to American English, but with good prospects of application to other languages. Prosodic prominence involves two different prosodic features: pitch accent and stress accent. Pitch accent is acoustically connected with fundamental frequency (F0) movements and overall syllable energy, whereas stress exhibits a strong correlation with syllable nuclei duration and mid-to-high-frequency emphasis. This paper shows that a careful measurement of these acoustic parameters, as well as the identification of their connection to prosodic parameters, makes it possible to build an automatic system capable of identifying prominent syllables in utterances with performance comparable with the inter-human agreement reported in the literature. Two different prominence detectors were studied and developed: the first uses a training corpus to set up thresholds properly, while the second uses a pure unsupervised method. In both cases, it is worth stressing that only acoustic parameters derived directly from speech waveforms are exploited.  相似文献   

19.
音节是维吾尔语的最小发音单元,所以大部分维吾尔语语音合成系统以音节作为基本的合成单元,但维吾尔语中音节数量很大,语料库很难保证覆盖所有的音节样本,这会导致合成语音不稳定和不连续。为解决合成语音不稳定的情况,提出了结合单音素和三音素两个不同基元的单元挑选算法。通过在单元挑选模块中加入韵律参数相匹配的方法选出最佳韵律匹配的单元并解决了合成语音不连续的情况。实验结果表明,提出的方法有效地解决了合成语音不稳定和不连续的现象,从而提高了合成语音的自然度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号