期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Significance of knowledge sources for a text-to-speech system for Indian languages

B Yegnanarayana S Rajendran V R Ramachandran A S Madhukumar 《Sadhana》1994,19(1):147-169

This paper discusses the significance of segmental and prosodic knowledge sources for developing a text-to-speech system for Indian languages. Acoustic parameters such as linear prediction coefficients, formants, pitch and gain are prestored for the basic speech sound units corresponding to the orthographic characters of Hindi. The parameters are concatenated based on the input text. These parameters are modified by stored knowledge sources corresponding to coarticulation, duration and intonation. The coarticulation rules specify the pattern of joining the basic units. The duration rules modify the inherent duration of the basic units based on the linguistic context in which the units occur. The intonation rules specify the overall pitch contour for the utterance (declination or rising contour), fall-rise patterns, resetting phenomena and inherent fundamental frequency of vowels. Appropriate pauses between syntactic units are specified to enhance intelligibility and naturalness. 相似文献

2.

一种基于共振峰分析的语音驱动人脸动画方法

潘晋杨卫英《电声技术》2009,33(5):62-65

快速、高效地实现语音驱动下的唇形自动合成,以及优化语音与唇动的同步是语音驱动人脸动画的重点。提出了一种基于共振峰分析的语音驱动人脸动画的方法。对语音信号进行加窗分帧,DFT变换,再对短时音频信号的频谱进行第一、第二共振峰分析,将分析结果映射为一组控制序列,并对控制序列进行去奇异点等后处理。设定三维人脸模型的动态基本口形,以定时方式将控制序列导入模型,完成人脸动画驱动。实验结果表明,该方法简单快速,有效实现了语音和唇形的同步,动画效果连贯自然,可广泛用于各类虚拟角色的配音,缩短虚拟人物的制作周期。相似文献

3.

平面波与层状多孔介质海底的反射和透射

下载免费PDF全文

周来江杨士莪《声学技术》2010,29(6):559-564

针对典型的海底介质结构情况,研究了从海水入射到含有非固结沉积层的层状流体饱和多孔介质海底的平面波的反射和透射问题,分析了沉积层和基岩中纵、横波速度和衰减的频散变化特点,在沉积层厚度和频率变化时,对海水-海底界面上的位移势函数反射系数进行了计算和分析。研究结果表明：在沉积层厚度一定的情况下,较高频率时,沉积层对海水中声场的影响较大,而频率较低时,基岩对海水中声场的影响较大。在以不同的掠射角入射时,由于沉积层中质点的法向共振,广义位移势函数反射系数随频厚积的变化曲线会出现一系列的共振峰,随着掠射角的减小,共振峰个数减少,但共振峰的幅度会增加。相似文献

4.

基于发声模型的腭裂语音高鼻音自动检测算法

张桠童何凌张婷尹恒李杨《计算机工程与设计》2015,(6)

通过对腭裂语音发声模型进行研究,提出基于激励、声道、辐射模型特征参数的腭裂语音高鼻音等级自动识别算法。通过对基于激励模型的基音频率、基于声道模型的共振峰参数、基于综合发声模型的短时能量和M el倒频谱系数等表征高鼻音特性的参数进行分析和改进,以K‐最近邻算法为模式识别分类器,得出应用4种特征参数的高鼻音等级自动识别结果。实验结果表明, M el倒频谱系数与腭裂语音临床生理特征相关性最大,对不同等级高鼻音识别率最高。相似文献

5.

Cascade convolutional neural network-long short-term memory recurrent neural networks for automatic tonal and nontonal preclassification-based Indian language identification

Chuya China Bhanja Mohammad A. Laskar Rabul H. Laskar 《Expert Systems》2020,37(5):e12544

This work presents an automatic tonal/nontonal preclassification-based Indian language identification (LID) system. Languages are firstly classified into tonal and nontonal categories, and then, individual languages are identified from the languages of the respective categories. This work proposes the use of pitch Chroma and formant features for this task, and also investigates how Mel-frequency Cepstral Coefficients (MFCCs) complement these features. It further explores block processing (BP), pitch synchronous analysis (PSA)- and glottal closure regions (GCRs)-based approaches for feature extraction, using syllables as basic units. Cascade convolutional neural network (CNN)-long short-term memory (LSTM) model using syllable-level features has been developed. National Institute of Technology Silchar language database (NITS-LD) and OGI-Multilingual Telephone Speech Corpus (OGI-MLTS) have been used for experimental validation. The proposed system based on the score combination of Cascade CNN-LSTM models of Chroma (extracted from BP method), first two formants and MFCCs (both extracted from GCR method) reports the highest accuracies. In the preclassification stage, the observed accuracies are 91%, 87.3%, and 85.1% for NITS-LD, for 30 s, 10 s, and 3 s test data respectively. For OGI-MLTS database, the respective accuracies are 86.7%, 83.1%, and 80.6%. That amounts to absolute improvements of 11.6%, 12.3%, and 13.9% for NITS-LD, and 12.5%, 11.9%, and 12.6% for OGI-MLTS database with respect to that of the baseline system. The proposed preclassification-based LID system shows improvements of 7.3%, 6.4%, and 7.4% for NITS-LD and 6.1%, 6.7%, and 7.2% for OGI-MLTS database over the baseline system for the three respective test data conditions. 相似文献

6.

A modified autocorrelation method of linear prediction for pitch-synchronous analysis of voiced speech

K.K. Paliwal P.V.S. Rao 《Signal processing》1981,3(2):181-185

A modified autocorrelation method of linear prediction is proposed for pitch-synchronous analysis of voiced speech. The method needs one full period of speech data for analysis and assumes periodic extension of the data. This method guarantees the stability of the estimated all-pole filter and is shown to perform better than the covariance and autocorrelation methods of linear prediction. 相似文献

7.

基于声门特征参数的语音情感识别算法研究

何凌黄华刘肖珩《计算机工程与设计》2013,34(6)

为实现更为有效的自动语音情感识别系统,提出了一种基于声门信号特征参数及高斯混合模型的情感识别算法.该算法基于人类发音机理,通过逆滤波器及线性预测方法,实现声门信号的估计,提取声门信号时域特征参数表征不同情感类别.实验采用公开的BES (berlin emotion speech database)情感语料库,对愤怒、无聊、厌恶、害怕、高兴、平静、悲伤这7种情感进行自动识别.实验结果表明,提出的语音情感识别系统能有效的识别各类情感状态,其情感判别正确率接近于人类识别正确率,且优于传统的基音频率及共振峰参数. 相似文献

8.

Lifting the curtain on the Wizard of Oz: Biased voice-based impressions of speaker size.

Rendall Drew; Vokey John R.; Nemeth Christie 《Canadian Metallurgical Quarterly》2007,33(5):1208

The consistent, but often wrong, impressions people form of the size of unseen speakers are not random but rather point to a consistent misattribution bias, one that the advertising, broadcasting, and entertainment industries also routinely exploit. The authors report 3 experiments examining the perceptual basis of this bias. The results indicate that, under controlled experimental conditions, listeners can make relative size distinctions between male speakers using reliable cues carried in voice formant frequencies (resonant frequencies, or timbre) but that this ability can be perturbed by discordant voice fundamental frequency (F?, or pitch) differences between speakers. The authors introduce 3 accounts for the perceptual pull that voice F? can exert on our routine (mis)attributions of speaker size and consider the role that voice F? plays in additional voice-based attributions that may or may not be reliable but that have clear size connotations. (PsycINFO Database Record (c) 2010 APA, all rights reserved) 相似文献

9.

On the performance of Burg's method of maximum entropy spectral analysis when applied to voiced speech

K.K. Paliwal P.V.S. Rao 《Signal processing》1982,4(1):59-63

Burg's method of maximum entropy spectral analysis is used to analyse voiced speech signal and its performance is compared with that of the autocorrelation and covariance methods of linear prediction using the following three criteria: (1) normalized total-squared linear prediction error, (2) error in estimating the power spectrum and (3) errors in estimating the first three formant frequencies and bandwidths. Results of pitch-synchronous and pitch-asynchronous analyses when applied to synthetic vowel signals are discussed. 相似文献