首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 578 毫秒
1.
该文旨在实现从藏文文本到国际音标的自动转换,在一定程度上解决获取较大规模的藏文国际音标标注文本的问题。在国际音标转换系统中,采用了基于规则和统计融合的方法,实现了文语语音词自动切分;利用辅音、元音和声调对应规则表实现了藏语音节的国际音标自动转换;利用声调变化规则、辅音和元音变化规则实现了基于语音词的声调变调、辅音和元音的变化。从自动标注的结果来看,达到了实用效果。
  相似文献   

2.
着重研究语速对语音生成的影响.过去有不少人对此进行了研究,指出语速的变化对元音和辅音的运动产生了不同的影响,但是具体是什么样的影响并没有明确指出.基于DIVA模型,通过修改ODV和AVV活动的公式对语速对元音和辅音运动产生的影响进行了研究和探讨.仿真实验表明,语速的增加导致了辅音运动速度的增加,但元音运动速度只增加了少量的值,有时甚至还会减小.最终得出的结论是:尽管语速对元音和辅音的运动速度产生了不同的影响,但最大速度与运动距离比值的增量却是大致相同的.  相似文献   

3.
维吾尔语音节语音识别与识别基元的研究   总被引:1,自引:0,他引:1  
王昆仑 《计算机科学》2003,30(7):182-184
1 引言现代维吾尔语(以下简称维语)是维吾尔族人民的主要交际工具,是我国新疆维吾尔自治区的法定工作用语之一,也是新疆其它少数民族共同的交际用语之一。维语属阿尔泰语系,突厥语族。维语语音有元音8个、辅音24个。由辅音和元音构成维语语音音节,每个音节必须且只能有一个元音,单元音可构成音节。维语句子由词构成。句子中有意群重音和句重音。部分音节在语流中产生语流音变现象,常见的有同化、弱化、脱落以及元音和谐等现象。  相似文献   

4.
提出了一种新的连续语音情感识别特征:语音元音段声门激励的时域参数归一化振幅商(the normalized amplitude quotient,NAQ).该方法首先运用迭代自适应逆滤波器(Iterative Adaptive Inverse Filtering,IAIF)估计声门波,然后采用NAQ值来描述声门开启和闭合的特性.采用eNERFACE'05听视觉情感语音数据库中六种不同情感的语音为实验数据,以情感语音元音段的归一化振幅商值为特征,使用直方图和盒形图分析其特征的分布和对情感的区分能力;以情感语句元音段的NAQ值的均值、方差、最大值、最小值作为特征,用高斯混合模型(Gaussian Minute Models,GMM)和k-近邻法进行了语音情感识别实验.结果表明NAQ特征对语音情感具有较强的区别能力.  相似文献   

5.
由于哈萨克语构词法的特点,九个元音的声频特性在语音识别中具有重要的作用。该文采用实验语音学的基本理论和方法,研究了哈萨克语多音节词中的元音格局。针对从语音库中挑选的1 062个多音节词,分别对其词首、词腹和词尾音节中的元音共振峰频率值进行统计,并采用Joos方法详细地归纳和分析了哈萨克语词首、词腹和词尾音节元音格局以及存在的差异,绘制出了哈萨克语多音节词元音的共振峰模式。该项研究结果对哈萨克语的语音研究及应用具有较高的参考价值。  相似文献   

6.
该文对藏语拉萨话单音节的嗓音特征进行了实验研究,实验首先对藏语拉萨话单音节进行语音标注,然后根据语音标注的位置信息,利用对应的程序提取音节结构中的元音和辅音的嗓音声学参数,对基频、开商和速度商分别统计分析,并做了显著性分析。实验结果表明不同元音和辅音的嗓音参数与发声方式以及其在音节中位置有关,元音和音节结构的不同会显著影响开商和速度商的值,但对于基频数据的影响并不显著。同时嗓音参数之间也存在一定的关联性,即基频和开商、速度商之间是反比关系,开商和速度商之间是正比的关系。  相似文献   

7.
.基于NAQ的语音情感识别研究*   总被引:1,自引:0,他引:1  
研究了用迭代自适应逆滤波器估计声门激励的方法,以声门激励的时域参数归一化振幅商作为特征,对六种不同情感的连续语音,首先使用Fratio准则判别其对情感的区分能力,然后运用混合高斯模型对语音情感进行建模和识别。采用eNTERFACE’05情感语音数据库中的语音,比较了以整句NAQ值作为特征和以元音段的NAQ值作为特征,以及主观感知的情感识别结果。实验表明元音段的NAQ值是一种具有判别力的语音情感特征。  相似文献   

8.
维吾尔语元音的声频特性分析和识别   总被引:2,自引:0,他引:2  
维吾尔语属阿尔泰语系突厥语族,由于其构词法的特点,八个元音的声频特性在语音识别中,尤其是识别基元选取中有重要作用,其共振峰频率参数也是语音识别和语音合成的重要依据。运用实验语音学的基本理论和方法,在维吾尔语综合语音数据库的办公环境语料条件下,对维吾尔语八个元音进行了声频特性统计分析,给出了维吾尔语元音共振峰频率参数和分布规律,并通过八个元音的语音识别实验结果,验证了其共振峰频率分布规律的正确性。实验证明:维吾尔语在排除元音和谐情况下,其声频特性具有很强的可区分性,对于实现语音信息的传送接受正确性很高。  相似文献   

9.
维吾尔语双音节词元音格局研究   总被引:1,自引:0,他引:1  
从高自然度语音合成与高精度语音识别技术研究的实际应用需求出发,采用实验语音学的方法研究了维吾尔语双音节词中的元音格局。为此,从"维吾尔语语音声学参数库"中选取了包括维吾尔语元音的双音节词,并分别对词首音节和词尾音节中的元音共振峰频率值进行统计分析,利用Joos方法比较详细地归纳出了维吾尔语词首和词尾音节元音格局以及它们之间的区别,绘制出了维吾尔语双音节词元音的共振峰模式。首次用实际实验数据验证了维吾尔语元音舌位特点符合传统"口耳之学"结论。研究结果对维吾尔语语言乃至整个阿尔泰语系语言的语音研究及应用开发具有较高的参考价值。  相似文献   

10.
一种基于语音学知识的汉语辅音分类方法   总被引:3,自引:0,他引:3  
文章提出一种提高汉语辅音识别性能的框架,在此框架下构造了一个基于声学—语音层分析的多级分类器,实现对全部汉语辅音的无重叠分类,测试了将辅音分类结果与概率统计模型结合的效果。重点讨论了用于汉语辅音分类的几种特征参数提取技术和实验结果。文章所提取的特征参数包括非嗓音段持续时间(DUP)、归一化的有效频带能量趋势等,涉及时域、频域和小波变换域等不同分析处理方法,特征参数简单、有效,具有较好的与后接元音无关和非特定人性质。分类器将21个汉语辅音分为5类,狖m,n,l,r狚,狖b,d,g狚,狖p,t,k,f,h狚,狖zh,ch,sh狚,狖z,c,s,j,q,x狚;其分类正确率分别达97.21%、97.10%,97.70%,93.31%和94.80%。实验所用的语音资料库包括21个话者的孤立字汉语辅音发音资料。  相似文献   

11.
Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer  相似文献   

12.
Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.  相似文献   

13.
基于HMM的普通话单字发音准确度评价方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
在大量语音分析实验的基础上,提出了一种普通话单字发音准确度计算辅助评价方法。细化声韵母,对反映普通话发音准确度的声韵过渡段建立连续高斯混合密度HMM的普通话发音标准度评价系统。在语音特征上采取了符合人耳听觉特征的Mel倒谱。最后结合基音特征,完成客观特征到主观评价的映射。实验证明,该方法取得了较好的效果。  相似文献   

14.
In this paper, an intelligent speaker identification system is presented for speaker identification by using speech/voice signal. This study includes both combination of the adaptive feature extraction and classification by using optimum wavelet entropy parameter values. These optimum wavelet entropy values are obtained from measured Turkish speech/voice signal waveforms using speech experimental set. It is developed a genetic wavelet adaptive network based on fuzzy inference system (GWANFIS) model in this study. This model consists of three layers which are genetic algorithm, wavelet and adaptive network based on fuzzy inference system (ANFIS). The genetic algorithm layer is used for selecting of the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the eight different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet decomposition, wavelet decomposition – short time Fourier transform, wavelet decomposition – Born–Jordan time–frequency representation, wavelet decomposition – Choi–Williams time–frequency representation, wavelet decomposition – Margenau–Hill time–frequency representation, wavelet decomposition – Wigner–Ville time–frequency representation, wavelet decomposition – Page time–frequency representation, wavelet decomposition – Zhao–Atlas–Marks time–frequency representation. The wavelet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet decomposition and wavelet entropies. The ANFIS approach is used for evaluating to fitness function of the genetic algorithm and for classification speakers. It has been evaluated the performance of the developed system by using noisy Turkish speech/voice signals. The test results showed that this system is effective in detecting real speech signals. The correct classification rate is about 91% for speaker classification.  相似文献   

15.
基于汉语语音特点的大词表语音识别系统的研究   总被引:2,自引:0,他引:2  
本文探讨了汉语语音识别的若干问题,并简单介绍了一个大词表汉语语音识别系统,该系统充分考虑了汉语语音的特点,其中主要是汉语语音具有音节性比较强的特点、音节的简单声韵母结构以及汉语以词/词组为语音交流基础的特点.该系统一个显著的特点是系统可以不进行任何训练地添加新词汇,从而使得系统具有比较好的用户接口. 现在系统具有10,000多个词汇,实时测试的平均识别结果是93.1%.  相似文献   

16.
In this work, spectral features extracted from sub-syllabic regions and pitch synchronous analysis are proposed for speech emotion recognition. Linear prediction cepstral coefficients, mel frequency cepstral coefficients and the features extracted from high amplitude regions of spectrum are used to represent emotion specific spectral information. These features are extracted from consonant, vowel and transition regions of each syllable to study the contribution of these regions toward recognition of emotions. Consonant, vowel and the transition regions are determined using vowel onset points. Spectral features extracted from each pitch cycle, are also used to recognize emotions present in speech. The emotions used in this study are: anger, fear, happy, neutral and sad. The emotion recognition performance using sub-syllabic speech segments are compared with the results of conventional block processing approach, where entire speech signal is processed frame by frame. The proposed emotion specific features are evaluated using simulated emotion speech corpus, IITKGP-SESC (Indian Institute of Technology, KharaGPur-Simulated Emotion Speech Corpus). The emotion recognition results obtained using IITKGP-SESC are compared with the results of Berlin emotion speech corpus. Emotion recognition systems are developed using Gaussian mixture models and auto-associative neural networks. The purpose of this study is to explore sub-syllabic regions to identify the emotions embedded in a speech signal, and if possible, to avoid processing of entire speech signal for emotion recognition without serious compromise in the performance.  相似文献   

17.
The speech signal is modeled using zerocrossing interval distribution of the signal in time domain. The distributions of these parameters are studied over five Malayalam (one of the most popular Indian language) vowels. We found that the distribution patterns are almost similar for repeated utterances of the same vowel and varies from vowel to vowel. These distribution patterns are used for recognizing the vowels using multilayer feed forward artificial neural network. After analyzing the distribution patterns and the vowel recognition results, we realize that the zerocrossing interval distribution parameters can be effectively used for the speech phone classification and recognition. The noise adaptness of this parameter is also studied by adding additive white Gaussian noise at different signal to noise ratio. The computational complexity of the proposed technique is also less compared to the conventional spectral techniques, which includes FFT and Cepstral methods, used in the parameterization of speech signal.  相似文献   

18.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

19.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号