首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
与说话人识别、连续语音识别相比,自动语言辨识是一个相对较新的研究,而且是一项较难的课题。与音素配位学相比较韵律是语言辨识的更有希望的一个语言辨识特征。论文介绍了一种基于伪音节结构CnV的自动语言辨识方法,该系统提取了辅音、元音构成的伪音节结构的MFCC和!MFCC特征参数,使用了与语言无关的GMM算法模型化该特征。经过对OGI-TS数据库中的英语、法语、汉语测试表明,元音、辅音特征信息在语言辨识中起到一定作用,伪音节结构模型也是语言辨识的有效模型之一。  相似文献   

2.
以建立维吾尔语连续音素识别基础平台为目标,在HTK(基于隐马尔可夫模型的工具箱)的基础上,首次研究了其语言相关环节的几项关键技术;结合维吾尔语的语言特征,完成了用于语言模型建立和语音语料库建设的维吾尔语基础文本设计;根据具体技术指标,录制了较大规模语音语料库;确定音素作为基元,训练了维吾尔语声学模型;在基于字母的N-gram语言模型下,得出了从语音句子向字母序列句子的识别结果;统计了维吾尔语32个音素的识别率,给出了容易混淆的音素及其根源分析,为进一步提高识别率奠定了基础。  相似文献   

3.
一、汉字的语音输入 语音输入是将人们的语音通过计算机接收、分辨而实现输入的一种手段。汉字在语音识别方面相对于英文有其独特优势。汉字有21个声母、35个韵母,组成56个音素.这些音素构成400个音节,若将不同声调记入,则构成1200多个音节,汉字的句子和词语正是由这些单音节的字组成。  相似文献   

4.
音节是维吾尔语的最小发音单元,所以大部分维吾尔语语音合成系统以音节作为基本的合成单元,但维吾尔语中音节数量很大,语料库很难保证覆盖所有的音节样本,这会导致合成语音不稳定和不连续。为解决合成语音不稳定的情况,提出了结合单音素和三音素两个不同基元的单元挑选算法。通过在单元挑选模块中加入韵律参数相匹配的方法选出最佳韵律匹配的单元并解决了合成语音不连续的情况。实验结果表明,提出的方法有效地解决了合成语音不稳定和不连续的现象,从而提高了合成语音的自然度。  相似文献   

5.
维吾尔语双音节词元音格局研究   总被引:1,自引:0,他引:1  
从高自然度语音合成与高精度语音识别技术研究的实际应用需求出发,采用实验语音学的方法研究了维吾尔语双音节词中的元音格局。为此,从"维吾尔语语音声学参数库"中选取了包括维吾尔语元音的双音节词,并分别对词首音节和词尾音节中的元音共振峰频率值进行统计分析,利用Joos方法比较详细地归纳出了维吾尔语词首和词尾音节元音格局以及它们之间的区别,绘制出了维吾尔语双音节词元音的共振峰模式。首次用实际实验数据验证了维吾尔语元音舌位特点符合传统"口耳之学"结论。研究结果对维吾尔语语言乃至整个阿尔泰语系语言的语音研究及应用开发具有较高的参考价值。  相似文献   

6.
依据异类文种之间、同类文种不同语音之间存在音素数据关联的特性,提出多文种语音数据融合编码方法。将不同文种存在的相同音素数据段块按段块模板截取语音样本序列,小波变换,提取特征矢量,生成共享模板集;任意字音或语句音串均按共享模板集提供的元素进行编码与解码;以模板音素串构成的语音记录库按(音节、音素)索引。实验结果表明,单字语音数据压缩比、语音数据存储量、语音还原分段信噪比、主观评价得分等参数均明显优于已有方法,语音还原质量良好。  相似文献   

7.
基于三音素动态贝叶斯网络模型的大词汇量连续语音识别   总被引:1,自引:0,他引:1  
考虑连续语音中的协同发音现象,基于词-音素结构的DBN(WP-DBN)模型和词-音素-状态结构的DBN(WPS-DBN)模型,引入上下文相关的三音素单元,提出两个新颖的单流DBN模型:基于词-三音素结构的DBN(WT-DBN)模型和基于词-三音素-状态的DBN(WTS-DBN)模型.WTS-DBN模型是三音素模型,识别基元为三音素,以显式的方式模拟了基于三音素状态捆绑的隐马尔可夫模型(HMM).大词汇量语音识别实验结果表明:在纯净语音环境下,WTS-DBN模型的识别率比HMM,WT-DBN,WP-DBN和WPS-DBN模型的识别率分别提高了20.53%,40.77%,42.72%和7.52%.  相似文献   

8.
设计了一种腭裂语音的声韵母切分算法。通过主观的波形测试和客观的F检验及t检验,证明了腭裂语音与正常语音具有显著性差异。定义声母具有清音音素特性的音节为I类音节,声母具有浊音音素特性的音节为II类音节。首先基于层次聚类模型自动判别I类、II类音节,然后定义类浊音权重函数和类清音概率函数,实现I类音节的声韵母一级切分,再通过短时自相关函数峰值个数的一阶微分实现I类音节声韵母的二级切分。基于声韵母波形差异性,检测短时自相关函数的能量跳变点,实现II类音节的声韵母切分。通过大样本实验,结果表明提出的腭裂语音声韵母自动判别算法具有较高的正确率,I类音节的正确率达到90.72%,II类音节的正确率为92.90%。  相似文献   

9.
针对俄语语音合成和语音识别系统中发音词典规模有限的问题,提出一种基于长短时记忆(LSTM)序列到序列模型的俄语词汇标音算法,同时设计实现了标音原型系统。首先,对基于SAMPA的俄语音素集进行了改进设计,使标音结果能够反映俄语单词的重音位置及元音弱化现象,并依据改进的新音素集构建了包含20 000词的俄语发音词典;然后利用TensorFlow框架实现了这一算法,该算法通过编码LSTM将俄语单词转换为固定维数的向量,再通过解码LSTM将向量转换为目标发音序列;最后,设计实现了具有交互式单词标音等功能的俄语词汇标音系统。实验结果表明,该算法在集外词测试集上的词形正确率达到了74.8%,音素正确率达到了94.5%,均高于Phonetisaurus方法。该系统能够有效为俄语发音词典的构建提供支持。  相似文献   

10.
改进的跨语种语音合成模型自适应方法   总被引:1,自引:0,他引:1  
统计参数语音合成中的跨语种模型自适应主要应用于目标说话人语种与源模型语种不同时,使用目标发音人少量语音数据快速构建具有其音色特征的源模型语种合成系统。本文对传统的基于音素映射和三音素模型的跨语种自适应方法进行改进,一方面通过结合数据挑选的音素映射方法以提高音素映射的可靠性,另一方面引入跨语种的韵律信息映射以弥补原有方法中三音素模型在韵律表征上的不足。在中英文跨语种模型自适应系统上的实验结果表明,改进后系统合成语音的自然度与相似度相对传统方法都有了明显提升。  相似文献   

11.
基于噪声信道的维吾尔语央音原音识别模型   总被引:1,自引:0,他引:1       下载免费PDF全文
维吾尔语单词连接构形词缀时,经常发生元音弱化成央音的现象。但对已有形态变化的单词进行形态还原时,使用规则识别弱化央音的原音的效率一般在40%左右。提出基于噪声信道的维吾尔语央音原音识别模型。该模型以弱化词干词尾的二字符、三字符和最后音节作为上下文,建立语言模型和似然度计算公式。在开放测试中,模型的准确率达到82.45%,提高词干提取准确率15%。  相似文献   

12.
Vowel harmony is a pervasive feature in Finnish. It is only rarely violated in slips of the tongue or in aphasic output. Thus it is a constraint which should be successfully simulated by any model aiming to account for word production in Finnish. Only a few neural network models for Finnish language processing have been developed so far, although Finnish should be a very challenging object for modeling because of its complex morphology. Our work is the first attempt to model Finnish vowel harmony using neural networks. We have introduced a tool, FinnPro, for building interactive spreading activation models and for simulating word production processes. Our tool produces Finnish nouns in different cases and forms and simulates both normal and damaged speech. In this paper, the vowel harmony control features of FinnPro are introduced. The control structure of FinnPro maintains vowel harmony even in damaged word production. The vowel harmony adjustment process in FinnPro follows the idea that the vowel harmony process in Finnish is phonological. Our model suggests that in disturbed word productions, the vowel harmony category of the output word is triggered by the whole set of vowels activated in the production.  相似文献   

13.
Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain and realizing it in an artificial system. A physiological research study based on electromyographic signals (Honda, 1996) suggested that speech communication in human brain might be based on a topological mapping between speech production and perception, according to an analogous topology between motor and sensory representations. Following this hypothesis, this study first investigated the topologies of the vowel system across the motor, kinematic, and acoustic spaces by means of a model simulation, and then examined the linkage between vowel production and perception in terms of a transformed auditory feedback (TAF) experiment. The model simulation indicated that there exists an invariant mapping from muscle activations (motor space) to articulations (kinematic space) via a coordinate consisting of force-dependent equilibrium positions, and the mapping from the motor space to kinematic space is unique. The motor-kinematic-acoustic deduction in the model simulation showed that the topologies were compatible from one space to another. In the TAF experiment, vowel production exhibited a compensatory response for a perturbation in the feedback sound. This implied that vowel production is controlled in reference to perception monitoring.  相似文献   

14.
The k-nearest-neighbor decision rule is known to provide a useful nonparametric procedure for pattern classification. This rule is applied here to a vowel recognition problem and the effect of the number (k) of nearest neighbors, the size of the trained set and the type of the distance measure on vowel recognition performance is studied. It is shown that the vowel recognition performance remains approximately constant for all the values of k. The recognition performance initially improves with the size of the training set and then converges to an asymptotic value. Selection of a better distance measure leads to a significant improvement in vowel recognition performance.  相似文献   

15.
Conventional wisdom states that, since the average amplitude of vowel articulation significantly exceeds that for consonants, an assessment of spoken intelligibility in obscuring noise should primarily be limited by consonant confusion. Furthermore, in both English and Chinese, consonant discrimination is considered to be more important to overall intelligibility than that of vowels. In the unbounded case, the assumption that vowel confusion is less important than consonant confusion may well be true; however, at least two situations exist where the influence of vowel confusion may be greater. The first is where vocabulary-specific restrictions confine the structure of a particular spoken word to alternatives differing primarily in their vowel. The second is the prevalence of non-additive white Gaussian noise (AWGN) interference, particularly impulsive noise which obscures only the vowel portion of a word, and similarly is present as a nonlinear effect of many time-sliced processing algorithms. This paper explores the issue of vowel intelligibility for spoken Chinese, where the confusion characteristics are complicated through the influence of lexical tone carried by the vowel in consonant–vowel–consonant (CVC) structure utterances. Experimental evidence from multilistener intelligibility testing are presented to build toward an understanding of the characteristics of Mandarin Chinese vowel confusion in the presence of AWGN. Results are also isolated by carrier word consonants and in terms of the lexical tone overlaid upon tested vowels. In particular, several factors relating to issues such as vowel length, tone combination and the crucial influence of the /a/ (IPA ) phone are revealed.   相似文献   

16.
基于词法分析的维吾尔语元音弱化算法研究   总被引:5,自引:2,他引:3  
重点研究维吾尔语中弱化现象及处理算法,并分析了维吾尔语词法结构,音节结构,词干—词缀连接形式等技术。处理弱化问题时,要根据词干库检查弱化属性,并根据语音和谐规律分析是否正确连接。该算法在文本检索、词频统计、文本校对等研究领域得到很好的应用。运行结果表明该算法具有可行性和有效性,并在实践中不断完善。  相似文献   

17.
An approach to the problem of inter-speaker variability in automatic speech recognition is described which exploits systematic vowel differences in a two-stage process of adaptation to individual speaker characteristics. In stage one, an accent identification procedure selects one of four gross regional English accents on the basis of vowel quality differences within four calibration sentences. In stage two, an adjustment procedure shifts the regional reference vowel space onto the speaker's vowel space as calculated from the accent identification data. Results for 58 speakers from the four regional accent areas are presented.  相似文献   

18.
该文从提高语音合成自然度的实际需求出发,首次从实验语音学的角度从《维吾尔语语音声学参数库》中统计出了333个三音节词,其中再筛选了93个全和谐词和半和谐词,并对其元音的宽带共振峰模式、共振峰值、音高、时长和音强等韵律参数进行了统计分析,归纳了其共振峰、音高、时长和音强分布特点来考察元音和谐的基本声学特征,总结出了一些重要的规则和结论,为参数式或波形拼接式语音合成系统中调整合成前的元音和谐问题提供了重要的参考依据。  相似文献   

19.
维吾尔语中清化元音的实验语音学研究   总被引:1,自引:0,他引:1  
该文根据语音合成与识别等语音应用研究的需求,从文本分析模块入手,利用“维吾尔语语音声学参数库”,选择了带高元音/i/,/u/和/ü/的多音节词(双音节、三音节词),分别对其发生清化和保持原来浊特性时的三种高元音的时长,音高和音强进行了统计分析,归纳了其发生清化时的时长、共振峰和音强在开音节和闭音节中的分布模式,从实验语音学的角度出发,进一步探讨了维吾尔语中三个高元音的清化特性,并验证了语言学者凭听力和生理而总结出来的结论与声学上的结论的一致性。其目的是为了提高语音合成的自然度即更好的为自然语言处理服务。该项研究对维吾尔语语言乃至整个阿尔泰语系语言的韵律研究具有较高的参考价值。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号