首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 109 毫秒
针对前后相邻情感语句的情感变化存在相互关联的特性,提出基于情感上下文的情感推理算法.该算法首先利用传统语音情感特征和上下文语音情感特征分别识别待分析情感语句的情感状态,然后借助情感交互矩阵及两类情感特征识别结果的置信度对待测试语句的情感状态进行融合推理.在此基础上,建立语音情感上下文推理规则,利用该规则根据相邻语句的情感状态对待分析情感语句情感状态进行调整,最终得出待分析情感语句所属的情感类别.在自行录制的包含6种基本情感数据库上的实验结果表明,与仅采用声学特征的方法相比,文中提出方法平均识别率提高12.17%.  相似文献   

Transformer作为一种新的深度学习算法框架,得到了越来越多研究人员的关注,成为目前的研究热点.Transformer模型中的自注意力机制受人类只关注于重要事物的启发,只对输入序列中重要的信息进行学习.对于语音识别任务来说,重点是把输入语音序列的信息转录为对应的语言文本.过去的做法是将声学模型、发音词典和语言模型组成语音识别系统来实现语音识别任务,而Transformer可以将声学、发音和语言模型集成到单个神经网络中形成端到端语音识别系统,解决了传统语音识别系统的强制对齐和多模块训练等问题.因此,探讨Transformer在语音识别任务中存在的问题是非常有必要的.首先介绍Transformer的模型结构,并且从输入语音序列、深层模型结构和模型推理过程三方面对语音识别任务面临的问题进行分析;其次对现阶段解决语音识别中Transformer模型存在输入语音序列、深层模型结构和模型推理过程的问题进行方法总结和简要概述;最后对Transformer在语音识别任务中的应用方向进行总结和展望.  相似文献   

简要地介绍了用于语音分析合成的时城基音同步叠加算法,在此基础上提出一种汉语语音时域声调转换方法。利用这种方法可以将一种声调的语音转换为另一种声调的语音,除微小的音质降低外,仍可保持较好的语音质量。这种方法直接对语音波形进行处理,具有计算简单、能在一般微型计算机上进行实时的特点。将之用于语音合成系统,可以通过相同声韵母的音节只存储一种声调的语音数据而大大降低音库的容量;用这种方法按照汉语语句的语调变化规律来合成语句,还可以较好地改善汉语语句合成的自然度.  相似文献   

韦向峰  张全  熊亮 《计算机科学》2006,33(10):152-155
汉语语音识别的研究越来越重视与语言处理的结合,语音识别已经不是单纯的语音信号处理。N-gram语言模型应用到语音识别系统中,大大增强了系统的正确率和稳定性,但它也有其自身的局限性,使得语音识别出现许多语法和语义的错误结果。本文分析了语音识别产生语音和文字方面的错误的原因和类型,在概念层次网络语言模型的基础上提出了一种基于语句语义分析和混淆音矩阵的语音识别纠错方法。通过三个发音人、5万字的声音语料和216句实验语句的纠错测试,本文的纠错系统在纠正语义搭配型错误方面有比较好的表现,可克服N-gram语言模型带来的一些缺陷。本文提出的纠错方法还可以融合到语音识别系统中,以便更好地为语音识别的纠错处理服务。  相似文献   

语句级汉字输入技术   总被引:10,自引:6,他引:4  
本文讨论了包括声音输入、键盘输入、文字识别等各种形式的汉字输入技术的研究和发展, 阐述了按照字、词、语句作为汉字输入技术发展阶段的思怒, 提出了适用于上述各种形式的类码语句歧义处理问题, 该问题可描述为有向图求最短路径的问题。本文讨论了采用语法—语义分析和统计模型的最少元素概率推理方法和控制策略, 在知识库完备或不完备的情况下均可进行正常的推理, 并给出基于当时情况下的最佳结果。本文还简要介绍了几个应用事例。  相似文献   

本文描述了一个用于智能机器人的语音系统,它在机器人的有限环境中,能够理解用汉语口语形式输入的自然语言。该系统由语音识别、自然语言理解、语音合成三部分组成。系统基于 Prolog 谓词逻辑对输入语句进行句法分析及语义分析,以此得到它的内部表达式,绘出它的推导树。根据输入语句的形式,或执行命令,或回答问题。整个系统用 Prolog 语言完成,在 IBM—PC/XT 机上运行。  相似文献   

针对语音情感的动态特性,利用动态递归Elman神经网络实现语音情感识别系统.通过连接记忆上时刻状态与当前网络一并输入,实现Elman网络模型的状态反馈.基于此设计了语音情感识别系统,该系统能在后台修改网络类型,并实现单语句与批量语句识别模式.针对系统进行语音情感识别实验表明,基于Elman神经网络的语音情感识别在同等参数模型设置前提下优于BP神经网络识别效果,且BP神经网络参数设置较Elman网络敏感.  相似文献   

针对语音情感的动态特性,利用动态递归Elman神经网络实现语音情感识别系统。通过连接记忆上时刻状态与当前网络一并输入,实现Elman网络模型的状态反馈。基于此设计了语音情感识别系统,该系统能在后台修改网络类型,并实现单语句与批量语句识别模式。针对系统进行语音情感识别实验表明,基于Elman神经网络的语音情感识别在同等参数模型设置前提下优于BP神经网络识别效果,且BP神经网络参数设置较Elman网络敏感。  相似文献   

介绍应用于“虎丘,,旅游服务的非特定人语音对话系统.该系统运用互信息匹配模型MIM进行音节识别,并提出了关键词句法模型及相应的分析算法KBP进行语句分析和识别.实验表明,互信息匹配模型MIM的应用使连续语音音节识别率达到78%,而关键词句法分析的运用使系统总体语句识别率提高了65%,而且对预处理部分音节分割的误差以及不规范的语句输入有较好的容错处理能力.  相似文献   

音字转换中的机器学习研究   总被引:6,自引:2,他引:6  
本文提出了音字转换学习系统的模型,给出了它所采用的三种机器学习形式:单词学习,规则学习,参数修正学习、单词以及规则的自动获取用于确定的推理机制,而非确定规则的自动获取以及可信度函数的自适应调整主要用于概率推理上,基于上述学习机制所进行的数万字的学习实验结果表明,机器学习在改进音字转换的系统性能(如正确率、通用性等)上,具有相当好的效果,目前已经在语句级声音输入、键盘输入等汉字系统上实用。  相似文献   

In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions.  相似文献   

In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.  相似文献   

音节是维吾尔语的最小发音单元,所以大部分维吾尔语语音合成系统以音节作为基本的合成单元,但维吾尔语中音节数量很大,语料库很难保证覆盖所有的音节样本,这会导致合成语音不稳定和不连续。为解决合成语音不稳定的情况,提出了结合单音素和三音素两个不同基元的单元挑选算法。通过在单元挑选模块中加入韵律参数相匹配的方法选出最佳韵律匹配的单元并解决了合成语音不连续的情况。实验结果表明,提出的方法有效地解决了合成语音不稳定和不连续的现象,从而提高了合成语音的自然度。  相似文献   

本文介绍了一个基于语音参数规则合成的汉语文语转换系统。本系统采用汉语音节和词汇作为合成单元,保留了音节构词时音节与音节之间以及音节内部的超音段信息,保证了合成语音的自然度;采用目前较成功的CELP语音编码方法对合成单元进行压缩,在20多倍的情况下仍能保证合成语音的高清晰度。作者在构建系统时对系统软件的完善考虑以及对用户编程接口的设计,使得该系统成为一个有广泛用途的汉语文语转换系统。  相似文献   

Visyllable Based Speech Animation   总被引:1,自引:0,他引:1  

In mandarin all-syllable recognition,many insert errors occur due to the influence of non-consonant syllables.Introducing the duration model into the recognition process is a direct way to lessen these errors.But that usually could not work well as expected,for the duration is sensitive to speech rate.Hence,aiming at this problem,a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM).To realize this algorithm,the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate;and finally implement this duration information in the post-processing stage.With little change in the recognition process and resource demand,the duration model is adopted efficiently in the system.The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora.Especially for the insertions,the error rates reduce about sixty to eighty percent.  相似文献   

Text-independent speech segmentation is a challenging topic in computer-based speech recognition systems. This paper proposes a novel time-domain algorithm based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-term energy, zero-crossing rate and the singularity exponents are the time-domain features that we have calculated in each point of speech signal in order to exploit relevant information for generating the significant segments. This is down for the phoneme or syllable identification and the transition fronts. Fuzzy logic technique helped us to fuzzify the calculated features into three complementary sets namely: low, medium, high and to perform a matching phase using a set of fuzzy rules. The outputs of our proposed algorithm are silence, phonemes, or syllables. Once evaluated, our algorithm produced the best performances with efficient results on Fongbe language (an African tonal language spoken especially in Benin, Togo and Nigeria).  相似文献   

该研究使用电话对话语料,在统计的基础上对语句的音高下倾进行了考察。发现绝大多数语句的音高都是逐渐下降的,音高曲线前高后低的走势有其生理上的原因,并且具有标界功能。少数语句音高不下降,这与词语的载义重度、焦点及音节本调有关。该文又对陈述句和疑问句的音高进行了考察,发现与陈述句相比,疑问句的整体音域较大,句末无疑问语气词的是非问句末尾两音节间的降幅较小。  相似文献   

Studies of human speech processing have provided evidece for a segmentation strategy in the perception of continuous speech, whereby a word boundary is postulated, and a lexical access procedure initiated, at each metrically strong syllable. The likely success of this strategy was here estimated against the characteristics of the English vocabulary. Two computerized dictionaries were found to list approximately three times as many words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables containing a reduced vowel). Consideration of frequency of lexical word occurrence reveals that words beginning with strong syllables occur on average more often than words beginning with weak syllables. Together, these findings motivate an estimate for everyday speech recognition that approximately 85% of lexical words (i.e. excluding function words) will begin with strong syllables. This estimate was tested against a corpus of 190 000 words of spontaneous British English conversion. In this corpus, 90% of lexical words were found to begin with strong syllables. This suggests that a strategy of postulating word boundaries at the onset of strong syllables would have a high success rate in that few actual lexical word onsets would be missed.  相似文献   

音节是泰语构词和读音的基本单位,泰语音节切分对泰语词法分析、语音合成、语音识别研究具有重要意义。结合泰语音节构成特点,提出基于条件随机场(Conditional Random Fields)的泰语音节切分方法。该方法结合泰语字母类别和字母位置定义特征,采用条件随机场对泰语句子中的字母进行序列标注,实现泰语音节切分。在InterBEST 2009泰语语料的基础上,标注了泰语音节切分语料。针对该语料的实验表明,该方法能有效利用字母类别和字母位置信息实现泰语音节切分,其准确率、召回率和F值分别达到了99.115%、99.284%和99.199%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号