首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 218 毫秒
基于两级BP模型的普通话声调识别系统   总被引:3,自引:2,他引:1  
普通话声调识别参数除常用的基音轮廓外,基音的一阶差分、能量及能量的一阶差分等也具一定的声调特征。实验结果表明:如果将各种参数同时作为一个BP模型的输入参数,声调识别率不但没有提高,反而显著下降,因此,该文提出了将各种参数分别训练一个各自的BP网络,再将这些网络的输出作为另一高层BP网络的输入的普通话声调识别方法。另外,针对上声的特点提出了一种改进的基音平滑算法。这些方法的运用使系统的声调识别率达到90.05%。  相似文献   

提出一种汉语音的声调修正方法,该方法由声调规则的应用和声调平滑两部分组成。方方法在我们研制的基于基音同步叠加的语间合成系统中使用在改善合成语句的自然度和可度方面取得了较好的效果。  相似文献   

王改良  武妍 《计算机应用》2010,30(10):2709-2711
基音频率轨迹能比较真实地反映汉语普通话中的声调特性,通过识别不同的基音轨迹来识别声调,是一种较好的方法。根据仿生模式识别理论,提出用迭代自组织数据分析算法(ISODATA)寻找覆盖区中心,运用多权值神经网络对每个聚类中心实现覆盖的方法,实现四种声调的识别。通过实验与隐马尔科夫模型(HMM)和支持向量机(SVM)算法比较,在少量样本的情况下,能得到相对较高的识别率。  相似文献   

汉语孤立字声调的模糊识别方法   总被引:1,自引:0,他引:1  
本文应用模糊集合来识别汉语孤立字的声调。孤独字的四声调可被描述成四种模式类的模糊集合。由于四声调的基音轮廓具有其固定模式, 因之在此基础上可构成模糊集合的隶属函数。方法中使用隶属函数为模式分类的判别函数。这些隶属函数既简单又易于计算, 故适宜实时执行。实验结果表明, 总的识别率高于99%。  相似文献   

三字词音节声调模式具有连续语音中音节声调模式的特征,声调的提取和识别远较孤立字困难。采用小波变换方法提取语音基音,用Fuzzy ARTMAP神经网络进行声调识别,获得了比BP网络更好的实验结果。分析了仿真参数对识别结果的影响,讨论了Fuzzy ARTMAP神经网络中的过拟合问题,给出了一种基于Fuzzy ARTMAP神经网络的三字词声调识别方法。  相似文献   

本文利用一种快速声调识别方法,用一电平消波,降低采样率和线性插值形成快速基音提取,并应用RBF神经网络对四声进行自动分类。方法具有简单可靠和容差性等特征。  相似文献   

利用语音信号与噪声信号具有不同相关特性的特点,提出了一种新的加权自相关基频检测算法,该方法可以提高噪声环境下基音检测的准确性。在分类器设计方面,通过引入支持矢量机,进一步提高低信噪比下的汉语声调识别率。实验结果表明,新方法对提高噪声环境下的声调识别效果是十分有效的。  相似文献   

利用语音信号与噪声信号具有不同相关特性的特点,提出了一种新的加权自相关基频检测算法,该方法可以提高噪声环境下基音检测的准确性。在分类器设计方面,通过引入支持矢量机,进一步提高低信噪比下的汉语声调识别率。实验结果表明,新方法对提高噪声环境下的声调识别效果是十分有效的。  相似文献   

通过对语音转换的研究,提出了一种把源说话人特征转换为目标说话人特征的方法。语音转换特征参数分为两类:(1)频谱特征参数;(2)基音和声调模式。分别描述信号模型和转换方法。频谱特征用基于音素的2维HMMS建模,F0轨迹用来表示基音和音调。用基音同步叠加法对基音周期﹑声调和语速进行变换。  相似文献   

通过对语音转换的研究,提出了一种把源说话人特征转换为目标说话人特征的方法。语音转换特征参数分为两类:(1)频谱特征参数;(2)基音和声调模式。分别描述信号模型和转换方法。频谱特征用基于音素的2维HMMS建模,F0轨迹用来表示基音和音调。用基音同步叠加法对基音厨期、声调和语速进行变换。  相似文献   

一种改进的基音周期提取算法   总被引:1,自引:0,他引:1  
摘 要 基音周期的提取在语音信号处理领域有着广泛的应用。受基于归一化自相关函数基音周期提取算法和多带激励(Multi-Band Excitation, MBE)声码器中基音检测算法的启发,本文提出了一种改进的基音周期提取算法。该算法主要由预处理、时域基音粗估、基音平滑、时变滤波搜索、小数基音周期估计等五个部分组成。实验表明:该算法能达到更高的搜索准确度,得到更加平滑的基音周期曲线;与传统自相关检测算法相比,该算法有很好的抗噪性;  相似文献   

为了完成特定领域的语音识别任务,利用有限的语料建立高性能的语言模型成为提高系统性能的关键。针对此问题,对特定领域的语言模型进行了研究。提出了利用高频新词来加强模型的领域特征的方法,采取了两种方案:一种是将高频新词直接加入原有字典,并在训练过程中增加这些新词的权重,使模型更能表达与领域相关的特征;一种是基于高频新词统计出一个和领域相关的小词表,并对这两种方案进行了比较研究。通过实验研究了适合汉语语言的平滑策略。最后,实验结果表明,对于特定领域问题,语言模型平滑算法对模型性能影响较大;采用适合汉语的Witten-Bell插值平滑,可以使识别率达到88.4%,比通用模型性能相对提高了18.18%。  相似文献   

Versatile surface detail editing via Laplacian coordinates   总被引:2,自引:0,他引:2  
This paper presents a versatile detail editing approach for triangular meshes based on filtering the Laplacian coordinates. More specifically, we first compute the Laplacian coordinates of the mesh vertices, then filter the Laplacian coordinates, and finally reconstruct the mesh from the filtered Laplacian coordinates by solving a linear least square system. The proposed detail editing method includes not only feature preserving smoothing but also enhancing. Furthermore, the proposed approach allows interactive editing of some user-specified frequencies and regions. Experimental results demonstrate that our method is much more versatile and faster than the existing methods.  相似文献   

A cascadic geometric filtering approach to subdivision   总被引:1,自引:0,他引:1  
A new approach to subdivision based on the evolution of surfaces under curvature motion is presented. Such an evolution can be understood as a natural geometric filter process where time corresponds to the filter width. Thus, subdivision can be interpreted as the application of a geometric filter on an initial surface. The concrete scheme is a model of such a filtering based on a successively improved spatial approximation starting with some initial coarse mesh and leading to a smooth limit surface.

In every subdivision step the underlying grid is refined by some regular refinement rule and a linear finite element problem is either solved exactly or, especially on fine grid levels, one confines to a small number of smoothing steps within the corresponding iterative linear solver. The approach closely connects subdivision to surface fairing concerning the geometric smoothing and to cascadic multigrid methods with respect to the actual numerical procedure. The derived method does not distinguish between different valences of nodes nor between different mesh refinement types. Furthermore, the method comes along with a new approach for the theoretical treatment of subdivision.  相似文献   

提出一种对含噪语音进行基频检测的新方法。先对含噪语音进行小波去噪,然后再经过预处理后,采用归一化的AMDF算法对语音进行基频提取,后期对基频信号采用搜索试探方法进行平滑处理,通过实验表明,该方法比传统方法有更好的鲁棒性,尤其在低信噪比的情况下。  相似文献   

This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.  相似文献   

In Continuous Speech Recognition (CSR) systems a Language Model (LM) is required to represent the syntactic constraints of the language. Then a smoothing technique needs to be applied to avoid null LM probabilities. Each smoothing technique leads to a different LM probability distribution. Test set perplexity is usually used to evaluate smoothing techniques but the relationship with acoustic models is not taken into account. In fact, it is well-known that to obtain optimum CSR performances a scaling exponential parameter must be applied over LMs in the Bayes’ rule. This scaling factor implies a new redistribution of smoothed LM probabilities. The shape of the final probability distribution is due to both the smoothing technique used when designing the language model and the scaling factor required to get the optimum system performance when integrating the LM into the CSR system. The main object of this work is to study the relationship between the two factors, which result in dependent effects. Experimental evaluation is carried out over two Spanish speech application tasks. Classical smoothing techniques representing very different degrees of smoothing are compared. A new proposal, Delimited discounting, is also considered. The results of the experiments showed a strong dependence between the amount of smoothing given by the smoothing technique and the way that the LM probabilities need to be scaled to get the best system performance, which is perplexity independent in many cases. This relationship is not independent of the task and available training data.  相似文献   

基频是发浊音时声带振动频率,通常用F0表示。在一个音节或连续的语音段中,F0是随时间变化的,这种变化的轨迹形成了基频曲线。基频曲线的走势可以反映出语句的重音、语调等韵律信息,所以对基频曲线的描述和研究就显得尤为重要。该文首先提出了一种基频曲线描述方法,即导数域编码方法,同时探讨了该编码方法在语音发音质量评价中对韵律的作用。实验结果表明基于该描述方法能够提高英语发音语调质量评价的性能,主观和客观评价的相关性由原来的基于基音极值差的0.38提高到0.49。  相似文献   

Accent is a reflection of an individual speaker??s regional affiliation and is shaped by the speaker??s community background. This study investigated the acoustic characteristics of two British regional accents??the Birmingham and Liverpool accents??and their correlations from a different approach. In contrast to previous accent-related research, where the databases are formed from large groups of single-accent speakers, this study uses data from an individual who can speak in two accents, thus removing the effects of inter-speaker variability and facilitating efficient identification and analysis of the accent acoustic features. Acoustic features such as formant frequencies, pitch slope, intensity and phone duration have been used to investigate the prominent features of each accent. The acoustic analysis was based on nine monophthongal vowels and three diphthongal vowels. In addition, an analysis of variance of formant frequencies along the time dimension was performed to study the perceived effects of vocal tract shape changes as the speaker switches between the two accents. The results of the analysis indicate that the formant frequencies, pitch slope, the intensity and the phone duration all vary between the two accents. Classification testing using linear discriminant analysis showed that intensity had the strongest effect on differentiating between the two accents followed by F3, vowel duration, F2 and pitch slope.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号