首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
沈凌洁  王蔚 《声学技术》2018,37(2):167-174
提出一种基于韵律特征(基频、时长)和梅尔倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征的融合特征进行短语音汉语声调识别的方法,旨在利用两种特征的优势提高短语音汉语声调识别率。该融合特征包括7个根据不同模型得到的韵律特征和统计参数以及4个从每个音段的梅尔倒谱系数计算得来的对数化后验概率,使用高斯混合模型表示4个声调的倒谱特征的分布。实验分两步:第一步,将基于韵律特征和倒谱特征的分类器在决策阶段混合起来进行声调分类,分别赋予两个分类器权重,计算倒谱特征和韵律特征在声调分类任务中的权重;第二步,将基于字的韵律特征和基于帧的倒谱特征结合起来生成融合特征的超向量,使用融合特征进行汉语声调识别,根据准确率、未加权平均召回率(Unweigted Average Recall,UAR)和科恩卡帕(Cohen’s Kappa)系数3个指标,比较并评估5种分类器(两种设置的高斯混合模型,后向传播神经网络,支持向量机和卷积神经网络(Convolutional Neural Network,CNN))在不平衡数据集上的分类效果。实验结果表明:(1)倒谱特征方法能够提高汉语声调的识别率,该特征在总体分类任务中的权重为0.11;(2)基于融合特征的深度学习(CNN)方法对声调的识别率最高,为87.6%,与高斯混合模型的基线系统相比,提高了5.87%。该研究证明了倒谱特征法能够提供与韵律特征法互补的信息,从而提高短语音汉语声调识别率;同时,该方法可以运用到韵律检测和副语言信息检测等相关研究中。  相似文献   

2.
S HAWKINS 《Sadhana》2011,36(5):555-586
This paper reassesses conventional assumptions about the informativeness of the acoustic speech signal, and shows how recent research on systematic variability in the acoustic signal is consistent with an alternative linguistic model that is more biologically plausible and compatible with recent advances in modelling embodied visual perception and action. Standard assumptions about the information available from the speech signal, especially strengths and limitations of phonological features and phonemes, are reviewed, and compared with an alternative approach based on Firthian prosodic analysis (FPA). FPA places more emphasis than standard models on the linguistic and interactional function of an utterance, de-emphasizes the need to identify phonemes, and uses formalisms that force us to recognize that every perceptual decision is context- and task-dependent. Examples of perceptually-significant phonetic detail that is neglected by standard models are discussed. Similarities between the theoretical approach recommended and current work on perception–action robots are explored.  相似文献   

3.
李涛  曹辉  郭乐乐 《声学技术》2018,37(4):367-371
为了提升连续语音识别系统性能,将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器(Deep Auto-Encoding,DAE),经过预训练和微调两个步骤提取语音信号的本质特征,使用与上下文相关的三音素模型,以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征以及优化后的MFCC特征,基于深度自编码器提取的深度特征更具优越性。  相似文献   

4.
从提高满足少数民族普通话高自然度语音合成与高精度语音识别的实际应用需求出发,首次从实验语音学的角度对初级、中级和高级阶段的50名维吾尔族汉语学习者与10名母语为汉语普通话的说话人声调的一阶差分与时长以及相似度进行对比,并对其声调的一阶差分模式、声调时长等韵律参数进行了实验分析,得出维吾尔族学生对汉语声调的偏误情况以及与中国少数民族汉语水平等级考试(Master of Human Kinetics, MHK)成绩的关系。通过实验结果可以发现,三组维吾尔族人学习普通话的声调都有困难。两种语言的音系,语调和重音等特性影响了第二语言中的声调特性。归纳了维吾尔族学习者声调的基本声学特征,总结出了一些重要的规则和结论;为解决给汉语语音处理带来的困难,尤其是少数民族汉语的语音合成和语音识别方面的声调问题,提供了重要的参考依据。  相似文献   

5.
针对传统鸟声识别算法中特征提取方式单一、分类识别准确率低等问题,提出一种结合卷积神经网络和Transformer网络的鸟声识别方法。该方法综合考虑网络局部特征学习和全局上下文依赖性构造,从原始鸟声音频信号中提取短时傅里叶变换(Short Time Fourier Transform,STFT)语谱图特征,将其输入到卷积神经网络(ConvolutionalNeural Network,CNN)中提取局部频谱特征信息,同时提取鸟声信号的对数梅尔特征及一阶差分、二阶差分特征用于合成梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)混合特征向量,将其输入到Transformer网络中获取全局序列特征信息,最后融合所提取的特征可得到更丰富的鸟声特征参数,通过Softmax分类器得到鸟声识别结果。在Birdsdata和xeno-canto鸟声数据集上进行实验,平均识别准确率分别达到了97.81%和89.47%。实验结果表明该方法相较于其他现有的鸟声识别模型具有更高的识别准确率。  相似文献   

6.
Abstract

Mandarin Chinese is a tonal language, in which every syllable is assigned a tone that has a lexical meaning. Therefore tone recognition is very important for Mandarin speech. This paper presents a method for continuous speech tone recognition. Context‐dependent discrete hidden Markov models (HMM's) are used taking into account the tones of the syllables on both sides, and special efforts were made in selecting the minimum number of key context‐dependent models considering the characteristics of the tones. The results indicate that a total of 23 context‐dependent models have very good potential to describe the complicated tone behavior for all 175 possible tone concatenation conditions in continuous speech, such that the required training data can be reduced to a minimum and the recognition process can be simplified significantly. The best achievable recognition rate is 83.55 %.  相似文献   

7.
在大数据规模下,基于深度学习的语音识别技术已经相当成熟,但在小样本资源下,由于特征信息的关联性有限,模型的上下文信息建模能力不足从而导致识别率不高.针对此问题,提出了一种嵌入注意力机制层(Attention Mechanism)的时延神经网络(Time Delay Neural Network,TDNN)结合长短时记忆...  相似文献   

8.
Abstract

In this paper, the performance of several speech recognition techniques applied on the highly confusing Mandarin syllables were carefully compared, including dynamic time warping (DTW), the newly proposed DTW with superimposed weighting function (DTWW), the discrete hidden Markov models (DHMM) and the continuous hidden Markov models (CHMM). The vocabulary used here consists of 409 first tone isolated Mandarin syllables. Due to the fact that many confusing sets exist in this vocabulary, the accurate recognition of these syllables is relatively difficult, and all the recognition experiments were performed in the speaker dependent mode. After a series of 13 experiments, it was found that the recognition rate of the newly proposed DTWW (88.3) is higher than that of DTW (85.1), DHMM (65.0) and CHMM (83.9), and that the CPU time used for DTWW is 1.03 times that for DTW, 24 times that for DHMM and 4.3 times that for CHMM. In addition, the memory space required for DTWW and DTW is 3.4 times that of DHMM and 8.5 times that of CHMM. Therefore, DTWW has the highest recognition rate, DHMM has the fastest recognition speed, whereas CHMM appears to be very attractive when all the different factors including recognition rate, recognition speed and memory space requirement are considered.  相似文献   

9.
The results of this paper show that neural networks could be a very promising tool for reliability data analysis. Identifying the underlying distribution of a set of failure data and estimating its distribution parameters are necessary in reliability engineering studies. In general, either a chi-square or a non-parametric goodness-of-fit test is used in the distribution identification process which includes the pattern interpretation of the failure data histograms. However, those procedures can guarantee neither an accurate distribution identification nor a robust parameter estimation when small data samples are available. Basically, the graphical approach of distribution fitting is a pattern recognition problem and parameter estimation is a classification problem where neural networks have been proved to be a suitable tool. This paper presents an exploratory study of a neural network approach, validated by simulated experiments, for analysing small-sample reliability data. A counter-propagation network is used in classifying normal, uniform, exponential and Weibull distributions. A back-propagation network is used in the parameter estimation of a two-parameter Weibull distribution.  相似文献   

10.
本文提出了一种基于人工神经网络的无芯片电子标签识别方法.首先利用CST对不同夹角的标签进行仿真,得到水平和垂直两个方向的散射场,然后建立识别系统的神经网络模型,并利用该模型实现对标签的识别.该方法的优点是,使用1/2的样本数据训练神经网络模型,就可以得到准确的识别结果.仿真结果表明,基于神经网络的识别方法的误差在5°的范围内.  相似文献   

11.
针对语音情感识别任务中特征提取单一、分类准确率低等问题,提出一种3D和1D多特征融合的情感识别方法,对特征提取算法进行改进.在3D网络,综合考虑空间特征学习和时间依赖性构造,利用双线性卷积神经网络(Bilinear Convolutional Neural Network,BCNN)提取空间特征,长短期记忆网络(Sho...  相似文献   

12.
In the present paper, two models based on artificial neural networks and genetic programming for predicting split tensile strength and percentage of water absorption of concretes containing Fe2O3 nanoparticles have been developed. To build these models, training and testing of the network by using experimental results from 144 specimens produced with 16 different mixture proportions were conducted. The data used in the multilayer feed forward neural networks models and input variables of genetic programming models have been arranged in a format of eight input parameters that cover the cement content, nanoparticle content, aggregate type, water content, the amount of superplasticizer, the type of curing medium, age of curing and number of testing try. According to these input parameters, in the two models, the split tensile strength and percentage of water absorption values of concretes containing Fe2O3 nanoparticles were predicted. The training and testing results in the neural network and genetic programming models have shown that every two models are of strong potential for predicting the split tensile strength and percentage of water absorption values of concretes containing Fe2O3 nanoparticles. Although neural network has predicted better results, genetic programming is able to predict reasonable values with a simpler method rather than neural network.  相似文献   

13.
Abstract

By taking advantage of four‐tone structure in the pitch contour of Mandarin speech, we described text‐independent speaker identification using orthogonal pitch parameters. Slopes, mean and duration of the pitch contours of words in an utterance are taken as recognition features. An identification rate of 85% is achieved by using the parameters of pitch contour only. When incorporating parameters of pitch contour with the parameter of vocal tract, this system outperforms that using parameters of vocal tract or pitch contour only. A recognition rate of 99.7% is reached in such a system.  相似文献   

14.
朱敏  姜芃旭  赵力 《声学技术》2021,40(5):645-651
语音情感识别是人机交互的热门研究领域之一。然而,由于缺乏对语音中时频相关信息的研究,导致情感信息挖掘深度不够。为了更好地挖掘语音中的时频相关信息,提出了一种全卷积循环神经网络模型,采用并行多输入的方式组合不同模型,同时从两个模块中提取不同功能的特征。利用全卷积神经网络(Fully Convolutional Network,FCN)学习语音谱图特征中的时频相关信息,同时,利用长短期记忆(Long Short-Term Memory,LSTM)神经网络来学习语音的帧级特征,以补充模型在FCN学习过程中缺失的时间相关信息,最后,将特征融合后使用分类器进行分类,在两个公开的情感数据集上的测试验证了所提算法的优越性。  相似文献   

15.
目的 针对现有钢材缺陷识别算法特征图利用不充分、识别准确率低、参数量大等问题,基于脉冲神经网络,提出一种用于钢材缺陷识别的稠密卷积脉冲神经网络(DCSNN)模型,减少系统消耗和内存占用。方法 首先,采用卷积编码,对输入图片进行特征提取和编码。其次,采用稠密连接算法搭建稠密卷积脉冲神经网络,实现特征重复利用,抑制梯度消失,并通过替代梯度下降算法进行网络训练。最后,在带钢数据集上进行测试,实现带钢缺陷识别。结果 实验结果显示,DCSNN在测试集上的准确率为98.61%,参数量为0.5万,结论 在钢材表面缺陷识别问题上表现出良好效果。  相似文献   

16.
语音增强在语音信号处理的前端非常重要,直接影响后端语音识别等效果。目前用神经网络进行单通道语音分离对于解决鸡尾酒会问题取得了很大的进步,但是用于复杂混合语音时分离效果仍不令人满意。针对单通道情形下的不足,使用多通道结构形成4个方向的超指向波束,结合神经网络算法实现对于指定方向的目标语音增强。仿真和实验结果表明,该算法相较于超指向波束形成算法和谱减法在多种评价指标上均有了明显的提升。  相似文献   

17.
Artificial neural networks are computer algorithms or computer programs derived in part from attempts to model the activity of nerve cells. They have been applied to pattern recognition, classification, and optimization problems in the physical and chemical sciences, as well as in other fields. We introduce the principles of the multilayer feedforward network that is among the most commonly used neural networks in practical problems. The relevance of neural network models for the applied statistician is considered using a time series prediction problem as an example. The multilayer feedforward neural network uses a nonlinear function of the predictors to obtain predictions for future time series values. We illustrate the considerations involved in specifying a neural network model and evaluate the accuracy of neural network models relative to the accuracy obtained using other computer-intensive, nonmodel-based techniques.  相似文献   

18.
Abstract

Neural networks can be a useful tool to analyse the oxidation and corrosion behaviour of materials at high temperature. Examples are given of the use of neural network models to analyse datasets of material behaviour after exposure to combustion, gasification and steam atmospheres. The use of networks to identify changes in mechanism, additional significant experimental parameters and the onset of spallation is demonstrated.

The limitations of neural network modelling are briefly discussed. Although they can be trained to fit any existing dataset, care must be taken in using the networks to predict a time sequence of events.  相似文献   

19.
针对语音情感识别中无法对关键的时空依赖关系进行建模,导致识别率低的问题,提出一种基于自身注意力(self-attention)时空特征的语音情感识别算法,利用双线性卷积神经网络、长短期记忆网络和多组注意力(multi-head attention)机制去自动学习语音信号的最佳时空表征.首先提取语音信号的对数梅尔(log...  相似文献   

20.
针对实际生产中不同种类轮毂的混流生产问题,提出了一种基于环形特征的卷积神经网络轮毂识别算法。将直角坐标下的环形轮毂映射到极坐标中,归一化为标准形式的矩形,提取轮毂图像的环形特征信息,减少冗余特征产生的影响;设计了一种改进的VGG网络架构,利用深度可分离卷积打破输出通道维度与卷积核大小的联系,在不损失网络性能的同时降低了计算量,能够在实际生产中轮毂识别任务在有限的算力情况下实时进行计算;从有效性和实时性两个方面对轮毂识别算法进行评估,且通过Inception V3、SVM、KNN等模型的对比实验,验证了该算法可以实时地对轮毂自适应分类。实验表明: 该方法对轮毂图像的处理精度达到99%以上,单幅图像平均处理时间降低至11.78ms。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号