首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 125 毫秒
1.
基于PCNN的语谱图特征提取在说话人识别中的应用   总被引:8,自引:1,他引:7  
该文首次提出了一种将有生物视觉依据的人工神经网络——脉冲耦合神经网络(PulseCoupledNeuralNetwork,以下简称为PCNN)用于说话人识别领域的语谱图特征提取的新方法。该方法将语谱图输入到PCNN后得到输出图像的时间序列及其熵序列作为说话人语音的特征,利用它的不变性实现说话人识别。实验结果表明,该方法可以快速有效地进行说话人识别。该文将PCNN引入到语音识别的应用研究中,开拓了信号处理中两个极为重要的部分———语音信号处理和图像信号处理结合的新领域,同时对于PCNN的理论研究和实际应用具有非常重要的现实意义。  相似文献   

2.
脉冲耦合神经网络(Pulse Coupled Neural Network,PCNN)是基于生物视觉特性而提出的新一代人工神经网络,它在数字图像处理及人工智能等领域具有广泛应用前景.本文通过研究PCNN理论模型及其工作特性的基础上提出了一种提取人脸特征的方法.首先利用小波变换提取人脸图像低频特征,降低人脸图像的维度,然后利用简化的PCNN提取小波低频系数重构后的人脸图像的相应时间序列,并以此作为人脸识别的特征序列.最后利用时间序列和欧式距离完成人脸的识别过程.本文通过ORL人脸库进行实验证明了该方法的有效性.  相似文献   

3.
在人机语音交互系统中,机器不仅要具有理解人类语音的能力,还应当具有识别说话人情感的能力.本文提出了基于高斯混合模型(GMM)的序列分类和识别的改进方法,并将该方法引入到语音情感识别的研究中.本文提出了观测值次序均衡的方法.实验结果证明这种新的方法有效地提高了语音情感识别的准确率.  相似文献   

4.
基于PCNN和RBF的孤立词语音识别研究   总被引:1,自引:0,他引:1  
通过对孤立词语音识别现状的研究,提出了一种利用简化脉冲耦合神经网络PCNN和径向基函数RBF神经网络进行语音识别的新方法.利用语音信号的"可视"特性--语谱图,采用PCNN得到它的时间序列标识图作为语音信号的特征参数,辅以传统的RBF的语音识别方法,实现了孤立词语音识别.仿真结果表明,与其它方法比较,该方法能够达到较高的语音识别率.  相似文献   

5.
李建锋 《计算机工程与设计》2011,32(4):1398-1400,1405
提出一种仿生物视觉算法模型的彩色人脸图像识别方法——视觉交叉皮层时间序列人脸特征提取算法。将彩色人脸图像从RGB空间转换至HSI空间,对HSI空间的各个图像分量分别提取时间序列,将各个分量的时间序列连接形成整体的人脸图像特征。该序列对不同人脸图像具有较高的区分度,而对于不同角度的相同人脸图像却表现出一致性。用第一范式距离作为判据进行人脸图像识别,并与PCAI、CA以及基本的PCNN进行比较,实验结果表明,提出的方法识别速度明显高于比较的几种方法,识别准确率可以达到86.73%。  相似文献   

6.
语谱图是语音信号的时频表示,含有丰富的信息。把语谱图输入到脉冲耦合神经网络(PCNN)可以获得语音的特征矢量。传统的语音特征采用PCNN50次迭代的点火次数。提出了一种新的语音特征参数,该参数基于PCNN神经元点火位置的信息。说话人识别的实验表明,新语音特征比传统的特征更能反映话者语音信号的特点,获得更好的识别结果。  相似文献   

7.
基于修正PCNN的多传感器图像融合方法   总被引:4,自引:0,他引:4       下载免费PDF全文
多传感器图像融合技术作为信息融合的重要分支和研究热点,已广泛应用在机器视觉、医疗诊断、军事遥感等领域。为了更好地进行多传感器图像融合,将在图像分割、目标识别等领域具有独特优势的脉冲耦合神经网络(pulse coupled neural network,PCNN)引入到多传感器图像融合领域中来,提出了一种基于修正PCNN的多源图像融合方法,该方法在区域分割的基础上,先提取区域特征,然后由特征指导融合过程;同时,从目标区域相对于背景的显著性出发,提出了一种反映目标区域突出性的新特征,并针对传统PCNN参数无法自动设定的难题,提出了基于修正PCNN的参数自动设定方案。实验结果表明,该方法无论在主观视觉效果,还是客观评价参数上均优于基于多分辨分析的融合算法,且克服了传统像素级融合方法中融合图像模糊、对噪声敏感等不足,尤其适用于图像不能严格配准的应用场合。这对于拓宽PCNN的理论研究和实际应用具有一定价值。  相似文献   

8.
提出了一种改进脉冲耦合神经网络(IPCNN)实现语音识别的方法。首先利用IPCNN来快速提取语音的语谱图图像特征,然后由概率神经网络(PNN)辅助来识别语音。通过训练语音样本来构成语音识别库并建立综合识别系统。实验结果表明,本方法相对于单独使用PCNN和PNN识别率分别提高了22.7%和39.4%,达到92%的识别率。  相似文献   

9.
王忠民  刘戈  宋辉 《计算机工程》2019,45(8):248-254
在语音情感识别中提取梅尔频率倒谱系数(MFCC)会丢失谱特征信息,导致情感识别准确率较低。为此,提出一种结合MFCC和语谱图特征的语音情感识别方法。从音频信号中提取MFCC特征,将信号转换为语谱图,利用卷积神经网络提取图像特征。在此基础上,使用多核学习算法融合音频特征,并将生成的核函数应用于支持向量机进行情感分类。在2种语音情感数据集上的实验结果表明,与单一特征的分类器相比,该方法的语音情感识别准确率高达96%。  相似文献   

10.
在语音情感识别研究中,已有基于深度学习的方法大多没有针对语音时频两域的特征进行建模,且存在网络模型训练时间长、识别准确性不高等问题。语谱图是语音信号转换后具有时频两域的特殊图像,为了充分提取语谱图时频两域的情感特征,提出了一种基于参数迁移和卷积循环神经网络的语音情感识别模型。该模型把语谱图作为网络的输入,引入AlexNet网络模型并迁移其预训练的卷积层权重参数,将卷积神经网络输出的特征图重构后输入LSTM(Long Short-Term Memory)网络进行训练。实验结果表明,所提方法加快了网络训练的速度,并提高了情感识别的准确率。  相似文献   

11.
In this work, spectral features extracted from sub-syllabic regions and pitch synchronous analysis are proposed for speech emotion recognition. Linear prediction cepstral coefficients, mel frequency cepstral coefficients and the features extracted from high amplitude regions of spectrum are used to represent emotion specific spectral information. These features are extracted from consonant, vowel and transition regions of each syllable to study the contribution of these regions toward recognition of emotions. Consonant, vowel and the transition regions are determined using vowel onset points. Spectral features extracted from each pitch cycle, are also used to recognize emotions present in speech. The emotions used in this study are: anger, fear, happy, neutral and sad. The emotion recognition performance using sub-syllabic speech segments are compared with the results of conventional block processing approach, where entire speech signal is processed frame by frame. The proposed emotion specific features are evaluated using simulated emotion speech corpus, IITKGP-SESC (Indian Institute of Technology, KharaGPur-Simulated Emotion Speech Corpus). The emotion recognition results obtained using IITKGP-SESC are compared with the results of Berlin emotion speech corpus. Emotion recognition systems are developed using Gaussian mixture models and auto-associative neural networks. The purpose of this study is to explore sub-syllabic regions to identify the emotions embedded in a speech signal, and if possible, to avoid processing of entire speech signal for emotion recognition without serious compromise in the performance.  相似文献   

12.
探索在不同的情感状态下的基音特征变化规律.通过对含有生气、高兴、悲伤情感语音信号进行分析,总结了情感语音基频的变化规律,确定了用于情感识别的12维的基频的基本特征以及扩展特征,运用混合高斯模型进行情感识别,并作了识别实验,获得了较好的结果.  相似文献   

13.
The recognition of emotion in human speech has gained increasing attention in recent years due to the wide variety of applications that benefit from such technology. Detecting emotion from speech can be viewed as a classification task. It consists of assigning, out of a fixed set, an emotion category e.g. happiness, anger, to a speech utterance. In this paper, we have tackled two emotions namely happiness and anger. The parameters extracted from speech signal depend on speaker, spoken word as well as emotion. To detect the emotion, we have kept the spoken utterance and the speaker constant and only the emotion is changed. Different features are extracted to identify the parameters responsible for emotion. Wavelet packet transform (WPT) is found to be emotion specific. We have performed the experiments using three methods. Method uses WPT and compares the number of coefficients greater than threshold in different bands. Second method uses energy ratios of different bands using WPT and compares the energy ratios in different bands. The third method is a conventional method using MFCC. The results obtained using WPT for angry, happy and neutral mode are 85 %, 65 % and 80 % respectively as compared to results obtained using MFCC i.e. 75 %, 45 % and 60 % respectively for the three emotions. Based on WPT features a model is proposed for emotion conversion namely neutral to angry and neutral to happy emotion.  相似文献   

14.
Speech emotion recognition has been one of the interesting issues in speech processing over the last few decades. Modelling of the emotion recognition process serves to understand as well as assess the performance of the system. This paper compares two different models for speech emotion recognition using vocal tract features namely, the first four formants and their respective bandwidths. The first model is based on a decision tree and the second one employs logistic regression. Whereas the decision tree models are based on machine learning, regression models have a strong statistical basis. The logistic regression models and the decision tree models developed in this work for several cases of binary classifications were validated by speech emotion recognition experiments conducted on a Malayalam emotional speech database of 2800 speech files, collected from ten speakers. The models are not only simple, but also meaningful since they indicate the contribution of each predictor. The experimental results indicate that speech emotion recognition using formants and bandwidths was better modelled using decision trees, which gave higher emotion recognition accuracies compared to logistic regression. The highest accuracy obtained using decision tree was 93.63%, for the classification of positive valence emotional speech as surprised or happy, using seven features. When using logistic regression for the same binary classification, the highest accuracy obtained was 73%, with eight features.  相似文献   

15.
提出一种基于样本熵与Mel频率倒谱系数(MFCC)融合的语音情感识别方法。利用支持向量机分别对样本熵统计量与MFCC进行处理,计算其属于高兴、生气、厌烦和恐惧4种情感的概率,采用加法规则和乘法规则对情感概率进行融合,得到识别结果。仿真实验结果表明,该方法的识别率较高。  相似文献   

16.
PCNN is a novel neural network model to simulate the synchronous phenomenon in the visual cortex system of the mammals. It has been widely used in the field of image processing and pattern recognition. However, there are still some limitations when it is applied to solve image processing problems, such as trial-and-error parameter settings and manually selection of the final results. This paper studies a simple model of PCNN(S-PCNN) and applies it to image segmentation. The main contributions of this paper are: (1) A new method based on the simplified model of PCNN is proposed to segment the images automatically. (2) The parameter settings are studied to ensure that the threshold decay of S-PCNN would be adaptively adjusted according to the overall characteristics of the image. (3) Based on the time series in S-PCNN, a simple selection criteria for the final results is presented to promote efficiency of the proposed method. (4) Simulations are carried out to illustrate the performance of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号