首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
针对现有时域、频域属性特征在区分情感状态上存在的局限性,提出一种基于相空间重构理论的非线性几何特征提取方法。首先,通过分析情感语音信号的最小延迟时间和嵌入维数来实现相空间重构;其次,在重构相空间下分析并提取基于轨迹描述轮廓的五种非线性几何特征;最后,结合韵律特征、MFCC特征和混沌特征,设计实验方案验证所提特征区分情感状态的能力并通过特征选择获得情感信息完整的最优特征集合。选用德语柏林语音库中的五种情感(高兴、悲伤、中性、愤怒、害怕)作为实验数据来源,支持向量机作为识别网络。实验结果表明:与韵律特征、MFCC特征和混沌特征相比,所提特征不仅可以有效地表征语音信号中的情感差异性,也能够弥补现有特征在刻画情感状态上的不足。  相似文献   

2.
语音信号的产生过程是非线性的,而且具有混沌性。相对于线性模型,在重构相空间中建立的语音信号模型更接近实际系统,神经网络是建立非线性系统模型的常用工具。实验结果表明:在重构相空间中建立的基于径向基函数神经网络的预测器较线性预测器在性能上有明显提高。  相似文献   

3.
基于相空间重构的语音增强   总被引:1,自引:0,他引:1  
将相空间降噪方法应用于语音增强之中。由于语音信号集中在有限空间,而随机噪声则分散在各个分量中,通过找到信号能量集中的信号空间,去除噪声能量集中的冗余空间,达到减少噪声的目的。针对语音信号的特点,本文对相空间降噪基本算法中邻点的选取方法进行了改进,在没有增加太多运算量的前提下,提高了对不同信噪比信号的适应性。另外,应用改进后的算法,本文分别对汉语单个音素和连续语音进行了相空间语音增强测试。实验结果显示,改进邻域选择后的相空间语音增强方法可以显著提高信噪比。  相似文献   

4.
为了解决语音情感识别中时空特征动态依赖问题,提出一种基于注意力机制的非线性时空特征融合模型。模型利用基于注意力机制的长短时记忆网络提取语音信号中的时间特征,利用时间卷积网络提取语音信号中的空间特征,利用注意力机制将时空特征进行非线性的融合,并将非线性融合后的高级特征输入给全连接层进行语音情感识别。实验在IEMOCAP数据集中进行评估,实验结果表明,该方法可以同时考虑时空特征的内在关联,相对于使用线性融合的方法,利用注意力机制进行非线性特征融合的网络可以有效地提高语音情感识别准确率。  相似文献   

5.
传统声纹识别方法过程繁琐且识别率低,现有的深度学习方法所使用的神经网络对语音信号没有针对性从而导致识别精度不够。针对上述问题,本文提出一种基于非线性堆叠双向LSTM的端到端声纹识别方法。首先,对原始语音文件提取出Fbank特征用于网络模型的输入。然后,针对语音信号连续且前后关联性强的特点,构建双向长短时记忆网络处理语音数据提取深度特征,为进一步增强网络的非线性表达能力,利用堆叠多层双向LSTM层和多层非线性层实现对语音信号更深层次抽象特征的提取。最后,使用SGD优化器优化训练方式。实验结果表明提出的方法能够充分利用语音序列信号特征,具有较强的时序全面性和非线性表达能力,所构造模型整体性强,比GRU和LSTM等模型具有更好的识别效果。  相似文献   

6.
基于数学形态学的非线性语音增强方法   总被引:1,自引:0,他引:1  
为了更好地滤除语音信号中的宽带噪声,使语音增强向着非线性方向发展。综合考虑语音和噪声特性,选用合理的结构元素,设计了一种新的形态学滤波算法,构造出用于语音信号滤波增强的非线性形态滤波器。仿真实验和数据分析结果论证了这种基于形态学的非线性语音增强方法的可行性,尤其是对于正负脉冲噪声的滤除效果相对较好。  相似文献   

7.
利用相空间重构方法提取音符音频中非线性特征参量,将部分参量作为训练集来构造支持向量机(SVM)分类器,另一部分作为测试集进行识别效果的检验,由于固定相空间重构参数后,将会导致部分音符信号的非线性信息丢失,从而降低识别准确率,因此将自适应信号分解和PCA的方法引入到信号预处理环节中,建立了相应的识别流程.  相似文献   

8.
针对语音情感识别中的特征提取的问题,提出了一种新的特征提取方式,利用深度神经网络(DNN)中的深度信念网络(DBNs)自动提取语音信号中情感特征.通过训练一个5层的深度信念网络提取语音情感特征,把连续多帧的语音并在一起,构成一个高维的特征,把深度信念网络训练完的特征作为非线性支持向量机(SVM)分类器的输入端,最终建立一个语音情感识别多分类器系统.其识别率为86.5%比传统的基于提取句子的时间构造、振幅构造、基频构造等特征的方法提高7%.  相似文献   

9.
提取天然气压缩机故障状态下振动信号的故障特征是设计机械自动化检测系统的核心技术。提出一种基于非线性关联维特征提取的机械自动化监测系统设计方法,在故障诊断原理基础上,进行故障振动信号时间序列分析,设计故障振动信号相空间重构方法,改进了相空间重构最佳时延和嵌入维数参数计算的关键技术,通过提取关联维故障特征,在Simulink平台上设计了自动化监测系统。系统实验结果表明,该算法和系统能使各类故障状态下提取的关联维特征的标准差显著性降低,特征分布聚类能力明显提高,系统能有效检测各类故障,实现了机械设备的自动化监测,在自动化故障诊断仪表设计等领域具有较好的工程实践价值。  相似文献   

10.
针对现有的语音可懂度评价方法不能真实贴近人耳对语音的感知过程,提出一种基于人耳听觉特性的双谱特征预测语音可懂度评价(Gammatone-bspectral speech intelligibility metric, GBSIM)算法。充分利用双谱可以检测语音信号中的非线性相位耦合,抑制非高斯信号中的高斯噪声的特性,采用可以模拟人工耳蜗模型的Gammatone滤波器组,通过滤波处理将输入的语音信号分为32个听觉子频带,用三阶统计量对每个子频带的语音信号进行双谱估计并提取单一特征值来计算语音的可懂度。实例验证结果表明,该方法对信号失真变化敏感,其评价结果与主观评价具有很高的相关度,相对于传统的语音可懂度评价算法具有更好的评价效果。  相似文献   

11.
In this paper, a novel solving method for speech signal chaotic time series prediction model was proposed. A phase space was reconstructed based on speech signal's chaotic characteristics and the genetic programming (GP) algorithm was introduced for solving the speech chaotic time series prediction models on the phase space with the embedding dimension m and time delay τ. And then, the speech signal's chaotic time series models were built. By standardized processing of these models and optimizing parameters, a speech signal's coding model of chaotic time series with certain generalization ability was obtained. At last, the experimental results showed that the proposed method can get the speech signal chaotic time series prediction models much more effectively, and had a better coding accuracy than linear predictive coding (LPC) algorithms and neural network model.  相似文献   

12.
在说话人空间中,存在语音特征随句子和时间差异而变化的问题。这个变化主要是由语音数据中的语音信息和说话人信息的变化引起的。如果把这两种信息彼此分离就能实现鲁棒的说话人识别。在假设大的说话人变量的空间为“语音空间”和小的说话人变量的空间为“说话人空间”的情况下,通过子空间方法分离语音信息和说话人信息,提出了说话人辨认和说话人确认方法。结果显示:通过相对于传统方法的比较试验,能用小量训练数据建立鲁棒说话人模型。  相似文献   

13.
Investigating new effective feature extraction methods applied to the speech signal is an important approach to improve the performance of automatic speech recognition (ASR) systems. Owing to the fact that the reconstructed phase space (RPS) is a proper field for true detection of signal dynamics, in this paper we propose a new method for feature extraction from the trajectory of the speech signal in the RPS. This method is based upon modeling the speech trajectory using the multivariate autoregressive (MVAR) method. Moreover, in the following, we benefit from linear discriminant analysis (LDA) for dimension reduction. The LDA technique is utilized to simultaneously decorrelate and reduce the dimension of the final feature set. Experimental results show that the MVAR of order 6 is appropriate for modeling the trajectory of speech signals in the RPS. In this study recognition experiments are conducted with an HMM-based continuous speech recognition system and a naive Bayes isolated phoneme classifier on the Persian FARSDAT and American English TIMIT corpora to compare the proposed features to some older RPS-based and traditional spectral-based MFCC features.  相似文献   

14.

A novel method for Chinese speech time series prediction model is proposed. In order to reconstruct the phase space of Chinese speech signal, the delay time and embedding dimension are calculated by C–C method and false nearest neighbor algorithm. The maximum lyapunov exponent and correlation dimension of Chinese speech phoneme are calculated by wolf algorithm and genetic programming algorithm. The numerical results show that there exists nonlinear characteristics in Chinese speech signal. Based on the analysis method of RBF neural network and the nonlinear characteristic parameters such as the delay time and embedding dimension, a nonlinear prediction model is designed. In order to further verify the prediction performance of the designed prediction model, waveform comparison and four evaluation indexes are used. It is shown that compared with the linear prediction model and back propagation neural network nonlinear prediction model, prediction error of the RBF neural network nonlinear prediction model is significantly reduced, and the model has higher prediction accuracy and prediction performance.

  相似文献   

15.
为提高神经网络对语音信号时域波形的直接处理能力,提出了一种基于RefineNet的端到端语音增强方法.本文构建了一个时频分析神经网络,模拟语音信号处理中的短时傅里叶变换,利用RefineNet网络学习含噪语音到纯净语音的特征映射.在模型训练阶段,用多目标联合优化的训练策略将语音增强的评价指标短时客观可懂度(Short-time objective intelligibility,STOI)与信源失真比(Source to distortion ratio,SDR)融入到训练的损失函数.在与具有代表性的传统方法和端到端的深度学习方法的对比实验中,本文提出的算法在客观评价指标上均取得了最好的增强效果,并且在未知噪声和低信噪比条件下表现出更好的抗噪性.  相似文献   

16.
为提高神经网络对语音信号时域波形的直接处理能力,提出了一种基于RefineNet的端到端语音增强方法.本文构建了一个时频分析神经网络,模拟语音信号处理中的短时傅里叶变换,利用RefineNet网络学习含噪语音到纯净语音的特征映射.在模型训练阶段,用多目标联合优化的训练策略将语音增强的评价指标短时客观可懂度(Short-time objective intelligibility,STOI)与信源失真比(Source to distortion ratio,SDR)融入到训练的损失函数.在与具有代表性的传统方法和端到端的深度学习方法的对比实验中,本文提出的算法在客观评价指标上均取得了最好的增强效果,并且在未知噪声和低信噪比条件下表现出更好的抗噪性.  相似文献   

17.
Most of the speech enhancement algorithms process the amplitudes of speech, but the phase of noisy speech is left unprocessed as it may cause undesired artifacts. Recently, short time Fourier transform based single channel speech enhancement algorithms are developed by considering uncertain prior knowledge of phase. The uncertain knowledge of the phase is obtained from the phase reconstruction algorithms. The goal of this paper is to develop joint minimum mean square error estimate of complex speech coefficients given uncertainty phase (CUP) information by considering Nagakami probability density function (PDF) and gamma PDF as speech spectral amplitude priors and generalized gamma PDF for noise prior. Estimators like amplitudes given uncertainty phase, which uses uncertain phase only for amplitude estimation and not for phase improvement are developed. Experimental results shows that incorporating uncertain phase information improves quality and intelligibility of speech. Also novel phase-blind estimators are developed using Nagakami PDF/gamma as speech priors and generalized gamma as noise prior. Finally comparison of all estimators using uncertain prior phase information is discussed and how initial phase information affects the enhancement process is analyzed with novel estimators. For comparison of all the derived estimators, the speech signals uttered by male and female speakers are taken from TIMIT database. The proposed CUP estimators outperforms the existing algorithms in terms of objective performance measure segmental signal to noise ratio, phase signal to noise ratio, perceptual evaluation of speech quality, short time objective intelligibility.  相似文献   

18.
Task decomposition with pattern distributor (PD) is a new task decomposition method for multilayered feedforward neural networks (NNs). Pattern distributor network is proposed that implements this new task decomposition method. We propose a theoretical model to analyze the performance of pattern distributor network. A method named reduced pattern training (RPT) is also introduced, aiming to improve the performance of pattern distribution. Our analysis and the experimental results show that RPT improves the performance of pattern distributor network significantly. The distributor module's classification accuracy dominates the whole network's performance. Two combination methods, namely, crosstalk-based combination and genetic-algorithm (GA)-based combination, are presented to find suitable grouping for the distributor module. Experimental results show that this new method can reduce training time and improve network generalization accuracy when compared to a conventional method such as constructive backpropagation or a task decomposition method such as output parallelism (OP).  相似文献   

19.
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through complex spectrum subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters; and (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15–20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号