期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

白洁蒋冬梅《计算机仿真》2009,26(2)

提出了一种新的连续语音情感识别特征:语音元音段声门激励的时域参数归一化振幅商(the normalized amplitude quotient,NAQ).该方法首先运用迭代自适应逆滤波器(Iterative Adaptive Inverse Filtering,IAIF)估计声门波,然后采用NAQ值来描述声门开启和闭合的特性.采用eNERFACE'05听视觉情感语音数据库中六种不同情感的语音为实验数据,以情感语音元音段的归一化振幅商值为特征,使用直方图和盒形图分析其特征的分布和对情感的区分能力;以情感语句元音段的NAQ值的均值、方差、最大值、最小值作为特征,用高斯混合模型(Gaussian Minute Models,GMM)和k-近邻法进行了语音情感识别实验.结果表明NAQ特征对语音情感具有较强的区别能力. 相似文献

2.

融合语音情感词局部特征的语音情感识别方法

宋明虎余正涛高盛祥李铚沈韬《计算机工程与科学》2017,39(1):194-198

为有效利用语音情感词局部特征,提出了一种融合情感词局部特征与语音语句全局特征的语音情感识别方法。该方法依赖于语音情感词典的声学特征库,提取出语音语句中是否包含情感词及情感词密度等局部特征,并与全局声学特征进行融合,再通过机器学习算法建模和识别语音情感。对比实验结果表明,融合语音情感词局部特征与全局特征的语音情感识别方法能取得更好的效果,局部特征的引入能有效提高语音情感识别准确率。相似文献

3.

基于情感上下文的语音情感推理算法

毛启容白李娟王丽詹永照《模式识别与人工智能》2014,(9)

针对前后相邻情感语句的情感变化存在相互关联的特性,提出基于情感上下文的情感推理算法.该算法首先利用传统语音情感特征和上下文语音情感特征分别识别待分析情感语句的情感状态,然后借助情感交互矩阵及两类情感特征识别结果的置信度对待测试语句的情感状态进行融合推理.在此基础上,建立语音情感上下文推理规则,利用该规则根据相邻语句的情感状态对待分析情感语句情感状态进行调整,最终得出待分析情感语句所属的情感类别.在自行录制的包含6种基本情感数据库上的实验结果表明,与仅采用声学特征的方法相比,文中提出方法平均识别率提高12.17%. 相似文献

4.

基于深度半监督的柬语句子级情感分类

李超严馨《计算机应用研究》2021,38(11):3283-3288

针对柬语标注数据较少、语料稀缺,柬语句子级情感分析任务进步缓慢的问题,提出了一种基于深度半监督CNN(convolutional neural networks)的柬语句子级情感极性分类方法.该方法通过融合词典嵌入的分开卷积CNN模型,利用少量已有的柬语情感词典资源提升句子级情感分类任务性能.首先构建柬语句子词嵌入和词典嵌入,通过使用不同的卷积核对两部分嵌入分别进行卷积,将已有情感词典信息融入到CNN模型中去,经过最大延时池化得到最大输出特征,把两部分最大输出特征拼接后作为全连接层输入;然后通过结合半监督学习方法——时序组合模型,训练提出的深度神经网络模型,利用标注与未标注语料训练,降低对标注语料的需求,进一步提升模型情感分类的准确性.结果证明,通过半监督方法时序组合模型训练,在人工标记数据相同的情况下,该方法相较于监督方法在柬语句子级情感分类任务上准确率提升了3.89％. 相似文献

5.

.基于NAQ的语音情感识别研究* 总被引：1，自引：0，他引：1

白洁蒋冬梅谢磊付中华任翠红《计算机应用研究》2008,25(11):3243-3245

研究了用迭代自适应逆滤波器估计声门激励的方法,以声门激励的时域参数归一化振幅商作为特征,对六种不同情感的连续语音,首先使用Fratio准则判别其对情感的区分能力,然后运用混合高斯模型对语音情感进行建模和识别。采用eNTERFACE’05情感语音数据库中的语音,比较了以整句NAQ值作为特征和以元音段的NAQ值作为特征,以及主观感知的情感识别结果。实验表明元音段的NAQ值是一种具有判别力的语音情感特征。相似文献

6.

基于蚁群算法特征选择的语音情感识别

杨鸿章《计算机仿真》2013,30(4)

情感特征提取是语音情感准确识别的关键,传统方法采用单一特征或者简单组合特征提取方法,单一特征无法全面反映语音情感变化,简单组合特征会使特征间产生大量冗余特征,影响识别正确结果.为了提高语音情感识别率,提了一种蚁群算法的语音情感智能识别方法.首先采用语音识别正确率和特征子集维数加权作为目标函数,然后利用蚁群算法找到最优语音特征子集,消除特征冗余信息.通过汉话和丹麦语两种情感语音库进行仿真测试,仿真结果表明,改进方法不仅消除了冗余、无用特征,降低了特征维数,而且提高了语音情感识别率,是一种有效的语音情感智能识别方法. 相似文献

7.

两级特征联合学习的情感说话人识别

下载免费PDF全文

刘金琳李冬冬王喆蔡立志《计算机工程与应用》2023,59(1):149-155

针对说话人识别的性能易受到情感因素影响的问题,提出利用片段级别特征和帧级别特征联合学习的方法。利用长短时记忆网络进行说话人识别任务,提取时序输出作为片段级别的情感说话人特征,保留了语音帧特征原本信息的同时加强了情感信息的表达,再利用全连接网络进一步学习片段级别特征中每一个特征帧的说话人信息来增强帧级别特征的说话人信息表示能力,最后拼接片段级别特征和帧级别特征得到最终的说话人特征以增强特征的表征能力。在普通话情感语音语料库（MASC）上进行实验,验证所提出方法有效性的同时,探究了片段级别特征中包含语音帧数量和不同情感状态对情感说话人识别的影响。相似文献

8.

语音信号中的情感识别研究 总被引：25，自引：0，他引：25

赵力钱向民邹采荣吴镇扬《软件学报》2001,12(7):1050-1055

提出了从语音信号中识别情感特征的方法.从5名说话者中搜集了带有欢快、愤怒、惊奇和悲伤的情感语句共300句.从这些语音资料中提取了10个情感特征.提出了3种基于主元素分析的语音信号中的情感识别方法.使用这些方法获得了基本上接近于人的正常表现的识别效果. 相似文献

9.

基于动态卷积递归神经网络的语音情感识别

耿磊傅洪亮陶华伟卢远郭歆莹赵力《计算机工程》2023,(4):125-130+137

动态情感特征是说话人独立语音情感识别中的重要特征。由于缺乏对语音中时频信息的充分挖掘，现有动态情感特征表征能力有限。为更好地提取语音中的动态情感特征，提出一种动态卷积递归神经网络语音情感识别模型。基于动态卷积理论构建一种动态卷积神经网络提取语谱图中的全局动态情感信息，使用注意力机制分别从时间和频率维度对特征图关键情感区域进行强化表示，同时利用双向长短期记忆网络对谱图进行逐帧学习，提取动态帧级特征及情感的时序依赖关系。在此基础上，利用最大密度散度损失对齐新个体特征与训练集特征分布，降低个体差异性对特征分布产生的影响，提升模型表征能力。实验结果表明，该模型在CASIA中文情感语料库、Emo-db德文情感语料库及IEMOCAP英文情感语料库上分别取得59.50%、88.01%及66.90%的加权平均精度，相较HuWSF、CB-SER、RNN-Att等其他主流模型识别精度分别提升1.25～16.00、0.71～2.26及2.16～8.10个百分点，验证了所提模型的有效性。相似文献

10.

基于声学上下文的语音情感特征提取与分析

白李娟赵小蕾毛启容吴宝凤《小型微型计算机系统》2013,34(6)

针对语句之间的情感存在相互关联的特性,本文从声学角度提出了上下文动态情感特征、上下文差分情感特征、上下文边缘动态情感特征和上下文边缘差分情感特征共四类268维语音情感上下文特征以及这四类情感特征的提取方法,该方法是从当前情感语句与其前面若干句的合并句中提取声学特征,建立上下文特征模型,以此辅助传统特征所建模型来提高识别率.最后,将该方法应用于语音情感识别,实验结果表明,加入新的上下文语音情感特征后,六类典型情感的平均识别率为82.78％,比原有特征模型的平均识别率提高了约8.89％. 相似文献

11.

Robust Feature Extraction for Speaker Recognition Based on Constrained Nonnegative Tensor Factorization

下载免费PDF全文

Qiang Wu Li-Qing Zhang Guang-Chuan Shi 《计算机科学技术学报》2010,25(4):783-792

How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor. cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment. 相似文献

12.

叠加特征信息辅助的语音传输与重构

下载免费PDF全文

万东琴卿朝进阳庆瑶蔡斌余旺《计算机工程与应用》2019,55(15):117-122

为改善压缩语音传输系统的重构精度且不增加系统的频谱开销,提出一种叠加特征信息辅助的语音压缩传输与重构方法。提出方法首先提取稀疏语音信号的特征信息;抽取的特征信息以叠加序列方式叠加在压缩语音信号上进行传输;接收机重构时,借助特征信息辅助重构算法进行语音重构。分析与仿真结果表明,相比于传统的压缩感知语音重构方法,在较高信噪比或较低压缩率情况下,提出方法可改善语音重构精度,且不增加传输系统的频谱开销。相似文献

13.

基于短时能量—LPCC的语音特征提取方法研究

王钟斐王彪《计算机与数字工程》2012,40(11):79-80,127

为了提高语音信号的识别率,提出了一种基于短时能量—LPCC的语音特征提取方法。该方法在LPCC参数的基础上,增加每帧信号的短时能量信息,使得新参数能够更为准确的表征语音信号。通过仿真实验,说明了新特征参数取得了较高的识别率。相似文献

14.

A security watermark scheme used for digital speech forensics

Zhenghui Liu Jiwu Huang Xingming Sun Chuanda Qi 《Multimedia Tools and Applications》2017,76(7):9297-9317

Based on digital watermark, a speech forensics scheme is proposed. The feature coefficients cross-correlation degree of speech signal is defined, and the property is discussed, which demonstrates that the feature is very robust. Then a new watermark embedding method based on the feature is explored, aiming to enlarge the embedding capacity and solve the security issue of watermark schemes based on public features. In this paper, for each fame of speech signal, it is cut into two parts, and each part is divided into some segments. Then frame number is mapped to a sequence of integers, which are embedded into the segments. The integers can be extracted used for forensics and tamper location after watermarked signal being attacked. Theoretical analysis and experimental results show that the scheme proposed is inaudible and robust against desynchronization attacks, enhances the security of watermark system and has a good ability for speech forensics. 相似文献

15.

Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes

Yasser Shekofteh Farshad Almasganj 《Digital Signal Processing》2013,23(6):1923-1932

Investigating new effective feature extraction methods applied to the speech signal is an important approach to improve the performance of automatic speech recognition (ASR) systems. Owing to the fact that the reconstructed phase space (RPS) is a proper field for true detection of signal dynamics, in this paper we propose a new method for feature extraction from the trajectory of the speech signal in the RPS. This method is based upon modeling the speech trajectory using the multivariate autoregressive (MVAR) method. Moreover, in the following, we benefit from linear discriminant analysis (LDA) for dimension reduction. The LDA technique is utilized to simultaneously decorrelate and reduce the dimension of the final feature set. Experimental results show that the MVAR of order 6 is appropriate for modeling the trajectory of speech signals in the RPS. In this study recognition experiments are conducted with an HMM-based continuous speech recognition system and a naive Bayes isolated phoneme classifier on the Persian FARSDAT and American English TIMIT corpora to compare the proposed features to some older RPS-based and traditional spectral-based MFCC features. 相似文献

16.

A Novel Approach to Improve the Speech Intelligibility Using Fractional Delta-amplitude Modulation Spectrogram

Arul Valiyavalappil Haridas Ramalatha Marimuthu Basabi Chakraborty 《控制论与系统》2013,44(7-8):421-451

Abstract

Speech enhancement is an interesting research area that aims at improving the quality and intelligibility of the speech that is affected by the additive noises, such as airport noise, train noise, restaurant noise, and so on. The presence of these background noises degrades the comfort of listening of the end user. This article proposes a speech enhancement method that uses a novel feature extraction which removes the noise spectrum from the noisy speech signal using a novel fractional delta-AMS (amplitude modulation spectrogram) feature extraction and the D-matrix feature extraction method. The fractional delta-AMS feature extraction strategy is the modification of the delta-AMS with the fractional calculus that increases the sharpness of the feature extraction. The extracted features from the frames are used to determine the optimal mask of all the frames of the noisy speech signal and the mask is employed for training the deep belief neural networks (DBN). The two metrics root mean square error (RMSE) and perceptual evaluation of speech quality (PESQ) are used to evaluate the method. The proposed method yields a better value of PESQ at all level of noise and RMSE decreases with increased noise level. 相似文献

17.

基于深层声学特征的端到端语音分离

李娟娟王丹李子晋《计算机系统应用》2019,28(10):1-7

提出基于深层声学特征的端到端单声道语音分离算法,传统声学特征提取方法需要经过傅里叶变换、离散余弦变换等操作,会造成语音能量损失以及长时间延迟.为了改善这些问题,提出了以语音信号的原始波形作为深度神经网络的输入,通过网络模型来学习语音信号的更深层次的声学特征,实现端到端的语音分离.客观评价实验说明,本文提出的分离算法不仅有效地提升了语音分离的性能,也减少了语音分离算法的时间延迟. 相似文献

18.

Emotional Feature Extraction Method Based on the Concentration of Phoneme Influence for Human–Robot Interaction

《Advanced Robotics》2013,27(1-2):47-67

Depending on the emotion of speech, the meaning of the speech or the intention of the speaker differs. Therefore, speech emotion recognition, as well as automatic speech recognition is necessary to communicate precisely between humans and robots for human–robot interaction. In this paper, a novel feature extraction method is proposed for speech emotion recognition using separation of phoneme class. In feature extraction, the signal variation caused by different sentences usually overrides the emotion variation and it lowers the performance of emotion recognition. However, as the proposed method extracts features from speech in parts that correspond to limited ranges of the center of gravity of the spectrum (CoG) and formant frequencies, the effects of phoneme variation on features are reduced. Corresponding to the range of CoG, the obstruent sounds are discriminated from sonorant sounds. Moreover, the sonorant sounds are categorized into four classes by the resonance characteristics revealed by formant frequency. The result shows that the proposed method using 30 different speakers' corpora improves emotion recognition accuracy compared with other methods by 99% significance level. Furthermore, the proposed method was applied to extract several features including prosodic and phonetic features, and was implemented on 'Mung' robots as an emotion recognizer of users. 相似文献

19.

Low-complexity feature-mapped speech bandwidth extension

Gustafsson H. Lindgren U.A. Claesson I. 《IEEE transactions on audio, speech, and language processing》2006,14(2):577-588

Today's telecommunications systems use a limited audio signal bandwidth. A typical bandwidth is 0.3-3.4 kHz, but recently it has been suggested that mobile phone networks will facilitate an audio signal bandwidth of 50 Hz-7 kHz. This is suggested since an increased bandwidth will increase the sound quality of the speech signals. Since only few telephones initially will have this facility, a method extending the conventional narrow frequency-band speech signal into a wide-band speech signal utilizing the receiving telephone only is suggested. This will give the impression of a wide-band speech signal. The proposed speech bandwidth extension method is based on models of speech acoustics and fundamentals of human hearing. The extension maps each speech feature separately. Care has been taken to deal with implementation aspects, such as noisy speech signals, speech signal delays, computational complexity, and processing memory usage. 相似文献

20.

基于时频分布与MFCC的说话人识别

金银燕于凤芹何艳《计算机系统应用》2012,21(4):189-192,178

针对MFCC不能得到高效的说话人识别性能的问题,提出了将时频特征与MFCC相结合的说话人特征提取方法。首先得到语音信号的时频分布,然后将时频域转换到频域再提取MFCC+MFCC作为特征参数,最后通过支持向量机来进行说话人识别研究。仿真实验比较了MFCC、MFCC+MFCC分别作为特征参数时语音信号与各种时频分布的识别性能,结果表明基于CWD分布的MFCC和MFCC的识别率可提高到95.7%。相似文献