期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于PCNN的语谱图特征提取在说话人识别中的应用 总被引：7，自引：1，他引：7

马义德袁敏齐春亮刘悦刘映杰《计算机工程与应用》2005,41(20):81-84

该文首次提出了一种将有生物视觉依据的人工神经网络——脉冲耦合神经网络(PulseCoupledNeuralNetwork,以下简称为PCNN)用于说话人识别领域的语谱图特征提取的新方法。该方法将语谱图输入到PCNN后得到输出图像的时间序列及其熵序列作为说话人语音的特征,利用它的不变性实现说话人识别。实验结果表明,该方法可以快速有效地进行说话人识别。该文将PCNN引入到语音识别的应用研究中,开拓了信号处理中两个极为重要的部分———语音信号处理和图像信号处理结合的新领域,同时对于PCNN的理论研究和实际应用具有非常重要的现实意义。相似文献

2.

基于汉语元音分类的多维特征说话人辨认研究

李秀怡徐利敏《信息与电脑》2011,(12)

由于说话人的语音特征和个性特征到目前为止无法很好地分离,本文提出了基于语音分类的说话人多维特征的提取方法,将语音识别技术应用到说话人特征提取上,提取出的N维组合特征较其它普通特征有更高的有效性。该方法从汉语语音的特点出发,对基于汉语的说话人识别进行研究。实验结果表明它的有效性较长时平均特征的有效性提高了2.915%。相似文献

3.

语音识别中说话人自适应方法研究综述

朱方圆马志强陈艳张晓旭王洪彬宝财吉拉呼《计算机科学与探索》2021,15(12):2241-2255

语音是人机交互方式之一,语音识别技术是人工智能的重要组成部分.近年来神经网络技术在语音识别领域的应用快速发展,已经成为语音识别领域中主流的声学建模技术.然而测试条件中目标说话人语音与训练数据存在差异,导致模型不适配的问题.因此说话人自适应(SA)方法是为了解决说话人差异导致的不匹配问题,研究说话人自适应方法成为语音识别领域的一个热门方向.相比传统语音识别模型中的说话人自适应方法,使用神经网络的语音识别系统中的自适应存在着模型参数庞大,而自适应数据量相对较少等特点,这使得基于神经网络的语音识别系统中的说话人自适应方法成为一个研究难题.首先回顾说话人自适应方法的发展历程和基于神经网络的说话人自适应方法研究遇到的各种问题,其次将说话人自适应方法分为基于特征域和基于模型域的说话人自适应方法并介绍对应原理和改进方法,最后指出说话人自适应方法在语音识别中仍然存在的问题及未来的发展方向. 相似文献

4.

应用似然比框架的法庭说话人识别

王华朋杨军《数据采集与处理》2013,28(2):239

为了检验元音倒谱特征在法庭说话人识别中的性能,提出了使用元音稳定段美尔倒谱系数(Mel-frequeney eepstral coefficients,MFCC)作为识别特征的基于似然比的法庭说话人识别方法,并使用45人电话对话录音中元音/a/作为样本进行了测试.实验结果表明,该方法不仅能正确识别说话人,而且能根据当前嫌疑人样本和问题语音样本的差异,量化该语音样本作为证据的力度,为法庭提供科学合理的证据评估结果.与人工提取共振峰特征相比,自动特征提取的引入提高了工作效率,使识别系统的性能获得了大幅提升. 相似文献

5.

基于DTW的语音识别和说话人识别的特征选择 总被引：1，自引：0，他引：1

刘敬伟徐美芝郑忠国程乾生《模式识别与人工智能》2005,18(1)

研究了基于动态时间规正(DTW)和图论方法的语音识别和说话人识别的特征子集选择问题,提出了基于DTW距离的有向图方法(DTWDAG).此方法推广了基于欧氏距离的相似矩阵聚类方法,将图论聚类方法改进为语音和说话人特征选择的代价函数.并将此代价函数与(l-r)优化算法结合应用于孤立数字的特定人的语音识别和文本有关的说话人辩认的特征选择,实验结果表明,DTWDAG方法能够较好反映语音识别和说话人识别的特征子集的重要性. 相似文献

6.

基于单字音特征提取的说话人识别方法

下载免费PDF全文

张燕唐振民李燕萍《计算机工程》2009,35(10):188-189

证实普通话可以分解为辅音音素和单元音音素通过过度音的连接,提出一种单字音特征提取方法。该方法在传统的帧特征提取基础上,对相关帧进行二次处理,得到单字语音中的多个代表帧,将代表帧进行拼接作为单字的特征矢量。这种特征提取方法能更好地表现说话人单字发音中相邻语音帧之间的连续性。仿真实验表明该方法在说话人识别系统的应用中达到较高的识别率,使识别时间进一步缩短。相似文献

7.

基于尺度相关-双向长短期记忆网络模型的说话人识别

曹书鑫冯藤藤葛凤培梁春燕《计算机工程》2023,(4):289-296

说话人识别通过语音对说话人进行身份认证,然而大部分语音在时域与频域具有分布多样性,目前说话人识别中的卷积神经网络深度学习模型普遍使用单一的卷积核进行特征提取,无法提取尺度相关特征及时频域特征。针对这一问题,提出一种尺度相关卷积神经网络-双向长短期记忆(SCCNN-BiLSTM)网络模型用于说话人识别。通过尺度相关卷积神经网络在每一层特征抽象过程中调整感受野大小,捕获由尺度相关块组成的尺度特征信息,同时引入双向长短期记忆网络保留与学习语音数据的多尺度特征信息,并在最大程度上提取时频域特征的上下文信息。实验结果表明,SCCNN-BiLSTM网络模型在LibriSpeech和AISHELL-1数据集上迭代50 000次时的等错率为7.21%和6.55%,相比于ResCNN基线网络模型提升了25.3%和41.0%。相似文献

8.

基于语音结构化模型的数字语音识别

姜莹俞一彪《计算机工程与设计》2012,33(4):1482-1485,1490

提出一种新的基于语音结构化模型的语音识别方法,并应用于非特定人数字语音识别.每一个数字语音计算倒谱特征之后提取语音中存在的对说话人差异具有不变性的结构化特征——全局声学结构(acoustical universal structure,AUS),并建立结构化模型,识别时提取测试语音的全局声学结构,然后与各数字语音的结构化模型进行匹配.测试了少量语料训练下的识别性能并与传统HMM (hidden Markov model)方法进行比较,结果表明该方法可以取得优于HMM的性能,语音结构化模型可以有效消除说话人之间的差异. 相似文献

9.

非线性幂变换Gammachirp滤波器的鲁棒语音特征提取*

李聪葛洪伟《计算机科学与探索》2019,13(8):1351-1359

针对归一化功率倒谱系数(PNCC)在较低信噪比噪声环境下说话人识别鲁棒性不佳的问题,提出了非线性幂函数变换伽马啁啾频率倒谱系数(NPGFCC)的抗噪语音特征提取算法。相比PNCC,NPGFCC的不同之处在于其采用符合人耳听觉特性的归一化压缩Gammachirp滤波器组代替Gammatone滤波器组进行滤波,并在特征参数中融合了分段式非线性幂函数变换的方式。另外,算法中利用了均值方差归一化和时间序列滤波等技术的方法,进一步提高了其在噪声环境下的鲁棒性,并在改进的i-vector+PLDA模型下进行了测试。实验结果表明,相较于目前常用的一些说话人语音特征提取算法,在不同噪声和不同信噪比下,NPGFCC特征具有最佳抗噪性能,特别是在信噪比较低的情况下,与其他语音特征相比,NPGFCC特征具有更大的优势。相似文献

10.

基于听皮层神经元感受野的强噪声环境下说话人识别

牛晓可黄伊鑫徐华兴蒋震阳《计算机应用》2020,40(10):3034-3040

针对说话人识别易受环境噪声影响的问题,借鉴生物听皮层神经元频谱-时间感受野（STRF）的时空滤波机制,提出一种新的声纹特征提取方法。在该方法中,对基于STRF获得的听觉尺度-速率图进行了二次特征提取,并与传统梅尔倒谱系数（MFCC）进行组合,获得了对环境噪声具有强容忍的声纹特征。采用支持向量机（SVM）作为分类器,对不同信噪比（SNR）语音数据进行测试的结果表明,基于STRF的特征对噪声的鲁棒性普遍高于MFCC系数,但识别正确率较低;组合特征提升了语音识别的正确率,同时对环境噪声具有良好的鲁棒性。该结果说明所提方法在强噪声环境下说话人识别上是有效的。相似文献

11.

基于信号处理的声音模式识别过程及方法研究 总被引：2，自引：0，他引：2

张宇波《计算机仿真》2004,21(9):134-137

基于信号处理的声音特征提取和模式识别技术近些年发展较快，声信号具有精确性和稳定性等特点，借助计算机技术运用信号处理的方法对物体发出的声音进行分析，提取其振动特征量，可以为物体运转情况的预测提供精准的数据分析依据。声音模式识别在无人职守、故障检测、灾害预防等方面具有很广阔的应用前景。该文主要介绍了在声音模式识别的一般过程，其中包括声信号预处理、特征提取、模式分类过程的方法。基于信号处理的声音模式识别，主要是利用信号处理的方法手段，对信号进行分析，提取频域、时域、幅域特征，对这些特征进行统计分类，应用数学方法，设计合理的分类器．达到分类识别的目的。文章中主要对在信号处理的过程中所涉及到的方法进行了研究。相似文献

12.

Mismatched feature detection with finer granularity for emotional speaker recognition

Li Chen Ying-chun Yang Zhao-hui Wu 《浙江大学学报:C卷英文版》2014,15(10):903-916

The shapes of speakers＇ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition--phoneme classes, Gaussian mixture model （GMM） tokenizer, and probabilistic GMM tokenizer--are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus （MASC） show that our feature pruning and feature regulation methods increase the identification rate （IR） by 3.64% and 6.77%, compared with the baseline GMM-UBM （universal background model） algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector. 相似文献

13.

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Marwa A. Nasr Mohammed Abd-Elnaby Adel S. El-Fishawy S. El-Rabaie Fathi E. Abd El-Samie 《International Journal of Speech Technology》2018,21(4):941-951

This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step. 相似文献

14.

Physiological-physical feature fusion for automatic voice spoofing detection

Junxiao XUE Hao ZHOU 《Frontiers of Computer Science》2023,17(2):172318

Biometric speech recognition systems are often subject to various spoofing attacks, the most common of which are speech synthesis and speech conversion attacks. These spoofing attacks can cause the biometric speech recognition system to incorrectly accept these spoofing attacks, which can compromise the security of this system. Researchers have made many efforts to address this problem, and the existing studies have used the physical features of speech to identify spoofing attacks. However, recent studies have shown that speech contains a large number of physiological features related to the human face. For example, we can determine the speaker’s gender, age, mouth shape, and other information by voice. Inspired by the above researches, we propose a spoofing attack recognition method based on physiological-physical features fusion. This method involves feature extraction, a densely connected convolutional neural network with squeeze and excitation block (SE-DenseNet), and feature fusion strategies. We first extract physiological features in audio from a pre-trained convolutional network. Then we use SE-DenseNet to extract physical features. Such a dense connection pattern has high parameter efficiency, and squeeze and excitation blocks can enhance the transmission of the feature. Finally, we integrate the two features into the classification network to identify the spoofing attacks. Experimental results on the ASVspoof 2019 data set show that our model is effective for voice spoofing detection. In the logical access scenario, our model improves the tandem decision cost function and equal error rate scores by 5% and 7%, respectively, compared to existing methods. 相似文献

15.

组合特征和二级判断模型相结合的说话人识别 总被引：1，自引：0，他引：1

下载免费PDF全文

李战明林娟陈若珠《计算机工程与应用》2011,47(10):180-182

针对目前说话人识别中个性化的特征提取以及假冒说话人的问题,提出一种组合特征提取和二级判断模型相结合的说话人识别方法。在特征提取阶段,采用MFCC倒谱特征、Delta_ Delta特征与平均幅度差法提取的基音周期相结合进行组合特征提取;在识别阶段,采用得分规整后的得分值与一个统一的阈值比较,将一部分假冒说话人排除后,再结合二级判断模型进行识别。实验结果证明该方法有效提高了识别率。相似文献

16.

Front end analysis of speech recognition: a review

M. A. Anusuya S. K. Katti 《International Journal of Speech Technology》2011,14(2):99-145

Automatic speech recognition (ASR) has made great strides with the development of digital signal processing hardware and software. But despite of all these advances, machines can not match the performance of their human counterparts in terms of accuracy and speed, especially in case of speaker independent speech recognition. So, today significant portion of speech recognition research is focused on speaker independent speech recognition problem. Before recognition, speech processing has to be carried out to get a feature vectors of the signal. So, front end analysis plays a important role. The reasons are its wide range of applications, and limitations of available techniques of speech recognition. So, in this report we briefly discuss the different aspects of front end analysis of speech recognition including sound characteristics, feature extraction techniques, spectral representations of the speech signal etc. We have also discussed the various advantages and disadvantages of each feature extraction technique, along with the suitability of each method to particular application. 相似文献

17.

基于duffing随机共振的说话人特征提取方法 总被引：2，自引：0，他引：2

潘平何朝霞《计算机工程与应用》2012,48(35):123-125,142

说话人特征参数的提取直接影响识别模型的建立,MFCC与LPC参数提取方法,分别以局域低频信息和全局AR信号为主要特征。提出一种基于duffing随机共振的说话人频谱特征提取方法。仿真结果表明,该方法能识别说话人之间频谱的微小差别,有效地提取说话人频谱的基本特征,从而为说话人识别模型提供更为精细的识别模型。相似文献

18.

Artificially intelligent recognition of Arabic speaker using voice print-based local features

Awais Mahmood Mansour Alsulaiman Ghulam Muhammad Sheeraz Akram 《人工智能实验与理论杂志》2016,28(6):1009-1020

Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time–frequency plain by taking the moving average on the diagonal directions of the time–frequency plane. This feature captured the time–frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset. 相似文献

19.

Speaker Verification Using Support Vector Machines and High-Level Features

Campbell W.M. Campbell J.P. Gleason T.P. Reynolds D.A. Wade Shen 《IEEE transactions on audio, speech, and language processing》2007,15(7):2085-2094

High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker in current corpora, high-level systems now have the amount of data needed to sufficiently characterize a speaker. Although a significant amount of work has been done in finding novel high-level features, less work has been done on modeling these features. We describe a method of speaker modeling based upon support vector machines. Current high-level feature extraction produces sequences or lattices of tokens for a given conversation side. These sequences can be converted to counts and then frequencies of n-gram for a given conversation side. We use support vector machine modeling of these n-gram frequencies for speaker verification. We derive a new kernel based upon linearizing a log likelihood ratio scoring system. Generalizations of this method are shown to produce excellent results on a variety of high-level features. We demonstrate that our methods produce results significantly better than standard log-likelihood ratio modeling. We also demonstrate that our system can perform well in conjunction with standard cesptral speaker recognition systems. 相似文献

20.

采用PCNN的有噪特定人语音识别系统

下载免费PDF全文

韦丽兴张淼钟映春韩光《计算机工程与应用》2012,48(3):133-136

在特定人语音识别系统中,噪声严重影响语音特征提取,并导致语音识别率明显下降。针对在噪声环境下语音识别率偏低的问题,通过谱减法去除语音信号噪声,并根据语音信号语谱图可视化的特点,运用脉冲耦合神经网络从语音信号的语谱图中提取熵序列作为特征参数进行语音识别。实验结果表明,该方法能较好地去除语音信号中的噪声,并能使在噪声环境下的特定人语音识别系统具有较好的识别效果。相似文献