首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
邬龙  黎塔  王丽  颜永红 《软件学报》2019,30(S2):25-34
为了进一步利用近场语音数据来提高远场语音识别的性能,提出一种基于知识蒸馏和生成对抗网络相结合的远场语音识别算法.该方法引入多任务学习框架,在进行声学建模的同时对远场语音特征进行增强.为了提高声学建模能力,使用近场语音的声学模型(老师模型)来指导远场语音的声学模型(学生模型)进行训练.通过最小化相对熵使得学生模型的后验概率分布逼近老师模型.为了提升特征增强的效果,加入鉴别网络来进行对抗训练,从而使得最终增强后的特征分布更逼近近场特征.AMI数据集上的实验结果表明,该算法的平均词错误率(WER)与基线相比在单通道的情况下,在没有说话人交叠和有说话人交叠时分别相对下降5.6%和4.7%.在多通道的情况下,在没有说话人交叠和有说话人交叠时分别相对下降6.2%和4.1%.TIMIT数据集上的实验结果表明,该算法获得了相对7.2%的平均词错误率下降.为了更好地展示生成对抗网络对语音增强的作用,对增强后的特征进行了可视化分析,进一步验证了该方法的有效性.  相似文献   

2.
This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Computational Hearing in Multisource Environments (CHiME) Challenge track 2 task, which consists of the Wall Street Journal (WSJ-0) corpus distorted by highly non-stationary, convolutive noise. In extensive test runs, different feature front-ends, network training targets, and network topologies are evaluated in terms of frame-wise regression error and speech recognition performance. Furthermore, we consider gradually refined speech recognition back-ends from baseline ‘out-of-the-box’ clean models to discriminatively trained multi-condition models adapted to the enhanced features. In the result, deep bidirectional LSTM networks processing log Mel filterbank outputs deliver best results with clean models, reaching down to 42% word error rate (WER) at signal-to-noise ratios ranging from −6 to 9 dB (multi-condition CHiME Challenge baseline: 55% WER). Discriminative training of the back-end using LSTM enhanced features is shown to further decrease WER to 22%. To our knowledge, this is the best result reported for the 2nd CHiME Challenge WSJ-0 task yet.  相似文献   

3.
This paper presents a novel emotion recognition model using the system identification approach. A comprehensive data driven model using an extended Kohonen self-organizing map (KSOM) has been developed whose input is a 26 dimensional facial geometric feature vector comprising eye, lip and eyebrow feature points. The analytical face model using this 26 dimensional geometric feature vector has been effectively used to describe the facial changes due to different expressions. This paper thus includes an automated generation scheme of this geometric facial feature vector. The proposed non-heuristic model has been developed using training data from MMI facial expression database. The emotion recognition accuracy of the proposed scheme has been compared with radial basis function network, multi-layered perceptron model and support vector machine based recognition schemes. The experimental results show that the proposed model is very efficient in recognizing six basic emotions while ensuring significant increase in average classification accuracy over radial basis function and multi-layered perceptron. It also shows that the average recognition rate of the proposed method is comparatively better than multi-class support vector machine.  相似文献   

4.
5.
异方差线性判别分析(HLDA)因在语音识别中起到了巨大的特征去相关作用而被广泛利用。然而在训练数据不足或特征维数较高时,HLDA易出现不稳定性和小样本问题。根据特征的矩阵表示形式,提出了一种结构受限的HLDA。首先用二维线性判别分析(2DLDA)压缩矩阵形式的特征,然后作一维的HLDA。通过分析我们指出,二维的特征变换实际上是一种结构受限的一维特征变换。在RM库上的实验,受限HLDA对常规HLDA的词识别错误相对下降12.39%;在TIMIT库上的实验,受限HLDA对常规HLDA的音素识别错误相对下降4.43%。  相似文献   

6.
基于混合语言模型的语音识别系统虽然具有可以识别集外词的优点,但是集外词识别准确率远低于集内词。为了进一步提升混合语音识别系统的识别性能,本文提出了一种基于互补声学模型的多系统融合方法。首先,通过采用不同的声学建模单元,构建了两套基于隐马尔科夫模型和深层神经网络(Hidden Markov model and deep neural network, HMM-DNN)的混合语音识别系统;然后,针对这两种识别任务之间的关联性,采用多任务学习(Multi-task learning DNN, MTL-DNN)思想,实现DNN网络输入层和隐含层的共享,并通过联合训练提高建模精度。最后,采用ROVER(Recognizer output voting error reduction)方法对两套系统的输出结果进行融合。实验结果表明,相比于单任务学DNN(Single-task learning DNN, STL-DNN)建模方式,MTL-DNN可以获得更好的识别性能;将两个系统的输出进行融合,能够进一步降低词错误率。  相似文献   

7.
藏文命名实体识别是藏文分词和标注系统中必须要解决的问题。通过对命名实体构词规律及分词歧义进行分析,提出基于音节特征感知机训练模型的藏文命名实体识别方案。重点研究了利用藏文紧缩格识别音节的方法,命名实体内部和边界音节的模型训练特征模板,训练模型,以及命名实体分类识别方法。提出的藏文命名实体识别方法在测试集上获得86.03%的F值,相对基于分词的基线系统高出10.5%个点。  相似文献   

8.
Two hybrid fuzzy neural systems are developed and applied to handwritten word recognition. The word recognition system requires a module that assigns character class membership values to segments of images of handwritten words. The module must accurately represent ambiguities between character classes and assign low membership values to a wide variety of noncharacter segments resulting from erroneous segmentations. Each hybrid is a cascaded system. The first stage of both is a self-organizing feature map (SOFM). The second stages map distances into membership values. The third stage of one system is a multilayer perceptron (MLP). The third stage of the other is a bank of Choquet fuzzy integrals (FI). The two systems are compared individually and as a combination to the baseline system. The new systems each perform better than the baseline system. The MLP system slightly outperforms the FI system, but the combination of the two outperforms the individual systems with a small increase in computational cost over the MLP system. Recognition rates of over 92% are achieved with a lexicon set having average size of 100. Experiments were performed on a standard test set from the SUNY/USPS CD-ROM database  相似文献   

9.
This paper introduces a neural network optimization procedure allowing the generation of multilayer perceptron (MLP) network topologies with few connections, low complexity and high classification performance for phoneme’s recognition. An efficient constructive algorithm with incremental training using a new proposed Frame by Frame Neural Networks (FFNN) classification approach for automatic phoneme recognition is thus proposed. It is based on a novel recruiting hidden neuron’s procedure for a single hidden-layer. After an initializing phase started with initial small number of hidden neurons, this algorithm allows the Neural Networks (NNs) to adjust automatically its parameters during the training phase. The modular FFNN classification method is then constructed and tested to recognize 5 broad phonetic classes extracted from the TIMIT database. In order to take into account the speech variability related to the coarticulation effect, a Context Window of Three Successive Frame’s (CWTSF) analysis is applied. Although, an important reduction of the computational training time is observed, this technique penalized the overall Phone Recognition Rate (PRR) and increased the complexity of the recognition system. To alleviate these limitations, two feature dimensionality reduction techniques respectively based on Principal Component Analysis (PCA) and Self Organizing Maps (SOM) are investigated. It is observed an important improvement in the performance of the recognition system when the PCA technique is applied. Optimal neuronal phone recognition architecture is finally derived according to the following criteria: best PRR, minimum computational training time and complexity of the BPNN architecture.  相似文献   

10.
基于Laplacian正则化最小二乘的半监督SAR目标识别   总被引:3,自引:0,他引:3  
张向荣  阳春  焦李成 《软件学报》2010,21(4):586-596
提出了一种基于核主成分分析(kernel principal component analysis,简称KPCA)和拉普拉斯正则化最小二乘(Laplacian regularized least squares,简称LapRLS)的合成孔径雷达(synthetic aperture radar,简称SAR)目标识别方法.KPCA特征提取方法不仅能够提取目标主要特征,而且有效地降低了特征维数.Laplacian正则化最小二乘分类是一种半监督学习方法,将训练集样本作为有标识样本,测试集样本作为无标识样本,在学习过程中将测试集样本包含进来以获得更高的识别率.在MSTAR实测SAR地面目标数据上进行实验,结果表明,该方法具有较高的识别率,并对目标角度间隔具有鲁棒性.与模板匹配法、支撑矢量机以及正则化最小二乘监督学习方法相比,具有更高的SAR目标识别正确率.此外,还通过实验分析了不同情况下有标识样本数目对目标识别性能的影响.  相似文献   

11.
提出一种基于深度学习的盲文点字识别方法,利用深度模型--堆叠去噪自动编码器(Stack Denoising AutoEncoder,SDAE)解决盲文识别中特征的自动提取与降维等问题。在构建深度模型过程中,采用非监督贪婪逐层训练算法(Greedy Layer Wise Unsupervised Learning Algorithm)初始化网络权重,使用反向传播算法优化网络参数。利用SDAE自动学习盲文点字图片特征,使用Softmax分类器进行识别。实验结果表明,本文所提方法较之传统方法,可以有效解决样本特征的自动学习与特征降维等问题,操作更为简易,并能获得满意的识别结果。  相似文献   

12.
Control chart patterns (CCPs) are important statistical process control tools for determining whether a process is run in its intended mode or in the presence of unnatural patterns. Automatic recognition of abnormal patterns in control charts has seen increasing demands nowadays in the manufacturing processes. This paper presents a novel hybrid intelligent method for recognition of common types of CCP. The proposed method includes three main modules: the feature extraction module, the classifier module and optimization module. In the feature extraction module, a proper set of the shape features and statistical features is proposed as the efficient characteristic of the patterns. In the classifier module multilayer perceptron neural network and support vector machine (SVM) are investigated. In support vector machine training, the hyper-parameters have very important roles for its recognition accuracy. Therefore, in the optimization module, improved bees algorithm is proposed for selecting of appropriate parameters of the classifier. Simulation results show that the proposed algorithm has very high recognition accuracy.  相似文献   

13.
Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification.  相似文献   

14.
一种基于核函数的非线性感知器算法   总被引:16,自引:1,他引:16  
为了提高经典Rosenblatt感知器算法的分类能力,该文提出一种基于核函数的非线性感知器算法,简称核感知器算法,其特点是用简单的迭代过程和核函数来实现非线性分类器的一种设计,核感知器算法能够处理原始属性空间中线性不可分问题和高维特征空间中线性可分问题。同时,文中详细分析了其算法与径向基函数神经网络、势函数方法和支持向量机等非线性算法的关系。人工和实际数据的计算结果表明:与线性感知器算法相比,核感知器算法可以有效地提高分类精度。  相似文献   

15.
在连续语音识别系统中,针对复杂环境(包括说话人及环境噪声的多变性)造成训练数据与测试数据不匹配导致语音识别率低下的问题,提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度;采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题,并改进传统深度神经网络输出层的分类函数,以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试,实验结果表明,相较于目前流行的GMM-HMM和传统DNN语音声学模型,所提算法的识别词错误率分别下降了5.151%和3.113%,在一定程度上提升了模型的泛化性能和鲁棒性。  相似文献   

16.
针对传统鞍部识别方法中特征选择困难及未考虑鞍部与其它地形要素的共生关系等问题,利用深度卷积神经网络的特征自学习性能,提出了一种卷积神经网络与多层感知器相结合的混合模型实现DEM数据中的鞍部要素识别.首先设计改进的卷积神经网络模型自动提取鞍部的深度特征,经过Softmax分类器得到候选鞍部点,再运用多层感知器对候选鞍部点的位置进行精细回归,标识出最终的鞍部要素坐标.通过自建的鞍部样本集SADDLE-100训练网络模型,并在三种不同的山地样区进行实验,实验结果表明该方法比其它鞍部识别方法的漏提率减少约50%,正确识别率提高6.7%,在一定程度上避免了人工选择特征造成的鞍部语义信息缺失现象,为DEM中的点状要素识别提供了新的技术途径.  相似文献   

17.
Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition.  相似文献   

18.
基于DNN的低资源语音识别特征提取技术   总被引:1,自引:0,他引:1  
秦楚雄  张连海 《自动化学报》2017,43(7):1208-1219
针对低资源训练数据条件下深层神经网络(Deep neural network,DNN)特征声学建模性能急剧下降的问题,提出两种适合于低资源语音识别的深层神经网络特征提取方法.首先基于隐含层共享训练的网络结构,借助资源较为丰富的语料实现对深层瓶颈神经网络的辅助训练,针对BN层位于共享层的特点,引入Dropout,Maxout,Rectified linear units等技术改善多流训练样本分布不规律导致的过拟合问题,同时缩小网络参数规模、降低训练耗时;其次为了改善深层神经网络特征提取方法,提出一种基于凸非负矩阵分解(Convex-non-negative matrix factorization,CNMF)算法的低维高层特征提取技术,通过对网络的权值矩阵分解得到基矩阵作为特征层的权值矩阵,然后从该层提取一种新的低维特征.基于Vystadial 2013的1小时低资源捷克语训练语料的实验表明,在26.7小时的英语语料辅助训练下,当使用Dropout和Rectified linear units时,识别率相对基线系统提升7.0%;当使用Dropout和Maxout时,识别率相对基线系统提升了12.6%,且网络参数数量相对其他系统降低了62.7%,训练时间降低了25%.而基于矩阵分解的低维特征在单语言训练和辅助训练的两种情况下都取得了优于瓶颈特征(Bottleneck features,BNF)的识别率,且在辅助训练的情况下优于深层神经网络隐马尔科夫识别系统,提升幅度从0.8%~3.4%不等.  相似文献   

19.
研究了灰度值、中值滤波的图像预处理方法和Haar特征提取思想计算多尺度下相同特征.本文基于Adaboost算法针对同一个训练集训练不同的分类器,并将弱分类器进行集合,构成一个更强的最终分类器,实现了脸谱识别系统.通过验证脸谱识别系统,实现了对视频流中脸谱的准确定位,达到了无拖影、噪声少及识别准确的预期.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号