期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive Gaussian back-end based on LDOF criterion for language recognition

Zhong-fu YE Ting QI Sai-feng LI Yan SONG 《通信学报》2017,38(4):17-24

In order to alleviate the mismatch in model between training and testing samples caused by inter-language variations,adaptive Gaussian back-end based on LDOF criterion was proposed for language recognition.The local distance-based outlier factor (LDOF) criterion was defined to find the appropriate model parameters and dynamically select the training data subset similar to the testing samples from multiple class training sets.Then original back-end was adjusted to obtain a more matched recognition model.Experimental results on NIST LRE 2009 easily-confused language data set show that proposed method achieves an obvious performance improvement on both the equal error rate (ERR) and average decision cost function. 相似文献

2.

低数据资源条件下基于结构信息共享的无切分维文文档识别字符建模

姜志威丁晓青彭良瑞刘长松《电子与信息学报》2015,37(9):2103-2109

无切分维吾尔文文档识别技术能够有效避免字符切分错误,但是对于低数据资源的新样本类型,原有模型往往难以获得较高的识别性能。为此,该文提出共享常用维文字体间相对稳定的字符结构信息,并用Bootstrap方法提高样本利用效率的解决方法。通过在实际书籍样本上的实验表明,仅利用规模约原始训练样本1/5的新类型样本,该方法在测试集上的平均字符识别准确率就可以达到95.05%;而与常用的最大后验概率估计方法相比,也能使识别错误率相对降低55.76%~63.84%。因此,该方法能够有效解决低数据资源条件下的维文字符建模问题,实现对新样本类型的高性能识别。相似文献

3.

Continuous Mandarin speech recognition for Chinese language withlarge vocabulary based on segmental probability model

Shen J.-L. 《Vision, Image and Signal Processing, IEE Proceedings -》1998,145(5):309-315

The author presents a study of large-vocabulary continuous Mandarin speech recognition based on a segmental probability model (SPM) approach. The SPM was found to be very suitable for recognition of isolated Mandarin syllables especially considering the monosyllabic structure of the Chinese language. To extend the application of the model to continuous Mandarin speech recognition, a concatenated syllable matching (CSM) algorithm in place of the conventional Viterbi search algorithm is first introduced. Also, to utilise the available training material efficiently, a training procedure is proposed to re-estimate the SPM parameters using the maximum a posteriori (MAP) algorithm. A few special techniques integrating acoustic and linguistic knowledge are developed further to improve the performance step by step. Preliminary experimental results show that the final achievable rate is as high as 91.62%, which indicates a 18.48% error rate reduction and more than three times faster than the well studied subsyllable-based CHMM 相似文献

4.

基于主动标记支持向量机和太赫兹光谱的转基因物质检测方法研究

潘学文刘元明《光电子．激光》2018,29(10):1092-1100

为克服传统支持向量机需要事先对训练样本进行人为标记的缺点,提出了一种主动训练支持向量机模型。利用仿射传播聚类算法对未标记样本进行聚类分析,在迭代过程中不断更新现有支持向量机的训练数据,从而不仅可以减少人为标记样本所带来的误差,还能够最大限度地提高模型的识别准确率。本文以转基因棉花的太赫兹光谱数据为研究对象对该模型进行了验证,实验结果表明,本文提出的方法对总待测样品的种类的识别率为95.56%,较其他三种方法有较少的误判和更高的识别率。基于仿射传播聚类的支持向量机较传统支持向量机有更高的识别率和更低的误判率,为转基因物质的检测提供了一种快速,无损的新方法。相似文献

5.

基于空间相关性变换的声学模型训练

苏腾荣吴及王作英《电子与信息学报》2010,32(4):1003-1007

为了在语音识别中增强对不同语音单元之间的相关性的利用,该文基于空间相关性变换(Spatial Correlation Transformation,SCT)框架,提出一种新的模型训练算法,在说话人无关模型的训练中利用训练数据中的空间相关性进行模型参数重估。该算法对所有训练数据进行空间相关性变换,削弱数据间的空间相关性,使重估的模型更不依赖训练数据,以改善模型的性能。实验表明,基于空间相关性变换框架的模型训练方法与基于该框架的特征变换方法相结合,使系统的平均错误率相对基线系统下降了18%。相似文献

6.

汉语连续语音识别中上下文相关的识别单元(三音子)的研究 总被引：1，自引：0，他引：1

赵庆卫王作英陆大《电子学报》1999,27(6):79-82,117

本文详细研究了汉语语音识别中如何有效地建立上下文相关的识别单元,以解决连续语音之间的协同发音问题。相似文献

7.

Segmental probability distribution model approach for isolatedMandarin syllable recognition

Shen J.-L. 《Vision, Image and Signal Processing, IEE Proceedings -》1998,145(6):384-390

A segmental probability distribution model (SPDM) approach is proposed for fast and accurate recognition of isolated Mandarin syllables. Instead of the conventional frame-based approach such as the hidden Markov model (HMM), the model matching process in the proposed SPDM is evaluated segment-by-segment based on information-theoretic distance measurements. The training and recognition procedures for the SPDM are developed first. Several distance measurement criteria, including the Chernoff distance, Bhattacharyya distance, Patrick-Fisher (1969) distance, divergence and a Bayesian-like distance, are used, and formulations and comparative results are discussed. Experimental results show that, compared to the widely used sub-unit based continuous density HMM, the proposed method leads to an improvement of 15.27% in the error rate, with a 12-fold increase in recognition speed and less than three quarters of the mixture requirements 相似文献

8.

Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition

Wei-Wei Liu Meng Cai Wei-Qiang Zhang Jia Liu Michael T. Johnson 《Journal of Signal Processing Systems》2016,82(2):229-239

Currently, phonotactic spoken language recognition (SLR) and acoustic SLR systems are widely used language recognition systems. Parallel phone recognition followed by vector space modeling (PPRVSM) is one typical phonotactic system for spoken language recognition. To achieve better performance, researchers assumed to extract more complementary information of the training data using phone recognizers trained for multiple language-specific phone recognizers, different acoustic models and acoustic features. These methods achieve good performance but usually compute at high computational cost and only using complementary information of the training data. In this paper, we explore a novel approach to discriminative vector space model (VSM) training by using a boosting framework to use the discriminative information of test data effectively, in which an ensemble of VSMs is trained sequentially. The effectiveness of our boosting variation comes from the emphasis on working with the high confidence test data to achieve discriminatively trained models. Our variant of boosting also includes utilizing original training data in VSM training. The discriminative boosting algorithm (DBA) is applied to the National Institute of Standards and Technology (NIST) language recognition evaluation (LRE) 2009 task and show performance improvements. The experimental results demonstrate that the proposed DBA shows 1.8 %, 11.72 % and 15.35 % relative reduction for 30s, 10s and 3s test utterances in equal error rate (EER) than baseline system. 相似文献

9.

连续语音识别前端鲁棒性研究

胡丹曾庆宁龙超黄桂敏《电视技术》2015,39(24):43-46

针对大词汇量连续语音识别中识别率不高的问题,提出了将语音增强级联在识别系统前端,在语音增强中将谱减法和对数最小均方误差算法（logmmse）与用于噪声估计的最小控制递归平均算法（imcra）相结合。识别系统使用Mel频率倒谱系数（MFCC）提取特征,用隐马尔科夫模型（HMM）训练与识别。实验结果表明,提出的方法最高能使单词识别率提高38.9%,使句子正确率提高21.8%。该方法用于大词汇量连续语音识别是可行的,有效的。相似文献

10.

民航陆空通话语音识别BiLSTM网络模型

下载免费PDF全文

邱意贾桂敏杨金锋刘远庆《信号处理》2019,35(2):293-300

民航陆空通话对民航飞行安全十分重要,但因其通话模式有特殊的语法结构与发音方式,日常语音识别声学模型无法有效应用于民航陆空通话的语音处理问题。针对民航陆空通话的特殊语境,本文提出了基于双向长短时记忆网络(BiLSTM)的民航陆空通话语音识别方法。首先,提取民航陆空通话语音的FBANK特征作为输入,以时序链式连接(CTC)为目标函数,训练BiLSTM网络得到BiLSTM/CTC模型。然后,利用声学模型,语言模型与陆空通话词典实现民航陆空通话的语音识别,并结合数据增强与数据迁移对模型进行增强训练提高语音识别性能。实验结果表明本文提出的方法适用于民航陆空通话语音识别,并且数据增强模型可有效降低民航陆空通话语音识别的词错误率。相似文献

11.

用于人机交互的静态手势识别系统 总被引：7，自引：1，他引：7

刘江华陈佳品程君实《红外与激光工程》2002,31(6):499-503

提出并实现一个用于人机交互的静态手势识别系统。基于皮肤颜色模型进行手势分割，并用傅里叶描述子描述轮廓。采用针对小样本特别有效且范化误差有界的支持向量机方法：最小二乘支持向量机（LS－SVM）作为分类器。提出了LS－SVM的增量训练方式，避免了费时的矩阵求逆操作。为实现多类手势识别，利用DAG（Directed Acyclic Graph)将多个两类LS－SVM结合起来。对26个字母手势进行识别，与多层感知器、径向基函数网络等方法比较，LS－SVM的识别率最高，为93．62％。相似文献

12.

Support Vector Machine Training for Improved Hidden Markov Modeling

Sloin A. Burshtein D. 《Signal Processing, IEEE Transactions on》2008,56(1):172-188

We present a discriminative training algorithm, that uses support vector machines (SVMs), to improve the classification of discrete and continuous output probability hidden Markov models (HMMs). The algorithm uses a set of maximum-likelihood (ML) trained HMM models as a baseline system, and an SVM training scheme to rescore the results of the baseline HMMs. It turns out that the rescoring model can be represented as an unnormalized HMM. We describe two algorithms for training the unnormalized HMM models for both the discrete and continuous cases. One of the algorithms results in a single set of unnormalized HMMs that can be used in the standard recognition procedure (the Viterbi recognizer), as if they were plain HMMs. We use a toy problem and an isolated noisy digit recognition task to compare our new method to standard ML training. Our experiments show that SVM rescoring of hidden Markov models typically reduces the error rate significantly compared to standard ML training. 相似文献

13.

声韵母约束扩展识别网络的发音偏误检测

下载免费PDF全文

董文伟解焱陆林举《信号处理》2020,36(6):977-983

发音偏误检测是计算机辅助发音训练（Computer Aided Pronunciation Training ,CAPT）的重要组成部分。为了在机器辅助语料标注任务或者缺少标注语料的偏误检测任务上提高性能,本文提出解码时使用声韵母约束的扩展识别网络方法。该方法将传统的语音识别中解码的自由文法循环（free grammar loop）部分换成结合声韵母交替以及字数限制规则的扩展识别网络,可以对全音素进行偏误检测, 并且不会出现插入删除错误。相比于传统的扩展识别网络,这种约束的扩展识别网络不需要大量的语料标注和分析。相对于传统的发音良好度评价方法（Goodness of Pronunciation, GOP）, 基于这种拓展识别网络的方法不仅可以对二语学习者的发音进行正误的检测,还能给出具体的错误反馈。实验结果表明,本文提出的基于声韵母约束拓展识别网络的方法在挑错任务上优于传统的发音质量评估（GOP）的方法,其错误接受率为29.2%,错误拒绝率为22.9%,诊断准确率为76.6%。比GOP方法的诊断准确率相对提升15.5%,并且模型相较于无标注经验汉语母语者能检测出更多偏误。相似文献

14.

Automatic context induction for tone model integration in mandarin speech recognition

HUANG Hao LI Bing-hu 《中国邮电高校学报(英文版)》2012,19(1):94-100

Tone model (TM) integration is an important task for mandarin speech recognition.It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi... 相似文献

15.

3D ear shape reconstruction and recognition for biometric applications

Siu-Yeung Cho 《Signal, Image and Video Processing》2013,7(4):609-618

This paper presents a new method based on a generalized neural reflectance (GNR) model for enhancing ear recognition under variations in illumination. It is based on training a number of synthesis images of each ear taken at single lighting direction with a single view. The way of synthesizing images can be used to build training cases for each ear under different known illumination conditions from which ear recognition can be significantly improved. Our training algorithm assigns to recognize the ear by similarity measure on ear features extracting firstly by the principal component analysis method and then further processing by the Fisher’s discriminant analysis to acquire lower-dimensional patterns. Experimental results conducted on our collected ear database show that lower error rates of individual and symmetry are achieved under different variations in lighting. The recognition performance of using our proposed GRN model significantly outperforms the performance that without using the proposed GNR model. 相似文献

16.

Discriminative tonal feature extraction method in mandarin speech recognition

HUANG Hao ZHU Jie 《中国邮电高校学报(英文版)》2007,14(4):126-130

To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project F0 (fundamental frequency) features of neighboring syllables as compensations, and adds them to the original F0 features of the current syllable. The transforms are discriminatively trained by using an objective function termed as "minimum tone error", which is a smooth approximation of tone recognition accuracy. Experiments show that the new tonal features achieve 3.82% tone recognition rate improvement, compared with the baseline, using maximum likelihood trained HMM on the normal F0 features. Further experiments show that discriminative HMM training on the new features is 8.78% better than the baseline. 相似文献

17.

基于低秩特征脸与协同表示的人脸识别算法

下载免费PDF全文

杨明中杨平先《液晶与显示》2017,32(8):650-655

在人脸识别中,人脸图像往往受到表情、光照、遮挡、姿态变化的影响,对此本文提出一种基于低秩特征脸与协同表示的人脸识别算法。该算法先用低秩矩阵恢复算法分解出训练样本图像的误差图像,再分别对训练样本与误差图像提取特征构造特征字典,计算测试样本图像特征字典下的协同表示系数,最后通过重构误差进行分类。通过AR和ORL人脸库进行实验,结果表明,本文提出的人脸识别算法的识别率、识别速率得到有效提高。相似文献

18.

CMOS current-mode implementation of spatiotemporal probabilistic neural networks for speech recognition

Chung-Yu Wu Ron-Yi Liu 《The Journal of VLSI Signal Processing》1995,10(1):67-84

In this paper, a Spatiotemporal Probabilistic Neural Network (SPNN) is proposed for spatiotemporal pattern recognition. This new model is developed by applying the concept of Gaussian density function to the network structure of the SPR (Spatiotemporal Pattern Recognition). The main advantages of this model include faster training and recalling process for patterns. In addition, the overall architecture is also simple, modular, regular, locally connected, and suitable for VLSI implementation. One set of independent speaker isolated (Mandarin digit) speech database is used as an example to demonstrate the superiority of the neural networks for spatiotemporal pattern recognition. The testing result with a reduced error rate of 7% shows that the SPNN is very attractive and effective for practical applications. p ]The CMOS current-mode IC technology is used to implement the SPNN to achieve the objective of minimum classification error in a more direct manner. In this design, neural computation is performed in analog circuits while template information is stored in digital circuits. The prototyping speech recognition processor for the 12th LPC calculation is designed by 1.2μm CMOS technology. The HSPICE simulation results are also presented, which verifies the function of the designed neural system. 相似文献

19.

采用最少门单元结构的改进注意力声学模型

下载免费PDF全文

龙星延屈丹张文林徐思颖《信号处理》2018,34(6):739-748

采用“编码-解码”结构的注意力声学模型存在参数规模庞大、收敛速度慢和在噪声环境中对齐关系不准确的问题。针对以上问题，先提出引入最少门结构单元减少模型参数，减少训练时间；再采用自适应宽度的窗函数和在计算注意力系数特征的卷积神经网络中加入池化层进一步提高音素与特征对齐的准确度，从而提升识别准确率。在英语和捷克语的实验结果表明，改进后的模型参数规模和音素错误率均下降，同时识别性能优于基于隐马可夫模型和基于连接时序分类算法的声学模型。相似文献

20.

最大似然稀疏编码在人脸识别中的研究

单桂军《电视技术》2013,37(23)

摘　要稀疏编码(SRC)是一种用于人脸识别的方法。该方法把检测图像表示为一组训练样本的稀疏线性组合,表示的准确性通过L2或L1残余项来衡量。此模型假定编码残余项服从高斯分布或拉普拉斯分布,实际上却不能很准确的描述编码错误率。本文提出一种新的稀疏编码方法,建立一种有约束的回归问题模型。最大似然稀疏编码(MSC)寻找此模型的最大似然估计参数,对异常情况具有很强的鲁棒性。在Yale及ORL人脸数据库的实验结果表明了该方法对于人脸模糊、光照及表情变化等的有效性及鲁棒性。相似文献