期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于局部与非局部线性判别分析和高斯混合模型动态集成的晶圆表面缺陷探测与识别 总被引：1，自引：0，他引：1

余建波卢笑蕾宗卫周《自动化学报》2016,42(1):47-59

在复杂的半导体制造过程中,晶圆生产经过薄膜沉积、蚀刻、抛光等多项复杂的工序,制造过程中的异常波动都可能导致晶圆缺陷产生.晶圆表面的缺陷模式通常反映了半导体制造过程的各种异常问题,生产线上通过探测和识别晶圆表面缺陷,可及时判断制造过程故障源并进行在线调整,降低晶圆成品率损失.本文提出了基于一种流形学习算法与高斯混合模型动态集成的晶圆表面缺陷在线探测与识别模型.首先该模型开发了一种新型流形学习算法——局部与非局部线性判别分析法(Local and nonlocal linear discriminant analysis, LNLDA),通过融合数据局部/非局部信息以及局部/非局部惩罚信息,有效地提取高维晶圆特征数据的内在流形结构信息,以最大化数据不同簇样本的低维映射距离,保持特征数据中相同簇的低维几何结构.针对线上晶圆缺陷产生的随机性和复杂性,该模型对每种晶圆缺陷模式构建相应的高斯混合模型(Gaussian mixture model, GMM),提出了基于高斯混合模型动态集成的晶圆缺陷在线探测与识别方法.本文提出的模型成功地应用到实际半导体制造过程的晶圆表面缺陷在线探测与识别,在WM-811K晶圆数据库的实验结果验证了该模型的有效性与实用性. 相似文献

2.

深度神经网络在维吾尔语大词汇量连续语音识别中的应用

麦麦提艾力·吐尔逊戴礼荣《数据采集与处理》2015,30(2):365-371

研究将深度神经网络有效地应用到维吾尔语大词汇量连续语音识别声学建模中的两种方法：深度神经网络与隐马尔可夫模型组成混合架构模型(Deep neural network hidden Markov model, DNN-HMM),代替高斯混合模型进行状态输出概率的计算;深度神经网络作为前端的声学特征提取器提取瓶颈特征(Bottleneck features, BN),为传统的GMM-HMM(Gaussian mixture model-HMM)声学建模架构提供更有效的声学特征(BN-GMM-HMM)。实验结果表明,DNN-HMM模型和BN- GMM-HMM模型比GMM-HMM基线模型词错误率分别降低了8.84%和5.86%,两种方法都取得了较大的性能提升。相似文献

3.

基于GMM的间歇过程故障检测 总被引：3，自引：0，他引：3

王静胡益侍洪波《自动化学报》2015,41(5):899-905

对间歇过程的多操作阶段进行划分时,往往会被离群点和噪声干扰,影响建模的精确性,针对此问题提出一种新的方法:主元分析--多方向高斯混合模型(Principal component analysis-multiple Gaussian mixture model, PCA-MGMM)建模方法.首先用最短长度法对数据进行等长处理,融合不同展开方法相结合的处理方式消除数据预估问题;利用主元分析方法将数据转换到对故障较为敏感的低维子空间中,得到主元的同时消除了离群点和噪声的干扰;通过改进的高斯混合模型(Gaussian mixture model, GMM)算法对各阶段主元进行聚类,减少了运算量的同时自动得到最佳高斯成分和对应的统计分布参数;最后将局部指标融合为全局概率监控指标,实现了连续的在线监控.通过一个实际的半导体制造过程的仿真研究验证了所提方法的有效性. 相似文献

4.

基于声学分段模型的无监督语音样例检测

李勃昊张连海郑永军《数据采集与处理》2016,31(2):407-414

提出一种基于声学分段模型的无监督语音样例检测方法。该方法首先利用高斯混合模型（Gaussian mixture model, GMM)将训练数据频谱参数转换为后验概率特征向量,采用层次聚类算法确定后验概率的边界信息,得到声学分段;然后通过k means算法将片段聚类并添加标签,构建基于后验概率的声学分段模型。检索时以模型对查询样例与检索文档的解码序列代替测量矩阵以降低检索时间,通过基于最小编辑距离的动态匹配检索查询项,最小编辑距离的代价函数由模型相似度距离矩阵修正。实验结果表明,相比GMM及传统声学分段模型,本文提出的方法性能更好,检索速度得到显著提升。相似文献

5.

采用韵律特征的说话人确认系统

龙艳花郭武戴礼荣《数据采集与处理》2010,25(1)

在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中.本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统方法的混合高斯-通用背景模型(Gaussian mixture model-universal background model,GMM-UBM)基线系统有了40.19%的提升.该方法与本文的基于声学倒谱参数的确认系统融合后,能使整体系统的识别性能有9.25%的提升.在NIST(National institute of standards and technology mixture)2006说话人测试数据库上,融合后的系统能够取得4.9%的等错误率. 相似文献

6.

基于测地线活动区域模型的非监督式纹理分割 总被引：8，自引：0，他引：8

何源罗予频胡东成《软件学报》2007,18(3):592-599

提出了一种基于曲线演化的非监督式纹理分割算法.在用Gabor小波库提取纹理特征之后,可以得到一个多维的特征图像.为了避免直接在多维空间中应用曲线演化模型,采用高斯混合模型(Gaussian mixture model,简称GMM)来描述该特征图像的概率分布,再从分布模型中计算得到每个像素点的区域信息和边界信息.综合两种信息,并应用测地线活动区域模型来获得最终分割结果.实验结果显示,这种方法能够获得良好的区域边界. 相似文献

7.

基于局部线性嵌入算法的汉语数字语音识别

高文曦于凤芹《计算机工程与应用》2012,48(31):105-107,155

语音信号转换到频域后维数较高,流行学习方法可以自主发现高维数据中潜在低维结构的规律性,提出采用流形学习的方法对高维数据降维来进行汉语数字语音识别。采用流形学习中的局部线性嵌入算法提取语音频域上高维数据的低维流形结构特征,再将低维数据输入动态时间规整识别器进行识别。仿真实验结果表明,采用局部线性嵌入算法的汉语数字语音识别相较于常用声学特征MFCC维数要少,识别率提高了1.2%,有效提高了识别速度。相似文献

8.

动态场景红外图像的压缩感知域高斯混合背景建模 总被引：1，自引：0，他引：1

王传云秦世引《自动化学报》2018,44(7):1212-1226

针对动态场景下红外图像的背景模型构建问题,提出一种基于压缩感知（Compressed sensing,CS）域高斯混合模型（Gaussian mixture model,GMM）的背景建模方法.该方法不是对图像中的每个像素建立高斯混合模型,而是对图像局部区域的压缩感知测量值建立高斯混合模型.1）通过提取红外图像轮廓的角点特征,估计相邻帧图像间的相对运动参数以对图像进行校正与配准;2）将每帧图像网格化为适当数目的局部子图,利用序列图像构建每个局部子图的压缩感知域高斯混合背景模型;3）采用子空间学习训练稀疏字典,通过子空间追踪对可能含有目标的局部子图进行选择性稀疏重构;4）通过背景减除实现前景目标检测.以红外图像数据集CDnet2014和VIVID PETS2005进行实验验证,结果表明:该方法能建立有效的动态场景红外图像背景模型,对成像过程中所受到的场景动态变化、背景扰动等具有较强的鲁棒性,其召回率、精确率、F-measure等性能指标及处理速度较之于同类算法具有明显优势. 相似文献

9.

含局部空间约束的t分布混合模型的点集配准

周志勇李莉华郑健蒯多杰胡粟张涛《自动化学报》2014,40(4):683-696

基于高斯混合模型（Gaussian mixture model,GMM）的点集非刚性配准算法易受重尾点和异常点影响,提出含局部空间约束的t分布混合模型的点集非刚性配准算法. 通过期望最大化（Expectation maximization,EM）框架将高斯混合模型推广为t分布混合模型;把Dirichlet分布作为浮动点的先验权重,并构造含局部空间约束性质的Dirichlet 分布参数. 使用EM算法获得配准参数的闭合解;计算浮动点的自由度,改变其概率密度分布,避免异常点水平估计误差. 实验表明,本文提出的配准算法具有配准误差小、鲁棒性好、抗干扰能力强等优点. 相似文献

10.

低资源语音识别若干关键技术研究进展

刘加张卫强《数据采集与处理》2017,32(2):205-220

低资源语音识别是当今语音界研究的热点问题之一,也是多语言小语种语音识别技术在实际应用中所面临的重要挑战之一。本文回顾并总结了低资源语音识别的发展历史和研究现状,重点介绍了低资源语音识别在声学特征、声学模型和语言模型方面的若干关键技术研究进展。具体内容包括发音特征、多语言瓶颈特征、子空间高斯混合模型、卷积神经网络声学模型和递归神经网络语言模型,然后介绍了针对低资源语音识别的公开关键词搜索(Open keyword search,OpenKWS)评测,最后对低资源语音识别进行了总结和展望。相似文献

11.

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Seçkin Uluskan Abhijeet Sangwan John H. L. Hansen 《International Journal of Speech Technology》2017,20(4):799-811

Distant speech capture in lecture halls and auditoriums offers unique challenges in algorithm development for automatic speech recognition. In this study, a new adaptation strategy for distant noisy speech is created by the means of phoneme classes. Unlike previous approaches which adapt the acoustic model to the features, the proposed phoneme-class based feature adaptation (PCBFA) strategy adapts the distant data features to the present acoustic model which was previously trained on close microphone speech. The essence of PCBFA is to create a transformation strategy which makes the distributions of phoneme-classes of distant noisy speech similar to those of a close talk microphone acoustic model in a multidimensional MFCC space. To achieve this task, phoneme-classes of distant noisy speech are recognized via artificial neural networks. PCBFA is the adaptation of features rather than adaptation of acoustic models. The main idea behind PCBFA is illustrated via conventional Gaussian mixture model–Hidden Markov model (GMM–HMM) although it can be extended to new structures in automatic speech recognition (ASR). The new adapted features together with the new and improved acoustic models produced by PCBFA are shown to outperform those created only by acoustic model adaptations for ASR and keyword spotting. PCBFA offers a new powerful understanding in acoustic-modeling of distant speech. 相似文献

12.

语音情感的维度特征提取与识别

李嘉《数据采集与处理》2012,27(3):389-393

研究了情绪的维度空间模型与语音声学特征之间的关系以及语音情感的自动识别方法。介绍了基本情绪的维度空间模型,提取了唤醒度和效价度对应的情感特征,采用全局统计特征减小文本差异对情感特征的影响。研究了生气、高兴、悲伤和平静等情感状态的识别,使用高斯混合模型进行4种基本情感的建模,通过实验设定了高斯混合模型的最佳混合度,从而较好地拟合了4种情感在特征空间中的概率分布。实验结果显示,选取的语音特征适合于基本情感类别的识别,高斯混合模型对情感的建模起到了较好的效果,并且验证了二维情绪空间中,效价维度上的情感特征对语音情感识别的重要作用。相似文献

13.

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

Chengwei Huang Baolin Song Li Zhao 《International Journal of Speech Technology》2016,19(4):805-816

In this paper we propose a feature normalization method for speaker-independent speech emotion recognition. The performance of a speech emotion classifier largely depends on the training data, and a large number of unknown speakers may cause a great challenge. To address this problem, first, we extract and analyse 481 basic acoustic features. Second, we use principal component analysis and linear discriminant analysis jointly to construct the speaker-sensitive feature space. Third, we classify the emotional utterances into pseudo-speaker groups in the speaker-sensitive feature space by using fuzzy k-means clustering. Finally, we normalize the original basic acoustic features of each utterance based on its group information. To verify our normalization algorithm, we adopt a Gaussian mixture model based classifier for recognition test. The experimental results show that our normalization algorithm is effective on our locally collected database, as well as on the eNTERFACE’05 Audio-Visual Emotion Database. The emotional features achieved using our method are robust to the speaker change, and an improved recognition rate is observed. 相似文献

14.

Mismatched feature detection with finer granularity for emotional speaker recognition

Li Chen Ying-chun Yang Zhao-hui Wu 《浙江大学学报:C卷英文版》2014,15(10):903-916

The shapes of speakers＇ vocal organs change under their different emotional states, which leads to the deviation of the emotional acoustic space of short-time features from the neutral acoustic space and thereby the degradation of the speaker recognition performance. Features deviating greatly from the neutral acoustic space are considered as mismatched features, and they negatively affect speaker recognition systems. Emotion variation produces different feature deformations for different phonemes, so it is reasonable to build a finer model to detect mismatched features under each phoneme. However, given the difficulty of phoneme recognition, three sorts of acoustic class recognition--phoneme classes, Gaussian mixture model （GMM） tokenizer, and probabilistic GMM tokenizer--are proposed to replace phoneme recognition. We propose feature pruning and feature regulation methods to process the mismatched features to improve speaker recognition performance. As for the feature regulation method, a strategy of maximizing the between-class distance and minimizing the within-class distance is adopted to train the transformation matrix to regulate the mismatched features. Experiments conducted on the Mandarin affective speech corpus （MASC） show that our feature pruning and feature regulation methods increase the identification rate （IR） by 3.64% and 6.77%, compared with the baseline GMM-UBM （universal background model） algorithm. Also, corresponding IR increases of 2.09% and 3.32% can be obtained with our methods when applied to the state-of-the-art algorithm i-vector. 相似文献

15.

Stereo hidden Markov modeling for noise robust speech recognition

Xiaodong Cui Mohamed Afify Yuqing Gao Bowen Zhou 《Computer Speech and Language》2013,27(2):407-419

This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions. 相似文献

16.

基于HMM模型的语音单元边界的自动切分 总被引：1，自引：0，他引：1

王丽娟曹志刚《数据采集与处理》2005,20(4):381-384

基于隐尔马可夫模型（HMM）的强制对齐方法被用于文语转换系统（TTS）语音单元边界切分.为提高切分准确性,本文对HMM模型的特征选择,模型参数和模型聚类进行优化.实验表明：12维静态Mel频率倒谱系数（MFCC）是最优的语音特征;HMM模型中的状态模型采用单高斯;对于特定说话人的HMM模型,使用分类与衰退树（CART）聚类生成的绑定状态模型个数在3 000左右最优.在英文语音库中音素边界切分的实验中,切分准确率从模型优化前的77.3%提高到85.4%. 相似文献

17.

基于混合因子分析的隐马尔可夫模型

王新民姚天任《计算机工程与应用》2005,41(24):50-52

经典隐马尔可夫模型用于语音识别存在的两个主要缺陷是“离散状态假设”和“独立分布假设”。前者忽略了语音信号的非平稳性,后者忽略了语音信号的相关性。文章将混合因子分析方法用于语音建模,提出了基于混合因子分析的隐马尔可夫模型框架,并用动态贝叶斯网络形象地表示。该模型框架不仅从理论上解决了上述问题,而且给出许多语音建模的选择。目前广泛使用的统计声学模型均可视为该模型的特例。相似文献

18.

A Mixture of Recurrent Neural Networks for Speaker Normalisation

Edmondo Trentin Diego Giuliani 《Neural computing & applications》2001,10(2):120-135

In spite of recent advances in automatic speech recognition, the performance of state-of-the-art speech recognisers fluctuates depending on the speaker. Speaker normalisation aims at the reduction of differences between the acoustic space of a new speaker and the training acoustic space of a given speech recogniser, improving performance. Normalisation is based on an acoustic feature transformation, to be estimated from a small amount of speech signal. This paper introduces a mixture of recurrent neural networks as an effective regression technique to approach the problem. A suitable Vit-erbi-based time alignment procedure is proposed for generating the adaptation set. The mixture is compared with linear regression and single-model connectionist approaches. Speaker-dependent and speaker-independent continuous speech recognition experiments with a large vocabulary, using Hidden Markov Models, are presented. Results show that the mixture improves recognition performance, yielding a 21% relative reduction of the word error rate, i.e. comparable with that obtained with model-adaptation approaches. 相似文献

19.

基于SGMM和DNN结合提高音素识别率的研究

下载免费PDF全文

贾兵兵曹辉秦驰杰《计算机工程与应用》2019,55(24):117-121

为降低声学特征在语音识别系统中的音素识别错误率,提高系统性能,提出一种子空间高斯混合模型和深度神经网络结合提取特征的方法,分析了子空间高斯混合模型的参数规模并在减少计算复杂度后将其与深度神经网络串联进一步提高音素识别率。把经过非线性特征变换的语音数据输入模型,找到深度神经网络结构的最佳配置,建立学习与训练更可靠的网络模型进行特征提取,通过比较音素识别错误率来判断系统性能。实验仿真结果证明,基于该系统提取的特征明显优于传统声学模型。相似文献