期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李浩亮靳双燕贾伟伟《电声技术》2013,(12):75-78

介绍了一种基于连续M元高斯混合密度的隐马尔可夫模型（HMM）的非特定人孤立词语音识别仿真系统。通过研究模型状态数、训练时间以及特征参数选取对语音识别率的影响,得出HMM状态数取4,训练次数为20次,特征参数选取48维LPCC和MFCC的混合参数,可使语音识别系统对于汉语孤立词的识别率达到90％。相似文献

2.

基于BP神经网络的汉语语音识别的研究

张忠平文成义《电声技术》1992,(9):2-6

本文论述了基于神经网络模型的非特定人汉语语音识别。我们采用24人(12人用于训练,12人用于测试)的语音数据对汉语十个数字和十个孤立字进行了实验,取得了96.3％(10个数字)和97.2％(十个汉字)的识别率。相似文献

3.

基于OpenRISC1200的孤立词识别系统设计与实现

李彬贺前华齐凡《电子工程师》2006,32(11):44-47

介绍了一个基于32位OpenRISC1200开放源码微处理器内核的小词汇量孤立词语音识别系统结构。根据软硬件协同设计方法，研究和比较了孤立词语音识别各个环节的计算量，合理分配软硬件资源，并提出一种适合FPGA（现场可编程门阵列）实现的动态时间规正硬件实现思路，大大缩短识别响应时间。该系统在成本和知识产权方面都较市场上流行的ARM、8051等内核有优势。实验结果表明，在特定场合下，该系统对于100个词组的平均识别响应时间少于2s，特定人识别率95％以上，非特定人识别率87％以上。相似文献

4.

C8051F040单片机在语音识别系统中的应用

刘琼《世界电子元器件》2012,(8):38-40,48

语音识别是人机交互的一种重要技术手段。根据实际需要和应用场合的不同,语音识别可以分为孤立词识别和连续语音识别、特定人识别和非特定人识别。语音识别追求的主要指标为高识别率、实时性和大词汇量. 相似文献

5.

函数链及其在语音识别中的应用

全庆一张忠平《电声技术》1994,(2):2-4

本文根据加权倒变距离测度，提出了一种用于非特定人语音识别的函数链神经网络。此网络与多层感知器相比，不仅具有较高的识别别率，而且大大缩短了网络的学习时间。我们采用６人（３男，３女）的语音数据对汉语十个数字进行了实验，正确识别率为９３．７％。相似文献

6.

基于TMS320C54x DSP的实时语音识别系统 总被引：6，自引：0，他引：6

陈志鑫郭华伟《半导体技术》2001,26(4):5-8

介绍一个非特定人、小词汇表、孤立词的语音识别系统,它采用基于隐马尔可夫随机模型（HMM）的语音信号端点检测方法和基于VQ/HMM的自学习语音识别算法,同时以高速的TMS320C54xDSP芯片为核心进行硬件设计,实现语音的实时识别。相似文献

7.

基于LPMCC的语音识别系统实现

石太佳王晓君《电声技术》2010,34(1):63-66

语音识别可实现人机交互和语音控制,在工业控制、消费电子等领域都有广泛应用。结合人发音的生理结构的特点,使用LPMCC（LPC倒谱美尔变换）作为特征向量,采用动态规划算法作为核心识别算法,在TMS320VC5402芯片上实现了特定人、孤立词的高性能实时识别系统。相似文献

8.

基于TMS320C54×DSP的实时语音识别系统

陈志鑫郭华伟《半导体技术》2001,(4)

介绍一个非特定人、小词汇表、孤立词的语音识别系统,它采用基于隐马尔可夫随机模型（ＨＭＭ）的语音信号端点检测方法和基于ＶＱＩＨＭＭ的自学习语音识别算法,同时以高速的ＴＭＳ３２０Ｃ５４ＸＤＳＰ芯片为核心进行硬件设计,实现语音的实时识别。相似文献

9.

一种改进的特定人语音识别系统及算法研究

赵智琦房建东《电子设计工程》2014,22(16)

针对传统特定人语音识别过程中存在的算法复杂、所占存储空间大等问题,提出了一种改进的基于动态时间规整算法(DTW)的特定人语音识别系统.在对参数提取方法进行详细对比之后,提取美尔频率倒谱系数(MFCC)作为本系统的语音识别参数,有效的解决了人耳响应不同信号灵敏度不同的问题.利用MATLAB环境下语音工具箱Voice Box实现了对若干数字的孤立词识别,识别速度提高了约30％,识别成功率达到95％以上.仿真结果证明,该系统在算法简单,识别成功率高,是一种简单有效的语音识别方法. 相似文献

10.

一种适于非特定人语音识别的并行隐马尔可夫模型 总被引：2，自引：0，他引：2

陈雁翔戴蓓蒨周曦刘鸣《电子与信息学报》2004,26(10):1601-1606

为了适合非特定人语音识别,提出了一种由多条并行马尔可夫链组成的并行HMM(Parallel Hidden Markov Model,PHMM),从而融合了基于分类的语音识别中为各个类别建立的模板,提高了识别性能,各条链之间允许有交叉,使得融合的多模板之间存在状态共享,同时PHMM可以在训练过程中自动完成聚类,且测试语音的输出结果来自所有类别,无需聚类分析和类别判断,这些都减少了存储量和计算量,汉语非特定人孤立数字的识别实验表明,PHMM较之传统CHMM使识别性能及噪声鲁棒性都得到了改善。相似文献

11.

基于音色一致的语音克隆说话人特征提取方法

下载免费PDF全文

李嘉欣张连海李宜亭《信号处理》2023,39(4):719-729

当前基于预训练说话人编码器的语音克隆方法可以为训练过程中见到的说话人合成较高音色相似性的语音,但对于训练中未看到的说话人,语音克隆的语音在音色上仍然与真实说话人音色存在明显差别。针对此问题,本文提出了一种基于音色一致的说话人特征提取方法,该方法使用当前先进的说话人识别模型TitaNet作为说话人编码器的基本架构,并依据说话人音色在语音片段中保持不变的先验知识,引入一种音色一致性约束损失用于说话人编码器训练,以此提取更精确的说话人音色特征,增加说话人表征的鲁棒性和泛化性,最后将提取的特征应用端到端的语音合成模型VITS进行语音克隆。实验结果表明,本文提出的方法在2个公开的语音数据集上取得了相比基线系统更好的性能表现,提高了对未见说话人克隆语音的音色相似度。相似文献

12.

Speaker Adaptation Using ICA‐Based Feature Transformation

Ho‐Young Jung Mansoo Park Hoi‐Rin Kim Minsoo Hahn 《ETRI Journal》2002,24(6):469-472

Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression‐based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA‐based feature transformation matrix, it is necessary to adjust the ICA‐based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker‐independent (SI) feature transformation matrix and the speaker‐dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust. 相似文献

13.

Effects of speech coding on speaker verification

Castellano P.J. Sridharan S. Boland S. 《Electronics letters》1996,32(6):517-518

The impact of speech coding on automatic speaker verification is investigated. Two different coders, classifiers and parametric representations of speech are considered. It is found that coding degrades mean speaker verification accuracy. Classifications of speech frames, for a given speaker, fluctuate by up to 30% relative to the uncoded case 相似文献

14.

噪声背景下基于多模板矢量量化的与文本无关的话者辩识 总被引：1，自引：0，他引：1

沈春华徐柏龄《信号处理》2001,17(2):185-188

在话者辨识系统的实际应用中,导致系统识别率下降的根本原因是噪声的影响,它使得测试与训练条件不一致.本文针对实际环境中常见的加性背景噪声,提出了利用加入不同类型、不同信噪比噪声的含噪语音进行训练说话人的模型,每个说话人具有多个模板.实验结果表明,这种方法能够有效的提高系统的鲁棒性.文中还讨论了距离加权方法在话者辨识中的应用. 相似文献

15.

基于VEMAP的说话人识别鲁棒性研究

黄文娜彭亚雄《电声技术》2016,40(11):44-47

为了改善发声力度变化对说话人识别系统性能的影响.针对不同发声力度下语音信号的分析,提出了使用发声力度最大后验概率(Vocal Effort Maximum A Posteriori,VEMAP)自适应方法更新基于高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)的说话人识别系统模型.实验表明,所提出的方法使不同发声力度下系统EER％降低了88.45％与85.16％,有效解决了因发声力度变化引起的训练语音与测试语音音量失配,从而导致说话人识别性能降低的问题,改善说话人识别系统性能效果显著. 相似文献

16.

Text‐Independent Speaker Verification Using Variational Gaussian Mixture Model

Mohammad Hossein Moattar Mohammad Mehdi Homayounpour 《ETRI Journal》2011,33(6):914-923

This paper concerns robust and reliable speaker model training for text‐independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text‐independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%. 相似文献

17.

基于听感量化编码的神经网络语音合成方法研究

刘庆峰江源胡亚军刘利娟《电子科技》2019,32(9):76-79

针对当前神经网络声学建模中数据混用困难的问题,文中提出了一种基于听感量化编码的神经网络语音合成方法。通过设计听感量化编码模型学习海量语音在音色、语种、情感上的不同差异表征,构建统一的多人数据混合训练的神经网络声学模型。在统一的听感量化编码声学模型内通过数据共享和迁移学习,可以显著降低合成系统搭建的数据量要求,并实现对合成语音的音色、语种、情感等属性的有效控制。提升了神经网络语音合成的质量和灵活性,一小时数据构建语音合成系统自然度可达到4.0MOS分,达到并超过普通说话人水平。相似文献

18.

Speaker identification based on adaptive discriminative vector quantisation

Zhou G. Mikhael W.B. 《Vision, Image and Signal Processing, IEE Proceedings -》2006,153(6):754-760

A novel adaptive discriminative vector quantisation technique for speaker identification (ADVQSI) is introduced. In the training mode of ADVQSI, for each speaker, the speech feature vector space is divided into a number of subspaces. The feature space segmentation is based on the difference between the probability distribution of the speech feature vectors from each speaker and that from all speakers in the speaker identification (SI) group. Then, an optimal discriminative weight, which represents the subspace's role in SI, is calculated for each subspace of each speaker by employing adaptive techniques. The largest template differences between speakers in the SI group are achieved by using optimal discriminative weights. In the testing mode of ADVQSI, discriminative weighted average vector quantisation (VQ) distortions are used for SI decisions. The performance of ADVQSI is analysed and tested experimentally. The experimental results confirm the performance improvement employing the proposed technique in comparison with existing VQ techniques for SI and recently reported discriminative VQ techniques for SI (DVQSI) 相似文献

19.

Speech-driven facial animation using a hierarchical model

Cosker D.P. Marshall A.D. Rosin P.L. Hicks Y.A. 《Vision, Image and Signal Processing, IEE Proceedings -》2004,151(4):314-321

A system capable of producing near video-realistic animation of a speaker given only speech inputs is presented. The audio input is a continuous speech signal, requires no phonetic labelling and is speaker-independent. The system requires only a short video training corpus of a subject speaking a list of viseme-targeted words in order to achieve convincing realistic facial synthesis. The system learns the natural mouth and face dynamics of a speaker to allow new facial poses, unseen in the training video, to be synthesised. To achieve this the authors have developed a novel approach which utilises a hierarchical and nonlinear principal components analysis (PCA) model which couples speech and appearance. Animation of different facial areas, defined by the hierarchy, is performed separately and merged in post-processing using an algorithm which combines texture and shape PCA data. It is shown that the model is capable of synthesising videos of a speaker using new audio segments from both previously heard and unheard speakers. 相似文献

20.

基于BP神经网络的说话人识别技术的实现

陈仁林郭中华朱兆伟《智能计算机与应用》2012,(2):47-49

说话人识别就是从说话人的一段语音中提取出说话人的个性特征,通过对这些个人特征的分析和识别,从而达到对说话人进行辨认或者确认的目的。神经网络是一种基于非线性理论的分布式并行处理网络模型,具有很强的模式分类能力及对不完全信息的鲁棒性,为说话人识别技术提供了一种独特的方法。BP(Back-propagation Neural Network)是一种非循环多级网络训练算法,有输入层,输出层和N个隐含层组成。首先概述了语音识别技术,介绍了BP神经网络训练过程的7个步骤及其模型,如何建立BP神经网络模型。同时介绍了与其相关的特征参数的提取,神经网络的训练和识别过程,最后,通过编程在Linux系统下实现说话人身份的识别。相似文献