首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
针对声音效果变化引起的语音声学特性的改变,提出基于声学模型自适应的方法。分析了正常模式下训练的声学模型在识别其他声效模式下语音的表现;根据随机段模型的模型特性,将最大似然线性回归方法引入到随机段模型系统中,并利用自适应后的声学模型来识别对应的声效模式下的语音。在“863-test”测试集上进行的汉语连续语音识别实验显示,正常模式下训练的声学模型识别其他四种声效模式下的语音时,识别精度均有较大程度的下降;而自适应后的系统在识别对应的声效模式的语音时,识别精度有了明显的改观。表明了基于声学模型自适应的方法在解决语音识别中声音效果变化问题上的有效性。  相似文献   

2.
3.
基于多元变量Taylor级数展开模型的定位算法*   总被引:1,自引:0,他引:1  
为了进一步提高无线传感器网络的定位精度,通过考虑未知传感器之间的距离信息,构建了多元变量Taylor级数展开的定位模型。在对该模型求解过程中,首先利用三边测距法得到未知传感器的初始位置,再采用加权最小二乘法计算其最优值作为未知传感器的估计位置。为评价该算法的性能,对定位结果的Cramer-Rao下界(CRLB)进行了推导。仿真测试了不同距离测量误差和已知传感器数目对定位误差的影响,以及算法的累积分布函数(CDF)。仿真结果表明,该算法有效地提高了定位精度,且定位误差非常接近CRLB。  相似文献   

4.
A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human-computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates.  相似文献   

5.
Hidden Markov models (HMMs) are the most commonly used acoustic model for speech recognition. In HMMs, the probability of successive observations is assumed independent given the state sequence. This is known as the conditional independence assumption. Consequently, the temporal (inter-frame) correlations are poorly modelled. This limitation may be reduced by incorporating some form of trajectory modelling. In this paper, a general perspective on trajectory modelling is provided, where time-varying model parameters are used for the Gaussian components. A discriminative semi-parametric trajectory model is then described where the Gaussian mean vector and covariance matrix parameters vary with time. The time variation is modelled as a semi-parametric function of the observation sequence via a set of centroids in the acoustic space. The model parameters are estimated discriminatively using the minimum phone error (MPE) criterion. The performance of these models is investigated and benchmarked against a state-of-the-art CUHTK Mandarin evaluation systems.  相似文献   

6.
为了提高直接数字频率合成输出信号的动态范围,提出了一种在不增加直接数字频率合成中的累加器的位数的基础上,利用泰勒级数法较少数字频率合成的相位抖动的方法。并且对一个具有32位累加器的直接数字频率合成,输出一定频率范围的信号进行了仿真。仿真结果表明,基于泰勒级数的直接数字频率合成具有较好的动态范围,比一般的方法提高了12 dB。该方法对直接数字频率合成设计者有着重要的参考价值。  相似文献   

7.
庄志豪  傅洪亮  陶华伟  杨静  谢跃  赵力 《计算机应用研究》2021,38(11):3279-3282,3348
针对不同语料库之间数据分布差异问题,提出一种基于深度自编码器子域自适应的跨库语音情感识别算法.首先,该算法采用两个深度自编码器分别获取源域和目标域表征性强的低维情感特征;然后,利用基于LMMD(local maximum mean discrepancy)的子域自适应模块,实现源域和目标域在不同低维情感类别空间中的特征分布对齐;最后,使用带标签的源域数据进行有监督地训练该模型.在eNTERFACE库为源域、Berlin库为目标域的跨库识别方案中,所提算法的跨库识别准确率相比于其他算法提升了5.26%~19.73%;在Berlin库为源域、eNTERFACE库为目标域的跨库识别方案中,所提算法的跨库识别准确率相比于其他算法提升了7.34%~8.18%.因此,所提方法可以有效地提取不同语料库的共有情感特征并提升了跨库语音情感识别的性能.  相似文献   

8.
针对存在不确定扰动的线性时变系统的轨迹跟踪控制问题,提出了基于泰勒级数的迭代学习算法.该算法利用泰勒级数将系统参数化,导出一种基于泰勒级数的线性时变系统的近似模型.在此模型的基础上,利用迭代学习方式修正输入量的泰勒展开系数,并用LMI方法求解学习增益矩阵.所提出算法在系统不满足正则性或无源性时,仍可用输出误差信号来构造学习律.仿真结果表明了该算法的有效性.  相似文献   

9.
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy.  相似文献   

10.
This paper proposes an efficient speech data selection technique that can identify those data that will be well recognized. Conventional confidence measure techniques can also identify well-recognized speech data. However, those techniques require a lot of computation time for speech recognition processing to estimate confidence scores. Speech data with low confidence should not go through the time-consuming recognition process since they will yield erroneous spoken documents that will eventually be rejected. The proposed technique can select the speech data that will be acceptable for speech recognition applications. It rapidly selects speech data with high prior confidence based on acoustic likelihood values and using only speech and monophone models. Experiments show that the proposed confidence estimation technique is over 50 times faster than the conventional posterior confidence measure while providing equivalent data selection performance for speech recognition and spoken document retrieval.  相似文献   

11.
域自适应算法被广泛应用于跨库语音情感识别中;然而,许多域自适应算法在追求减小域差异的同时,丧失了目标域样本的鉴别性,导致其以高密度的形式存在于模型决策边界处,降低了模型的性能。基于此,提出一种基于决策边界优化域自适应(DBODA)的跨库语音情感识别方法。首先利用卷积神经网络进行特征处理,随后将特征送入最大化核范数及均值差异(MNMD)模块,在减小域间差异的同时,最大化目标域情感预测概率矩阵的核范数,从而提升目标域样本的鉴别性并优化决策边界。在以Berlin、eNTERFACE和CASIA语音库为基准库设立的六组跨库实验中,所提方法的平均识别精度领先于其他算法1.68~11.01个百分点,说明所提模型有效降低了决策边界的样本密度,提升了预测的准确性。  相似文献   

12.
Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automaticspeech and speaker recognition in noisy acoustic scenarios: feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram equalization (HEQ) belongs to such a category of algorithms and has proved to be effective on purpose and therefore taken here as reference.In this paper the presence of multi-channel acoustic channels is used to enhance the statistics modeling capabilities of the HEQ algorithm, by exploiting the availability of multiple noisy speech occurrences, with the aim of maximizing the effectiveness of the cepstra normalization process. Computer simulations based on the Aurora 2 database in speech and speaker recognition scenarios have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea. The proposed algorithmic configuration has also been combined with the kernel estimation technique in order to further improve the speech recognition performances.  相似文献   

13.
《国际计算机数学杂志》2012,89(17):3677-3684
In this paper, we analyse the exact solutions to scalar partial differential equations obtained, thanks to the summable Taylor series provided by Adomian's decomposition method. We propose a modification of the method which makes the calculations of Taylor coefficients easier and more direct. The difference is essential, for instance, in the case of non-homogenous equations or initial conditions and is illustrated by some examples.  相似文献   

14.
在矢量量化(VQ)的码本设计过程中,经典的LBG算法收敛速度快,但极易陷入局部最优,且初始码本的生成对最佳码本的设计影响很大。考虑到遗传算法(GA)是一种具有全局优化搜索能力的算法,提出了GA和LBG算法相结合的GA-L算法来优化码本,改善了码本质量,并将其应用于汉语连续数字语音识别中,实验结果表明了GA-L算法的有效性。  相似文献   

15.
提出了一种基于Gabor小波人脸特征和模型自适应算法的新鲁棒人脸识别方法。该方法在真实识别前,通过用与真实识别相同的环境条件下所获得的人脸图像数据对原始模型进行更新补偿,实现了模型自适应。该模型自适应更新算法是加性的,其具有较低的时间和空间复杂度。通过模型自适应更新,新方法可以有效地减少模型和识别数据间的失配,从而提高识别率。在AT&T和MITCBCL人脸数据库上的测试结果表明,该方法是有效的。  相似文献   

16.
The kernel function is the core of the Support Vector Machine (SVM), and its selection directly affects the performance of SVM. There has been no theoretical basis on choosing a kernel function for speech recognition. In order to improve the learning ability and generalization ability of SVM for speech recognition, this paper presents the Optimal Relaxation Factor (ORF) kernel function, which is a set of new SVM kernel functions for speech recognition, and proves that the ORF function is a Mercer kernel function. The experiments show the ORF kernel function's effectiveness on mapping trend, bi-spiral, and speech recognition problems. The paper draws the conclusion that the ORF kernel function performs better than the Radial Basis Function (RBF), the Exponential Radial Basis Function (ERBF) and the Kernel with Moderate Decreasing (KMOD). Furthermore, the results of speech recognition with the ORF kernel function illustrate higher recognition accuracy.  相似文献   

17.
The paper presents a decision algorithmic model called vector gravitational force model in the feature space. The algorithmic model, inspired by and similar to the Law of Universal Gravitation, is derived from the vector geometric analysis of the linear classifier and established in the feature space. Based on this algorithmic model, we propose a classification method called vector gravitational recognition. The proposed method is applied to the benchmark Glass Identification task in the UCI Database available from USA Forensic Science Service, and other two UCI benchmark tasks. The experimental and comparative results show that the proposed approach yields quite good results and outperforms some well known and recent approaches on the tasks, and other applications may benefit from ours.
Yang Zong-changEmail:
  相似文献   

18.
噪声鲁棒语音识别研究综述*   总被引:3,自引:1,他引:2  
针对噪声环境下的语音识别问题,对现有的噪声鲁棒语音识别技术进行讨论,阐述了噪声鲁棒语音识别研究的主要问题,并根据语音识别系统的构成将噪声鲁棒语音识别技术按照信号空间、特征空间和模型空间进行分类总结,分析了各种鲁棒语音识别技术的特点、实现,以及在语音识别中的应用。最后展望了进一步的研究方向。  相似文献   

19.
Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly.  相似文献   

20.
A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time–frequency (T–F) mask which retains the mixture in a local T–F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T–F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号