首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The issue of input variability resulting from speaker changes is one of the most crucial factors influencing the effectiveness of speech recognition systems. A solution to this problem is adaptation or normalization of the input, in a way that all the parameters of the input representation are adapted to that of a single speaker, and a kind of normalization is applied to the input pattern against the speaker changes, before recognition. This paper proposes three such methods in which some effects of the speaker changes influencing speech recognition process is compensated. In all three methods, a feed-forward neural network is first trained for mapping the input into codes representing the phonetic classes and speakers. Then, among the 71 speakers used in training, the one who is showing the highest percentage of phone recognition accuracy is selected as the reference speaker so that the representation parameters of the other speakers are converted to the corresponding speech uttered by him. In the first method, the error back-propagation algorithm is used for finding the optimal point of every decision region relating to each phone of each speaker in the input space for all the phones and all the speakers. The distances between these points and the corresponding points related to the reference speaker are employed for offsetting the speaker change effects and the adaptation of the input signal to the reference speaker. In the second method, using the error back-propagation algorithm and maintaining the reference speaker data as the desirable speaker output, we correct all the speech signal frames, i.e., the train and the test datasets, so that they coincide with the corresponding speech of the reference speaker. In the third method, another feed-forward neural network is applied inversely for mapping the phonetic classes and speaker information to the input representation. The phonetic output retrieved from the direct network along with the reference speaker data are given to the inverse network. Using this information, the inverse network yields an estimation of the input representation adapted to the reference speaker. In all three methods, the final speech recognition model is trained using the adapted training data, and is tested by the adapted testing data. Implementing these methods and combining the final network results with un-adapted network based on the highest confidence level, an increase of 2.1, 2.6 and 3% in phone recognition accuracy on the clean speech is obtained from the three methods, respectively.  相似文献   

2.
该文讨论了不同非母语条件下的语音识别说话人自适应问题,提出了说话人分类和声学模型合并两种不同的自适应方法,并用实验说明了所提方法的有效性和实用性。  相似文献   

3.
In spite of recent advances in automatic speech recognition, the performance of state-of-the-art speech recognisers fluctuates depending on the speaker. Speaker normalisation aims at the reduction of differences between the acoustic space of a new speaker and the training acoustic space of a given speech recogniser, improving performance. Normalisation is based on an acoustic feature transformation, to be estimated from a small amount of speech signal. This paper introduces a mixture of recurrent neural networks as an effective regression technique to approach the problem. A suitable Vit-erbi-based time alignment procedure is proposed for generating the adaptation set. The mixture is compared with linear regression and single-model connectionist approaches. Speaker-dependent and speaker-independent continuous speech recognition experiments with a large vocabulary, using Hidden Markov Models, are presented. Results show that the mixture improves recognition performance, yielding a 21% relative reduction of the word error rate, i.e. comparable with that obtained with model-adaptation approaches.  相似文献   

4.
This paper addresses the problem of recognising speech in the presence of a competing speaker. We review a speech fragment decoding technique that treats segregation and recognition as coupled problems. Data-driven techniques are used to segment a spectro-temporal representation into a set of fragments, such that each fragment is dominated by one or other of the speech sources. A speech fragment decoder is used which employs missing data techniques and clean speech models to simultaneously search for the set of fragments and the word sequence that best matches the target speaker model. The paper investigates the performance of the system on a recognition task employing artificially mixed target and masker speech utterances. The fragment decoder produces significantly lower error rates than a conventional recogniser, and mimics the pattern of human performance that is produced by the interplay between energetic and informational masking. However, at around 0 dB the performance is generally quite poor. An analysis of the errors shows that a large number of target/masker confusions are being made. The paper presents a novel fragment-based speaker identification approach that allows the target speaker to be reliably identified across a wide range of SNRs. This component is combined with the recognition system to produce significant improvements. When the target and masker utterance have the same gender, the recognition system has a performance at 0 dB equal to that of humans; in other conditions the error rate is roughly twice the human error rate.  相似文献   

5.
说话人识别和确认是信号处理中研究的热点之一,但有关文献表明识别效率并不是很高,而且训练和识别的语音要求都比较长,距离实际应用还有一定差距.分析了说话人识别中有关参数的选取对识别结果的影响,采用线性预测倒谱和基音参数共同作为识别参数,并采用矢量量化,改进了线性预测倒谱距离的加权函数,提供了与文本无关的说话人识别系统.最后给出了实验结果和有关分析,在低噪声时识别正确率可达99%以上,在高噪声时也能达到98%以上的正确率.  相似文献   

6.
说话人识别就是从说话人的一段语音中提取出说话人的个性特征,通过对这些个人特征的分析和识别,从而达到对说话人进行辨认或者确认的目的。神经网络是一种基于非线性理论的分布式并行处理网络模型,具有很强的模式分类能力及对不完全信息的鲁棒性,为说话人识别技术提供了一种独特的方法。BP(Back-propagation Neural Network)是一种非循环多级网络训练算法,有输入层,输出层和N个隐含层组成。首先概述了语音识别技术,介绍了BP神经网络训练过程的7个步骤及其模型,如何建立BP神经网络模型。同时介绍了与其相关的特征参数的提取,神经网络的训练和识别过程,最后,通过编程在Linux系统下实现说话人身份的识别。  相似文献   

7.
以线性预测系数为特征通过高斯混合模型的迭代算法对训练样本的初始k均值聚类结果进行优化,得到语音组成单位的表示.以语音组成单位的模式匹配为基础,提出一种文本无关说话人确认的方法——均值法,以及一种文本无关说话人辨认方法.实验结果表明,即使在短时语音下本文方法都能取得较好效果.  相似文献   

8.
语音识别/说话人识别中的高效算法   总被引:1,自引:0,他引:1  
通过对硬件平台特点的研究,采用了多帧同步搜索算法、马氏距离并行内积化和并行指令集的技术对语音识别和说话人识别进行了优化,识别速度提高了3倍。在P41.4GHz的机器上,可以同时完成120路的说话人识别任务。  相似文献   

9.
In this paper, a frame linear predictive coding spectrum (FLPCS) technique for speaker identification is presented. Traditionally, linear predictive coding (LPC) was applied in many speech recognition applications, nevertheless, the modification of LPC termed FLPCS is proposed in this study for speaker identification. The analysis procedure consists of feature extraction and voice classification. In the stage of feature extraction, the representative characteristics were extracted using the FLPCS technique. Through the approach, the size of the feature vector of a speaker can be reduced within an acceptable recognition rate. In the stage of classification, general regression neural network (GRNN) and Gaussian mixture model (GMM) were applied because of their rapid response and simplicity in implementation. In the experimental investigation, performances of different order FLPCS coefficients which were induced from the LPC spectrum were compared with one another. Further, the capability analysis on GRNN and GMM was also described. The experimental results showed GMM can achieve a better recognition rate with feature extraction using the FLPCS method. It is also suggested the GMM can complete training and identification in a very short time.  相似文献   

10.
一种基于子带处理的PAC说话人识别方法研究   总被引:1,自引:1,他引:0  
目前,说话人识别系统对于干净语音已经达到较高的性能,但在噪声环境中,系统的性能急剧下降.一种基于子带处理的以相位自相关(PAC)系数及其能量作为特征的说话人识别方法,即宽带语音信号经Mel滤波器组后变为多个子带信号,对各个子带数据经DCT变换后提取PAC系数作为特征参数,然后对每个子带分别建立HMM模型进行识别,最后在识别概率层中将HMM得出的结果相结合之后得到最终的识别结果.实验表明,该方法在不同信噪比噪声和无噪声情况下的识别性能都有很大提高.  相似文献   

11.
变异情况对语音的影响是导致语音识别系统性能下降的原因之一。一般情况下变异语音数据采集困难,获得的训练数据量少,这样即使测试环境和训练环境都相同,识别性能也不理想。利用自适应算法可以解决这类问题,它采用少量的测试环境数据进行训练,以达到训练模型和测试数据匹配的目的,保证系统良好的识别性能。MAP算法是常用的自适应算法,大多应用于话者自适应环境,该文尝试将其应用于变异语音识别系统中,并通过对该模型做相应改进获得了较好的识别结果。在小词表特定人应力变异的识别实验中,分别用非特定人模型和改进的特定人模型作为初始模型,应用MAP算法,系统识别率均有明显提高,与基本识别系统相比,在10遍自适应数据前提下,识别率分别提高了15.84%和15.97%,最好的识别率达到85.56%和90.42%。  相似文献   

12.
Whispered speech speaker identification system is one of the most demanding efforts in automatic speaker recognition applications. Due to the profound variations between neutral and whispered speech in acoustic characteristics, the performance of conventional speaker identification systems applied on neutral speech degrades drastically when compared to whisper speech. This work presents a novel speaker identification system using whispered speech based on an innovative learning algorithm which is named as extreme learning machine (ELM). The features used in this proposed system are Instantaneous frequency with probability density models. Parametric and nonparametric probability density estimation with ELM was compared with the hybrid parametric and nonparametric probability density estimation with Extreme Learning Machine (HPNP-ELM) for instantaneous frequency modeling. The experimental result shows the significant performance improvement of the proposed whisper speech speaker identification system.  相似文献   

13.
In this study, we investigate an offline to online strategy for speaker adaptation of automatic speech recognition systems. These systems are trained using the traditional feed-forward and the recent proposed lattice-free maximum mutual information (MMI) time-delay deep neural networks. In this strategy, the test speaker identity is modeled as an iVector which is offline estimated and then used in an online style during speech decoding. In order to ensure the quality of iVectors, we introduce a speaker enrollment stage which can ensure sufficient reliable speech for estimating an accurate and stable offline iVector. Furthermore, different iVector estimation techniques are also reviewed and investigated for speaker adaptation in large vocabulary continuous speech recognition (LVCSR) tasks. Experimental results on several real-time speech recognition tasks demonstrate that, the proposed strategy can not only provide a fast decoding speed, but also can result in significant reductions in word error rates (WERs) than traditional iVector based speaker adaptation frameworks.  相似文献   

14.
该文根据云南境内少数民族同胞说普通话时明显带有民族口音的语言使用现状,介绍了一个以研究非母语说话人汉语连续语音识别为目的的云南少数民族口音汉语普通话语音数据库,并在其基础上开展了发音变异规律、说话人自适应和非母语说话人口音识别研究,是汉语语音识别中用户多样性研究的重要补充。  相似文献   

15.
解决说话人识别问题具有重要的理论价值和深远的实用意义,本文在研究支持向量机理论的基础上,采用支持向量机的分类算法实现说话人识别系统的训练和测试,并将小波去噪技术应用于说话人识别的预处理过程中,改善进入说话人识别系统的语音质量。实验表明,在说话人识别系统中,支持向量机结合小波去噪可以获得较好的识别率。  相似文献   

16.
孙林慧  叶蕾  杨震 《计算机仿真》2005,22(5):231-234
测试时长是影响说话人识别问题的主要因素之一。该文主要对分布式语音识别中测试时长与说话人识别率的关系进行了研究。文中采用文本无关的训练模板,首先对基本的说话人辨认系统用干净语音和带噪语音进行了测试,结果表明系统识别率随测试时长的增加而提高,并在实验室条件下获得加噪语音最佳测试时长。其次为了减小最佳测试时长采用改进的说话人辨认系统,先对说话人的性别进行分类然后再对其身份进行识别,不仅减少了测试所需的最佳时长,而且提高了系统的抗噪性能。最后对仿真结果进行了分析。  相似文献   

17.
An approach to the problem of inter-speaker variability in automatic speech recognition is described which exploits systematic vowel differences in a two-stage process of adaptation to individual speaker characteristics. In stage one, an accent identification procedure selects one of four gross regional English accents on the basis of vowel quality differences within four calibration sentences. In stage two, an adjustment procedure shifts the regional reference vowel space onto the speaker's vowel space as calculated from the accent identification data. Results for 58 speakers from the four regional accent areas are presented.  相似文献   

18.
19.
王成儒  王金甲 《计算机工程》2003,29(13):105-106,114
提出了一种基于最小分类错误准则的概率神经网络的训练算法。实验结果表明,该系统及其MCE学习算法在20个说话人辨认应用中利用5s清晰语音获得98.9%的辨认率,利用l5s电话语音获得86.2%的辨认率。  相似文献   

20.
《Pattern recognition》2003,36(2):347-359
Speaker verification and utterance verification are examples of techniques that can be used for speaker authentication purposes.Speaker verification consists of accepting or rejecting the claimed identity of a speaker by processing samples of his/her voice. Usually, these systems are based on HMM models that try to represent the characteristics of the speakers’ vocal tracts.Utterance verification systems make use of a set of speaker-independent speech models to recognize a certain utterance. If the utterances consist of passwords, this can be used for identity verification purposes.Up to now, both techniques have been used separately. This paper is focused on the problem of how to combine these two sources of information. New architectures are presented to join an utterance verification system and a speaker verification system in order to improve the performance in a speaker verification task.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号