期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

方志刚鲍福良叶伟中《计算机工程与科学》2007,29(11):72-75

由于单生物特征认证往往难以满足实际应用的要求，本文在信息融合的基础上提出一个多通道生物特征认证模型，它采用基于PCA的人脸识剐方法和基于MFCC与VQ的说话人识别方法，在分数层使用多层线性分类器实现了人脸和语音的双通道融合。实验结果表明，在人脸识剐率和说话人识别率分剐为82．6％和75．9％的情况下，两个通道融合后的
识剐率达到了92．2％．相似文献

2.

基于HMM的性别识别 总被引：2，自引：1，他引：2

邓英欧贵文《计算机工程与应用》2004,40(15):74-75

进行男女生识别的方法有很多种,如GMM,VQ等,该文提出了基于HMM进行说话人性别识别的方法,该方法通过计算语音信号的Mel频率倒谱系数(MFCC)并使用隐马尔可夫模型(HMM)进行性别识别。在实验室环境下,对50个不同说话人(其中男女说话人各占一半)的语音文件采用该方法与基于VQ的方法进行比较实验,从实验方法和实验结果方面得出结论:HMM的方法更加简单易行,识别率更高。对于实验的语音材料,采用HMM的方法识别率可以达到100%。相似文献

3.

基于分布特征统计的说话人识别 总被引：2，自引：2，他引：0

下载免费PDF全文

李邵梅郭云飞卫红权《计算机工程与应用》2009,45(34):118-120

给出了基于公共码书的说话人分布特征的定义。提出了基于分布特征统计的说话人识别算法,根据所有参考说话人的训练语音建立公共码书,实现对语音特征空间的分类,统计各参考说话人训练语音的在公共码字上的分布特征进行建模。识别中引入双序列比对方法进行识别语音的分布特征统计与参考说话人模型间的相似度匹配,实现对说话人的辨认。实验表明,该方法保证识别率的情况下,进一步提高了基于VQ的说话人识别的速度。相似文献

4.

基于GMM-UBM/SVM的维吾尔语电话语音监控系统

李晓阳伊·达瓦吾守尔·斯拉木勾坂芳典《计算机应用与软件》2012,(1):46-48,77

讨论基于GMM-UBM/SVM的电话语音监控系统。GMM是说话人识别系统中使用的常用方式。但由于监控语音发话时间短暂,电话-互联网终端及传输线背景噪音大等因素影响了GMM的识别精度。基于GMM的鲁棒性及SVM对小量静态数据具有高分类的优势设计电话语音监控系统并通过维吾尔语研讨了系统性能。为了便于比较,同时也讨论了量化距离(VQ)、加权量化距离(WVQ)及基线系统的识别。在50个目标人训练集,每人发话时间为20秒时,对10秒测试语音提案方法识别率对比于VQ和WVQ法分别提高了20.2%及16.7%。相似文献

5.

置信度的原理及其在语音识别中的应用 总被引：7，自引：2，他引：5

刘镜刘加《计算机研究与发展》2000,37(7):882-890

由于置信度模型可以有效地判断观测数据与语音模型之间的匹配程度,因此可以用来对语音识别结果进行假设检验,定位识别结果中的错误,从而提高系统的识别率和稳健笥,讨论了语音识别中置信度的基本原理,、在值方法、模型性能评价方法、比较全面地介绍了置信度在语音识别中的各种,实验结果表明,置信度在语音识别的搜索的剪枝过程、说话人自适应以及拒识和验证方法面都有显的作用。相似文献

6.

新颖检测法在说话人识别技术中的应用

石艳王晓晔《现代计算机》2008,(7)

提出一种用于说话人识别中说话人语音特征向量聚类的方法--新颖检测法.通过提取出的特征参数(MFCC和LPCC),建立系统模型,实验结果表明,将新颖检测法结合VQ用于特征向量的分类,较之于单纯的VQ分类,取得了识别率高、稳健型强、确认可靠的效果. 相似文献

7.

用于语音识别拒识的隐马尔可夫模型状态及状态驻留相关的声学置信量度 总被引：1，自引：0，他引：1

田斌田红心刘丹亭易克初《计算机研究与发展》1999,36(11):1398-1401

随着语音识别系统继续从实验室转向实际应用,语音拒识就变得愈来愈重要．为解决语音识别系统对识别候选的接受／拒识判决问题,文中提出了基于隐马尔可夫模型（ＨＭＭ）的语音识别系统中状态和状态驻留相关的声学置信量度准则．给定状态下特征矢量的平均观测先验概率和给定特征矢量状态的后验概率均比较容易设定统一的拒识门限,且不需专门的训练．而状态驻留分布相关法则是基于驻留分布概率和置信区间理论,不仅可设定一个拒识门限,同时可给出语音识别候选的状态驻留可信度．实验表明上述拒识准则能很好地拒识误识别候选和词表外语音（ＯＯＶ或非关键词）,从而在较低拒识率的情况下有效地提高系统的识别率相似文献

8.

新疆非母语汉语语音识别中的字典自适应技术

下载免费PDF全文

李兵虎黄浩《计算机工程与应用》2011,47(21):141-144

将标准普通话语音数据训练得到的声学模型应用于新疆维吾尔族说话人非母语汉语语音识别时,由于说话人的普通话发音存在较大偏误,将导致识别率急剧下降。针对这一问题,将多发音字典技术应用于新疆维吾尔族说话人汉语语音识别中,通过统计分析识别器的识别错误,建立音素混淆矩阵,获取音素的发音候选项。利用剪枝策略对发音候选项进行剪枝整合,扩展出符合维吾尔族说话人汉语发音规律的替代字典。对三种剪枝方法产生的发音字典的识别结果进行了对比。实验结果表明,使用相对最大剪枝策略产生的发音字典可以显著提高系统识别率。相似文献

9.

嵌入式系统上的实时语音识别算法

丁玉国刘加刘润生《数据采集与处理》2005,20(3):302-305

介绍了一种嵌入式系统上的孤立词语音识别算法.该算法基于连续隐含马尔可夫模型,根据嵌入式系统的特点,简化了经典的连续隐含马尔可夫模型,在主流个人数字助理（PDA）上实现了中等规模语音识别的实时处理,采用最大后验概率（MAP）自适应方法解决训练数据采集信道和PDA信道的不匹配问题.在系统的后端处理中,提出了一种基于置信测度的拒识方法改善系统的稳健性,最终使610个孤立词的识别任务,系统的等错误率小于5%;对集内发音拒识率为5%时,集内发音识别率达到95%. 相似文献

10.

说话人识别中采用混合免疫算法的VQ码本设计

许允喜俞一彪《计算机应用》2008,28(2):339-341,

矢量量化(VQ)方法是文本无关说话人识别中广泛应用的建模方法之一,它的主要问题是码本设计问题。语音特征参数是高维数据,样本分布复杂,因此码本设计的难度也很大,传统的LBG算法只能获得局部最优的码本。提出一种VQ码本设计的新方法,将小生境技术与K-均值算法融入到免疫算法训练过程中,形成混合免疫算法,采用针对高维数据聚类的改进变异算子,降低了随机变异的盲目性,增强群体的全局及局部搜索能力,同时通过接种疫苗提高算法的收敛速度。说话人识别实验表明,与传统LBG和基于混合遗传算法的VQ码本设计方法相比,该方法可以得到更优的模型参数,使得系统的识别率进一步提高。相似文献

11.

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Revathi Arunachalam Nagakrishnan R. Sasikaladevi N. 《Multimedia Tools and Applications》2022,81(22):31245-31259

Research on recognizing the speeches of normal speakers is generally in practice for numerous years. Nevertheless, a complete system for recognizing the speeches of persons with a speech impairment is still under development. In this work, an isolated digit recognition system is developed to recognize the speeches of speech-impaired people affected with dysarthria. Since the speeches uttered by the dysarthric speakers are exhibiting erratic behavior, developing a robust speech recognition system would become more challenging. Even manual recognition of their speeches would become futile. This work analyzes the use of multiple features and speech enhancement techniques in implementing a cluster-based speech recognition system for dysarthric speakers. Speech enhancement techniques are used to improve speech intelligibility or reduce the distortion level of their speeches. The system is evaluated using Gamma-tone energy (GFE) features with filters calibrated in different non-linear frequency scales, stock well features, modified group delay cepstrum (MGDFC), speech enhancement techniques, and VQ based classifier. Decision level fusion of all features and speech enhancement techniques has yielded a 4% word error rate (WER) for the speaker with 6% speech intelligibility. Experimental evaluation has provided better results than the subjective assessment of the speeches uttered by dysarthric speakers. The system is also evaluated for the dysarthric speaker with 95% speech intelligibility. WER is 0% for all the digits for the decision level fusion of speech enhancement techniques and GFE features. This system can be utilized as an assistive tool by caretakers of people affected with dysarthria.

相似文献

12.

说话人识别中基于聚类特征的矢量量化技术

徐利敏唐振民何可可钱博《计算机工程与应用》2007,43(27):196-198

为解决采用矢量量化的方法进行说话人识别时出现的失真问题,根据汉语语音的发音特性,提出了将矢量量化与语音特征的聚类技术相结合的方法,在进行矢量量化码书训练之前,先对特征矢量进行聚类筛选。实验结果表明,当测试语音片段长度为4 s时,在保持95％左右识别率下,采用普通矢量量化方法需64码本数,而采用该文方法只需8码本数,降低了8倍。结果说明该方法不但在一定程度上解决了因训练样本不足而引起的失真问题,而且通过方法的改进,实现了采用较低码字数产生较好的识别结果,从而提高识别效率。相似文献

13.

Statistical feature evaluation for classification of stressed speech

H. Patro G. Senthil Raja S. Dandapat 《International Journal of Speech Technology》2007,10(2-3):143-152

The variations in speech production due to stress have an adverse affect on the performances of speech and speaker recognition algorithms. In this work, different speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Coefficients (CC) and Mel Frequency Cepstral Coefficients (MFCC), are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density characteristics, F-ratio test, Kolmogorov-Smirnov test (KS test) and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. Four different stressed conditions, Neutral, Compassionate, Anger and Happy are tested. The stressed speech database used in this work consists of 600 stressed speech files which are recorded from 30 speakers. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with the VQ classifier. The relative classification results and the relative magnitudes of F-ratio values for SFF, MFCC and CC features are obtained with the same order. SFF and MFCC feature show consistent relative performance for all the three tests, F-ratio, K-S test and VQ classifier. 相似文献

14.

基于独立分量分析和矢量量化的说话人识别

屈微刘贺平《计算机应用》2005,25(10):2401-2403

使用独立分量分析（ICA）来提取说话人特征并与矢量量化（VQ）判决方法相结合,实现了一个高性能的基于ICA特征的VQ （ICA VQ）说话人识别系统。通过ICA变换得到说话人语音特征基函数系数用于生成VQ码书,并导出包含能量失真的ICA VQ码书失真测度和质心确定条件,生成最终的判决。仿真实验中ICA提取的特征分别用于不同系统实现说话人确认任务,各系统的DET曲线对比验证了VQ方法用于ICA特征分类判决的优势,同时不同码书尺寸下的等差率（EER）对比证明了VQ码书设计的有效性。相似文献

15.

A strategic approach to recognize the speech of the children with hearing impairment: different sets of features and models

Arunachalam Revathi 《Multimedia Tools and Applications》2019,78(15):20787-20808

The automatic speech recognition system is developed and tested for recognizing the speeches of a normal person in various languages. This paper mainly emphasizes the need for the development of a more challenging speaker independent speech recognition system for hearing impaired to recognize the speeches uttered by any Hearing Impaired (HI) speaker. In this work, Gamma tone energy features with filters spaced an equivalent rectangular bandwidth (ERB), MEL & BARK scale, and MFPLPC features are used at the front end and vector quantization (VQ) & multivariate hidden Markov models (MHMM) at the back end for recognizing the speeches uttered by any hearing impaired speaker. Performance of the system is compared for the three modeling techniques VQ, FCM (Fuzzy C means) clustering and MHMM for the recognition of isolated digits and simple continuous sentences in Tamil. Recognition accuracy (RA) is 81.5% with speeches of eight speakers considered for training and speeches of the remaining two speakers considered for testing for speaker independent isolated digit recognition system. Accuracy is found to be 91% and 87.5% for considering 90% of the data for training and 10% for testing for speaker independent isolated digit and continuous speech recognition systems respectively. Accuracy can be further enhanced by having an extensive database for creating models/templates. Receiver operating characteristics (ROC) drawn between True Positive Rate and False Positive Rate is used to assess the performance of the system for HI. This system can be utilized to understand the speech uttered by any hearing impaired speaker and the system facilitates the provision of necessary assistance to them. It ultimately improves the social status of the hearing impaired people and their confidence level will be enhanced.

相似文献

16.

一种基于K-SVD的说话人识别方法

马振张雄伟杨吉斌《计算机工程与应用》2012,48(34):112-115,135

为了充分提取语音中的个人特征信息,类比矢量量化,提出了一种基于K-均值奇异值分解(K-SVD)的说话人识别方法。利用K-SVD训练得到的字典可较好地保存语音信号中的个人特征信息。利用这一特性,通过K-SVD从训练数据中提取包含说话人个人特征信息的字典,利用该字典实现说话人识别。相对于传统方法,该方法能够更好地利用语音的稀疏性保存语音中的个人特征信息并减小重构误差。实验仿真结果表明,与基于矢量量化的说话人识别方法相比,该方法在多说话人的情况下具有更好的识别率,具有更高的实用价值。相似文献

17.

Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks

Isar Nejadgholi Seyyed Ali Seyyedsalehi 《Neural computing & applications》2009,18(1):45-55

The issue of input variability resulting from speaker changes is one of the most crucial factors influencing the effectiveness of speech recognition systems. A solution to this problem is adaptation or normalization of the input, in a way that all the parameters of the input representation are adapted to that of a single speaker, and a kind of normalization is applied to the input pattern against the speaker changes, before recognition. This paper proposes three such methods in which some effects of the speaker changes influencing speech recognition process is compensated. In all three methods, a feed-forward neural network is first trained for mapping the input into codes representing the phonetic classes and speakers. Then, among the 71 speakers used in training, the one who is showing the highest percentage of phone recognition accuracy is selected as the reference speaker so that the representation parameters of the other speakers are converted to the corresponding speech uttered by him. In the first method, the error back-propagation algorithm is used for finding the optimal point of every decision region relating to each phone of each speaker in the input space for all the phones and all the speakers. The distances between these points and the corresponding points related to the reference speaker are employed for offsetting the speaker change effects and the adaptation of the input signal to the reference speaker. In the second method, using the error back-propagation algorithm and maintaining the reference speaker data as the desirable speaker output, we correct all the speech signal frames, i.e., the train and the test datasets, so that they coincide with the corresponding speech of the reference speaker. In the third method, another feed-forward neural network is applied inversely for mapping the phonetic classes and speaker information to the input representation. The phonetic output retrieved from the direct network along with the reference speaker data are given to the inverse network. Using this information, the inverse network yields an estimation of the input representation adapted to the reference speaker. In all three methods, the final speech recognition model is trained using the adapted training data, and is tested by the adapted testing data. Implementing these methods and combining the final network results with un-adapted network based on the highest confidence level, an increase of 2.1, 2.6 and 3% in phone recognition accuracy on the clean speech is obtained from the three methods, respectively. 相似文献

18.

基于听觉感知和概率神经网络的语音识别模型

下载免费PDF全文

张晓俊陶智顾济华赵鹤鸣施晓敏《计算机工程与应用》2007,43(19):30-31

提出了一种基于Bark子波变换和概率神经网络（PNN）的语音识别模型。利用符合人耳听觉特性的Bark滤波器组进行信号重构并提取语音特征,然后利用训练好的概率神经网络进行识别。通过训练大量语音样本来构成语音识别库,并建立综合识别系统。实验结果表明该方法与传统的LPCC/DTW和MFCC/DWT方法相比,识别率分别提高了14.9%和10.1%,达到了96.9%的识别率。相似文献