期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

郭皓婷郑方罗灿华李银国《中文信息学报》2010,24(6):64-69

说话人识别技术以其方便、经济和易于被接受等特点日益成为人们生活和工作中重要且普及的用户身份验证方式,但是在嵌入式领域的应用中,现有算法难以很好地满足实时性的要求。该文研究了应用于语音识别的非线性分块算法,将其思想加以改进,以逐块对比的识别方式用于嵌入式的文本相关说话人识别,与传统的基于动态时间弯折的方法相比,在实时性方面取得了良好的实用效果。相似文献

2.

内窥镜自动定位语音识别系统

马宁陈晓冬李亚楠尹青云汪毅郁道银《计算机工程与应用》2014,50(8):207-210

提出一种基于特定人的内窥镜自动定位语音识别系统,通过识别特定医生的语音控制口令实现内窥镜的定位,为手持内窥镜操作提供更加智能化的解决方案。在识别算法上提出了参考模板归一化平均的动态时间规划（Normalized Average-Dynamic Time Warping,NA-DTW）算法,可获得更高的识别率,系统以片上Windows?CE操作系统和ARM作为系统的软硬件平台。实验通过对10个不同测试人的共1 250组测试数据进行识别检测,NA-DTW算法与传统DTW算法相比,识别率从96.6%提高到99.76%,运算时间从469 ms缩短到241 ms。验证了NA-DTW算法可以完成基于特定人、孤立词的语音识别功能,并满足嵌入式系统中的实时检测条件。相似文献

3.

Merge-Weighted Dynamic Time Warping for Speech Recognition

下载免费PDF全文

张湘莉兰骆志刚李明《计算机科学技术学报》2014,29(6):1072-1082

Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe... 相似文献

4.

基于DTW的编码域说话人识别研究

李榕健于洪涛李邵梅《电子技术应用》2010,36(8)

相对解码重建后的语音进行说话人识别,从VoIP的语音流中直接提取语音特征参数进行说话人识别方法具有便于实现的优点,针对G.729编码域数据,研究基于DTW算法的快速说话人识别方法。实验结果表明,在相关的说话人识别中,DTW算法相比GMM在识别正确率和效率上有了很大提高。相似文献

5.

Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms 总被引：1，自引：0，他引：1

Stolcke A. Kajarekar S.S. Ferrer L. Shrinberg E. 《IEEE transactions on audio, speech, and language processing》2007,15(7):1987-1998

We present a new modeling approach for speaker recognition that uses the maximum-likelihood linear regression (MLLR) adaptation transforms employed by a speech recognition system as features for support vector machine (SVM) speaker models. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification without data fragmentation. We discuss the basics of the MLLR-SVM approach, and show how it can be enhanced by combining transforms relative to multiple reference models, with excellent results on recent English NIST evaluation sets. We then show how the approach can be applied even if no full word-level recognition system is available, which allows its use on non-English data even without matching speech recognizers. Finally, we examine how two recently proposed algorithms for intersession variability compensation perform in conjunction with MLLR-SVM. 相似文献

6.

Intelligent Processing of Stuttered Speech

Andrzej Czyzewski Andrzej Kaczmarek Bozena Kostek 《Journal of Intelligent Information Systems》2003,21(2):143-171

The process of counting stuttering events could be carried out more objectively through the automatic detection of stop-gaps, syllable repetitions and vowel prolongations. The alternative would be based on the subjective evaluations of speech fluency and may be dependent on a subjective evaluation method. Meanwhile, the automatic detection of intervocalic intervals, stop-gaps, voice onset time and vowel durations may depend on the speaker and the rules derived for a single speaker might be unreliable when trying to consider them as universal ones. This implies that learning algorithms having strong generalization capabilities could be applied to solve the problem. Nevertheless, such a system requires vectors of parameters, which characterize the distinctive features in a subject's speech patterns. In addition, an appropriate selection of the parameters and feature vectors while learning may augment the performance of an automatic detection system.The paper reports on automatic recognition of stuttered speech in normal and frequency altered feedback speech. It presents several methods of analyzing stuttered speech and describes attempts to establish those parameters that represent stuttering event. It also reports results of some experiments on automatic detection of speech disorder events that were based on both rough sets and artificial neural networks. 相似文献

7.

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

Sridhar Krishna Nemala Kailash Patil Mounya Elhilali 《International Journal of Speech Technology》2013,16(3):313-322

Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation. This approach focuses on the information-rich spectral attributes of speech and presents an intricate yet computationally-efficient analysis of the speech signal by careful choice of model parameters. Further, the approach takes advantage of an information-theoretic analysis of the message and speaker dominant regions in the speech signal, and defines feature representations to address two diverse tasks such as speech and speaker recognition. The proposed analysis surpasses the standard Mel-Frequency Cepstral Coefficients (MFCC), and its enhanced variants (via mean subtraction, variance normalization and time sequence filtering) and yields significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks. 相似文献

8.

Template-Based Continuous Speech Recognition

De Wachter M. Matton M. Demuynck K. Wambacq P. Cools R. Van Compernolle D. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1377-1390

Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results 相似文献

9.

一种基于MFCC和LPCC的文本相关说话人识别方法 总被引：1，自引：0，他引：1

于明袁玉倩董浩王哲《计算机应用》2006,26(4):883-885

在说话人识别的建模过程中，为传统矢量量化模型的码字增加了方差分量，形成了一种新的连续码字分布的矢量量化模型。同时采用美尔倒谱系数及其差分和线性预测倒谱系数及其差分相结合作为识别的特征参数，来进行与文本有关的说话人识别。通过与动态时间规整算法和传统的矢量量化方法进行比较表明，在系统响应时间并未明显增加的基础上，该模型识别率有一定提高。相似文献

10.

基于SoPC的实时说话人识别控制器

刘好新张歆奕《电子技术应用》2010,(7)

运用软硬件协同设计,在DE2-70开发板上实现了一个基于SoPC的实时说话人识别控制器,控制器有很好的实时性和良好的识别性能。控制器的语音特征参数采用线性预测倒谱系数(LPCC),匹配算法采用动态时间规整算法(DTW)。相似文献

11.

基于FPGA的嵌入式语音识别控制系统

赵丽娜侯义斌黄樟钦高曦李倩《小型微型计算机系统》2007,28(8):1527-1531

介绍了一款针对特定人、孤立词的嵌入式语音识别系统的设计与实现.该系统的硬件核心部件是Virtex-II Pro50 FPGA芯片,其硬核处理器是PowerPC405.本系统对预处理、端点检测、LPCC特征提取部分进行了定点化处理;DTW算法采用硬件IP核实现;整体调度采用中断方式实现.将该系统用于语音控制玩具机器狗AIBO进行实验,识别率达到98.3%.本系统设计性能满足玩具、游戏等消费娱乐电子设备对识别率和实时性的性能要求,具有广阔的市场应用前景. 相似文献

12.

一种新的基于LBG和DTW的模板训练算法

封伶刚王秀萍《计算机工程与应用》2005,41(26):85-88

提出了一种新的基于LBG和DTW结合的模板训练算法,包括模板训练、初始模板设置、空子集处理三个部分,能够完整、有效地解决语音识别中模板训练的问题。该算法实现了语音信号特征矩阵的聚类及其质心的生成,使孤立词语音识别系统更好地适用于非特定人的情况,提高了系统对训练集外说话人语音的正确识别率。设计、实现了一个识别系统,模板训练中较快的收敛速度和系统较高的识别率验证了算法的优良性能。相似文献

13.

Processing degraded speech for text dependent speaker verification

Banriskhem K. Khonglah Ramesh K. Bhukya S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2017,20(4):839-850

This work explores the use of speech enhancement for enhancing degraded speech which may be useful for text dependent speaker verification system. The degradation may be due to noise or background speech. The text dependent speaker verification is based on the dynamic time warping (DTW) method. Hence there is a necessity of the end point detection. The end point detection can be performed easily if the speech is clean. However the presence of degradation tends to give errors in the estimation of the end points and this error propagates into the overall accuracy of the speaker verification system. Temporal and spectral enhancement is performed on the degraded speech so that ideally the nature of the enhanced speech will be similar to the clean speech. Results show that the temporal and spectral processing methods do contribute to the task by eliminating the degradation and improved accuracy is obtained for the text dependent speaker verification system using DTW. 相似文献

14.

Text-independent speaker recognition using LSTM-RNN and speech enhancement

El-Moneim Samia Abd Nassar M. A. Dessouky Moawad I. Ismail Nabil A. El-Fishawy Adel S. Abd El-Samie Fathi E. 《Multimedia Tools and Applications》2020,79(33-34):24013-24028

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

相似文献

15.

Pseudo pitch synchronous analysis of speech with applications to speaker recognition

《IEEE transactions on audio, speech, and language processing》2006,14(2):467-478

The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations in pitch causing variations in the features. For speaker recognition systems, this phenomenon, known as "pitch mismatch" between training and testing, can increase error rates. Likewise, pitch-related variability may potentially increase error rates in speech recognition systems for languages such as English in which pitch does not carry phonetic information. In addition, for both speech recognition and speaker recognition systems, the parsing of the raw speech signal into frames is traditionally performed using a constant frame size and a constant frame offset, without aligning the frames to the natural pitch cycles. As a result the power spectral estimation that is done as part of the Mel cepstral computation may include artifacts. Pitch synchronous methods have addressed this problem in the past, at the expense of adding some complexity by using a variable frame size and/or offset. This paper introduces Pseudo Pitch Synchronous (PPS) signal processing procedures that attempt to align each individual frame to its natural cycle and avoid truncation of pitch cycles while still using constant frame size and frame offset, in an effort to address the above problems. Text independent speaker recognition experiments performed on NIST speaker recognition tasks demonstrate a performance improvement when the scores produced by systems using PPS are fused with traditional speaker recognition scores. In addition, a better distribution of errors across trials may be obtained for similar error rates, and some insight regarding of role of the fundamental frequency in speaker recognition is revealed. Speech recognition experiments run on the Aurora-2 noisy digits task also show improved robustness and better accuracy for extremely low signal-to-noise ratio (SNR) data. 相似文献

16.

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

John Dines Hui Liang Lakshmi Saheer Matthew Gibson William Byrne Keiichiro Oura Keiichi Tokuda Junichi Yamagishi Simon King Mirjam Wester Teemu Hirsimäki Reima Karhila Mikko Kurimo 《Computer Speech and Language》2013,27(2):420-437

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics. 相似文献

17.

基于片上系统的孤立词语音识别算法设计 总被引：1，自引：0，他引：1

下载免费PDF全文

刘金伟 HUANG Zhangqin 侯义斌《计算机工程》2007,33(13):25-27,55

介绍了孤立词语音识别系统,针对片上系统进行了语音识别算法的选择。对基于语音帧的端点检测算法、线性预测编码倒谱系数LPCC算法和动态时间规整DTW算法进行了分析和设计。对于新型语音识别SoC芯片的开发研制和推动片上可编程系统(SoPC)的研究与发展具有一定的理论和实践意义。相似文献

18.

Dual stream speech recognition using articulatory syllable models

Antti Puurula Dirk Van Compernolle 《International Journal of Speech Technology》2010,13(4):219-230

Recent theoretical developments in neuroscience suggest that sublexical speech processing occurs via two parallel processing pathways. According to this Dual Stream Model of Speech Processing speech is processed both as sequences of speech sounds and articulations. We attempt to revise the “beads-on-a-string” paradigm of Hidden Markov Models in Automatic Speech Recognition (ASR) by implementing a system for dual stream speech recognition. A baseline recognition system is enhanced by modeling of articulations as sequences of syllables. An efficient and complementary model to HMMs is developed by formulating Dynamic Time Warping (DTW) as a probabilistic model. The DTW Model (DTWM) is improved by enriching syllable templates with constrained covariance matrices, data imputation, clustering and mixture modeling. The resulting dual stream system is evaluated on the N-Best Southern Dutch Broadcast News benchmark. Promising results are obtained for DTWM classification and ASR tests. We provide a discussion on the remaining problems in implementing dual stream speech recognition. 相似文献

19.

基于GMM的说话人识别技术研究

下载免费PDF全文

曹洁潘鹏《计算机工程与应用》2011,47(11):114-117

为了探讨高斯混合模型在说话人识别中的作用,设计了一个基于GMM的说话人识别系统。整个系统由音频信号预处理,语音活动检测,说话人模型建立以及音频信号识别4个模块组成。前三个模块构成了系统的模型训练部分,最后一个模块构成了系统的语音识别部分。包含在第二个模块中的由GMM模型搭建的语音活动检测器是研究的创新之处。利用增强的多方互动会议语料库中的视听会议对系统中的部分可调参数以及系统的识别错误率进行了测试。仿真结果表明,在语音活动检测器和若干滤波算法的帮助下,系统对包含重叠语音的音频信号的识别准确率可以达到83.02%。相似文献

20.

一种基于子带处理的PAC说话人识别方法研究 总被引：1，自引：1，他引：0

陈迪何静媛李战明《计算机仿真》2008,25(3):306-309

目前,说话人识别系统对于干净语音已经达到较高的性能,但在噪声环境中,系统的性能急剧下降.一种基于子带处理的以相位自相关(PAC)系数及其能量作为特征的说话人识别方法,即宽带语音信号经Mel滤波器组后变为多个子带信号,对各个子带数据经DCT变换后提取PAC系数作为特征参数,然后对每个子带分别建立HMM模型进行识别,最后在识别概率层中将HMM得出的结果相结合之后得到最终的识别结果.实验表明,该方法在不同信噪比噪声和无噪声情况下的识别性能都有很大提高. 相似文献