期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

马昕杜利民《计算机应用》2005,25(6):1342-1344

时频分析的理论基础上,提出了一种基于小波调制尺度特征的参数提取方法。根据人对调制谱信息的感知特性及干扰在调制谱中的特点,采用小波分析技术及归一化处理求得归一化的小波调制尺度特征参数,并以此作为语音的动态特征应用于语音识别系统。通过与MFCC一阶、二阶系数对比的汉语音节识别实验表明,该方法在抗噪声干扰和说话速率变化等方面比MFCC的一阶、二阶系数的性能优越,为提高语音识别鲁棒性提供了一种新途径。相似文献

2.

一种改进动态特征参数的话者语音识别系统

申小虎万荣春张新野《计算机仿真》2015,32(4):154-158

研究语音动态特征参数提取问题,在话者语音识别过程中,动态特征参数可以有效提高识别率.但是传统算法在其提取过程中存在大量干扰冗余信息,造成了识别率降低并带来运算速度的降低.为解决上述副作用,提出在说话人识别系统中,使用一种动态时频倒谱系数参数的方法.上述方法在不减少反应话者个体特征分布特性的前提下,可消除冗余信息并降低样本特征的维度.利用上述方法提取语音特征参数并输入混合高斯-通用背景模型进行说话人语音分类.在Matlab上仿真结果表明,动态时频倒谱系数可有效改进话者语音识别系统的识别正确率. 相似文献

3.

语音信号特征参数的分析和选取

《信息与电脑》2018,(5)

在与文本有关的说话人识别系统中,既需要识别说话人的身份,又需要识别语音文本的内容。语音信号特征参数的选取对系统来说至关重要。目前,在传统语音识别系统的研究中,主要采用MFCC参数作为特征参数进行识别。笔者对语音信号特征参数进行分析,对不同的语音特征参数组合进行实验。实验结果证明,在该系统中,MFCC参数与基音参数的组合提高了系统的识别率。相似文献

4.

一种改进的特征提取方法在语音识别中的应用

陈树于海波《传感器与微系统》2018,(5):154-157

针对梅尔频率倒谱系数(MFCC)参数在噪声环境中语音识别率下降的问题,提出了一种基于耳蜗倒谱系数(CFCC)的改进的特征参数提取方法.提取具有听觉特性的CFCC特征参数;运用改进的线性判别分析(LDA)算法对提取出的特征参数进行线性变换,得到更具有区分性的特征参数和满足隐马尔可夫模型(HMM)需要的对角化协方差矩阵;进行均值方差归一化,得到最终的特征参数.实验结果表明:提出的方法能有效地提高噪声环境中语音识别系统的识别率和鲁棒性. 相似文献

5.

有色噪声环境中鲁棒语音特征参数提取研究

邹大勇李玲《计算机仿真》2011,28(5)

针对复杂噪声干扰环境中语音特征参数会发生改变,引起训练模型和测试语音之间的失配,使语音识别系统的识别率降低,为提高语音特征参数在色噪声环境中提取的鲁棒性,提出了基于总体最小二乘旋转不变子空间技术(TLS-ESPRIT)谐波倒谱加权谱鲁棒特征参数提取方法.运用TLS-SVD方法对观测数据矩阵进行广义特征值分解估计谐波模型的参数,实现了有色噪声背景下语音信号的最优估计.在重建语音的过程中根据谐波能量与带噪语音能量的比值,对重建谐波的各个谐波峰给予不同的加权和语音建模,并进行仿真,结果实现了鲁棒性特征参数的提取,解决了模型之间的失配问题. 相似文献

6.

一种低信噪比下的说话人识别算法研究

茅正冲王正创龚熙《计算机应用与软件》2014,(12)

为了提高低信噪比下说话人识别系统的性能,提出一种Gammatone滤波器组与改进谱减法的语音增强相结合的说话人识别算法。将改进的谱减法作为预处理器,进一步提高语音信号的信噪比,再通过Gammatone滤波器组,对增强后的说话人语音信号进行处理,提取说话人语音信号的特征参数GFCC,进而将特征参数GFCC用于说话人识别算法中。仿真实验在高斯混合模型识别系统中进行。实验结果表明,采用这种算法应用于说话人识别系统,系统的识别率及鲁棒性都有明显的提高。相似文献

7.

基于DSP5416芯片的语音识别系统

WANG Gui-zhen ZHAO Ming-jian 《数字社区&智能家居》2008,(13)

本文主要论述了一种小词表语音识别系统的硬、软件设计方法。系统以DSP5416为硬件平台,采用非线性美尔刻度倒谱参数(MFCC)为特征参数提取算法,动态时间规整(DTW)作为识别算法,实现了语音识别系统的设计。实验结果表明平均语音识别率不低于90%,取得良好的识别效果。相似文献

8.

基于DSP5416芯片的语音识别系统

王桂珍赵明建《数字社区&智能家居》2008,(5):711-713

本文主要论述了一种小词表语音识别系统的硬、软件设计方法。系统以DSP5416为硬件平台,采用非线性美尔刻度倒谱参数（MFCC）特征参数提取算法,动态时间规整（DTW）作为识别算法,实现了语音识别系统的设计。实验结果表明平均语音识别率不低于90％,取得良好的识别效果。相似文献

9.

主成分分析和K-means聚类在说话人识别中的应用

马金龙景新幸杨海燕冼灿娇赵靖《计算机应用》2015,(Z1)

为了解决特征提取计算量大且特征参数不够全面的问题,提出了用主成分分析和K-means聚类进行语音特征参数提取的方法。通过对说话人识别系统中最常用的线性预测倒谱系数( LPCC)参数和梅尔倒谱系数( MFCC)参数提取原理以及差分参数的提取算法深入研究,选择LPCC、MFCC以及其一阶差分参数的组合作为最终混合特征参数。首先用主成分分析降低每一帧语音信号特征参数的阶数,然后经过K-means聚类降低帧数,最后通过矢量量化( VQ)来进行说话人识别。实验结果表明,该方法降低了计算复杂度,同时也提升了识别准确性。相似文献

10.

基于卷积神经网络的面罩语音识别

王霞杜桂明王光艳张艳《传感器与微系统》2017,36(10)

针对带噪面罩语音识别率低的问题,结合语音增强算法,对面罩语音进行噪声抑制处理,提高信噪比,在语音增强中提出了一种改进的维纳滤波法,通过谱熵法检测有话帧和无话帧来更新噪声功率谱,同时引入参数控制增益函数;提取面罩语音信号的Mel频率倒谱系数(MFCC)作为特征参数;通过卷积神经网络(CNN)进行训练和识别,并在每个池化层后经局部响应归一化(LRN)进行优化.实验结果表明:该识别系统能够在很大程度上提高带噪面罩语音的识别率. 相似文献

11.

Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing

《Digital Signal Processing》2019

In this paper, we propose a novel front-end speech parameterization technique for automatic speech recognition (ASR) that is less sensitive towards ambient noise and pitch variations. First, using variational mode decomposition (VMD), we break up the short-time magnitude spectrum obtained by discrete Fourier transform into several components. In order to suppress the ill-effects of noise and pitch variations, the spectrum is then sufficiently smoothed. The desired spectral smoothing is achieved by discarding the higher-order variational mode functions and reconstructing the spectrum using the first-two modes only. As a result, the smoothed spectrum closely resembles the spectral envelope. Next, the Mel-frequency cepstral coefficients (MFCC) are extracted using the VMD-based smoothed spectra. The proposed front-end acoustic features are observed to be more robust towards ambient noise and pitch variations than the conventional MFCC features as demonstrated by the experimental evaluations presented in this study. For this purpose, we developed an ASR system using speech data from adult speakers collected under relatively clean recording conditions. State-of-the-art acoustic modeling techniques based on deep neural networks (DNN) and long short-term memory recurrent neural networks (LSTM-RNN) were employed. The ASR systems were then evaluated under noisy test conditions for assessing the noise robustness of the proposed features. To assess robustness towards pitch variations, experimental evaluations were performed on another test set consisting of speech data from child speakers. Transcribing children's speech helps in simulating an ASR task where pitch differences between training and test data are significantly large. The signal domain analyses as well as the experimental evaluations presented in this paper support our claims. 相似文献

12.

Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition

《IEEE transactions on audio, speech, and language processing》2009,17(1):84-94

The performance of current automatic speech recognition (ASR) systems often deteriorates radically when the input speech is corrupted by various kinds of noise sources. Several methods have been proposed to improve ASR robustness over the last few decades. The related literature can be generally classified into two categories according to whether the methods are directly based on the feature domain or consider some specific statistical feature characteristics. In this paper, we present a polynomial regression approach that has the merit of directly characterizing the relationship between speech features and their corresponding distribution characteristics to compensate for noise interference. The proposed approach and a variant were thoroughly investigated and compared with a few existing noise robustness approaches. All experiments were conducted using the Aurora-2 database and task. The results show that our approaches achieve considerable word error rate reductions over the baseline system and are comparable to most of the conventional robustness approaches discussed in this paper. 相似文献

13.

A Robust Speech Recognition System for Communication Robots in Noisy Environments

Ishi C.T. Matsuda S. Kanda T. Jitsuhiro T. Ishiguro H. Nakamura S. Hagita N. 《Robotics, IEEE Transactions on》2008,24(3):759-763

The application range of communication robots could be widely expanded by the use of automatic speech recognition (ASR) systems with improved robustness for noise and for speakers of different ages. In past researches, several modules have been proposed and evaluated for improving the robustness of ASR systems in noisy environments. However, this performance might be degraded when applied to robots, due to problems caused by distant speech and the robot's own noise. In this paper, we implemented the individual modules in a humanoid robot, and evaluated the ASR performance in a real-world noisy environment for adults' and children's speech. The performance of each module was verified by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80% word accuracy in 70-dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73% word accuracy. 相似文献

14.

语音倒谱特征的研究 总被引：24，自引：1，他引：24

王让定柴佩琪《计算机工程》2003,29(13):31-33

语音倒谱特征是语音识别中最常用的特征参数，它表征了人类的听觉特征。该文在研究基于线性预测倒谱和非线性MEL刻度倒谱特征的基础上，研究了LPCC和MFCC参数提取的算法原理及提取算法，提出了一级、二级差分倒谱特征参数的提取算法。识别实验验证了MFCC参数的鲁棒性优于LPCC参数。相似文献

15.

盲的环境补偿方法

下载免费PDF全文

张宗红陈愉王炜李宗葛宋彬《计算机工程与科学》2000,22(4):100-102

环境鲁棒性是语音识别系统的一个很重要的问题。本文讨论了环境加性噪声和卷积噪声对语音数据的影响,以及提高系统的鲁棒性的盲的环境补偿方法。相似文献

16.

Monaural speech separation based on MAXVQ and CASA for robust speech recognition 总被引：1，自引：0，他引：1

Peng Li Yong Guan Shijin Wang Bo Xu Wenju Liu 《Computer Speech and Language》2010,24(1):30-44

Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly. 相似文献

17.

A review of recent advances in visual speech decoding

《Image and vision computing》2014,32(9):590-605

Visual speech information plays an important role in automatic speech recognition (ASR) especially when audio is corrupted or even inaccessible. Despite the success of audio-based ASR, the problem of visual speech decoding remains widely open. This paper provides a detailed review of recent advances in this research area. In comparison with the previous survey [97] which covers the whole ASR system that uses visual speech information, we focus on the important questions asked by researchers and summarize the recent studies that attempt to answer them. In particular, there are three questions related to the extraction of visual features, concerning speaker dependency, pose variation and temporal information, respectively. Another question is about audio-visual speech fusion, considering the dynamic changes of modality reliabilities encountered in practice. In addition, the state-of-the-art on facial landmark localization is briefly introduced in this paper. Those advanced techniques can be used to improve the region-of-interest detection, but have been largely ignored when building a visual-based ASR system. We also provide details of audio-visual speech databases. Finally, we discuss the remaining challenges and offer our insights into the future research on visual speech decoding. 相似文献

18.

用于自动语音识别系统的切换语音功率谱估计算法

刘金刚周翊马永保刘宏清《计算机应用》2016,36(12):3369-3373

针对语音识别系统在噪声环境下不能保持很好鲁棒性的问题,提出了一种切换语音功率谱估计算法。该算法假设语音的幅度谱服从Chi分布,提出了一种改进的基于最小均方误差（MMSE）的语音功率谱估计算法。然后,结合语音存在的概率（SPP）,推导出改进的基于语音存在概率的MMSE估计器。接下来,将改进的MSME估计器与传统的维纳滤波器结合。在噪声干扰比较大时,使用改进的MMSE估计器来估计纯净语音的功率谱,当噪声干扰较小时,改用传统的维纳滤波器以减少计算量,最终得到用于识别系统的切换语音功率谱估计算法。实验结果表明,所提算法相比传统的瑞利分布下的MMSE估计器在各种噪声的情况下识别率平均提高在8个百分点左右,在去除噪声干扰、提高识别系统鲁棒性的同时,减小了语音识别系统的功耗。相似文献

19.

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Mohamed O. M. Khelifa Yahya Mohamed Elhadj Yousfi Abdellah Mostafa Belkasmi 《International Journal of Speech Technology》2017,20(4):937-949

Conventional Hidden Markov Model (HMM) based Automatic Speech Recognition (ASR) systems generally utilize cepstral features as acoustic observation and phonemes as basic linguistic units. Some of the most powerful features currently used in ASR systems are Mel-Frequency Cepstral Coefficients (MFCCs). Speech recognition is inherently complicated due to the variability in the speech signal which includes within- and across-speaker variability. This leads to several kinds of mismatch between acoustic features and acoustic models and hence degrades the system performance. The sensitivity of MFCCs to speech signal variability motivates many researchers to investigate the use of a new set of speech feature parameters in order to make the acoustic models more robust to this variability and thus improve the system performance. The combination of diverse acoustic feature sets has great potential to enhance the performance of ASR systems. This paper is a part of ongoing research efforts aspiring to build an accurate Arabic ASR system for teaching and learning purposes. It addresses the integration of complementary features into standard HMMs for the purpose to make them more robust and thus improve their recognition accuracies. The complementary features which have been investigated in this work are voiced formants and Pitch in combination with conventional MFCC features. A series of experimentations under various combination strategies were performed to determine which of these integrated features can significantly improve systems performance. The Cambridge HTK tools were used as a development environment of the system and experimental results showed that the error rate was successfully decreased, the achieved results seem very promising, even without using language models. 相似文献

20.

Beyond ASR 1-best: Using word confusion networks in spoken language understanding 总被引：1，自引：0，他引：1

Dilek Hakkani-Tür Frdric Bchet Giuseppe Riccardi Gokhan Tur 《Computer Speech and Language》2006,20(4):495-514

We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. 相似文献