期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A new optimum feature extraction and classification method for speaker recognition: GWPNN

《Expert systems with applications》2007,32(2):485-498

Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification. 相似文献

2.

Bark子带小波包自适应阈值语音去噪方法

田玉静左红伟董玉民魏德生《计算机应用》2010,30(11):3111-3114

为了克服低信噪比输入下,语音增强造成清音弱分量损失,导致信号重构失真的问题,提出了一种新的语音增强方法。该方法采用小波包拟合语音感知模型的临界带,按子带能量对语音清浊音分离,然后对清音和浊音信号分别作8层和4层小波包分解,在阈值计算上采用Bark子带小波包自适应节点阈值算法,在Bark子带实时跟踪噪声水平,有效保护清音中高频弱分量,减少失真。通过与传统语音增强方法的仿真对比实验,证实该方法在低信噪比输入时,具有明显优势,输出信噪比高,语音失真度低。将该方法与谱减法相结合,进行语音二次增强,能进一步提高增强语音质量。相似文献

3.

结合听觉掩蔽效应的时频自适应小波阈值增强

丁卫王忠《计算机工程与设计》2011,32(11):3768-3771,3856

针对浊音、清音和噪声的不同特性,结合听觉掩蔽并使用随尺度变化的多阈值对语音信号进行处理.提出了多小波门限估计法,该方法针对不同声音成分,使用不同的与尺度有关的缩小因子调节门限值;通过估计各频带内的信噪比,实现了阈值的时频自适应变化;用巴克小波包分解法模拟人耳临界带特性,用小波谱减法对带噪语音进行预增强,采用Johnst... 相似文献

4.

The speaker identification by using genetic wavelet adaptive network based fuzzy inference system

E. Avci D. Avci 《Expert systems with applications》2009,36(6):9928-9940

In this paper, an intelligent speaker identification system is presented for speaker identification by using speech/voice signal. This study includes both combination of the adaptive feature extraction and classification by using optimum wavelet entropy parameter values. These optimum wavelet entropy values are obtained from measured Turkish speech/voice signal waveforms using speech experimental set. It is developed a genetic wavelet adaptive network based on fuzzy inference system (GWANFIS) model in this study. This model consists of three layers which are genetic algorithm, wavelet and adaptive network based on fuzzy inference system (ANFIS). The genetic algorithm layer is used for selecting of the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the eight different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet decomposition, wavelet decomposition – short time Fourier transform, wavelet decomposition – Born–Jordan time–frequency representation, wavelet decomposition – Choi–Williams time–frequency representation, wavelet decomposition – Margenau–Hill time–frequency representation, wavelet decomposition – Wigner–Ville time–frequency representation, wavelet decomposition – Page time–frequency representation, wavelet decomposition – Zhao–Atlas–Marks time–frequency representation. The wavelet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet decomposition and wavelet entropies. The ANFIS approach is used for evaluating to fitness function of the genetic algorithm and for classification speakers. It has been evaluated the performance of the developed system by using noisy Turkish speech/voice signals. The test results showed that this system is effective in detecting real speech signals. The correct classification rate is about 91% for speaker classification. 相似文献

5.

基于小波包分析的在线手写签名认证方法

马海豹刘漫丹张岑《计算机工程与应用》2007,43(12):235-238

针对在线手写签名难以提取有效特征的实际情况,提出用小波包分解和单支重构来构造能量特征向量的方法,直接利用各频段成分能量的变化来反映签名的动态特征。用该方法构造的特征向量能突出反映签名的动态特征,通过RBF神经网络进行签名识别。实验数据表明,采用此方法,识别的正确率可达96.75%,平均错误率ERR=3.34%,其性能是较满意的。相似文献

6.

一种基于清浊音分离的动态阈值小波去噪方法

下载免费PDF全文

张君昌叶珍李艳艳《计算机工程与应用》2011,47(12):133-136

低信噪比下,传统的小波去噪算法会造成语音信号中有用信息的损失,从而导致去噪性能的下降。针对这一问题,提出了一种基于清浊音分离的动态阈值小波去噪方法。采用谱减法去除部分噪声,再运用短时能量法判别清浊音,有效地降低了误判率;融入了小波包分解法以保护清音部分不被损失;根据各层的分解系数来动态地确定阈值,以避免过平滑真实信号;采用了一种新的阈值函数,有效弥补了软、硬阈值函数在去噪性能上的不足。仿真结果表明,该方法能较好地提高语音信号的重构质量。相似文献

7.

基于小波包变换的自适应门限的语音激活检测

陈明义李微黎华《计算机仿真》2009,26(3)

语音激活检测是语音信号处理的一个重要环节.在低信噪比的情况下,传统的检测方法已不适用.为了提高语音激活检测的性能和鲁棒性,针对主要由白噪声组成的噪声背景,提出了一种基于小波包变换的自适应门限的语音激活检测方法(VAD),它将语音信号进行小波包变换,得到各个子带信号,符个子带信号通过Teager能量算子(TEO)将有声部分强化,同时衰减无声部分,最后进行自适应门限判决.实验结果表明在低信噪比的情况下,算法能够正确判别语音段和噪声段. 相似文献

8.

基于多尺度小波包分析的肺音特征提取与分类 总被引：8，自引：0，他引：8

刘毅张彩明赵玉华董亮《计算机学报》2006,29(5):769-777

提出了一种适于非平稳肺音信号的特征提取方法.以4种肺音信号（正常、气管炎、肺炎和哮喘）为样本数据,通过分析肺音信号的时频分布特点,选择了具有任意多分辨分解特性的小波包.对小波包进行空间划分后找到了适合肺音特征提取的最优基,并基于最优基对肺音信号进行快速多尺度的分解,得到了各级节点的高维小波系数矩阵,建立了小波系数与信号能量在时域上的等价关系,并将能量作为特征值,构造了低维的作为分类神经网络的输入特征矢量,大大降低了输入特征的维数.研究表明该算法的识别性能是高效的. 相似文献

9.

Artificial neural network based autoregressive modeling technique with application in voice activity detection

A.M. Aibinu M.J.E. Salami A.A. Shafie 《Engineering Applications of Artificial Intelligence》2012,25(6):1265-1276

A new method of estimating the coefficients of an autoregressive (AR) model using real-valued neural network (RVNN) technique is presented in this paper. The coefficients of the AR model are obtained from the synaptic weights and adaptive coefficients of the activation function of a two layer RVNN while the number of neurons in the hidden layer is estimated from over-constrained system of equations.The performance of the proposed technique has been evaluated using sinusoidal data and recorded speech so as to examine the spectral resolution and line splitting as well as its ability to detect voiced and unvoiced data section from a recorded speech. Results obtained show that the method can accurately resolve closely related frequencies without experiencing spectral line splitting as well as identify the voice and unvoiced segments in a recorded speech. 相似文献

10.

基于小波变换的语音增强方法研究 总被引：4，自引：1，他引：3

下载免费PDF全文

董胡钱盛友《计算机工程与应用》2007,43(31):58-60

分析了小波去噪原理,根据随机噪声的小波变换系数在不同尺度上的传递特性和噪声信号奇异性与小波模极大值的关系,同时考虑到语音中浊音和清音的特点,提出了一种改进阈值的小波域语音增强方法。在阈值函数中引入参数,通过调整参数以获得最佳的小波系数的阈值估计,使得改进阈值介于硬阈值与软阈值之间。利用改进阈值对染噪语音信号的小波系数进行阈值处理,既抑制了噪声,又减少了语音段信息的损失。仿真结果表明,这是一种有效的语音增强方法。相似文献

11.

基于时频二维能量特征的汉语音节切分方法

张扬赵晓群王缔罡《计算机应用》2016,36(11):3222-3228

较准确的语音切分方法可以极大提高语料标注等工作的效率,有助于语音识别等应用中语音与模型的对齐。利用汉语语音在时频二维的能量特征设计了一种新的汉语语音音节切分方法。用传统方法判断静音帧,用相同时间不同频率的二维能量判断清音帧,用不同时间特定频段的0-1二维能量判断浊音帧及有话帧,综合4种判断结果给出音节切分位置。实验结果表明,该方法切分准确度优于基于归并的音节切分自动机（MBSDA）和高斯拟合法,其音节切分误差为0.0297 s,音节切分偏差率为7.93%。相似文献

12.

Interpolation of Lost Speech Segments Using LP-HNM Model With Codebook Post-Processing

Zavarehei E. Vaseghi S. 《Multimedia, IEEE Transactions on》2008,10(3):493-502

This paper presents a method for interpolation of lost speech segments. The interpolation method can be used for packet loss concealment in voice communication over mobile phones, for voice over IP or for restoration of lost segments in speech recordings. The interpolation method employs a combination of a linear prediction (LP) model of the spectral envelope and a harmonic noise model (HNM) of the excitation of speech. The speech interpolation problem is transformed to the modeling and interpolation of the trajectories of LP parameters and the amplitude, phase and harmonicity of HNM tracks of speech excitation. In particular, the interpolation of harmonicity results in a smooth transition from voiced to unvoiced speech and vice versa. Crucially, the proposed interpolation method does not suffer from the consequences of zero-excitation of conventional autoregressive (AR) interpolation. Different combinations of linear and autoregressive interpolation methods are evaluated for the estimation of the time-varying parameters of LP-HNM tracks. Furthermore, a post-processing codebook mapping, employed to enhance the interpolation of the spectral envelope of speech, results in improved output quality for longer length speech gaps. For different packet loss rates and patterns of distributions of missing speech gaps, the proposed interpolation methods are evaluated and compared with popular AR-based interpolation methods and the speech packet recovery method specified in the ITU G.711 standard, as a reference. The evaluation results show that the proposed methods substantially improve the restoration of formants and harmonic tracks and consistently results in significant performance gain and improved perceptual quality of speech. 相似文献

13.

小波包熵在水下目标识别中的应用研究

石敏徐袭《计算机工程与应用》2014,(1):215-217,231

研究了基于小波包变换和Fisher线性分类器的水下目标识别方法。小波包是在小波变换的基础上发展起来的时频分析方法,能够对非平稳信号提供更丰富的时频信息。通过对水下目标辐射噪声信号进行小波包分解,提取小波包分解的终端节点的熵值作为特征矢量,应用Fisher线性分类器设计的分段线性分类器对水下目标进行分类识别。仿真结果表明,以小波包熵作为特征矢量的分类方法具有较高的识别正确率。相似文献

14.

Average framing linear prediction coding with wavelet transform for text-independent speaker identification system

Khaled Daqrouq Khalooq Y. Al Azzawi 《Computers & Electrical Engineering》2012

In this work, an average framing linear prediction coding (AFLPC) technique for text-independent speaker identification systems is presented. Conventionally, linear prediction coding (LPC) has been applied in speech recognition applications. However, in this study the combination of modified LPC with wavelet transform (WT), termed AFLPC, is proposed for speaker identification. The investigation procedure is based on feature extraction and voice classification. In the phase of feature extraction, the distinguished speaker’s vocal tract characteristics were extracted using the AFLPC technique. The size of a speaker’s feature vector can be optimized in term of an acceptable recognition rate by means of genetic algorithm (GA). Hence, an LPC order of 30 is found to be the best according to the system performance. In the phase of classification, probabilistic neural network (PNN) is applied because of its rapid response and ease in implementation. In the practical investigation, performances of different wavelet transforms in conjunction with AFLPC were compared with one another. In addition, the capability analysis on the proposed system was examined by comparing it with other systems proposed in literature. Consequently, the PNN classifier achieves a better recognition rate (97.36%) with the wavelet packet (WP) and AFLPC termed WPLPCF feature extraction method. It is also suggested to analyze the proposed system in additive white Gaussian noise (AWGN) and real noise environments; 58.56% for 0 dB and 70.52% for 5 dB. The recognition rates for the whole database of the Gaussian mixture model (GMM) reached the lowest value in case of small number of training samples. 相似文献

15.

运动想象脑电信号特征提取与分类算法研究

马也常天庆郭理彬《计算机工程与应用》2017,53(16):149-154

针对运动想象脑电信号特征提取困难,分类正确率低的问题,提出了利用小波熵进行特征提取并采用支持向量机（SVM）来分类的算法。计算运动想象脑电信号的功率,通过理论分析选择小波包尺度,对信号功率进行小波包分解并计算其小波包熵（WPE）,提取C3、C4导联的小波包熵插值组成特征向量,将特征向量作为分类器的输入送入支持向量机进行分类。采用国际BCI竞赛2003中的Graz数据进行验证,算法的最高分类正确率达97.56%。算法特征向量维数低、数据量小、分类正确率高,对运动想象脑电信号特征提取及分类的任务可以提供参考方法。相似文献

16.

New method for feature extraction based on fractal behavior 总被引：1，自引：0，他引：1

Yuan Y. Tang Ernest C.M. Lam 《Pattern recognition》2002,35(5):1071-1081

In this paper, a novel approach to feature extraction based on fractal theory is presented as a powerful technique in pattern recognition. This paper presents a new fractal feature that can be applied to extract the feature of two-dimensional objects. It is constructed by a hybrid feature extraction combining wavelet analysis, central projection transformation and fractal theory. New fractal feature and fractal signatures are reported. A multiresolution family of the wavelets is also used to compute information conserving micro-features. We employed a central projection method to reduce the dimensionality of the original input pattern. A wavelet transformation technique to transform the derived pattern into a set of sub-patterns. Its fractal dimension can readily be computed, and to use the fractal dimension as the feature vectors. Moreover, a modified fractal signature is also used to distinguish the distinct handwritten signatures. We expect that the proposed fractal method can also be used for improving the extraction and classification of features in pattern recognition. 相似文献

17.

基于EMG信号的无声语音识别应用及实现

许佳佳姚晓东《计算机与数字工程》2006,34(5):133-136

提出了基于肌电信号（EMG）的无声语音识别系统。由于该系统是通过EMG信号而非声音信号进行识别,因此可应用于高噪声环境和帮助失去发音能力的人实现无声交流,有着良好的应用前景。关于该系统的实现,提出了以下方法：实验时使用0—9十个中文数字,由受试者不发声地重复说出,从三块面部肌肉采集EMG信号;对EMG信号进行小波变换,获取变换系数矩阵后提取其能量值,构造特征矢量送入BP神经网络分类器分类。实验表明,基于小波变换的特征提取方法是一种有效的方法．适用于类似EMC信号的非平稳生理信号。相似文献

18.

A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. 总被引：2，自引：0，他引：2

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

19.

A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

20.

Robust distributed speech recognition in noise and packet loss conditions

Ronan Flynn Edward Jones 《Digital Signal Processing》2010,20(6):1559-1571

This paper examines the performance of a Distributed Speech Recognition (DSR) system in the presence of both background noise and packet loss. Recognition performance is examined for feature vectors extracted from speech using a physiologically-based auditory model, as an alternative to the more commonly-used Mel Frequency Cepstral Coefficient (MFCC) front-end. The feature vectors produced by the auditory model are vector quantised and combined in pairs for transmission over a statistically modelled channel that is subject to packet burst loss. In order to improve recognition performance in the presence of noise, the speech is enhanced prior to feature extraction using Wiener filtering. Packet loss mitigation to compensate for missing features is also used to further improve performance. Speech recognition results show the benefit of combining speech enhancement and packet loss mitigation to compensate for channel and environmental degradations. 相似文献