首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
张玲  顾彦飞  何伟 《计算机应用》2010,30(5):1262-1265
为了降低噪声及决策导向(DD)参数估计算法的帧延迟特性对语音活动检测(VAD)算法鲁棒性的影响,首先采用两步降噪(TSNR)技术估计算法提高语音瞬变时刻参数估计准确性,并针对语音噪声的频率选择性,通过频带分割,将噪声污染限制到孤立子频带中,构建了由子频带特征与可靠性因子结合提供判别结果的子频带加权VAD算法。实验表明,此子频带加权算法优于Sohn算法、Cho算法以及G.729B等全频带算法。  相似文献   

李强  陈浩  陈丁当 《计算机应用》2016,36(11):3212-3216
针对现有基于隐马尔可夫模型(HMM)的语音激活检测(VAD)算法对噪声的跟踪性能不佳的问题,提出采用Baum-Welch算法对具有不同特性的噪声进行训练,并生成相应噪声模型,建立噪声库的方法。在语音激活检测时,根据待测语音背景噪声的不同,动态地匹配噪声库中的噪声模型;同时,为了适应语音信号的实时处理,降低了语音参数提取的复杂度,并对判决阈值提出改进,以保证语音信号帧间的相关性。在不同噪声环境下对改进算法进行性能测试并与自适应多速率编码(AMR)标准、国际电信联盟电信标准分局(ITU-T)的G.729B标准比较,测试结果表明,改进算法在实时语音信号处理中能够有效提高检测的准确率及噪声跟踪能力。  相似文献   

This paper proposes a multimodal approach to distinguish silence from speech situations, and to identify the location of the active speaker in the latter case. In our approach, a video camera is used to track the faces of the participants, and a microphone array is used to estimate the Sound Source Location (SSL) using the Steered Response Power with the phase transform (SRP-PHAT) method. The audiovisual cues are combined, and two competing Hidden Markov Models (HMMs) are used to detect silence or the presence of a person speaking. If speech is detected, the corresponding HMM also provides the spatio-temporally coherent location of the speaker. Experimental results show that incorporating the HMM improves the results over the unimodal SRP-PHAT, and the inclusion of video cues provides even further improvements.  相似文献   

基于对数能量倒谱特征的端点检测算法   总被引:1,自引:0,他引:1  
端点检测技术是语音识别的关键技术之一,为了克服传统倒谱距离语音端点检测算法在低信噪比下检测效果的不理想,将对数能量(LE)特征和倒谱(C)特征相结合,提出了一种新的对数能量倒谱特征(LEC),采用模糊C均值聚类和贝叶斯信息准则(BIC)方法估计特征门限,得出了正确的语音端点判断,在三种典型噪声下,对信噪比从-5 dB到15 dB的带噪声语音进行仿真,结果表明LEC法的检测错误率仅为20.25%,明显低于倒谱法和对数能量法,能有效地确定语音的端点并改善语音识别效果。  相似文献   

This paper, presents a robust voice activity detection (VAD) technique based on wavelet packet. In this technique sub-bands and their amplitudes are represented as the vectors for each sample time in order to find a new feature from the frequency and amplitude changes. On the other hand, the multi-resolution analysis property of the wavelet packet transform (WPT), the voiced, unvoiced, and transient components of speech can be distinctly discriminated. Then, a new feature extraction method is implemented based on observations of the angles between vectors. This feature extraction method retains most unvoiced sounds in a voice active frame. Experimental results show that the proposed WT feature parameter can extract the speech activity under poor SNR conditions and that it is also insensitive to variable-level of noise.  相似文献   

由于车内噪声的存在,使语音的检测率降低得非常明显,这给智能汽车的语音控制带来了困难。提出了一种基于HHT谱矩阵的检测方法。该方法通过分析HHT的特点,以及噪声、语音信号的幅度分布特点,以帧为单位对输入信号HHT的时、频、幅矩阵进行处理,构建幅值-时间曲线,通过对前端噪声段的估计,自动设定阈值对整个信号的语音段进行检测。实验结果表明,该方法在车内噪声较强的情况下仍能有效检测语音段。  相似文献   

为了提高语音激活检测在汽车内部噪声环境下的检测性能,提出了一种基于分带谱熵的语音激活检测算法.将实验仿真结果与ITU标准G.729B中的检测性能进行了分析比较,结果表明,该算法在汽车内部噪声环境下具有较高的准确率和稳定性,且算法的复杂度较低,具有一定的实用价值.  相似文献   

为了提高车载噪声环境下语音端点检测的准确性,提出了一个基于GRU RNN的神经网络结构, 对带噪语音的Log Mel特征序列进行处理,实现语音与噪声的分离,从而恢复出纯净语音的Log Mel特征序列;在此基础上,提出一种新的特征Log Mel Sum,并用该特征进行端点检测。实验结果表明,在车载环境下,本文方法具有很好的端点检测性能。  相似文献   

主动队列管理是目前的研究热点,随机早期检测(RED)算法是一种经典的队列管理算法。线性RED算法虽然简单且容易计算,但队列位于最小阈值和最大阈值附近时的丢包概率都不太合理。在论证了平均队列长度和丢包概率间为非线性性质后,提出了一种改进非线性RED算法——JRED。利用NS2对改进的算法进行仿真,结果表明,JRED算法提高了平均吞吐量,降低了丢包概率,增强了网络稳定性和可靠性。  相似文献   

传统基于特征点匹配的目标检测算法目标识别率低、误检率较高是因为特征点匹配不准确、目标轮廓不连续。针对这一问题,分别引入谱残差算法和k means聚类算法,并加以改进,提出一种基于谱残差算法和k means聚类算法的运动目标检测算法。具体方法是:首先,每隔两帧提取加速鲁棒特征SURF并对图像配准,再对帧差结果采用谱残差算法提取视觉显著性特征,去除因匹配不准确造成的噪点和伪运动目标;其次,形态学处理之后引入改进后的k means聚类算法,对不连续的轮廓进行聚类;最后形成完整的目标。实验显示,本文算法目标识别率达到90.61%,误检率达到21.25%,分别优于传统基于SURF特征的运动目标检测算法66.60%的识别率、31.91%的误检率和基于新的局部不变性特征ORB匹配的目标检测算法87.573%的识别率、26.80%的误检率。虽然该算法平均运行时间为18 fps,但仍可以满足视频流畅的需求,因此动态背景下该算法可做为一种有效的运动目标检测算法使用。  相似文献   

This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD.  相似文献   

李宇  郭雷勇  谭洪舟 《计算机应用》2011,31(5):1447-1449
为了提高统计模型似然比测试的语音活动检测(VAD)的检测性能,利用前后语音帧间存在的统计相关特性,提出一种改进VAD算法。通过前帧语音频谱分量对先验信噪比进行递归估计,然后利用前一帧的语音检测状态来设计判决阈值,建立了双阈值隐马尔可夫模型语音活动判决规则。实验表明,此帧间相关性VAD算法的检测指标值优于Sohn算法。  相似文献   

为提高在噪声环境下语音检测的性能,提出了一种基于离散小波变换的语音激活检测(VAD)的方法。算法将语音信号进行3层离散小波变换,通过Teager能量算子(TEO),提取能量比值和能量差值两个参数,最后进行门限判决。实验结果表明,本算法在噪声环境中能够有效地正确判别语音段和噪声段,并且优于G.729B和AMR所提出的VAD的算法。  相似文献   

Face detection using spectral histograms and SVMs.   总被引:4,自引:0,他引:4  
We present a face detection method using spectral histograms and support vector machines (SVMs). Each image window is represented by its spectral histogram, which is a feature vector consisting of histograms of filtered images. Using statistical sampling, we show systematically the representation groups face images together; in comparison, commonly used representations often do not exhibit this necessary and desirable property. By using an SVM trained on a set of 4500 face and 8000 nonface images, we obtain a robust classifying function for face and non-face patterns. With an effective illumination-correction algorithm, our system reliably discriminates face and nonface patterns in images under different kinds of conditions. Our method on two commonly used data sets give the best performance among recent face-detection ones. We attribute the high performance to the desirable properties of the spectral histogram representation and good generalization of SVMs. Several further improvements in computation time and in performance are discussed.  相似文献   

针对在低信噪比条件下语音端点检测问题,提出了一种基于Toeplitz最大特征值的去噪语音端点检测方法。该方法用语带频谱自相关序列构造一个对称Toeplitz矩阵,利用该矩阵最大特征值的信息量对语音信号进行双门限端点检测。新算法经过实验,能够有效地区分语音和噪声,在不同的低噪声环境条件下具有良好的鲁棒性。与新近的信号递归度分析方法比较,准确率较高。该算法计算代价小,实时性好,简洁易实现。  相似文献   

A modification of the binary weight CHIR algorithm is presented, whereby a zero state is added to the possible binary weight states. This method allows solutions with reduced connectivity to be obtained, by offering disconnections in addition to the excitatory and inhibitory connections. The algorithm has been examined via extensive computer simulations for the restricted cases of parity, symmetry, and teacher problems, which show convergence rates similar to those presented for the binary CHIR2 algorithm, but with reduced connectivity. Moreover, this method expands the set of problems solvable via the binary weight network configuration with no additional parameter requirements.  相似文献   

The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the accuracy of the VAD utilizing statistical models can be significantly improved when machine-learning techniques are adopted to provide prior knowledge for speech characteristics. In the first part of this paper, we introduce a more accurate and flexible statistical model, the generalized gamma distribution (GΓD) as a new model in the VAD based on the likelihood ratio test. In practice, parameter estimation algorithm based on maximum likelihood principle is also presented. Experimental results show that the VAD algorithm implemented based on GΓD outperform those adopting the conventional Laplacian and Gamma distributions. In the second part of this paper, we introduce machine learning techniques such as a minimum classification error (MCE) and support vector machine (SVM) to exploit automatically prior knowledge obtained from the speech database, which can enhance the performance of the VAD. Firstly, we present a discriminative weight training method based on the MCE criterion. In this approach, the VAD decision rule becomes the geometric mean of optimally weighted likelihood ratios. Secondly, the SVM-based approach is introduced to assist the VAD based on statistical models. In this algorithm, the SVM efficiently classifies the input signal into two classes which are voice active and voice inactive regions with nonlinear boundary. Experimental results show that these training-based approaches can effectively enhance the performance of the VAD.  相似文献   

The progressively scale of online social network leads to the difficulty of traditional algorithms on detecting communities. We introduce an efficient and fast algorithm to detect community structure in social networks. Instead of using the eigenvectors in spectral clustering algorithms, we construct a target function for detecting communities. The whole social network communities will be partitioned by this target function. We also analyze and estimate the generalization error of the algorithm. The performance of the algorithm is compared with the standard spectral clustering algorithm, which is applied to different well-known instances of social networks with a community structure, both computer generated and from the real world. The experimental results demonstrate the effectiveness of the algorithm.  相似文献   

传统谱聚类算法直接对原始数据建立高斯核邻接矩阵后再对数据进行聚类,并未考虑数据的深层次特征以及数据的邻域流形结构,并且仅进行单一聚类,针对以上三点不足,提出了利用稀疏自编码的局部谱聚类映射算法(LSCMS),通过对数据进行预处理,利用稀疏自编码提取能反映原始数据本质的深层次特征,并以此替代原始数据;对每个数据利用其邻域进行线性重构,以重构权值代替高斯核函数建立邻接矩阵.LSCMS在聚类同时将数据映射到聚类指标上进而协调聚类指标.在UCI数据集、手写数据集、人脸数据集上的实验结果表明:算法优于现有的聚类算法.  相似文献   

Two difference-based target detection methods are proposed in this work. In contrast to many target detectors which only calculate the distance between the testing pixel to the target spectrum, the proposed methods calculate the distance of the testing pixel to both of target and of background spectra. In other words, they utilize the difference between target and background computed distances. The first proposed method uses the Mahalanobis distance and benefits the valuable information contained in the statistics of targets and background. The second proposed method uses the kernel-based spectral angle mapper to benefit the advantages of spectral angle and kernel trick to separate targets from background, especially in non-linear cases. The experiments done on three real hyperspectral images indicate the high detection probability of the proposed methods compared to several target detectors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号