期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Implementation of variable bitrate data hiding techniques on standard and proposed GSM 06.10 full rate coder and its overall comparative evaluation of performance

Ninad Bhatt Yogeshwar Kosta 《International Journal of Speech Technology》2013,16(3):285-293

Paper deals with implementation of variable bit rate steganographic data transmission over ETSI GSM 06.10 FR coder at five different bitrates. Then, few modifications are suggested in Regular Pulse Excitation section of ETSI GSM FR coder which ultimately claims to produce state of the art proposed GSM FR coder. In contrast with ETSI GSM FR coder, proposed coder also exhibits same bit rate steganographic data transmission. Here, in order to facilitate the same, few RPE pulses are identified and being utilized for embedding and hiding the information bits into them. Key element of this research is to allow for joint speech coding and data hiding and that is accomplished with two different approaches like Fixed and Joint Approach. These both approaches are implemented on both Standard and Proposed coders for their overall analytical evaluation of performance using Subjective (Mean opinion Score and Degraded MOS) and Objective (Perceptual Evaluation of Speech Quality) analysis. Small data information is represented as stego signal which can be embedded over different encoded wave files (chosen from NOIZEUS corpus) that serve as carrier signal. Simulation results for both coders reveal the trade off between data embedding rate and recovered speech quality (for both approaches). It is quite evident from both Subjective and Objective analysis that proposed coder offers comparable performance at the same time with lesser simulation delay because of its inherent constructional difference. It remains the fact that for both the coders, Joint approach performs better but at the cost of more simulation delay. 相似文献

2.

特征联合优化深度信念网络的语音增强算法

下载免费PDF全文

王雁贾海蓉吉慧芳王卫梅《计算机工程与应用》2019,55(9):38-42

针对深度信念网络（Deep Believe Network，DBN）模型泛化能力较弱，导致语音增强效果不佳的问题，提出了一种特征联合优化的回归DBN语音增强算法。该算法对语音和噪声不做任何假设。该算法分别提取语音信号的LMPS（Log-Mel frequency Power Spectrum）和MFCC（Mel-Frequency Cepstral Coefficients）特征。LMPS用于直接重构增强语音，保证了语音听觉质量，MFCC作为辅助次级特征。将两种特征联合输入到DBN体系中对网络参数进行优化。这种联合优化在对LMPS的直接预测中加入MFCC限制，提升了模型对LMPS估计的泛化能力，更加准确地重构增强语音。仿真结果表明，在不同的信噪比环境下，与LPS（Log Power Spectrum）和LMPS单特征优化相比，LMPS和MFCC联合优化使增强语音获得了较高的PESQ和SNR，提高了语音质量和可懂度。相似文献

3.

Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages

Pulakka H. Laaksonen L. Vainio M. Pohjalainen J. Alku P. 《IEEE transactions on audio, speech, and language processing》2008,16(6):1124-1137

Quality and intelligibility of narrowband telephone speech can be improved by artificial bandwidth extension (ABE), which extends the speech bandwidth using only information available in the narrowband speech signal. This paper reports a three-language evaluation of an ABE method that has recently been launched in several of Nokia's mobile telephone models. The method extends the speech bandwidth to frequencies above the telephone band by first utilizing spectral folding and then modifying the magnitude spectrum of the extension band with spline curves. The performance of the method was evaluated by formal listening tests in American English, Russian, and Mandarin Chinese. The results of the listening tests indicate that ABE processing improved the subjective quality of coded narrowband speech in all these languages. Differences between bandwidth-extended American English test sentences and their original wideband counterparts were also evaluated using both an objective distance measure that simulates the characteristics of human hearing and a conventional spectral distortion measure. The average objective error was calculated for different categories of speech sounds. The error was found to be smallest in nasals and semivowels and largest in fricative sounds. 相似文献

4.

时间反转语音掩蔽的语音信号可懂度的客观评价方法

王玥 Philip Leistner 李平《微计算机应用》2012,1(2):54-59

对于开放型办公室语音掩蔽系统性能的评价,语言可懂度是很重要的一个方面,目前通常采取的客观评价方法是STI。将语音信号按一定时间帧长反转后得到的信号我们称为时间反转语音,时间反转语音已被作为有效掩蔽信号之一。虽然对于由平稳噪声掩蔽的语音信号,STI与主观理解的语言可懂度相关性很好。但研究发现STI不适用于估计由时间反转语音掩蔽的语音信号的语言可懂度。文章分析了STI、PESQ及mNCM客观评价方法并进行了实验,实验结果表明,PESQ及mNCM对于由反转语音掩蔽的语音信号仍能较好估计语言可懂度。文章根据客观评价结果,进一步比较了反转语音掩蔽算法的不同参数（反转帧长与信噪比）对于语言可懂度的影响。发现反转帧长的增加和信噪比的降低会导致较低的语言可懂度。相似文献

5.

Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding

Shijo M. Joseph Anto P. Babu 《International Journal of Speech Technology》2016,19(3):537-550

During the last five decades, extensive researches have been carried out in the field of speech compression, which has resulted in various techniques for speech coding. Researchers have been in full swing for more efficient speech coding and their effort is still continuing in different parts of the world. In this paper we are proposing an alternative method for better speech coding. In the proposed technique we use discrete wavelet transform to decompose the signal and wavelet energy is used to differentiate between active voice region and silence region in the speech signal. Depending upon the region’s status the system, different thresholding strategies have been chosen which leads to a better compression without any loss of speech intelligibility. The proposed method is evaluated in terms of qualitative and quantitative parameters. In this paper we also propose an alternative parameter for MOS values which is here after known as System Recognition Rate. 相似文献

6.

基于DSP的甚低速率语音编码算法及其实现

下载免费PDF全文

赵继勇曹芳梁妙元刘亚峰《计算机工程》2011,37(21):261-263

在混合激励线性预测(MELP)算法的基础上,设计一种1 120 b/s MELP甚低速率语音编码算法。该算法通过增加帧长、动态比特分配和多帧联合矢量量化、及参数内插等方法降低语音的编码速率,并已在TMS320VC5416DSP芯片上实时实现。采用美国GL公司的语音质量评估系统VQT,对编解码的实时语音质量进行评估,语音质量感知评价高于3分。实验结果表明,该算法能够满足实际通信要求。相似文献

7.

基于最短欧氏距离替换码元的VoIP隐写算法

下载免费PDF全文

孙鑫昊王开西《计算机工程与应用》2022,58(13):128-134

信息隐藏的首要目标是不被感知,降低修改率是提高不可见性、实现不可感知的有效方法。根据对载体修改越少,隐写越不可被感知的基本原则,提出了一种基于最短欧氏距离替换码元的VoIP隐写方法,通过降低对载体的修改数量,提高隐写效率,达到了在不降低音频质量的前提下提高隐写不可见性的目的。该方法给出隐写单元的定义后,计算隐写单元中对应线性预测编码（linear predictive coding,LPC）值与秘密消息的欧氏距离,用秘密消息替换具有最短距离的LPC值,并将被替换的LPC值所在位置进行标记,使用互补邻节点方法隐写位置信息,实现秘密消息隐藏。对提出的算法进行了性能分析与对比,实验结果表明,该方法具有较高的隐写效率,隐写修改率降低至22.7%,KL散度值表明其安全性较好,PESQ值在4.0以上,表明语音质量较好。相似文献

8.

基于Gammatone滤波器组的客观语音质量评估

李庆先卞昕刘良江朱宪宇周鑫《计算技术与自动化》2016,(3):76-80

语音质量是评价通信系统的一项重要指标。现有的语音质量感知评估算法采用基于Bark谱的感知模型,其算法复杂度较大,并且对于人耳的频率选择性的模拟存在不足。针对这一问题,本文提出一种新的客观语音质量评估方法,采用更加符合人耳听觉特性的Gammatone滤波器组提取特征参数,计算原始语音与失真语音的平均失真距离,并由主观平均意见分值和归一化平均失真距离之间的映射关系求出客观平均意见分值。实验表明,与感知评估方法相比,本文所提出算法的计算复杂度大大降低,同时保持了客观平均意见分值与主观平均意见分值之间的高相关度。相似文献

9.

低延迟低码率语音编码研究

赵哲峰张刚谢克明王一平《计算机工程与应用》2008,44(34):100-102

现有的低延迟语音编码算法（LD-CELP）需要16 kb/s比特率,无疑会妨碍它的应用。提出了一种采用两阶段码书搜索的方法可以在提高低延迟语音编码算法性能的同时降低码率。首先构造了两个子码书：一个后向更新的自适应码书和一个具有代数结构的固定码书;然后设计了两阶段码书搜索方法使滤波后的激励矢量和目标矢量之间的均方误差保持最小。这样就得到了一个在8 kHz采样率下具有2.5 ms延迟的10 kb/s两阶段码书搜索的CELP编码器。用平均分段信噪比（SSNR）和感知语音质量评价（PESQ）测试,本算法具有和16 kb/s的G.728相当的编码质量。相似文献

10.

Proposed modifications in ITU-T G.729 8 kbps CS-ACELP speech codec and its overall performance analysis

Nikunj Tahilramani Ninad Bhatt 《International Journal of Speech Technology》2017,20(3):615-628

This paper proposes modification in the transmission of excitation codevector and its non-zero pulse sign magnitude using “codebook partition and label assignment” approach, which in turn reduces the number of bits required to transmit it through the communication channel in legacy CS-ACELP 8 kbps speech codec. The proposed approach uses the excitation codebook structure of forward mode standard G.729E 11.8 kbps with two non-zero pulses per track which avoids the use of two algebraic codebook structure for forward mode as well as for backward mode of G.729E with least significant pulse replacement approach for finding optimized excitation codevector. Proposed modification in legacy 8 kbps CS-ACELP (80 bits/10 ms) speech codec actuates the bit rate of 10.6 kbps (106 bits/10 ms) with a better objective and subjective analysis in stark contrast with legacy 8 kbps CS-ACELP speech coder and also avoids the switching of codebook modes of standard 11.8 kbps (G.729E) CS-ACELP speech coder. This paper also aims to propose the reduction in the number of searches in the final codevector of excitation structure by considering initial codevector as a final codevector which improves the quality of the speech compared to the output speech quality of legacy G.729 CS-ACELP working at 8 kbps. Both legacy CS-ACELP 8 kbps speech codec and proposed CS-ACELP 10.6 kbps are implemented in MATLAB. Subjective and objective analysis are carried out on a proposed CS-ACELP 10.6 kbps speech codec in order to evaluate its performance and the results obtained are then cross- compared with the results of legacy CS-ACELP (8 kbps) using set of tables and graphs. It is evident from obtained results that both PESQ and MOS scores are quite comparable for each set of wave files even though bitrates are reduced. Consistency and efficiency of proposed algorithm is assured by calculating the population mean of 95% confidence interval based on obtained objective and subjective parameter results. 相似文献

11.

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Suman Senapati 《International Journal of Speech Technology》2013,16(4):439-459

This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time. 相似文献

12.

A Multiresolution Model of Auditory Excitation Pattern and Its Application to Objective Evaluation of Perceived Speech Quality

《IEEE transactions on audio, speech, and language processing》2006,14(6):1912-1923

This paper proposes a multiresolution model of auditory excitation pattern and applies it to the problem of objective evaluation of subjective wideband speech quality. The model uses wavelet packet transform for time-frequency decomposition of the input signal. The selection of the wavelet packet tree is based on an optimality criterion formulated to minimize a cost function based on the critical band structure. The models of the different auditory phenomena are reformulated for the multiresolution framework. This includes the proposition of duration dependent outer and middle ear weighting, multiresolution spectral spreading, and multiresolution temporal smearing. As an application, the excitation pattern is used to define an objective measure of auditory distortion of a distorted speech signal compared to the undistorted one. The performance of this objective measure is evaluated with a database of various kinds of NOISEX-92 degraded wideband speech signals in predicting the subjective mean opinion score (MOS) and is compared with the fast Fourier transform (FFT)-based ITU-T PESQ P.862.2 algorithm. The proposed measure is found to achieve comparable correlation between subjective MOS and objective MOS as PESQ P.862.2, with a trend suggesting better correlation for the nonstationary degradations compared to the stationary ones. Further refinement of the measure for distortion types other than additive noise is anticipated. 相似文献

13.

基于CycleGAN的语音可懂度关键技术

肖晶刘佳奇李登实赵兰馨王前瑞《计算机系统应用》2022,31(6):1-9

语音可懂度增强是一种在嘈杂环境中再现清晰语音的感知增强技术. 许多研究通过说话风格转换(SSC)来增强语音可懂度, 这种方法仅依靠伦巴第效应, 因此在强噪声干扰下效果不佳. SSC还利用简单的线性变换对基频(F0)的转换进行建模, 并且只映射很少维的梅尔倒谱系数(MCEPs). 因为F0和MCEPs是语音的两个重要特征, 对这些特征进行充分的建模是非常必要的. 因此本文进行了一个创新性研究即通过连续小波变换(CWT)将F0分解为10维来描述不同时间尺度的语音, 以实现F0的有效转换, 而且使用20维表示MCEPs实现MCEPs的转换. 除此之外, 还利用iMetricGAN网络来优化强噪声中的语音可懂度指标. 实验结果表明, 提出的基于CycleGAN使用CWT和iMetricGAN的非平行语音风格转换方法(NS-CiC)在客观和主观评价上均显著提高了强噪声环境下的语音可懂度. 相似文献

14.

Acoustic classification and segmentation using modified spectral roll-off and variance-based features

Marko Kos Zdravko Kačič Damjan Vlaj 《Digital Signal Processing》2013,23(2):659-674

相似文献

15.

基于线谱区域量化技术的低语音编码

王卫锋张秀彬王世新刘旭涛汤亮《微型电脑应用》2001,17(4):50-51

本文将线谱对（Linear Spectrum Pair,LSP )参数用于语音编码,同时分析比较了LSP与常用的线性预测系数（LPC）和格滤波发射系统（PARCOR）的特征分别,并由此引入更能有效量化线谱对参数的区域量化技术。因此在保证编码后语音MOS指标的同时可以达到进一步降低码率的目的。相似文献

16.

Simultaneous speech coding and de-noising in a dictionary based quantized CS framework

Vinitha Ramdas Sai Subrahmanyam R. K. Gorthi Deepak Mishra 《International Journal of Speech Technology》2016,19(3):509-523

Speech compression or speech coding is inevitable for effective communication of speech signals in resource limited scenarios and researcher’s have been working on achieving lower and lower transmission bit rates (BR) without much compromise on the quality of speech. Medium BR hybrid speech coding schemes have gained much interest in the recent years with most of them based on CELP, the basic medium bit-rate coding scheme. In this work, we provide an insight to the capabilities of compressive sensing (CS) in speech processing and propose a novel idea in the quantized framework. Three major aspects demonstrated in this paper are (1) Inherent de-noising of noisy speech by the CS based coder along with compression (2) Quantization of CS measurements to achieve medium transmission bit-rates and (3) Enhancement of quality and compression performance of the coder with better sparse representations of speech using dictionaries. The results indicate that the proposed scheme offers better compression in comparison with basic Gaussian codebook CELP. The CS scheme has the added advantage of inherent noise suppression and provides more robustness to background noise in comparison with parameter extraction based medium bit-rate speech coding systems. 相似文献

17.

Bandwidth extension of telephone speech using magnitude spectrum data hiding

Prasad Nizampatnam Kishore Kumar Tappeta 《International Journal of Speech Technology》2017,20(1):151-162

Public telephone systems transmit speech across a limited frequency range, about 300–3400 Hz, called narrowband (NB) which results in a significant reduction of quality and intelligibility of speech. This paper proposes a fully backward compatible novel method for bandwidth extension of NB speech. The method uses magnitude spectrum data hiding technique to provide a perceptually better wideband speech signal. Code excited linear prediction parameters are extracted from the down sampled frequency shifted version of the high frequency components of speech signal existing above NB, which are spread by using pseudo-noise codes, and are embedded in the low amplitude high-frequency regions of the magnitude spectrum of NB speech signal. The embedded information is extracted at the receiving end to reconstruct the wideband speech signal. Theoretical and simulation analyses show that the proposed method is robust to quantization and channel noises. The comparison category rating listening and log spectral distortion tests clearly show that the reconstructed wideband signal gives a much better performance in terms of speech quality when compared to some of the existing speech bandwidth extension methods employing data hiding. 相似文献

18.

Intelligibility prediction for distorted sentences by the normalized covariance measure

Fei?Chen Email author 《International Journal of Speech Technology》2011,14(3):237-243

Speech-transmission index (STI) has been extensively used for predicting the intelligibility of speech corrupted by reverberation and additive noise. This study further evaluated its performance in predicting the intelligibility of three types of distorted sentences, i.e., time-reversed stimuli, vocoded stimuli, and stimuli containing recovered envelope from Hilbert fine-structure condition (R-HFS). The distorted sentences were simulated, and the intelligibility was predicted by the normalized covariance measure (NCM), which was a STI-based index. The NCM measure was evaluated with the intelligibility scores available for the three types of distorted stimuli, and the performance was also compared with those obtained with the PESQ measure and coherence-based speech intelligibility index. It was found that the NCM measure consistently well predicted the intelligibility in all three conditions of speech distortion: (1) the intelligibility of time-reversed speech continuously declined till the segmentation duration for speech reversal increased to 200 ms; (2) the intelligibility of tone-vocoded and noise-vocoded stimuli improved with more channels used in vocoder, and the intelligibility of these two types of vocoded sentences showed a small difference; and (3) the intelligibility of R-HFS stimuli decreased when the number of analysis bands varied from one to eight. Supplementary to previous outcomes on speech intelligibility prediction, the results in present work support that the intelligibility of distorted sentences could be well predicted by the NCM measure. 相似文献

19.

MOS estimation model development using ACR listening-opinion tests with Thai users referring to loss effects: a case of G.726 and G.729

Pongpisit?Wuttidittachotti Phisit?Khaoduang Therdpong?Daengsi Email author 《Multimedia Systems》2018,24(3):285-295

This paper proposes two models of Mean Opinion Score (MOS) estimation based on Thai users and the Thai language, referring to packet loss effects, for G.726 and G.729 codecs. Based on Thai users and Thai speech referring to packet loss effects in this work, the Absolute Category Rate (ACR) listening tests were conducted with 89 participants and 107 participants for the MOS estimation model development of G.726 and G.729 respectively, while the same tests were conducted with totally 60 participants for the model evaluation of both codecs. Packet loss rates were 0–15% for G.726 with 5 test conditions and G.729 with 6 test conditions; each condition was conducted with at least 16 participants. After gathering the data, the MOS estimation models for both codecs were simply created and then evaluated with the test sets, comparing Perceptual Evaluation of Speech Quality (PESQ), a popular measurement method. For one of the contributions of this study, after the models were evaluated using Mean Absolute Percentage Error (MAPE), it was found that the proposed models for G.726 and G.729 provided better performance than PESQ, particularly by reducing the MAPE by about 30% and 17% respectively, compared to PESQ. 相似文献

20.

语音谱参数的增强双预测多级矢量量化的码本设计方法 总被引：1，自引：0，他引：1

高戈胡瑞敏李德仁《计算机工程与应用》2002,38(10):23-26

表征语音谱参数的线性预测编码(LPC)参数被广泛用于各种语音编码算法。甚低位率语音编码算法要求使用尽可能少的位率编码语音谱参数。文章提出了语音谱参数的增强双预测多级矢量量化算法(EDPMSVQ)的码本设计方法。这种改进的多级矢量量化方法充分利用语音谱参数的短时相关和长时相关特性,采用了有记忆的多级矢量量化算法(MSVQ),对语音谱参数的每一维分别使用不同的预测系数;并且通过利用相邻语音帧间语音谱参数的强相关和弱相关的不同特点,采用了分别对应于强相关和弱相关的两个预测值集合,进一步减小了语音谱参数编码位率。增强双预测多级矢量量化方法能够实现20位的语音谱参数近似“透明”量化,同时能够使语音谱参数量化时的计算复杂度略有减少,所需的存储空间大为减少。相似文献