期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

毛启容詹永照杜守富《电子与信息学报》2007,29(2):434-438

为保护实时语音通信中的个人特征,该文提出了一种新的实时语音个人特征改变方法,该方法采用PLAR(Pseudo Log Area Ratio)系数曲线变换方法和基于线性预测的基音同步叠加(LP-PSOLA)算法分别对语音信号的谱参数和韵律参数进行修改,从而实现语音信号个人特征的改变;此外,针对目前时长规整大多采用的同步叠加(SOLA)算法计算量大、不适合实时语音处理的缺点,采用课题组提出的一种新的基于同步叠加方法的时长规整算法自适应同步叠加(ASOLA)算法,对个人特征改变后的语音信号进行时间上的弥补,保证语音处理的实时性。最后,利用该方法实现了实时语音的隐私保护。实验结果表明,该方法合成的语音质量高、实时性好。相似文献

2.

Improving the Flexibility of Dynamic Prosody Modification Using Instants of Significant Excitation

D. Govind Tinu T. Joy 《Circuits, Systems, and Signal Processing》2016,35(7):2518-2543

Modification of suprasegmental features such as pitch and duration of original speech by fixed scaling factors is referred to as static prosody modification. In dynamic prosody modification, the prosodic scaling factors (time-varying modification factors) are defined for all the pitch cycles present in the original speech. The present work is focused on improving the naturalness of the prosody modified speech by reducing the generation of piecewise constant segments in the modified pitch contour. The prosody modification is performed by anchoring around the accurate instants of significant excitation estimated from the original speech. The division of longer pitch intervals into many equal intervals over long speech segments introduces step-like discontinuities in the form of piecewise constant segments in the modified pitch contours. The effectiveness of proposed dynamic modification method is initially confirmed from the smooth modified pitch contour plot obtained for finer static prosody scaling factors, waveforms, spectrogram plots and comparison subjective evaluations. Also, the average \(F_0\) jitter computed from the pitch segments of each glottal activity region in the modified speech is proposed as an objective measure for the prosody modification. The naturalness of the prosody modified speech using the proposed method is objectively and subjectively compared with that of the existing zero frequency filtered signal-based dynamic prosody modification. Also, the proposed algorithm effectively preserves the dynamics of the prosodic patterns in singing voices where in the \(F_0\) parameters rapidly and continuously fluctuate within a higher \(F_0\) range. 相似文献

3.

语音时长规整技术的研究回溯 总被引：1，自引：0，他引：1

周俊高悦谭薇陈砚圃《现代电子技术》2006,29(18):102-105

语音时长规整技术是在不改变语音音调并保证良好音质的情况下,对语音进行一定的压缩或拉伸的技术。首先给出了语音时长规整技术的发展历程和主要实现方法,重点阐述了主要实现算法的原理,并仿真实现了适合实时处理的两种时域算法,比较分析了两种时域方法的效果。最后对语音时长规整技术进行了展望。相似文献

4.

Shape invariant time-scale and pitch modification of speech 总被引：7，自引：0，他引：7

Quatieri T.F. McAulay R.J. 《Signal Processing, IEEE Transactions on》1992,40(3):497-510

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. A time-scale modification system that preserves this shape-invariance property during voicing is developed. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its ability to perform time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing 相似文献

5.

目适应同步叠加语音时长规整算法 总被引：3，自引：0，他引：3

杜守富毛启容詹水照《通信学报》2005,26(2):136-140

针对时域同步叠加时长规整算法计算量大,不适合在实时语音处理中应用的情形,提出了一种新的基于同步叠加方法的时长规整算法——自适应同步叠加算法,根据语音信号的准周期性质,即时调整搜索算法中的搜索间隔,以最快的速度找到最准确的叠加位置,从而达到实时处理的目的。该算法的分析测试表明,合成的语音质量高、实时性好,可以有效地应用到实时语音处理场合中。相似文献

6.

一种语音更改技术的研究与实现

何峰于东武林嘉宇《电声技术》2007,31(2):54-56,59

基于时域基音同步叠加算法完成了对语音信号的更改。首先求出语音信号的基音周期,接着对语音信号进行基音标注,然后对基音周期进行更改,最后,将语音信号按照更改后的基音周期基于时域基音同步叠加算法进行语音合成。实验表明,语音更改方法可得到很好的效果。相似文献

7.

Unconstrained Pitch Contour Modification Using Instants of Significant Excitation

Krothapalli Sreenivasa Rao 《Circuits, Systems, and Signal Processing》2012,31(6):2133-2152

This paper proposes a flexible method for pitch contour modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the Linear Prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch contour is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good, and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms and listening tests. Listening tests are performed on voice conversion application, where the source speaker’s pitch contour is modified by the proposed method according to the target speaker’s pitch contour. The performance of the proposed method is compared with Linear Prediction Pitch Synchronous Overlap and Add (LP-PSOLA) method using listening tests, for the voice conversion application. 相似文献

8.

改进汉语数码语音识别中的语音特征提取性能 总被引：3，自引：0，他引：3

顾良刘润生《电路与系统学报》1997,2(4):1-6

汉语数据码语音识别中存在三种与语音特征提取性能有关的语音混淆。相似文献

9.

Channel and source considerations of a bit-rate reduction technique for a possible wireless communications system's performance enhancement

Ilk H.G. Tugac S. 《Wireless Communications, IEEE Transactions on》2005,4(1):93-99

In wireless commercial and military communications systems, where bandwidth is at a premium, robust low-bit-rate speech coders are essential. They operate at fix bit rates and those bit rates cannot be altered without major modifications in the vocoder design. A novel approach to vocoders, in order to reduce the bit rate required to transmit speech signal, is proposed. While traditional low-bit-rate vocoders code original input speech, the proposed procedure operates on the time-scale modified signal. The proposed method offers any bit rate from 2400 b/s to downwards without modifying the principle vocoder structure, which is the new NATO standard, Stanag 4591, Mixed Excitation Linear Prediction (MELP) vocoder. We consider the application of transmitting MELP-encoded speech over noisy communication channels by applying different modulation techniques, after time-scale compression is applied. Three different time-scale modification algorithms have been evaluated and waveform similarity overlap and add (WSOLA) algorithm has been selected for time-scale modification purposes. Computer simulation results, both source and channel, are presented in terms of objective speech quality metrics and informal subjective listening tests. Design parameters such as codec complexity and delay are also investigated. Simulation results lead to a possible wireless communications system, whose performance might be enhanced by using the spared bits offered by the procedure. 相似文献

10.

Interference Suppression Using Principal Subspace Modification in Multichannel Wiener Filter and Its Application to Speech Recognition

Gibak Kim 《ETRI Journal》2010,32(6):921-931

It has been shown that the principal subspace‐based multichannel Wiener filter (MWF) provides better performance than the conventional MWF for suppressing interference in the case of a single target source. It can efficiently estimate the target speech component in the principal subspace which estimates the acoustic transfer function up to a scaling factor. However, as the input signal‐to‐interference ratio (SIR) becomes lower, larger errors are incurred in the estimation of the acoustic transfer function by the principal subspace method, degrading the performance in interference suppression. In order to alleviate this problem, a principal subspace modification method was proposed in previous work. The principal subspace modification reduces the estimation error of the acoustic transfer function vector at low SIRs. In this work, a frequency‐band dependent interpolation technique is further employed for the principal subspace modification. The speech recognition test is also conducted using the Sphinx‐4 system and demonstrates the practical usefulness of the proposed method as a front processing for the speech recognizer in a distant‐talking and interferer‐present environment. 相似文献

11.

基于混合模型状态修正算法的非母语语音识别

张晴晴潘接林颜永红《数字通信》2009,36(1):33-37

非母语语音识别的性能较低,对于刚开始学习目标语言的说话人或者口音很重的说话人而言,性能下降更为明显。本文提出一种新型的双语模型修正算法用于提高非母语语音的识别性能。在该算法中,基线声学模型的每个状态都将被代表说话人母语特点的辅助模型状态所修正。文章给出了状态修正准则以及不同候选修正状态数下的性能比较。相比已用非母语训练数据自适应以后的基线声学模型,通过双语模型修正的声学模型在保证识别实时率的前提下,短语错误率相对下降了11．7％。相似文献

12.

Application of Emotion Recognition and Modification for Emotional Telugu Speech Recognition

Vegesna Vishnu Vidyadhara Raju Gurugubelli Krishna Vuppala Anil Kumar 《Mobile Networks and Applications》2019,24(1):193-201

Majority of the automatic speech recognition systems (ASR) are trained with neutral speech and the performance of these systems are affected due to the presence of emotional content in the speech. The recognition of these emotions in human speech is considered to be the crucial aspect of human-machine interaction. The combined spectral and differenced prosody features are considered for the task of the emotion recognition in the first stage. The task of emotion recognition does not serve the sole purpose of improvement in the performance of an ASR system. Based on the recognized emotions from the input speech, the corresponding adapted emotive ASR model is selected for the evaluation in the second stage. This adapted emotive ASR model is built using the existing neutral and synthetically generated emotive speech using prosody modification method. In this work, the importance of emotion recognition block at the front-end along with the emotive speech adaptation to the ASR system models were studied. The speech samples from IIIT-H Telugu speech corpus were considered for building the large vocabulary ASR systems. The emotional speech samples from IITKGP-SESC Telugu corpus were used for the evaluation. The adapted emotive speech models have yielded better performance over the existing neutral speech models.

相似文献

13.

Time-varying parametric modeling of speech

Mark G. Hall Alan V. Oppenheim Alan S. Willsky 《Signal processing》1983,5(3):267-285

For linear predictive coding (LPC) of speech, the speech waveform is modeled as the output of an all-pole filter. The waveform is divided into many short intervals (10–30 msec) during which the speech signal is assumed to be stationary. For each interval the constant coefficients of the all-pole filter are estimated by linear prediction by minimizing a squared prediction error criterion. This paper investigates a modification of LPC, called time-varying LPC, which can be used to analyze nonstationary speech signals. In this method, each coefficient of the all-pole filter is allowed to be time-varying by assuming it is a linear combination of a set of known time functions. The coefficients of the linear combination of functions are obtained by the same least squares error technique used by the LPC. Methods are developed for measuring and assessing the performance of time-varying LPC and results are given from the time-varying LPC analysis of both synthetic and real speech. 相似文献

14.

Real-Time Implementation of Time Domain Harmonic Scaling of Speech for Rate Modification and Coding

《Solid-State Circuits, IEEE Journal of》1983,18(1):10-24

Time domain harmonic scaling (TDHS) has been realized in real time on the Bell Laboratories digital signal processing (DSP) integrated circuit. It is an algorithm that can expand or compress the bandwidth and sampling rate of speech by taking advantage of the pitch structure in the speech signal. As such it is useful in a variety of speech applications including speech coding, speech enhancement, and rate modification. A single DSP can perform compression and a second DSP can perform expansion. Both operations require pitch information to be supplied with the input speech. Included in the system is a real-time pitch/periodicity detector which has also been implemented on a single DSP. Its design is based on a novel modification of the autocorrelation function type pitch detector. This paper presents details of both the TDHS and pitch detector implementation and discusses their performances. In particular in this paper we discuss a 2:1 compression and expansion system that has been used as part of a 9.6 kbit/s speech coder. TDHS was previously thought to require a much larger buffer than the RAM memory available in the DSP. We show that for all the compression/expansion ratios of interest the buffer size needed is twice the maximum pitch period. 相似文献

15.

基于状态机的语音电子密码锁设计 总被引：1，自引：0，他引：1

吴海涛梁迎春《电子工程师》2007,33(4):78-80

为了进一步减少现有电子密码锁系统的规模,提高其性能的灵活性,设计了一种新型的语音电子密码锁.它具有智能语音提示、开锁、超次锁定、自动报警、管理员解锁、修改用户密码等功能,用FPGA(现场可编程门阵列)芯片和语音芯片ISD2560实现,体积小、电路简单.实际实验及仿真结果表明该设计功耗低、安全可靠,维护和升级方便,具有广阔的应用前景. 相似文献

16.

语音时长规整SOLA算法的最佳参数选择 总被引：1，自引：0，他引：1

周俊陈砚圃谭薇高悦《微电子学与计算机》2007,24(4):54-58,62

在研究语音时长规整算法——同步叠加算法（SOLA）的基础上，依据MOS计分原则和巴克谱失真测度评价方法，对不同参数情况的规整语音进行了实验研究，得到了合成后相邻帧间距Sa，最大搜索长度Kmax等参数的最、佳选择区间。并分析了参数选取对规整语音质量的影响，对SOLA算法的各种场合的运用具有一定的指导意义。相似文献

17.

Vowel-Based Non-uniform Prosody Modification for Emotion Conversion

Hari Krishna Vydana Sudarsana Reddy Kadiri Anil Kumar Vuppala 《Circuits, Systems, and Signal Processing》2016,35(5):1643-1663

The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches. 相似文献

18.

Multitapering and a wavelet variant of MFCC in speech recognition

Ricotti L.P. 《Vision, Image and Signal Processing, IEE Proceedings -》2005,152(1):29-35

In speech recognition (ASR) based on hidden Markov models (HMM) it is necessary to obtain a spectral approximation with a reduced set of representation coefficients. The author introduces to the speech parameterisation scheme multitapering and a modification of the usual mel frequency cepstrum coefficients (MFCC) processing scheme based on wavelets on intervals (wavelet frequency coefficients, WFC). Phoneme recognition performance improvements compared to the MFCC have been experimentally verified on data from a speech database, using multitapering and WFC. 相似文献

19.

Fast SOLA-Based Time Scale Modification Using Envelope Matching

Peter H.W. Wong Oscar C. Au 《The Journal of VLSI Signal Processing》2003,35(1):75-90

Time scale modification (TSM) of speech and audio signals is very useful in many applications such as MPEG-4 and fast/slow browsing of pre-recorded materials. Synchronized Overlap-and-Add (SOLA) is a time-domain TSM algorithm known to achieve good speech and audio quality. One problem of SOLA is that it requires a large amount of computation in the search of the best matching point between the analysis and synthesis frames. In this paper, we propose two algorithms, envelope-matching TSM (EM-TSM) and modified EM-TSM (MEM-TSM), to simplify the computation with negligible perceptual quality degradation. In EM-TSM, 1-bit sign information is used in the search to substitute the full-precision signal samples used in SOLA. Three additional computation reduction measures, namely simplified formulation, recursive computation and search-point reduction, are applied to achieve significant computation reduction. In MEM-TSM, we reduce the computation of EM-TSM further by introducing zero-crossing point reduction and predictive search skipping. We also improve the quality of the time-scaled signals by introducing multiple-candidate re-examination, and frame-size modification. Simulation results show that the proposed MEM-TSM can achieve computational reduction factors as large as 300 with very good perceptual quality of time-scaled speech and audio. 相似文献

20.

Peak-to-RMS reduction of speech based on a sinusoidal model

Quatieri T.F. McAulay R.J. 《Signal Processing, IEEE Transactions on》1991,39(2):273-288

A sinusoidal-based analysis/synthesis system is used to apply a radar design solution to the problem of dispersing the phase of a speech waveform. Unlike conventional methods of phase dispersion, this solution technique adapts dynamically to the pitch and spectral characteristics of the speech, while maintaining the original spectral envelope. The solution can also be used to drive the sine-wave amplitude modification for amplitude compression, and is coupled to the desired shaping of the speech spectrum. The proposed dispersion solution, when integrated with amplitude compression, results in a significant reduction in the peak-to-RMS (root-mean-square) ratio of the speech waveform with acceptable loss in quality. Application of a real-time prototype sine-wave preprocessor to AM radio broadcasting is described 相似文献