首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In low rate code-excited linear predictive (CELP) coders, the LPC spectral information is usually quantized and transmitted on a frame-by-frame basis about every 20 to 30 msec. The quality of speech reproduced by a CELP coder can be improved by making spectral transitions as smooth and continuous as possible. One way in which this can be accomplished without increasing the transmission bit rate is to interpolate the LPC spectral parameters between adjacent extraction frames. This, however, usually leads to a dramatic increase in the computations required for the codebook search. The paper presents a new LPC interpolation technique, based on interpolating the impulse response of the LPC synthesis filter. It demonstrates that this method offers a significant complexity reduction for the codebook search over other typical interpolation schemes. Furthermore, the experiments show that the coder using the impulse response for interpolation produces the same speech quality as the coder using the LSP parameters for interpolation, and both these parameter sets are superior to other LPC representations for interpolation  相似文献   

2.
李晓明  鲍长春  贾懋 《电子学报》2015,43(7):1286-1293
基于语音和音频信号的固有周期性特征,本文构建了一种适合语音和音频信号的统一分析/合成模型,并分别在24kbps和32kbps码率下,实现了对宽带语音和音频信号的高质量分层编码.首先,本文将具有时变周期的输入信号规整为具有固定周期的信号,并对规整后的周期信号构建规整矩阵;其次,对规整矩阵的行和列分别进行调制叠接变换(MLT)和离散余弦变换(DCT),完成规整矩阵的稀疏化;最后,利用分带量化和矢量哈夫曼编码完成稀疏矩阵元素的量化和编码.主客观测试结果表明,本文所提方法的语音、音频及其混合信号的编码质量均优于同等速率下的ITU-T G.722.1和AMR-WB编码器.  相似文献   

3.
Neurofuzzy systems-the combination of artificial neural networks with fuzzy logic-have become useful in many application domains. However, conventional neurofuzzy models usually need enhanced representation power for applications that require context and state (e.g., speech, time series prediction, control). Some of these applications can be readily modeled as finite state automata. Previously, it was proved that deterministic finite state automata (DFA) can be synthesized by or mapped into recurrent neural networks by directly programming the DFA structure into the weights of the neural network. Based on those results, a synthesis method is proposed for mapping fuzzy finite state automata (FFA) into recurrent neural networks. Furthermore, this mapping is suitable for direct implementation in very large scale integration (VLSI), i.e., the encoding of FFA as a generalization of the encoding of DFA in VLSI systems. The synthesis method requires FFA to undergo a transformation prior to being mapped into recurrent networks. The neurons are provided with an enriched functionality in order to accommodate a fuzzy representation of FFA states. This enriched neuron functionality also permits fuzzy parameters of FFA to be directly represented as parameters of the neural network. We also prove the stability of fuzzy finite state dynamics of the constructed neural networks for finite values of network weight and, through simulations, give empirical validation of the proofs. Hence, we prove various knowledge equivalence representations between neural and fuzzy systems and models of automata  相似文献   

4.
当前基于预训练说话人编码器的语音克隆方法可以为训练过程中见到的说话人合成较高音色相似性的语音,但对于训练中未看到的说话人,语音克隆的语音在音色上仍然与真实说话人音色存在明显差别。针对此问题,本文提出了一种基于音色一致的说话人特征提取方法,该方法使用当前先进的说话人识别模型TitaNet作为说话人编码器的基本架构,并依据说话人音色在语音片段中保持不变的先验知识,引入一种音色一致性约束损失用于说话人编码器训练,以此提取更精确的说话人音色特征,增加说话人表征的鲁棒性和泛化性,最后将提取的特征应用端到端的语音合成模型VITS进行语音克隆。实验结果表明,本文提出的方法在2个公开的语音数据集上取得了相比基线系统更好的性能表现,提高了对未见说话人克隆语音的音色相似度。  相似文献   

5.
Example-based learning, as performed by neural networks and other approximation and classification techniques, is both computationally intensive and I/O intensive, typically Involving the optimization of hundreds or thousands of parameters during repeated network evaluations over a database of example vectors. Although there Is currently no dominant approach or technique among the various neural networks and learning algorithms, the basic functionality of most neural networks can be conceptually realized as a multidimensional look-up table. While multidimensional look-up tables are clearly impractical due to the exponential memory requirements, we are pursuing an approach using interpolation based only on the sparse data provided by an initial example database. In particular, we have designed prototype VLSI components for searching multidimensional example databases for the X closest examples to an input query as determined by a programmable metric using a massively parallel search. This nearest-neighbor approach can be used directly for classification, or in conjunction with any number of neural network algorithms that exploit local fitting. The hardware removes the I/O bottleneck from the learning task by supplying a reduced set of examples for localized training or classification. Though nearest-neighbor retrieval algorithms have efficient software implementations for low-dimensional databases, exhaustive searching is the only effective approach for handling high-dimensional data. The parallel VLSI hardware we have designed can accelerate the exhaustive search by three orders of magnitude. We believe this special purpose VLSI will have direct application in systems requiring learning functionality and in accelerating learning applications on large, high-dimensional databases  相似文献   

6.
基于小波变换的2.4kbit/s波形内插语音编码算法   总被引:1,自引:0,他引:1  
王晶  匡镜明  谢湘 《通信学报》2007,28(5):43-48
基于双正交小波滤波器组对波形内插编码中提取的特征波进行多级分解与重构,提出了一种基于小波变换(WT)的2.4kbit/s特征波形内插(CWI)语音编码算法。编码端去除了特征波对齐运算,并对幅度谱进行多级分解,相位谱不传输,鉴于小波变换对信号的压缩特性,仅传输对人耳感知起主要贡献的最后一级特征波幅度谱;解码端对各尺度空间采用单独重建的方法,相位信息在重构的末级与幅度谱结合,并由浊音度标志选择固定或随机相位。此外,根据语音信号的时变特性,由基于子帧的浊音度标志选择需要传输的幅度谱及量化模式。主观R-A/B测试表明,这种基于小波变换的2.4kbit/s编码算法的合成语音质量明显优于标准的2.4kbit/s的MELP编码器及FS1016的4.8kbit/sCELP编码器,亦优于3.8kbit/s的传统CWI编码框架下的合成语音效果。  相似文献   

7.
Two very different subband coders are described. The first is a modified dynamic bit-allocation-subband coder (D-SBC) designed for variable rate coding situations and easily adaptable to noisy channel environments. It can operate at rates as low as 12 kb/s and still give good quality speech. The second coder is a 16-kb/s waveform coder, based on a combination of subband coding and vector quantization (VQ-SBC). The key feature of this coder is its short coding delay, which makes it suitable for real-time communication networks. The speech quality of both coders has been enhanced by adaptive postfiltering. The coders have been implemented on a single AT&T DSP32 signal processor  相似文献   

8.
时文华  张雄伟  邹霞  孙蒙 《信号处理》2019,35(4):631-640
针对传统的神经网络未能对时频域的相关性充分利用的问题,提出了一种利用深度全卷积编解码神经网络的单通道语音增强方法。在编码端,通过卷积层的卷积操作对带噪语音的时频表示逐级提取特征,在得到目标语音高级特征表示的同时逐层抑制背景噪声。解码端和编码端在结构上对称,在解码端,对编码端获得的高级特征表示进行反卷积、上采样操作,逐层恢复目标语音。跳跃连接可以很好地解决极深网络中训练时存在的梯度弥散问题,本文在编解码端的对应层之间引入跳跃连接,将编码端特征图信息传递到对应的解码端,有利于更好地恢复目标语音的细节特征。 对特征融合和特征拼接两种跳跃连接方式、基于L1和 L2两种训练损失函数对语音增强性能的影响进行了研究,通过实验验证所提方法的有效性。   相似文献   

9.
1IntroductionTheGSMpanEuropeandigitalradiosystemhas-beendesignedwithaparticularTDMAframestfllcturewhichenablestheusingofeitherfull-rateorhalf-ratechannels.Speechandchannelcodingalgorithmsforfull-ratechannelshavebeenindependentlystandardized,leadingrespectivelytotheRPE-LTPalgorithmandprotectionschemebasedonaconvolutionalcodewithaCRCforerrordetection.StandardiZationofacombinedspeechandchannelhalf-ratecodecataglobalrateofII.4kbpshasstartedunderthecontrolofETSI.Theobjectiveisverychalleng…  相似文献   

10.
11.
Voice is the preferred method of human communication. Although there have been times when it seemed that the voice communications problem was solved, such as when the PSTN was our primary network or later when digital cellular networks reached maturity, such is not the case today. This paper addresses the challenges and opportunities starting from the basic issues in speech coder design, developing the important speech coding techniques and standards, discussing current and future applications, outlining techniques for evaluating speech coder performance, and identifying research directions. The most prominent speech coding standards are presented and their properties, such as performance, complexity, and coding delay, analyzed. Particular networks and applications for each standard are included. Further, reflecting upon the issues and developments highlighted in this paper, it becomes evident that there is a diverse set of challenges and opportunities for research and innovation in speech coding and voice communications.  相似文献   

12.
基于高效用神经网络的文本分类方法   总被引:1,自引:0,他引:1       下载免费PDF全文
吴玉佳  李晶  宋成芳  常军 《电子学报》2020,48(2):279-284
现有的基于深度学习的文本分类方法没有考虑文本特征的重要性和特征之间的关联关系,影响了分类的准确率.针对此问题,本文提出一种基于高效用神经网络(High Utility Neural Networks,HUNN)的文本分类模型,可以有效地表示文本特征的重要性及其关联关系.利用高效用项集挖掘(Mining High Utility Itemsets,MHUI)算法获取数据集中各个特征的重要性以及共现频率.其中,共现频率在一定程度上反映了特征之间的关联关系.将MHUI作为HUNN的挖掘层,用于挖掘每个类别数据中重要性和关联性强的文本特征.然后将这些特征作为神经网络的输入,再经过卷积层进一步提炼类别表达能力更强的高层次文本特征,从而提高模型分类的准确率.通过在6个公开的基准数据集上进行实验分析,提出的算法优于卷积神经网络(Convolutional Neural Networks,CNN),循环神经网络(Recurrent Neural Networks,RNN),循环卷积神经网络(Recurrent Convolutional Neural Networks,RCNN),快速文本分类(Fast Text Classifier,FAST),分层注意力网络(Hierarchical Attention Networks,HAN)等5个基准算法.  相似文献   

13.
A theoretical method of evaluating degradations of variable rate coders in a multichannel digital speech interpolation (DSI) system is developed. Each of the coder outputs has a variable rate based on the algorithm. The DSI system multiplexes the outputs of these variable rate coders into a fixed bit rate channel. During periods of high activity all active users are served, but at a reduced rate depending on the demand. The degradation due to high activity is shared by all active users. This system avoids speech clipping and "freeze-out" distortion. Theoretical expressions of the system overload probability and the probability of degradation to a particular user in the DSI system are derived. Two types of variable rate coders, namely, a constant quality subband coder and a constant noise subband coder, are chosen and used as examples. Comparisons of the degradations are made between the theoretical results and computer simulated results for the two types of variable rate coders, and close agreement is observed. The theory is applicable to other variable rate coding algorithms as well. In this study, all of the simulations are made at 40 percent speech activity and the average rate of the variable rate coders is close to 16 kbits/s. Objective quality measures indicate that in a system with a trunk size larger than 40, the variable rate coder DSI system can achieve a 2:1 compression with a degradation of less than 1 dB compared to non-DSI variable rate coders. This corresponds to a total gain of 8:1 when compared to 64 kbit/s PCM.  相似文献   

14.
A voice conversion (VC) system was designed based on Gaussian mixture model (GMM) and radial basis function (RBF) neural network. As a voice conversion model, RBF network needs quantities of training data to improve its performance. For one speech, the networks trained by different segments of data have different transformation effects. Since trying segment by segment to obtain the best conversion effect is complex, a conversion method was proposed, that uses GMM for statistics before training RBF network to aim at the problem. The speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT) model is used for accurate extraction of vocal tract spectrum. Then GMM is used to classify the numerous spectral parameters. The obtained mean parameters were trained in RBF network. Experiment reveals that, the soft classification ability of GMM can promptly realize the reduction and classification of training data under the premise of ensuring the training effect. The selection complexity is decreased thereafter. Compared to the conventional RBF network training methods, this method can make the transformation of spectral parameters more effective and improve the quality of converted speech.  相似文献   

15.
High compression rates of speech signals may be achieved by coding schemes based on relevant linguistic segments. A system is described that relies on a diphone recogniser as the coder and on a speech synthesiser reproducing speech starting from a diphone codebook as the decoder. The spoken message is encoded in textual (phoneme labels) plus prosody representation. This speech coding technique may be used for voice mail or phone communication over low bit rate channels  相似文献   

16.
基于增强型混合激励线性预测(MELPe)模型,设计了一款600bps低速率语音编码器。该编码器在保持MELPe算法特征的同时,利用相邻帧的帧间冗余,把连续的三帧构成一个超帧,对超帧采用多模式预测和多级矩阵量化技术进行联合量化。同时针对超帧的不同模式,通过预测系数对相邻超帧的模式转换进行处理,实现线谱对参数(LSF)的矢量量化。最后对基音周期与增益参数进行联合量化,进一步提高量化效率,完成一款在600bps下仍具有较好合成语音质量的语音编码器的设计。  相似文献   

17.
提出了一种新颖的基于高斯混合模型(GMM)的甚低码率语音编码系统.该编码器利用GMM对短时语音谱包络进行拟合的方法来对语音进行参数化表示.编码时,语音经预处理、分帧加窗后,再经FFT分析得到分帧语音的信号频谱,并获得平滑谱包络.然后采用GMM对谱包络进行拟合,用GMM参数(均值、方差、权重)对语音谱加以表示.由于GMM参数较少,从而可以使得码率甚低.解码时,根据编码逆运算生成谱包络,浊音信号利用正弦模型加以合成,清音信号经IFFT合成.实验仿真结果表明:该编码器在传输码率降低到2.35 kb/s时,仍可获得音质令人满意的解码语音.  相似文献   

18.
In this paper we propose to develop novel techniques for signal/image decomposition, and reconstruction based on the B-spline mathematical functions. Our proposed B-spline based multiscale/resolution representation is based upon a perfect reconstruction analysis/synthesis point of view. Our proposed B-spline analysis can be utilized for different signal/imaging applications such as compression, prediction, and denoising. We also present a straightforward computationally efficient approach for B-spline basis calculations that is based upon matrix multiplication and avoids any extra generated basis. Then we propose a novel technique for enhanced B-spline based compression for different image coders by preprocessing the image prior to the decomposition stage in any image coder. This would reduce the amount of data correlation and would allow for more compression, as will be shown with our correlation metric. Extensive simulations that have been carried on the well-known SPIHT image coder with and without the proposed correlation removal methodology are presented. Finally, we utilized our proposed B-spline basis for denoising and estimation applications. Illustrative results that demonstrate the efficiency of the proposed approaches are presented.  相似文献   

19.
基于离散余弦变换的波形内插语音编码算法   总被引:2,自引:0,他引:2       下载免费PDF全文
刘靖宇  鲍长春  李如玮 《电子学报》2009,37(7):1599-1605
 针对波形内插(Waveform Interpolation,WI)语音编码的特征波形分解问题,本文首先提出了基于离散余弦变换(Discrete Cosine Transform,DCT)的特征波形分解方法,避免了复杂的特征波形对齐运算;其次,针对WI的相位重建问题,提出了清/浊音相位判决和浊音相位分类的方法,提高了重建语音质量;最后,分别构建了速率为2.0kbps和1.6kbps的DCT-WI声码器.主观MOS分表明,2.0kbps的DCT-WI声码器质量优于2.4kbps MELP声码器,1.6kbps的DCT-WI声码器亦取得了良好的听觉效果.  相似文献   

20.
This paper presents several strategies to improve the performance of very low bit rate speech coders and describes a speech codec that incorporates these strategies and operates at an average bit rate of 1.2 kb/s. The encoding algorithm is based on several improvements in a mixed multiband excitation (MMBE) linear predictive coding (LPC) structure. A switched-predictive vector quantiser technique that outperforms previously reported schemes is adopted to encode the LSF parameters. Spectral and sound specific low rate models are used in order to achieve high quality speech at low rates. An MMBE approach with three sub-bands is employed to encode voiced frames, while fricatives and stops modelling and synthesis techniques are used for unvoiced frames. This strategy is shown to provide good quality synthesised speech, at a bit rate of only 0.4 kb/s for unvoiced frames. To reduce coding noise and improve decoded speech, spectral envelope restoration combined with noise reduction (SERNR) postfilter is used. The contributions of the techniques described in this paper are separately assessed and then combined in the design of a low bit rate codec that is evaluated against the North American Mixed Excitation Linear Prediction (MELP) coder. The performance assessment is carried out in terms of the spectral distortion of LSF quantisation, mean opinion score (MOS), A/B comparison tests and the ITU-T P.862 perceptual evaluation of speech quality (PESQ) standard. Assessment results show that the improved methods for LSF quantisation, sound specific modelling and synthesis and the new postfiltering approach can significantly outperform previously reported techniques. Further results also indicate that a system combining the proposed improvements and operating at 1.2 kb/s, is comparable (slightly outperforming) a MELP coder operating at 2.4 kb/s. For tandem connection situations, the proposed system is clearly superior to the MELP coder.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号