首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
该文提出一种基于最大似然可变子空间的说话人自适应方法。在训练阶段,对训练集中的说话人相关模型参数进行主分量分析,得到一组说话人基矢量;在自适应阶段,通过最大似然准则选取与当前说话人相关性最大的基矢量子集,进而将新的说话人相关模型限制在这组基矢量所张成的说话人子空间中,通过求解每一个基矢量对应的系数从而进行说话人自适应。与经典的基于子空间的说话人自适应方法不同,该文中的说话人子空间是在自适应阶段动态选取的,所需要估计的参数更少,在少量自适应数据下可以得到更稳健的自适应结果。在基于微软语料库的连续语音识别自适应实验中,给定极少量自适应数据(小于5 s),在有监督和无监督条件下,该文方法均优于经典的本征音自适应方法和基于最大似然线性回归的方法。  相似文献   

2.
赵力  邹采荣  吴镇扬 《电子学报》2002,30(7):967-969
本文提出了一种新的语音识别方法,它综合了VQ、HMM和无教师说话人自适应算法的优点,在每个状态通过用矢量量化误差值取代传统HMM的输出概率值来建立FVQ/HMM,同时采用基于模糊矢量量化的无教师自适应算法,来改变FVQ/HMM的各状态的码字,从而实现对未知说话人的码本适应.本文通过非特定人汉语数码(孤立和连续数码)语音识别实验,把该新的组合方法同基于CHMM的自适应和识别方法进行了比较,实验结果表明该方法的自适应和识别效果优于基于CHMM的方法.  相似文献   

3.
为了将源说话人的语音特征进行转换,使得听起来像是目标说话人的语音,本文提出了一种同语种的说话人转换算法。算法分为两个部分,一是利用高斯混合模型进行谱包络的转换,采用改进的方法对模型进行训练,去除语音数据时间对齐不准确的影响;二是基于高斯混合模型分类器和残差码本对残差信号预测。算法还对转换语音进行了后续处理,增强了语音的自然度。非正式的听觉测试表明,在利用时间不长的语音数据训练后,此算法可以进行说话人的转换,转换语音明显带有目标说话人的特征,且具有较高的可懂度。  相似文献   

4.
提出了一种新的适用于离散HMM说话人辨认系统的VQ码本训练方法,码本的训练准则是使码本中各码字的利用率趋于均等.将新方法训练的码本与用LBG算法训练的码本进行了比较,实验表明,在基于离散HMM的说话人辨认系统中,用新方法训练的码本性能优于用LBG算法训练的码本,特别是在与文本无关的情况下,使系统的正确辨认率显著提高.  相似文献   

5.
基于高斯混合模型和残差预测的说话人转换系统   总被引:1,自引:1,他引:0  
说话人转换是将源说话人的语音特征转换成目标说话人的特征,使得听起来像是目标说话人的语音。提出的说话人转换系统分为2个部分,第一部分利用高斯混合模型进行谱包络的转换,训练采用时间对齐的源说话人和目标说话人的语音数据进行。第二部分基于一个分类器和残差码本对残差信号预测。该系统在现有的说话人转换系统的基础上做了一些改进,改进后不再需要说话人模仿别人的语调,并且在某些性能上超过了现有的系统。  相似文献   

6.
当前基于预训练说话人编码器的语音克隆方法可以为训练过程中见到的说话人合成较高音色相似性的语音,但对于训练中未看到的说话人,语音克隆的语音在音色上仍然与真实说话人音色存在明显差别。针对此问题,本文提出了一种基于音色一致的说话人特征提取方法,该方法使用当前先进的说话人识别模型TitaNet作为说话人编码器的基本架构,并依据说话人音色在语音片段中保持不变的先验知识,引入一种音色一致性约束损失用于说话人编码器训练,以此提取更精确的说话人音色特征,增加说话人表征的鲁棒性和泛化性,最后将提取的特征应用端到端的语音合成模型VITS进行语音克隆。实验结果表明,本文提出的方法在2个公开的语音数据集上取得了相比基线系统更好的性能表现,提高了对未见说话人克隆语音的音色相似度。  相似文献   

7.
G.729用固定码本和自适应码本构成的激励通过合成滤波器恢复出较高质量的语音信号.由于算法复杂耗时过多,不能在DSP上实时实现,其中固定码本搜索和自适应码本搜索是最复杂的模块.介绍了码本搜索方法并进行了改进,使复杂度大大降低.结果表明语音质量没有明显下降.  相似文献   

8.
宋鹏  王浩  赵力 《信号处理》2013,29(10):1294-1299
针对非对称语音库情况下的语音转换,提出了一种有效的基于模型自适应的语音转换方法。首先,通过最大后验概率(Maximum A Posteriori,MAP)方法从背景模型分别自适应训练得到源说话人和目标说话人的模型;然后,通过说话人模型中的均值向量训练得到频谱特征的转换函数;并进一步与传统的INCA转换方法相结合,提出了基于模型自适应的INCA语音转换方法,有效实现了源说话人频谱特征向目标说话人频谱特征的转换。通过客观测试和主观测听实验对提出的方法进行评价,实验结果表明,与INCA语音转换方法相比,本文提出的方法可以取得更低的倒谱失真、更高的语音感知质量和目标倾向度;同时更接近传统基于对称语音库的高斯混合模型(Gaussian Mixture Model,GMM)的语音转换方法的效果。   相似文献   

9.
智能麦克风阵列语音分离和说话人跟踪技术研究   总被引:1,自引:1,他引:0       下载免费PDF全文
杜江  朱柯 《电子学报》2005,33(2):382-384
本文介绍一种新的基于麦克风阵列的语音分离和说话人跟踪技术.该技术使用麦克风阵列,形成一个指向感兴趣说话人的波束来增强信号,并通过方向置零来抑制其他说话人的声音和噪声,同时用自适应算法跟踪说话人的方位变化.仿真验证了该技术的有效性.与常规的自适应算法相比,该算法不需训练序列,具有显著的优势.  相似文献   

10.
数字移动通信近几年来得到长足的发展,而语音低码率编码是其中一项关键技术.本文提出一种脉冲自适应码本激励编码方案.该方案将规则脉冲激励算法和码本激励算法有机地相结合,采用新的有效码本结构,降低码字的维数,提高码字的效率,另外,在最佳激励搜索上采用了脉冲自适应搜索方式,避免了全寻的工作方式,使得码本搜索的计算量大大降低。通过计算机模拟结果表明,该编码方案和美国EIA公布的编码算法相比具有计算量小,所占储存空间少而合成音质几乎不下降等优点.在8Kb/s PACELP能够合成出令人满意的语音质量。而这一切又保证了PACELP算法可以在一块TMS320 C25上实现。  相似文献   

11.
This paper presents a new technique for designing a jointly optimized residual vector quantizer (RVQ). In conventional stage-by-stage design procedure, each stage codebook is optimized for that particular stage distortion and does not consider the distortion from the subsequent stages. However, the overall performance can be improved if each stage codebook is optimized by minimizing the distortion from the subsequent stage quantizers as well as the distortion from the previous stage quantizers. This can only be achieved when stage codebooks are jointly designed for each other. In this paper, the proposed codebook design procedure is based on a multilayer competitive neural network where each layer of this network represents one stage of the RVQ. The weight connecting these layers form the corresponding stage codebooks of the RVQ. The joint design problem of the RVQ's codebooks (weights of the multilayer competitive neural network) is formulated as a nonlinearly constrained optimization task which is based on a Lagrangian error function. This Lagrangian error function includes all the constraints that are imposed by the joint optimization of the codebooks. The proposed procedure seeks a locally optimal solution by iteratively solving the equations for this Lagrangian error function. Simulation results show an improvement in the performance of an RVQ when designed using the proposed joint optimization technique as compared to the stage-by-stage design, where both generalized Lloyd algorithm (GLA) and the Kohonen learning algorithm (KLA) were used to design each stage codebook independently, as well as the conventional joint-optimization technique  相似文献   

12.
Zhu  C. Po  L.M. 《Electronics letters》1996,32(19):1757-1758
An effective competitive learning algorithm based on the partial distortion theorem is proposed for optimal codebook design. Compared with some representative learning algorithms for codebook design, the proposed algorithm has consistently shown the best performance for designing codebooks of different sizes, especially large size codebooks  相似文献   

13.
Constrained-storage vector quantization with a universal codebook   总被引:1,自引:0,他引:1  
Many image compression techniques require the quantization of multiple vector sources with significantly different distributions. With vector quantization (VQ), these sources are optimally quantized using separate codebooks, which may collectively require an enormous memory space. Since storage is limited in most applications, a convenient way to gracefully trade between performance and storage is needed. Earlier work addressed this problem by clustering the multiple sources into a small number of source groups, where each group shares a codebook. We propose a new solution based on a size-limited universal codebook that can be viewed as the union of overlapping source codebooks. This framework allows each source codebook to consist of any desired subset of the universal code vectors and provides greater design flexibility which improves the storage-constrained performance. A key feature of this approach is that no two sources need be encoded at the same rate. An additional advantage of the proposed method is its close relation to universal, adaptive, finite-state and classified quantization. Necessary conditions for optimality of the universal codebook and the extracted source codebooks are derived. An iterative design algorithm is introduced to obtain a solution satisfying these conditions. Possible applications of the proposed technique are enumerated, and its effectiveness is illustrated for coding of images using finite-state vector quantization, multistage vector quantization, and tree-structured vector quantization.  相似文献   

14.
考虑到大规模多输入多输出(Multiple Input Multiple Output, MIMO)阵列尺寸及外形等因素的限制,本文提出了一种适用于基站采用交叉极化天线面阵的大规模MIMO(Massive MIMO)码本设计方法。该方法综合考虑交叉极化信道的对角化特点和相邻天线之间的相关性,首先设计出与采用交叉极化线阵匹配的码本,进而利用垂直维度天线间的相关性对其进行扩展,最终生成与交叉极化面阵相匹配的码本。仿真结果表明该码本设计方法可使大规模MIMO系统的传输速率和误码率性能得到明显地提升。  相似文献   

15.
现有的SCMA(稀疏码分多址)码本采用高维复数星座和映射矩阵相结合的设计方法,存在高维复数星座设计过程复杂,且任意时频资源星座图星座点间的最小欧式距离难以控制的问题。针对上述问题提出了一种基于时频资源星座的码本设计方法。首先设计一个二维格星座,然后通过星座优化和扩频得到特定用户的码本。所提方法不仅可以获得最大成形增益和最大最小欧式距离的合成星座图,还能使SCMA系统性能随着码本维度的增加而提升。仿真结果表明:在高斯信道下,提出的码本较现有的码本有效提升了系统性能。  相似文献   

16.
In Massive MIMO systems for 5G networks,precoding technology is one of the key technologies.Aiming at user side codebook search method of the discrete Fourier transform (DFT) rotation codebook,a low complexity search algorithm was proposed.In this algorithm,all horizontal and vertical codebooks were grouped separately according to the characteristics that the precoding vectors with the same column of DFT rotation codebooks had the smallest chordal distance and the smaller chordal distance have the stronger correlation,and then the optimal horizontal and vertical codewords with maximum channel gain were obtained to form 3D precoding code-books.The simulation results indicate that the searching complexity of the proposed method is significantly reduced under conditions of insuring the system performance,moreover,this advantage becomes greater with the number of antennas increasing.  相似文献   

17.
本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果,但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模;然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明,与本征音子说话人自适应算法相比,该算法在自适应数据量极少时能大幅提升性能,较好地克服过拟合现象。与本征音自适应算法相比,该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。  相似文献   

18.
A novel framework of an online unsupervised learning algorithm is presented to flexibly adapt the existing speaker-independent hidden Markov models (HMMs) to nonstationary environments induced by varying speakers, transmission channels, ambient noises, etc. The quasi-Bayes (QB) estimate is applied to incrementally obtain word sequence and adaptation parameters for adjusting HMMs when a block of unlabelled data is enrolled. The underlying statistics of a nonstationary environment can be successively traced according to the newest enrolment data. To improve the QB estimate, the adaptive initial hyperparameters are employed in the beginning session of online learning. These hyperparameters are estimated from a cluster of training speakers closest to the test environment. Additionally, a selection process is developed to select reliable parameters from a list of candidates for unsupervised learning. A set of reliability assessment criteria is explored for selection. In a series of speaker adaptation experiments, the effectiveness of the proposed method is confirmed and it is found that using the adaptive initial hyperparameters in online learning and the multiple assessments in parameter selection can improve the recognition performance  相似文献   

19.
Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression‐based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA‐based feature transformation matrix, it is necessary to adjust the ICA‐based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker‐independent (SI) feature transformation matrix and the speaker‐dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.  相似文献   

20.
该文提出了一种将模糊C-均值聚类法与矢量量化法相结合进行说话人识别的方法。该算法将从语音信号中提取的 12阶 LPC(线性预测编码)倒谱系数作为待分类样本的 12个指标,先用矢量量化法求出每个说话人表征特征参数的码书,作为模糊聚类算法的聚类中心,最后将待识别的特征矢量以得到的码书为聚类中心,进行聚类识别。该算法所使用的特征参数较少,计算比较简单,但识别率较矢量量化法高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号