共查询到20条相似文献,搜索用时 171 毫秒
1.
该文提出一种基于最大似然可变子空间的说话人自适应方法。在训练阶段,对训练集中的说话人相关模型参数进行主分量分析,得到一组说话人基矢量;在自适应阶段,通过最大似然准则选取与当前说话人相关性最大的基矢量子集,进而将新的说话人相关模型限制在这组基矢量所张成的说话人子空间中,通过求解每一个基矢量对应的系数从而进行说话人自适应。与经典的基于子空间的说话人自适应方法不同,该文中的说话人子空间是在自适应阶段动态选取的,所需要估计的参数更少,在少量自适应数据下可以得到更稳健的自适应结果。在基于微软语料库的连续语音识别自适应实验中,给定极少量自适应数据(小于5 s),在有监督和无监督条件下,该文方法均优于经典的本征音自适应方法和基于最大似然线性回归的方法。 相似文献
2.
3.
为了将源说话人的语音特征进行转换,使得听起来像是目标说话人的语音,本文提出了一种同语种的说话人转换算法。算法分为两个部分,一是利用高斯混合模型进行谱包络的转换,采用改进的方法对模型进行训练,去除语音数据时间对齐不准确的影响;二是基于高斯混合模型分类器和残差码本对残差信号预测。算法还对转换语音进行了后续处理,增强了语音的自然度。非正式的听觉测试表明,在利用时间不长的语音数据训练后,此算法可以进行说话人的转换,转换语音明显带有目标说话人的特征,且具有较高的可懂度。 相似文献
4.
5.
6.
当前基于预训练说话人编码器的语音克隆方法可以为训练过程中见到的说话人合成较高音色相似性的语音,但对于训练中未看到的说话人,语音克隆的语音在音色上仍然与真实说话人音色存在明显差别。针对此问题,本文提出了一种基于音色一致的说话人特征提取方法,该方法使用当前先进的说话人识别模型TitaNet作为说话人编码器的基本架构,并依据说话人音色在语音片段中保持不变的先验知识,引入一种音色一致性约束损失用于说话人编码器训练,以此提取更精确的说话人音色特征,增加说话人表征的鲁棒性和泛化性,最后将提取的特征应用端到端的语音合成模型VITS进行语音克隆。实验结果表明,本文提出的方法在2个公开的语音数据集上取得了相比基线系统更好的性能表现,提高了对未见说话人克隆语音的音色相似度。 相似文献
7.
G.729用固定码本和自适应码本构成的激励通过合成滤波器恢复出较高质量的语音信号.由于算法复杂耗时过多,不能在DSP上实时实现,其中固定码本搜索和自适应码本搜索是最复杂的模块.介绍了码本搜索方法并进行了改进,使复杂度大大降低.结果表明语音质量没有明显下降. 相似文献
8.
针对非对称语音库情况下的语音转换,提出了一种有效的基于模型自适应的语音转换方法。首先,通过最大后验概率(Maximum A Posteriori,MAP)方法从背景模型分别自适应训练得到源说话人和目标说话人的模型;然后,通过说话人模型中的均值向量训练得到频谱特征的转换函数;并进一步与传统的INCA转换方法相结合,提出了基于模型自适应的INCA语音转换方法,有效实现了源说话人频谱特征向目标说话人频谱特征的转换。通过客观测试和主观测听实验对提出的方法进行评价,实验结果表明,与INCA语音转换方法相比,本文提出的方法可以取得更低的倒谱失真、更高的语音感知质量和目标倾向度;同时更接近传统基于对称语音库的高斯混合模型(Gaussian Mixture Model,GMM)的语音转换方法的效果。 相似文献
9.
10.
数字移动通信近几年来得到长足的发展,而语音低码率编码是其中一项关键技术.本文提出一种脉冲自适应码本激励编码方案.该方案将规则脉冲激励算法和码本激励算法有机地相结合,采用新的有效码本结构,降低码字的维数,提高码字的效率,另外,在最佳激励搜索上采用了脉冲自适应搜索方式,避免了全寻的工作方式,使得码本搜索的计算量大大降低。通过计算机模拟结果表明,该编码方案和美国EIA公布的编码算法相比具有计算量小,所占储存空间少而合成音质几乎不下降等优点.在8Kb/s PACELP能够合成出令人满意的语音质量。而这一切又保证了PACELP算法可以在一块TMS320 C25上实现。 相似文献
11.
This paper presents a new technique for designing a jointly optimized residual vector quantizer (RVQ). In conventional stage-by-stage design procedure, each stage codebook is optimized for that particular stage distortion and does not consider the distortion from the subsequent stages. However, the overall performance can be improved if each stage codebook is optimized by minimizing the distortion from the subsequent stage quantizers as well as the distortion from the previous stage quantizers. This can only be achieved when stage codebooks are jointly designed for each other. In this paper, the proposed codebook design procedure is based on a multilayer competitive neural network where each layer of this network represents one stage of the RVQ. The weight connecting these layers form the corresponding stage codebooks of the RVQ. The joint design problem of the RVQ's codebooks (weights of the multilayer competitive neural network) is formulated as a nonlinearly constrained optimization task which is based on a Lagrangian error function. This Lagrangian error function includes all the constraints that are imposed by the joint optimization of the codebooks. The proposed procedure seeks a locally optimal solution by iteratively solving the equations for this Lagrangian error function. Simulation results show an improvement in the performance of an RVQ when designed using the proposed joint optimization technique as compared to the stage-by-stage design, where both generalized Lloyd algorithm (GLA) and the Kohonen learning algorithm (KLA) were used to design each stage codebook independently, as well as the conventional joint-optimization technique 相似文献
12.
An effective competitive learning algorithm based on the partial distortion theorem is proposed for optimal codebook design. Compared with some representative learning algorithms for codebook design, the proposed algorithm has consistently shown the best performance for designing codebooks of different sizes, especially large size codebooks 相似文献
13.
Many image compression techniques require the quantization of multiple vector sources with significantly different distributions. With vector quantization (VQ), these sources are optimally quantized using separate codebooks, which may collectively require an enormous memory space. Since storage is limited in most applications, a convenient way to gracefully trade between performance and storage is needed. Earlier work addressed this problem by clustering the multiple sources into a small number of source groups, where each group shares a codebook. We propose a new solution based on a size-limited universal codebook that can be viewed as the union of overlapping source codebooks. This framework allows each source codebook to consist of any desired subset of the universal code vectors and provides greater design flexibility which improves the storage-constrained performance. A key feature of this approach is that no two sources need be encoded at the same rate. An additional advantage of the proposed method is its close relation to universal, adaptive, finite-state and classified quantization. Necessary conditions for optimality of the universal codebook and the extracted source codebooks are derived. An iterative design algorithm is introduced to obtain a solution satisfying these conditions. Possible applications of the proposed technique are enumerated, and its effectiveness is illustrated for coding of images using finite-state vector quantization, multistage vector quantization, and tree-structured vector quantization. 相似文献
14.
考虑到大规模多输入多输出(Multiple Input Multiple Output, MIMO)阵列尺寸及外形等因素的限制,本文提出了一种适用于基站采用交叉极化天线面阵的大规模MIMO(Massive MIMO)码本设计方法。该方法综合考虑交叉极化信道的对角化特点和相邻天线之间的相关性,首先设计出与采用交叉极化线阵匹配的码本,进而利用垂直维度天线间的相关性对其进行扩展,最终生成与交叉极化面阵相匹配的码本。仿真结果表明该码本设计方法可使大规模MIMO系统的传输速率和误码率性能得到明显地提升。 相似文献
15.
现有的SCMA(稀疏码分多址)码本采用高维复数星座和映射矩阵相结合的设计方法,存在高维复数星座设计过程复杂,且任意时频资源星座图星座点间的最小欧式距离难以控制的问题。针对上述问题提出了一种基于时频资源星座的码本设计方法。首先设计一个二维格星座,然后通过星座优化和扩频得到特定用户的码本。所提方法不仅可以获得最大成形增益和最大最小欧式距离的合成星座图,还能使SCMA系统性能随着码本维度的增加而提升。仿真结果表明:在高斯信道下,提出的码本较现有的码本有效提升了系统性能。 相似文献
16.
In Massive MIMO systems for 5G networks,precoding technology is one of the key technologies.Aiming at user side codebook search method of the discrete Fourier transform (DFT) rotation codebook,a low complexity search algorithm was proposed.In this algorithm,all horizontal and vertical codebooks were grouped separately according to the characteristics that the precoding vectors with the same column of DFT rotation codebooks had the smallest chordal distance and the smaller chordal distance have the stronger correlation,and then the optimal horizontal and vertical codewords with maximum channel gain were obtained to form 3D precoding code-books.The simulation results indicate that the searching complexity of the proposed method is significantly reduced under conditions of insuring the system performance,moreover,this advantage becomes greater with the number of antennas increasing. 相似文献
17.
本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果,但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模;然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明,与本征音子说话人自适应算法相比,该算法在自适应数据量极少时能大幅提升性能,较好地克服过拟合现象。与本征音自适应算法相比,该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。 相似文献
18.
A novel framework of an online unsupervised learning algorithm is presented to flexibly adapt the existing speaker-independent hidden Markov models (HMMs) to nonstationary environments induced by varying speakers, transmission channels, ambient noises, etc. The quasi-Bayes (QB) estimate is applied to incrementally obtain word sequence and adaptation parameters for adjusting HMMs when a block of unlabelled data is enrolled. The underlying statistics of a nonstationary environment can be successively traced according to the newest enrolment data. To improve the QB estimate, the adaptive initial hyperparameters are employed in the beginning session of online learning. These hyperparameters are estimated from a cluster of training speakers closest to the test environment. Additionally, a selection process is developed to select reliable parameters from a list of candidates for unsupervised learning. A set of reliability assessment criteria is explored for selection. In a series of speaker adaptation experiments, the effectiveness of the proposed method is confirmed and it is found that using the adaptive initial hyperparameters in online learning and the multiple assessments in parameter selection can improve the recognition performance 相似文献
19.
Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression‐based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA‐based feature transformation matrix, it is necessary to adjust the ICA‐based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker‐independent (SI) feature transformation matrix and the speaker‐dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust. 相似文献