首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
语音转换在教育、娱乐、医疗等各个领域都有广泛的应用,为了得到高质量的转换语音,提出了基于多谱特征生成对抗网络的语音转换算法。利用生成对抗网络对由谱特征参数生成的声纹图进行转换,利用特征级多模态融合技术使网络学习来自不同特征域的多种信息,以提高网络对语音信号的感知能力,从而得到具有良好清晰度和可懂度的高质量转换语音。实验结果表明,在主、客观评价指标上,本文算法较传统算法均有明显提升。  相似文献   

The objective of voice conversion algorithms is to modify the speech by a particular source speaker so that it sounds as if spoken by a different target speaker. Current conversion algorithms employ a training procedure, during which the same utterances spoken by both the source and target speakers are needed for deriving the desired conversion parameters. Such a (parallel) corpus, is often difficult or impossible to collect. Here, we propose an algorithm that relaxes this constraint, i.e., the training corpus does not necessarily contain the same utterances from both speakers. The proposed algorithm is based on speaker adaptation techniques, adapting the conversion parameters derived for a particular pair of speakers to a different pair, for which only a nonparallel corpus is available. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30%. A speaker identification measure is also employed that more insightfully portrays the importance of adaptation, while listening tests confirm the success of our method. Both the objective and subjective tests employed, demonstrate that the proposed algorithm achieves comparable results with the ideal case when a parallel corpus is available.  相似文献   

Robust processing techniques for voice conversion   总被引:3,自引:0,他引:3  
Differences in speaker characteristics, recording conditions, and signal processing algorithms affect output quality in voice conversion systems. This study focuses on formulating robust techniques for a codebook mapping based voice conversion algorithm. Three different methods are used to improve voice conversion performance: confidence measures, pre-emphasis, and spectral equalization. Analysis is performed for each method and the implementation details are discussed. The first method employs confidence measures in the training stage to eliminate problematic pairs of source and target speech units that might result from possible misalignments, speaking style differences or pronunciation variations. Four confidence measures are developed based on the spectral distance, fundamental frequency (f0) distance, energy distance, and duration distance between the source and target speech units. The second method focuses on the importance of pre-emphasis in line-spectral frequency (LSF) based vocal tract modeling and transformation. The last method, spectral equalization, is aimed at reducing the differences in the source and target long-term spectra when the source and target recording conditions are significantly different. The voice conversion algorithm that employs the proposed techniques is compared with the baseline voice conversion algorithm with objective tests as well as three subjective listening tests. First, similarity to the target voice is evaluated in a subjective listening test and it is shown that the proposed algorithm improves similarity to the target voice by 23.0%. An ABX test is performed and the proposed algorithm is preferred over the baseline algorithm by 76.4%. In the third test, the two algorithms are compared in terms of the subjective quality of the voice conversion output. The proposed algorithm improves the subjective output quality by 46.8% in terms of mean opinion score (MOS).  相似文献   

During the past years, various principal component analysis algorithms have been developed. In this paper, a new approach for local nonlinear principal component analysis is proposed which is applied to capture voice conversion (VC). A new structure of autoassociative neural network is designed which not only performs data partitioning but also extracts nonlinear principal components of the clusters. Performance of the proposed method is evaluated by means of two experiments that illustrate its efficiency; at first, performance of the network is described by means of an artificial dataset and then, the developed method is applied to perform VC.  相似文献   

We propose a multi-modal dialogue analysis method for medical interviews that hierarchically interprets nonverbal interaction patterns in a bottom-up manner and simultaneously visualizes the topic structure. Our method aims to provide physicians with the clues generally overlooked by conventional dialogue analysis to form a cycle of dialogue practice and analysis. We introduce a motif and a pattern cluster in the designs of the hierarchical indices of interaction and exploit the Jensen–Shannon divergence (JSD) metric to reduce the number of usable indices. We applied the proposed interpretation method of interaction patterns to develop a corpus of interviews. The results of a summary reading experiment confirmed the validity of the developed indices. Finally, we discussed the integrated analysis of the topic structure and a nonverbal summary.  相似文献   

提出了一种应用中文自由文本作为知识源的本体构造方法,将采用该方法分词后得到的词汇分别计算,进而得到在样本文本和日常语料库中的出现概率估计值,将二者对比得到出现频率的显著性指标,由此自动识别并提取领域用词汇,再应用互信息分析识别领域词汇之间的结合特性.它可自动建立可能的领域本体词汇及词汇之间基本关系的集合,同时还可构造出基于领域词汇和它们之间结合度的领域词图,为进一步进行人工本体构造提供方便的可视化界面.该成果可为实现大规模基于内容的知识管理提供自动化/半自动化本体支持.  相似文献   

Multimedia Tools and Applications - In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech...  相似文献   

语音转换技术主要应用于计算机语音合成、计算机语音翻译、语音编辑、广播及多媒体等方面。高斯混合模型(GMM)是目前语音转换的主流方法,但它的最大不足是会导致转换频谱的过平滑。其中GMM转换函数中的均值项和相关项共同导致了过平滑现象,并且均值项的影响更大。为此提出了结合码本映射法和GMM方法的修正均值法,实验表明,使用修正均值法能够有效抑制过平滑问题。改善转换性能。  相似文献   

The paper presents a novel method for detecting abrupt changes in mixture viscosities. This problem arises in numerous industrial applications and different solutions are suggested in the literature. An approach of using non-linear filtering for detection—an adaptive filter combined with a change detector—is outlined in this paper. As a case study, we present an ash stabilization batch mixing process used to stabilize a blend of ash/dolomite/water. The main problem is here to predict the quantity of water to be added, since the required amount of water varies with the wood ash quality. One adaptive filter and two change detectors are applied to benchmark data and the detectors are evaluated using basic performance measures.  相似文献   

在维吾尔语中,词缀的数量有限且构词具有一定的规律性。为了提高维吾尔语词切分算法的性能,在一个词缀库的基础上,通过分析维吾尔语的基本构词规则,提出了一种改进的非监督维吾尔语词切分方法。该方法对词进行规则切分,采用MAP切分评价模型对规则切分打分,选取得分最高的规则切分作为该词的最终切分形式。在一个5000词的测试语料上进行了实验,实验结果表明,使用该方法进行维吾尔语词切分具有更高的准确率。  相似文献   

It is necessary to remove the noise in EEG before further EEG analysis and processing. For EEG is deeply masked in the noise background, it is very difficult to denoise EEG effectively. Proposed in this paper is a novel realistic head model based sparse decomposition algorithm to denoise EEG, which is an iterative procedure combining the subject's physiology of EEG generation into the denoising procedure. In this algorithm, the lead field overcomplete dictionary is numerically calculated according to the realistic head model firstly, and then the instantaneous EEG spatial potential is decomposed into one sparse combination of atoms in the lead field matrix by matching pursuit, and the sparse combination of atoms is to be regarded as the denoised EEG signal. The realistic head based sparse decomposition was tested by the simulated noisy potential and a real EEG recording collected in an oddball stimulus experiment, the results consistently confirmed the new method removed the uncorrelated noise in EEG effectively.  相似文献   


In order to improve support for higher data rates, third-generation partnership project (3GPP) introduced dual-carrier high-speed downlink packet access (DC-HSDPA), which reaches up to 42-Mbps throughput with the use of two adjacent 5-MHz carriers in Release-8. Defining the dependence of throughput on prevailing channel parameters is crucial because a frequency-selective channel limits achieving these data rates. For this reason, DC-HSDPA throughput real field measurements were taken in different propagation environments by using the “TEMS Investigation” program. The evaluation of the measurements showed that one-parameter linear mapping methods, such as signal-to-interference ratio and channel quality indicator, are insufficient for characterizing user throughput. Therefore, this study will propose a novel mapping method with more than one variable. Although multiple linear regression gives a better normalized root-mean-square error, results have shown that frequently used artificial neural network-based mapping methods—such as those for adaptive network-based fuzzy inference system, multilayer perceptron, and generalized regression neural network (GRNN)—yield improved accuracy. From among these, user throughput can be best estimated with the use of GRNN for a commercial DC-HSDPA system, with approximately 93.3 % precision. The GRNN structure allows system designers to update system parameters to maximize user throughput.


Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system.  相似文献   

为了提高语音端点检测算法的鲁棒性,提出了一种在不同信噪比下采用不同语音特征参数的端点检测算法.对含噪语音进行基于背景噪声能量估计的信噪比估计,根据估计的信噪比大小选择不同的特征参数来进行端点检测,在高信噪比下采用传统的语音短时能量和过零率,在低信噪比下采用基音周期、高频与全频带能量比和谱失真,即算法能根据信噪比的大小来自适应调整检测方法.实验结果表明,该方法具有良好的鲁棒性,在不同的信噪比下检测的准确率都很高.  相似文献   

林晓丹  邱应强 《计算机应用》2019,39(12):3510-3514
语音变调常用于掩盖说话人身份,各种变声软件的出现使得说话人身份伪装变得更加容易。针对现有变调语音检测方法无法判断语音是经过了何种变调操作(升调或降调)的问题,通过分析语音变调在信号频谱,尤其是高频区域留下的痕迹,提出了基于翻转梅尔倒谱系数(IMFCC)统计矩特征的电子变调语音检测方法。首先,提取各语音帧IMFCC及其一阶差分;然后,计算其统计均值;最后,在该统计特征上利用支持向量机(SVM)多分类器的设计来区分原始语音、升调语音和降调语音。在TIMIT和NIST语音集上的实验结果表明,所提方法无论对于原始语音、升调语音还是降调语音都具有良好的检测性能。与MFCC作为特征构造的基线系统相比,所设计的特征的方法明显提高了变调操作的识别率。在较少的训练资源的情况下,所提方法也获得了比基于卷积神经网络(CNN)的框架更好的性能;此外,在不同数据集和不同变调方法上也都取得了较好的泛化性能。  相似文献   

Though sparse representation (Wagner et al. in IEEE Trans Pattern Anal Mach Intell 34(2):372–386, 2012, CVPR 597–604, 2009) can perform very well in face recognition (FR), it still can be improved. To improve the performance of FR, a novel sparse representation method based on virtual samples is proposed in this paper. The proposed method first extends the training samples to form a new training set by adding random noise to them and then performs FR. As the testing samples can be represented better with the new training set, the ultimate classification obtained using the proposed method is more accurate than the classification based on the original training samples. A number of FR experiments show that the classification accuracy obtained using our method is usually 2–5 % greater than that obtained using the method mentioned in Xu and Zhu (Neural Comput Appl, 2012).  相似文献   

Aeroengine is a complex multi-module system. Due to the limitation of sensor cost and sensor installation conditions, it is usually impossible to install a large number of sensors to measure the physical parameters of the aeroengine modules to establish the accurate module characteristic models to achieve the purpose of module performance evaluation. To address this issue, the high-dimensional physical field reconstruction strategy base on limited measurement data is developed, which is of great significance to the modeling of module characteristics. A reconstruction framework of a high-dimensional physical field based on limited measurement data is built. The mapping relationship between limited measurement data and high-dimensional physical field data is established, and the relevant learning strategies based on the deep learning network are designed. To verify the effectiveness of the proposed method, the simulation dataset generated by the multi-component closed-loop simulation system and the aeroengine service dataset are used for experimental verification, and the mean and variance of mean square error are used as evaluation indexes. Experimental results show that the proposed method can obtain high-dimensional physical field distribution based on limited measurement data.  相似文献   

This article proposes a novel genetic algorithm (GA) which switches the expression of the solution from a redundant binary number to a usual binary number. Furthermore, a GA which switches the expression from the Gray code to the usual binary number is proposed and compared. Comparisons of the performances among five GAs (binary number, redundant binary number, Gray code, switching from redundant binary number to binary number, switching from Gray code to binary number) are illustrated. The performances are evaluated by solving some equations. It is confirmed that the proposed GA effectively decreases the error rate.  相似文献   

Gene Expression Programming (GEP) is a new technique of evolutionary algorithm that implements genome/phoneme representation in computing programs. Due to its power in global search, it is widely applied in symbolic regression. However, little work has been done to apply it to real parameter optimization yet. This paper proposes a real parameter optimization method named Uniform-Constants based GEP (UC-GEP). In UC-GEP, the constant domain directly participates in the evolution. Our research conducted extensive experiments over nine benchmark functions from the IEEE Congress on Evolutionary Computation 2005 and compared the results to three other algorithms namely Meta-Constants based GEP (MC-GEP), Meta-Uniform-Constants based GEP (MUC-GEP), and the Floating Point Genetic Algorithm (FP-GA). For simplicity, all GEP methods adopt a one-tier index gene structure. The results demonstrate the optimal performance of our UC-GEP in solving multimodal problems and show that at least one GEP method outperforms FP-GA on all test functions with higher computational complexity.  相似文献   

通过结合物理模拟中随机浓度的思想和分段处理方法,对中远距离分段的扩散浓度采取随机处理,从而提出一种新的分段水墨扩散函数,用于模拟水墨扩散效果,根据其实现要点选取适当算法设计了中国画风格绘制的总体算法框图。该算法首先对图像用均值滤波进行平滑预处理;然后应用K means聚类算法进行图像分割,用以模拟中国画用水墨浓淡表达色调层次的效果;接着,算法采用形态学方法检测图像的边缘点,对检测出来的边缘点应用自定义的分段处理函数模拟水墨边缘扩散的效果,得出中国画风格渲染图像;最后,用HTML5实现了该算法。该算法所生成的中国画风格在用墨模拟和水墨扩散上都能够得到较逼真的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号