期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using quality measures for multilevel speaker recognition

《Computer Speech and Language》2006,20(2-3):192-209

The use of quality information for multilevel speaker recognition systems is addressed in this contribution. From a definition of what constitutes a quality measure, two applications are proposed at different phases of the recognition process: scoring and multilevel fusion stages. The traditional likelihood scoring stage is further developed providing guidelines for the practical application of the proposed ideas. Conventional user-independent multilevel support vector machine (SVM) score fusion is also adapted for the inclusion of quality information in the fusion process. In particular, quality measures meeting three different goodness criteria: SNR, F0 deviations and the ITU P.563 objective speech quality assessment are used in the speaker recognition process. Experiments carried out in the Switchboard-I database assess the benefits of the proposed quality-guided recognition approach for both the score computation and score fusion stages. 相似文献

2.

Hybridized estimations of support vector machine free parameters C and γ using a fuzzy learning strategy for microphone array-based speaker recognition in a Kinect sensor-deployed environment

Ding Ing-Jr Shi Jia-Yi 《Multimedia Tools and Applications》2017,76(23):25297-25319

The support vector machine (SVM) is a popular classification model for speaker verification. However, although SVM is suitable for classifying speakers, the uncertain values of the free parameters C and γ of the SVM model have been a challenging technique problem. An improper value set provided for the free parameter pair (C, γ) can cause dissatisfactory performance in the recognition accuracy of speaker verification. Moreover, the sound source localization information of the collected acoustic data has a large effect on the recognition performance of SVM speaker verification. In response, this study developed a sound source localization-driven fuzzy scheme to help determine the optimal value set of (C, γ) for the establishment of an SVM model. Specifically, this scheme adopts the estimated information of time difference of arrival (TDOA) derived from the Kinect microphone array (containing both the angle and distance information of the acoustic data of the speaker), to optimally calculate the value set of the SVM free parameters C and γ. It was demonstrated that speaker verification using the SVM with a properly estimated parameter pair (C, γ) is more accurate than that with only an arbitrarily given value set for the parameter pair (C, γ) on recognition rate.

相似文献

3.

Speech synthesis with face embeddings

Wu Xing Ji Sihui Wang Jianjia Guo Yike 《Applied Intelligence》2022,52(13):14839-14852

Human beings are capable of imagining a person’s voice according to his or her appearance because different people have different voice characteristics. Although researchers have made great progress in single-view speech synthesis, there are few studies on multi-view speech synthesis, especially the speech synthesis using face images. On the basis of implicit relationship between the speaker’s face image and his or her voice, we propose a multi-view speech synthesis method called SSFE (Speech Synthesis with Face Embeddings). The proposed SSFE consists of three parts: a voice encoder, a face encoder and an improved multi-speaker text-to-speech (TTS) engine. On the one hand, the proposed voice encoder generates the voice embeddings from the speaker’s speech and the proposed face encoder extracts the voice features from the speaker’s face as f-voice embeddings. On the other hand, the multi-speaker TTS engine would synthesize the speech with voice embeddings and f-voice embeddings. We have conducted extensive experiments to evaluate the proposed SSFE on the synthesized speech quality and face-voice matching degree, in which the Mean Opinion Score of the SSFE is more than 3.7 and the matching degree is about 1.7. The experimental results prove that the proposed SSFE method outperforms state-of-the-art methods on the synthesized speech in terms of speech quality and face-voice matching degree.

相似文献

4.

针对语音变换的语音篡改检测

丁琦平西建《数据采集与处理》2012,27(1):57-62

针对使用语音变换技术的语音篡改,提出一种自动检测方法。在分析语音变换基本模型和变换语音失真的基础上,提取语音信号的声道参数以及相关的信号统计量,并通过支持向量机递归特征消除法,选择出对语音变换比较敏感的特征作为分类特征,使用支持向量机进行语音变换检测和变换语音的说话人性别判别。对于一种语音变换软件的实验结果表明,该方法具有较高的检测准确率,其中语音变换检测的平均准确率为94.90%,变换语音的说话人性别判别平均准确率为92.09%。相似文献

5.

基于GMM-UBM/SVM的维吾尔语电话语音监控系统

李晓阳伊·达瓦吾守尔·斯拉木勾坂芳典《计算机应用与软件》2012,(1):46-48,77

讨论基于GMM-UBM/SVM的电话语音监控系统。GMM是说话人识别系统中使用的常用方式。但由于监控语音发话时间短暂,电话-互联网终端及传输线背景噪音大等因素影响了GMM的识别精度。基于GMM的鲁棒性及SVM对小量静态数据具有高分类的优势设计电话语音监控系统并通过维吾尔语研讨了系统性能。为了便于比较,同时也讨论了量化距离(VQ)、加权量化距离(WVQ)及基线系统的识别。在50个目标人训练集,每人发话时间为20秒时,对10秒测试语音提案方法识别率对比于VQ和WVQ法分别提高了20.2%及16.7%。相似文献

6.

基于SVM的说话人识别参数选择方法

下载免费PDF全文

徐晨曹辉赵晓《计算机工程》2012,38(21):175-177

针对支持向量机(SVM)计算复杂度高的问题,采用归一化和主元分析变换算法对语音数据进行预处理,并把K倍交叉验证与网络搜索法相结合应用到语音识别中。分析结果表明,与遗传算法和粒子群优化算法相比,该方法可以在识别率基本不变的情况下有效提高 SVM的参数寻优效率。相似文献

7.

The persuasiveness of synthetic speech versus human speech

Stern SE Mullennix JW Dyson C Wilson SJ 《Human factors》1999,41(4):588-595

Is computer-synthesized speech as persuasive as the human voice when presenting an argument? After completing an attitude pretest, 193 participants were randomly assigned to listen to a persuasive appeal under three conditions: a high-quality synthesized speech system (DECtalk Express), a low-quality synthesized speech system (Monologue), and a tape recording of a human voice. Following the appeal, participants completed a posttest attitude survey and a series of questionnaires designed to assess perceptions of speech qualities, perceptions of the speaker, and perceptions of the message. The human voice was generally perceived more favorably than the computer-synthesized voice, and the speaker was perceived more favorably when the voice was a human voice than when it was computer synthesized. There was, however, no evidence that computerized speech, as compared with the human voice, affected persuasion or perceptions of the message. Actual or potential applications of this research include issues that should be considered when designing synthetic speech systems. 相似文献

8.

Automatic speaker age and gender recognition using acoustic and prosodic level information fusion

Ming Li Kyu J. Han Shrikanth Narayanan 《Computer Speech and Language》2013,27(1):151-167

The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods at both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture model (GMM) based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors and (3) SVM based on 450-dimensional utterance level features including acoustic, prosodic and voice quality information. In addition, we propose four subsystems: (1) SVM based on UBM weight posterior probability supervectors using the Bhattacharyya probability product kernel, (2) Sparse representation based on UBM weight posterior probability supervectors, (3) SVM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors and (4) SVM based on the polynomial expansion coefficients of the syllable level prosodic feature contours in voiced speech segments. Contours of pitch, time domain energy, frequency domain harmonic structure energy and formant for each syllable (segmented using energy information in the voiced speech segment) are considered for analysis in subsystem (4). The proposed four subsystems have been demonstrated to be effective and able to achieve competitive results in classifying different age and gender groups. To further improve the overall classification performance, weighted summation based fusion of these seven subsystems at the score level is demonstrated. Experiment results are reported on the development and test set of the 2010 Interspeech Paralinguistic Challenge aGender database. Compared to the SVM baseline system (3), which is the baseline system suggested by the challenge committee, the proposed fusion system achieves 5.6% absolute improvement in unweighted accuracy for the age task and 4.2% for the gender task on the development set. On the final test set, we obtain 3.1% and 3.8% absolute improvement, respectively. 相似文献

9.

On robustness of speech based biometric systems against voice conversion attack

《Applied Soft Computing》2015

Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW⁻). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW⁻ based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system. 相似文献

10.

采用STRAIGHT模型和深度信念网络的语音转换方法

王民苏利博王稚慧要趁红《计算机工程与科学》2016,38(9):1950-1954

提出一种将STRAIGHT模型和深度信念网络DBN相结合实现语音转换的方式。首先,通过STRAIGHT模型提取出源说话人和目标说话人的语音频谱参数,用提取的频谱参数分别训练两个DBN得到语音高阶空间的个性特征信息;然后,用人工神经网络ANN将两个具有高阶特征的空间连接并进行特征转换;最后,用基于目标说话人数据训练出的DBN来对转换后的特征信息进行逆处理得到语音频谱参数,并用STRAIGHT模型合成具有目标说话人个性化特征的语音。实验结果表明,采用此种方式获得的语音转换效果要比传统的采用GMM实现语音转换更好,转换后的语音音质和相似度与目标语音更接近。相似文献

11.

最小二乘向量机在说话人识别中的应用

但志平郑胜《微机发展》2007,17(5):30-32

说话人识别是语音识别的一种,是当前的研究热点之一。而基于统计学习理论的支持向量机(SVM)方法是一种新的机器学习算法,已成为机器学习研究的热点。讨论了一种改进的SVM即最小二乘向量机(LS-SVM)的方法进行说话人识别研究。研究表明,基于LS-SVM的说话人识别比传统的SVM说话人识别计算复杂度小、效率更高、对说话人识别有很强的适应性。相似文献

12.

基于模糊时序和支持向量机的高速公路SO₂浓度预测算法

岳鹏程张林梁马阅军《计算机系统应用》2017,26(6):1-8

针对现有SO₂浓度预测方法中存在的污染物来源和影响因素认识不统一、小样本数据敏感、易于陷入局部最优等问题,文中提出了基于模糊时序和支持向量机的高速公路SO₂浓度预测算法,为搭建高速公路环境健康监测系统提供了可靠的理论支持.该方法依据SO₂浓度的季节变动规律,以季节作为时间序列,以24h为粒化窗宽,通过高斯核函数提取原始样本数据的特征值,输入支持向量机训练模型,并利用k重交叉验证法结合网格划分优化模型参数.文中应用该方法建立了SO₂浓度预测模型,并以2014年4月至2015年3月山西省太旧高速公路某监测点SO₂小时浓度监测值为样本数据,在MATLAB平台下应用LIBSVM工具实现了计算过程.结果表明,基于模糊时序和支持向量机的高速公路SO₂浓度预测算法不受机理性理论研究的限制,支持小样本学习,非线性拟合效果好,泛化能力强. 相似文献

13.

基于共同向量的非常态语音说话人识别算法

何俊贺前华张清华孙国玺肖明左敬龙《计算机工程与科学》2014,36(8):1599-1603

针对预先给定参数求解共同向量所存在的不足,提出了一种基于共同向量的非常态语音说话人识别算法,首先,通过系统识别率自适应调整求解共同向量的参数;然后,将系统识别率最高的参数视为最优参数,为测试语音提取共同向量,并用SVM分类器进行非常态语音说话人分类。实验结果表明：该算法所提取的共同向量,对轻微感冒语音说话人识别率为85.4%,比对特征不进行处理的GMM算法、SVM和结合共同向量的GMM算法的识别率分别提高了16.9%、15.2%和3.2%。相似文献

14.

支持向量机及其应用研究综述 总被引：78，自引：1，他引：78

祁亨年《计算机工程》2004,30(10):6-9

在分析支持向量机原理的基础上，分别从人脸检测、验证和识别、说话人/语音识别、文字/手写体识别、图像处理及其他应用研究等方面对SVM的应用研究进行了综述，并讨论了SVM的优点和不足，展望了其应用研究的前景．相似文献

15.

基于i向量和变分自编码相对生成对抗网络的语音转换

李燕萍曹盼左宇涛张燕钱博《自动化学报》2022,48(7):1824-1833

提出一种基于i向量和变分自编码相对生成对抗网络的语音转换方法, 实现了非平行文本条件下高质量的多对多语音转换. 性能良好的语音转换系统, 既要保持重构语音的自然度, 又要兼顾转换语音的说话人个性特征是否准确. 首先为了改善合成语音自然度, 利用生成性能更好的相对生成对抗网络代替基于变分自编码生成对抗网络模型中的Wasserstein生成对抗网络, 通过构造相对鉴别器的方式, 使得鉴别器的输出依赖于真实样本和生成样本间的相对值, 克服了Wasserstein生成对抗网络性能不稳定和收敛速度较慢等问题. 进一步为了提升转换语音的说话人个性相似度, 在解码阶段, 引入含有丰富个性信息的i向量, 以充分学习说话人的个性化特征. 客观和主观实验表明, 转换后的语音平均梅尔倒谱失真距离值较基准模型降低4.80%, 平均意见得分值提升5.12%, ABX 值提升8.60%, 验证了该方法在语音自然度和个性相似度两个方面均有显著的提高, 实现了高质量的语音转换. 相似文献

16.

Efficient speaker identification using spectral entropy

Luque-Suárez Fernando Camarena-Ibarrola Antonio Chávez Edgar 《Multimedia Tools and Applications》2019,78(12):16803-16815

In voice recognition, the two main problems are speech recognition (what was said), and speaker recognition (who was speaking). The usual method for speaker recognition is to postulate a model where the speaker identity corresponds to the parameters of the model, which estimation could be time-consuming when the number of candidate speakers is large. In this paper, we model the speaker as a high dimensional point cloud of entropy-based features, extracted from the speech signal. The method allows indexing, and hence it can manage large databases. We experimentally assessed the quality of the identification with a publicly available database formed by extracting audio from a collection of YouTube videos of 1,000 different speakers. With 20 second audio excerpts, we were able to identify a speaker with 97% accuracy when the recording environment is not controlled, and with 99% accuracy for controlled recording environments.

相似文献

17.

基于模糊核聚类和SVM的说话人辨识

张娜亢军贤王峰王翔孙锋《数字社区&智能家居》2007,(19)

支持向量机作为一种新的统计学习方法,在说话人识别中得到了广泛应用.本文针对支持向量机在说话人辨识中的大样本训练耗时问题,提出对语音参数进行模糊核聚类的约简方法,选择聚类边界的语音参数作为支持向量,可以在不影响识别率的情况下,减少支持向量机的训练量.并通过实验验证了该方法的有效性. 相似文献

18.

On the study of replay and voice conversion attacks to text-dependent speaker verification

Zhizheng Wu Haizhou Li 《Multimedia Tools and Applications》2016,75(9):5311-5327

Automatic speaker verification (ASV) is to automatically accept or reject a claimed identity based on a speech sample. Recently, individual studies have confirmed the vulnerability of state-of-the-art text-independent ASV systems under replay, speech synthesis and voice conversion attacks on various databases. However, the behaviours of text-dependent ASV systems have not been systematically assessed in the face of various spoofing attacks. In this work, we first conduct a systematic analysis of text-dependent ASV systems to replay and voice conversion attacks using the same protocol and database, in particular the RSR2015 database which represents mobile device quality speech. We then analyse the interplay of voice conversion and speaker verification by linking the voice conversion objective evaluation measures with the speaker verification error rates to take a look at the vulnerabilities from the perspective of voice conversion. 相似文献

19.

基于LabVlEW的语音身份认证系统

唐夫乾汪亚明郑俊褒《工业控制计算机》2011,24(12):22-23

设计了一套基于LabVIEW的语音身份认证系统,以LabVIEW2009为开发平台,采用改进的美尔倒频谱系数法进行语音信号特征提取,采用矢量量化模型进行语音识别,实现了与文本、性别无关的声纹识别.实验结果表明该系统能够有效克服环境噪声、说话人声音变异带来的影响. 相似文献

20.

基于模糊核聚类和SVM的说话人辨识

张娜亢军贤王峰王翔孙锋《数字社区&智能家居》2007,(10):227-228,287

支持向量机作为一种新的统计学习方法,在说话人识别中得到了广泛应用。本文针对支持向量机在说话人辨识中的大样本训练耗时问题,提出对语音参数进行模糊核聚类的约简方法,选择聚类边界的语音参数作为支持向量,可以在不影响识别率的情况下,减少支持向量机的训练量。并通过实验验证了该方法的有效性。相似文献