首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
为了解决单通道生物特征识别的缺陷,在信息融合的基础上提出了一种基于人脸和语音融合的生物特征识别模型,实现了特征层的融合。对人脸图像采用主成分分析法(PCA)进行特征提取,对说话人采用fisher判别进行特征维数的约简。同时,提出一种基于PSO的多粒子群协调优化(PSCO)方法,并将其用于训练SVM来实现人脸和语音的混合认证系统,实验结果表明该方法取得了较好的识别效果。  相似文献   

2.
针对单一生物特征身份识别由于传感器噪音及特征破损等缺陷导致识别率低的问题,从信息融合角度出发,提出一种基于语音和人脸的多生物特征身份识别方法.分别提取语音特征和人脸特征作为识别的依据,并用神经网络在特征层上进行融合识别.实验证明,该方法相对单一生物特征身份识别,在同等条件下具有更高的识别率.  相似文献   

3.
孙文静  李士强 《计算机科学》2010,37(12):209-210
分析音频时域特征及提取方法,研究基于支持向量机的语音分类系统流程、分类系统架构以及SVM语音分类器的设计,并进行了相关实验。结果表明,设计的基于SVM的音频分类系统能够有效地对音频进行分类,平均识别准确率达到90%以上。  相似文献   

4.
传统的语音文档分类系统通常是基于语音识别系统所转录的文本实现的,识别错误会严重影响到这类系统的性能。尽管将语音和识别文本融合可以一定程度上减轻识别错误的影响,但大多数融合都是在表示向量层面融合,没有充分利用语音声学和语义信息之间的互补性。本文提出融合声学特征和深度特征的神经网络语音文档分类,在神经网络训练中,首先采用训练好的声学模型为每个语音文档提取包含语义信息的深度特征,然后将语音文档的声学特征和深度特征通过门控机制逐帧进行融合,融合后的特征用于语音文档分类。在语音新闻播报语料集上进行实验,本文提出的系统明显优于基于语音和文本融合的语音文档分类系统,最终的分类准确率达到97.27%。  相似文献   

5.
曹辉  曹礼刚  简兴祥 《计算机工程》2007,33(11):184-186
传统的身份识别系统利用单一的生物特征作为依据,在复杂背景下,系统性能往往会大幅下降。基于数据融合的多生物特征身份识别技术可以提高生物识别系统的准确率等性能。该文利用特征脸和矢量量化方法建立人脸识别和语音识别两个子系统,在决策层用神经网络融合子系统的输出来进行身份识别。实验证明该方法比单个子系统识别率高,在噪音环境下,优势明显。  相似文献   

6.
利用监督性学习算法进行语音增强时,特征提取是至关重要的步骤。现有的组合特征和多分辨率特征等听觉特征是常用的声学特征,基于这些特征的增强语音虽然可懂度得到了较大提升,但是仍然残留大量噪声,语音质量(用信噪比衡量)很低。在不影响可懂度的情况下,为了提高语音增强后语音质量,提出了一种基于自编码特征的综合特征。首先利用自编码器提取自编码特征,然后利用Group Lasso算法验证自编码特征与听觉特征的互补性和冗余性,将特征重新组合得到综合特征,最后将综合特征作为语音增强系统的输入特征进行语音增强。在TIMIT语料库和Noisex-92噪声库上进行了仿真实验,结果表明,与传统的语音增强方法以及现有的组合特征和多分辨率特征分别作为语音增强系统输入特征的深度学习等方法相比,提出的增强算法的语音质量得到了较大提升。  相似文献   

7.
针对传统智能家居系统在智能终端控制中存在智能化和人性化水平低的问题,提出设计一个基于语音识别的智能家居控制系统。该系统主要由智能终端、主控中心和控制节点组成。对主控中心和控制节点的软硬件方案进行设计后,即可采用系统中的图像采集模块采集家居数据;然后通过改进信号子空间与维纳滤波的两级降噪方法进行语音信号增强;之后选用24维梅尔倒谱系数对语音特征进行提取;最后采用隐马尔可夫模型HMM算法进行模板训练和模式匹配,最终实现智能家居语音自动控制。实验结果表明,在800个测试样本中,共有789个样本被正确识别,平均识别率为98.6%。且在5种不同的信噪比下,语音识别率均保持在94%及以上,最高可达97.4%。由此说明本系统具备较好的抗噪能力,提出的语音识别算法对满足系统语音自动化和智能化需求,在实际产品应用中具有重要意义。  相似文献   

8.
提出了一种基于双层码本的语音驱动视觉语音合成系统,该系统以矢量量化的思想为基础,建立语音特征空间到视觉语音特征空间的粗耦合映射关系。为加强语音和视觉语音的关联性,系统分别根据语音特征与视觉语音特征的相似性两次对样本数据进行自动聚类,构造同时反映语音之间与视觉语音之间相似性的双层映射码本。数据预处理阶段,提出一种能反映视觉语音几何形状特征与牙齿可见度的联合特征模型,并在语音特征LPCC及MFCC基础上采用遗传算法提取视觉语音相关的语音特征模型。合成的视频中图像数据与原始视频中图像数据的比较结果表明,合成结果能在一定程度上逼近原始数据,取得了很好的效果。  相似文献   

9.
为改善压缩语音传输系统的重构精度且不增加系统的频谱开销,提出一种叠加特征信息辅助的语音压缩传输与重构方法。提出方法首先提取稀疏语音信号的特征信息;抽取的特征信息以叠加序列方式叠加在压缩语音信号上进行传输;接收机重构时,借助特征信息辅助重构算法进行语音重构。分析与仿真结果表明,相比于传统的压缩感知语音重构方法,在较高信噪比或较低压缩率情况下,提出方法可改善语音重构精度,且不增加传输系统的频谱开销。  相似文献   

10.
面向汉语的计算机辅助语音学习系统特征的研究   总被引:2,自引:0,他引:2  
本文在分析了语言及语音学习和教学的重要性及特点的基础上,讨论了将语音处理技术应用于语言、语音的计算机辅助学习或教学中所涉及的多方面问题;并针对汉语语音的特点,研究了面向汉语学习的CALL系统所应具有的特征,及其在设计和实现时应遵循的原则;最后借助通用语音分析器′Speech Analyzer′进行了汉语语音学习的尝试。  相似文献   

11.
随着5G通信技术的广泛普及和应用,5G通信系统的安全受到广泛关注。针对5G电话语音通信中模仿、假冒或否认说话人身份进行欺骗而难以追踪说话人身份的问题,文章提出一种面向5G远程通信的电话语音说话人身份追踪方案。在语音接入前通过智能手机获取说话人的生物指纹作为身份信息,在语音通话中,采用数字水印技术将代表说话人身份的生物指纹嵌入语音信号,从而建立说话人身份信息与语音之间的关联性。一旦语音记录的说话人身份受到质疑,通过提取语音数据中的生物指纹实现远程通信中说话人身份的追踪。文章研究了在高斯白噪声信道下,嵌有说话人生物指纹信息的语音信号经5G广义频分复用模式调制传输后的指纹提取性能。仿真结果表明,在实际应用环境中,文章方案在5G广义频分复用调制模式下提取指纹的性能优于4G系统,对防范下一代移动通信中电话诈骗具有潜在的应用价值。  相似文献   

12.
《Image and vision computing》2014,32(12):1147-1160
This paper examines the issue of face, speaker and bi-modal authentication in mobile environments when there is significant condition mismatch. We introduce this mismatch by enrolling client models on high quality biometric samples obtained on a laptop computer and authenticating them on lower quality biometric samples acquired with a mobile phone. To perform these experiments we develop three novel authentication protocols for the large publicly available MOBIO database. We evaluate state-of-the-art face, speaker and bi-modal authentication techniques and show that inter-session variability modelling using Gaussian mixture models provides a consistently robust system for face, speaker and bi-modal authentication. It is also shown that multi-algorithm fusion provides a consistent performance improvement for face, speaker and bi-modal authentication. Using this bi-modal multi-algorithm system we derive a state-of-the-art authentication system that obtains a half total error rate of 6.3% and 1.9% for Female and Male trials, respectively.  相似文献   

13.
Persons’ identification in TV broadcast is one of the main tools to index this type of videos. The classical way is to use biometric face and speaker models, but, to cover a decent number of persons, costly annotations are needed. Over the recent years, several works have proposed to use other sources of names for identifying people, such as pronounced names and written names. The main idea is to form face/speaker clusters based on their similarities and to propagate these names onto clusters. In this paper, we propose a method to take advantage of written names during the diarization process, in order to both name clusters and prevent the fusion of two clusters named differently. First, we extract written names with the LOOV tool (Poignant et al. 2012); these names are associated to their co-occurring speaker turns / face tracks. Simultaneously, we build a multi-modal matrix of distances between speaker turns and face tracks. Then agglomerative clustering is performed on this matrix with the constraint to avoid merging clusters associated to different names. We also integrate the prediction of few biometric models (anchors, some journalists) to directly identify speaker turns / face tracks before the clustering process. Our approach was evaluated on the REPERE corpus and reached an F-measure of 68.2 % for speaker identification and 60.2 % for face identification. Adding few biometric models improves results and leads to 82.4 % and 65.6 % for speaker and face identity respectively. By comparison, a mono-modal, supervised person identification system with 706 speaker models trained on matching development data and additional TV and radio data provides 67.8 % F-measure, while 908 face models provide only 30.5 % F-measure.  相似文献   

14.
Wu  Xing  Ji  Sihui  Wang  Jianjia  Guo  Yike 《Applied Intelligence》2022,52(13):14839-14852

Human beings are capable of imagining a person’s voice according to his or her appearance because different people have different voice characteristics. Although researchers have made great progress in single-view speech synthesis, there are few studies on multi-view speech synthesis, especially the speech synthesis using face images. On the basis of implicit relationship between the speaker’s face image and his or her voice, we propose a multi-view speech synthesis method called SSFE (Speech Synthesis with Face Embeddings). The proposed SSFE consists of three parts: a voice encoder, a face encoder and an improved multi-speaker text-to-speech (TTS) engine. On the one hand, the proposed voice encoder generates the voice embeddings from the speaker’s speech and the proposed face encoder extracts the voice features from the speaker’s face as f-voice embeddings. On the other hand, the multi-speaker TTS engine would synthesize the speech with voice embeddings and f-voice embeddings. We have conducted extensive experiments to evaluate the proposed SSFE on the synthesized speech quality and face-voice matching degree, in which the Mean Opinion Score of the SSFE is more than 3.7 and the matching degree is about 1.7. The experimental results prove that the proposed SSFE method outperforms state-of-the-art methods on the synthesized speech in terms of speech quality and face-voice matching degree.

  相似文献   

15.
通过分析掌纹、指纹、虹膜、人脸、步态、声纹等生物特征识别技术的特点以及煤矿现场对入井人员生物特征的影响,指出虹膜识别、人脸识别、步态识别、声纹识别适用于煤矿入井人员唯一性检测;提出了一种基于人员定位和生物特征识别的煤矿入井人员唯一性检测技术方案,将生物特征识别技术嵌入人员定位系统,利用人员定位识别卡实现识别卡数量及人员身份的唯一性检测;指出煤矿入井人员唯一性检测技术的研究关键点是严重污染人脸的识别算法、对设备遮挡情况下人员步态图像的采集及对混入人员语音信号的煤矿现场噪声消除算法。  相似文献   

16.
Voice conversion (VC) approach, which morphs the voice of a source speaker to be perceived as spoken by a specified target speaker, can be intentionally used to deceive the speaker identification (SID) and speaker verification (SV) systems that use speech biometric. Voice conversion spoofing attacks to imitate a particular speaker pose potential threat to these kinds of systems. In this paper, we first present an experimental study to evaluate the robustness of such systems against voice conversion disguise. We use Gaussian mixture model (GMM) based SID systems, GMM with universal background model (GMM-UBM) based SV systems and GMM supervector with support vector machine (GMM-SVM) based SV systems for this. Voice conversion is conducted by using three different techniques: GMM based VC technique, weighted frequency warping (WFW) based conversion method and its variation, where energy correction is disabled (WFW). Evaluation is done by using intra-gender and cross-gender voice conversions between fifty male and fifty female speakers taken from TIMIT database. The result is indicated by degradation in the percentage of correct identification (POC) score in SID systems and degradation in equal error rate (EER) in all SV systems. Experimental results show that the GMM-SVM SV systems are more resilient against voice conversion spoofing attacks than GMM-UBM SV systems and all SID and SV systems are most vulnerable towards GMM based conversion than WFW and WFW based conversion. From the results, it can also be said that, in general terms, all SID and SV systems are slightly more robust to voices converted through cross-gender conversion than intra-gender conversion. This work extended the study to find out the relationship between VC objective score and SV system performance in CMU ARCTIC database, which is a parallel corpus. The results of this experiment show an approach on quantifying objective score of voice conversion that can be related to the ability to spoof an SV system.  相似文献   

17.
This paper proposes a novel approach for inference using fuzzy rank-level fusion and explores it application to face recognition using multiple biometric representations. Multiple representations of single biometric (trait) aim to increase the reliability or acceptance of a biometric system, as it exploits the underlying essential characteristics provided by different sensors. In this paper, we propose a new scheme for generating fuzzy ranks induced by a Gaussian function based on the confidence of a classifier. In contrast to the conventional ranking, this fuzzy ranking reflects some associations among the outputs (confidence factors) of a classifier. These fuzzy ranks, yielded by multiple representations of a face image, are fused weighted by the corresponding confidence factors of the classifier to generate the final ranks while recognizing a face. In many real-world applications, where multiple traits of a person are unavailable, the proposed method is highly effective. However, it can easily be extended to multimodal biometric systems utilizing multiple classifiers. The experimental results using different feature vectors of a face image employing different classifiers show that the proposed method can significantly improve recognition accuracy as compared to those from individual feature vectors and as well as some commonly used rank-level fusion methods.  相似文献   

18.
由于单生物特征认证往往难以满足实际应用的要求,本文在信息融合的基础上提出一个多通道生物特征认证模型,它采用基于PCA的人脸识剐方法和基于MFCC与VQ的说话人识别方法,在分数层使用多层线性分类器实现了人脸和语音的双通道融合。实验结果表明,在人脸识剐率和说话人识别率分剐为82.6%和75.9%的情况下,两个通道融合后的
识剐率达到了92.2%.  相似文献   

19.
Biometrics is the science of the measurement of unique human characteristics, both physical and behavioral. Various biometric technologies are available for identifying or verifying an individual by measuring fingerprint, hand, face, signature, voice, or a combination of these traits. This paper aims to assist readers as they consider biometric solutions by examining common biometric technologies, introducing different biometric applications, and reviewing recent CI solutions presented at the 2006 IEEE WCCI  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号