首页 | 本学科首页   官方微博 | 高级检索  
     

基于MFCC的频谱重构实现音高估计和发声分类
引用本文:张少华,秦会斌.基于MFCC的频谱重构实现音高估计和发声分类[J].测控技术,2019,38(11):86-89.
作者姓名:张少华  秦会斌
作者单位:杭州电子科技大学新型电子器件与应用研究所,浙江杭州,310018
摘    要:音高估计和发声分类可以帮助快速检索目标语音,是语音检索中十分重要且困难的研究方向之一,对语音识别领域具有重要的意义。提出了一种新型音高估计和发声分类方法。利用梅尔频率倒谱系数(MFCC)进行频谱重构,并在对数下对重构的频谱进行压缩和过滤。通过高斯混合模型(GMM)对音高频率和滤波频率的联合密度建模来实现音高估计,实验结果在TIMIT数据库上的相对误差为6.62%。基于高斯混合模型的模型也可以完成发声分类任务,经试验测试表明发声分类的准确率超过99%,为音高估计和发声分类提供了一种新的模型。

关 键 词:语音识别  音高估计  梅尔频率倒谱系数  高斯混合模型

Pitch Resolution Based on MFCC for Pitch Estimation and Sound Classification
Abstract:Pitch estimation and vocal classification can help to quickly retrieve the target speech,which is one of the most important and difficult research directions in speech retrieval,and has important significance in the field of speech recognition.A new method for pitch estimation and vocal classification is proposed.The spectrum reconstruction is performed by using the Mel frequency cepstral coefficient (MFCC),and the reconstructed spectrum is compressed and filtered under logarithm.Pitch estimation was performed by modeling the joint density of pitch frequency and filter frequency using Gaussian mixture model (GMM).The relative error of the experimental results on the TIMIT database was 6.62%.The model based on GMM can also complete the vocal classification task.The experimental results show that the accuracy of vocal classification exceeds 99%,which provides a new model for pitch estimation and vocal classification.
Keywords:speech recognition  pitch estimation  Mel frequency cepstral coefficient  Gaussian mixture model
本文献已被 万方数据 等数据库收录!
点击此处可从《测控技术》浏览原始摘要信息
点击此处可从《测控技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号