首页 | 本学科首页   官方微博 | 高级检索  
     

基于HMM模型的语音单元边界的自动切分
引用本文:王丽娟,曹志刚.基于HMM模型的语音单元边界的自动切分[J].数据采集与处理,2005,20(4):381-384.
作者姓名:王丽娟  曹志刚
作者单位:清华大学电子工程系,北京,100084;清华大学电子工程系,北京,100084
摘    要:基于隐尔马可夫模型(HMM)的强制对齐方法被用于文语转换系统(TTS)语音单元边界切分.为提高切分准确性,本文对HMM模型的特征选择,模型参数和模型聚类进行优化.实验表明:12维静态Mel频率倒谱系数(MFCC)是最优的语音特征;HMM模型中的状态模型采用单高斯;对于特定说话人的HMM模型,使用分类与衰退树(CART)聚类生成的绑定状态模型个数在3 000左右最优.在英文语音库中音素边界切分的实验中,切分准确率从模型优化前的77.3%提高到85.4%.

关 键 词:语音单元边界  自动切分  隐尔马可夫模型  文语转换系统
文章编号:1004-9037(2005)04-0381-04
收稿时间:2005-03-21
修稿时间:2005-05-25

Automatic Phonetic Segmentation Using HMM Model
WANG Li-juan,CAO Zhi-gang.Automatic Phonetic Segmentation Using HMM Model[J].Journal of Data Acquisition & Processing,2005,20(4):381-384.
Authors:WANG Li-juan  CAO Zhi-gang
Affiliation:Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Abstract:HMM models are widely used in the automatic speech recognition system to segment text-to-speech(TTS) units in the forced alignment mode.To improve the segmentation performance,the optimal acoustic feature selection and the training condition of the HMM model are discussed.Experimental results show that the static 12-D Mel-frequency cepstral coefficient(MFCC) feature is the optimal acoustic feature;the optimal number of Gaussian mixture components per state is 1;the optimal number of tied states after model clustering by the classification and regreession tree(CART) is about 3 000 for speaker-dependent tri-phone HMM models.With optimized parameters,the segmentation accuracy on English test corpus is increased from 77.3% to 85.4%.
Keywords:acoustic unit boundary  automatic segmentation  HMM  text-to-speech system
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号