首页 | 本学科首页   官方微博 | 高级检索  
     

基于语音配列的汉语方言自动辨识
引用本文:顾明亮,沈兆勇. 基于语音配列的汉语方言自动辨识[J]. 中文信息学报, 2006, 20(5): 79-84
作者姓名:顾明亮  沈兆勇
作者单位:1.徐州师范大学物理系2.徐州师范大学语言研究所
基金项目:江苏省社会科学规划项目;江苏省高校自然科学基金
摘    要:本文首先讨论了汉语方言辨识的依据及特征选取的基本原则,并由此导出了区间差分倒谱特征。然后利用GMM符号发生器和N元语言模型及ANN建立了一个方言辨识系统,该系统与传统的语种识别系统相比,具有以下特点:第一,系统不需要标注好的语音库,从而降低了汉语方言语音库建设的劳动强度和要求;第二, GMM符号化器计算量远远低于音素辨识器,从而提高了方言辨识速度,便于今后实时处理。第三,具有更高的辨识效果和更好的容错性。汉语普通话和三种方言辨识实验结果表明,系统平均辨识率可以达到83.8%。

关 键 词:计算机应用  中文信息处理  GMM符号化器  N元语言模型  汉语方言辨识  
文章编号:1003-0077(2006)05-0077-06
收稿时间:2005-09-05
修稿时间:2006-07-17

Phonotatics Based Chinese Dialects Identification
GU Ming-liang,SHEN Zhao-yong. Phonotatics Based Chinese Dialects Identification[J]. Journal of Chinese Information Processing, 2006, 20(5): 79-84
Authors:GU Ming-liang  SHEN Zhao-yong
Affiliation:1.Department of Physics Xuzhou Normal University2.Institution of Linguistics Xuzhou Normal University
Abstract:This paper discusses the criterions for distinguishing different Chinese dialects and the basic features selection firstly.According to these principals,a novel feature named district differential cepstral feature was proposed.Then,a novel dialect identification system combining GMM tokenizer,N-gram language model and ANN is constructed.Compared with traditional LID system,the new system has following characteristics: first,it is unnecessary to use tagged dialects speech database,which becomes less labour-intensive to build corpora.Second,GMM tokenizer is more computationally efficient.Third,the system has more accurate and robust performance.In a test under Chinese dialects classification,averagely 83.8% accuracy is achicved.
Keywords:computer application  Chinese information processing  GMM tokenizer  n-gram language modeling  Chinese dialects identification
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号