首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场的泰语音节切分方法
引用本文:赵世瑜,线岩团,郭剑毅,余正涛,洪玄贵,王红斌.基于条件随机场的泰语音节切分方法[J].计算机科学,2016,43(3):54-56, 83.
作者姓名:赵世瑜  线岩团  郭剑毅  余正涛  洪玄贵  王红斌
作者单位:昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500
基金项目:本文受国家自然科学基金:面向互联网的泰语-汉语双语语料获取及对齐方法研究(61363044),国家自然科学基金:面向汉语-泰语跨语言新闻事件检索方法研究(61462054),云南省教育厅重点项目:汉语-泰语跨语言新闻事件检索中的相似度计算研究(2014Z021)资助
摘    要:音节是泰语构词和读音的基本单位,泰语音节切分对泰语词法分析、语音合成、语音识别研究具有重要意义。结合泰语音节构成特点,提出基于条件随机场(Conditional Random Fields)的泰语音节切分方法。该方法结合泰语字母类别和字母位置定义特征,采用条件随机场对泰语句子中的字母进行序列标注,实现泰语音节切分。在InterBEST 2009泰语语料的基础上,标注了泰语音节切分语料。针对该语料的实验表明,该方法能有效利用字母类别和字母位置信息实现泰语音节切分,其准确率、召回率和F值分别达到了99.115%、99.284%和99.199%。

关 键 词:泰语字母特征  泰语音节  音节切分  条件随机场
收稿时间:2015/3/20 0:00:00
修稿时间:2015/6/17 0:00:00

Thai Syllable Segmentation Based on Conditional Random Fields
ZHAO Shi-yu,XIAN Yan-tuan,GUO Jian-yi,YU Zheng-tao,HONG Xuan-gui and WANG Hong-bin.Thai Syllable Segmentation Based on Conditional Random Fields[J].Computer Science,2016,43(3):54-56, 83.
Authors:ZHAO Shi-yu  XIAN Yan-tuan  GUO Jian-yi  YU Zheng-tao  HONG Xuan-gui and WANG Hong-bin
Affiliation:Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China,Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China and Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
Abstract:Syllable is the basic unit of word-formation and pronunciation of Thai.Thai syllable segmentation is significant to lexical analysis,speech synthesis and speech recognition.Combined with the characteristics of Thai syllables,Thai syllable segmentation method based CRFs (Conditional Random Fields) was proposed.In order to achieve Thai syllable segmentation,the algorithm not only combines the Thai alphabet categories and letter position to define features,but also employs CRFs for letters in Thai sentence to do sequence labeling.In this paper,Thai syllable segmentation corpus was marked on the basis of InterBEST 2009.Experiments for the corpus demonstrate the method can effectively achieve Thai syllable segmentation by adopting the category and location information of alphabetical letters,and the va-lues of precision,recall and F reach 99.115%,99.284% and 99.199%.
Keywords:Thai character feature  Thai syllable  Syllable segmentation  Conditional random fields
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号