首页 | 本学科首页   官方微博 | 高级检索  
     

基于Viterbi改进算法的高棉语分词研究
引用本文:蒋艳荣,刘习文,陈耿涛. 基于Viterbi改进算法的高棉语分词研究[J]. 计算机工程, 2011, 37(15): 174-176. DOI: 10.3969/j.issn.1000-3428.2011.15.055
作者姓名:蒋艳荣  刘习文  陈耿涛
作者单位:1. 广东工业大学计算机学院,广州,510006
2. 湘潭大学机械工程学院,湖南,湘潭,411105
3. 广东国笔科技股份有限公司,广州,510620
摘    要:采用最大匹配算法对高棉语进行分词准确率较低,且难以正确识别词库中没有的新词。针对该问题,采用改进的Viterbi算法,利用自动机实现音节切分,通过最优选择及剪枝操作提高分词效率,以统计语言模型对未知新词进行数据平滑,提高识别正确率。实验结果表明,改进的Viterbi算法具有较高的分词效率和准确率。

关 键 词:Viterbi算法  最大匹配算法  分词  高棉语  剪枝  统计语言模型
收稿时间:2011-01-10

Research of Khmer Word Segmentation Based on Improved Viterbi Algorithm
JIANG Yan-rong,LIU Xi-wen,CHEN Geng-tao. Research of Khmer Word Segmentation Based on Improved Viterbi Algorithm[J]. Computer Engineering, 2011, 37(15): 174-176. DOI: 10.3969/j.issn.1000-3428.2011.15.055
Authors:JIANG Yan-rong  LIU Xi-wen  CHEN Geng-tao
Affiliation:1.Faculty of Computer,Guangdong University of Technology,Guangzhou 510006,China;2.School of Mechanical Engineering,Xiangtan University,Xiangtan 411105,China;3.Guangdong Guobi Corporation Ltd.,Guangzhou 510620,China)
Abstract:The accuracy of Khmer words segmentation for maximum matching algorithm is relatively low, and it is difficult for this algorithm to recognize words that are not enrolled jn its dictionary. To solve this problem, an improved Viterbi algorithm is proposed. Wherein automation is used for syllable segmentation, optimization selection and pruning methods are used to promote the segmentation efficiency, and the statistical language model is adopted to perform data smooth for unknown words in this approach. Experimental results indicate that the improved Viterbi algorithm has higher accuracy and efficiency.
Keywords:Viterbi algorithm  maximum matching algorithm  word segmentation  Khmer  pruning  statistical language model
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号