汉语词语边界自动划分的模型与算法 MODELS AND ALGORITHM FOR ASSIGNING WORD BREAKS TO CHINESE TEXT期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

汉语词语边界自动划分的模型与算法

引用本文：	付国宏,王晓龙. 汉语词语边界自动划分的模型与算法[J]. 计算机研究与发展, 1999, 36(9): 1144-1147

作者姓名：	付国宏王晓龙

作者单位：	1. 哈尔滨工业大学计算机科学与工程系,哈尔滨,150001 2. 香港理工大学计算机系,香港

基金项目：	国家“八六三”项目基金

摘要：	在引入词形和汉字结合点等概念基础上，文中分别描述了一个基于字串构词能力的词形模型和一个基于词语内部、外部汉字结合度的汉字结合点模型，并采用线性插值方法两种模型融合于一体进行汉词语边界划分。在分析汉语切分候选择空间的基础上，文中还给出了相应的优化搜索算法。与一般的统计方法相比，文中方法的参数可直接从未经加工粗语料中得到，具有较强的适应能力，初步试验表明该方法是有效和可靠的。
关键词：	汉语分词词形字结合点
MODELS AND ALGORITHM FOR ASSIGNING WORD BREAKS TO CHINESE TEXT

FU Guo-Hong,WANG Xiao-Long. MODELS AND ALGORITHM FOR ASSIGNING WORD BREAKS TO CHINESE TEXT[J]. Journal of Computer Research and Development, 1999, 36(9): 1144-1147

Authors:	FU Guo-Hong WANG Xiao-Long

Abstract:	In this paper, the word form model (WFM) based on word formation power of Chinese character string and the character juncture model (CJM) based on the affinity of the Chinese character pairs inside or outside words are described respectively. Then a linear interpolation method is applied to combine these two models together to assign word breaks to Chinese text. The relative searching algorithm is also given after the searching space is analyzed. Compared with general statistic models, the parameters of the models proposed can be directly trained from raw corpus, which results in a strong adaptability. The approach has proven both reliable and efficient by experiments.

Keywords:	Chinese word segmentation character juncture word form
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏