首页 | 本学科首页   官方微博 | 高级检索  
     

一种改进的快速分词算法
引用本文:陈桂林,王永成,韩客松,王刚.一种改进的快速分词算法[J].计算机研究与发展,2000,37(4):418-424.
作者姓名:陈桂林  王永成  韩客松  王刚
作者单位:上海交通大学网络信息中心,上海,200030
基金项目:国家“八六三”高技术研究发展计划基金资助!(项目编号 863 -ZD0 3 -0 4-1)
摘    要:首先介绍了一种高效的中电子词表数据结构,它支持首字Hash和标准的二分查找,且不限词条长度;然后提出了一种改进的快速分词算法,在快速查找两字词的基础上,利用近邻匹配方法来查找多字词,明显提高了分词效率.理论分析表明,该分词算法的时间复杂度为1.66,在速度方面,优于目前所见的同类算法。

关 键 词:分词  中文信息处理  算法  中文电子词表  计算机

AN IMPROVED FAST ALGORITHM FOR CHINESE WORD SEGMENTATION
CHEN Gui-Lin,WANG Yong-Cheng,HAN Ke-Song,WANG Gang.AN IMPROVED FAST ALGORITHM FOR CHINESE WORD SEGMENTATION[J].Journal of Computer Research and Development,2000,37(4):418-424.
Authors:CHEN Gui-Lin  WANG Yong-Cheng  HAN Ke-Song  WANG Gang
Abstract:In this paper, a highly efficient data structure for Chinese thesaurus is introduced, which supports standard binary search and hashing operation by means of the first Chinese character in a string, while the length of every word is not limited. Then an improved fast algorithm for Chinese word segmentation is suggested. Based on searching a word composed of two characters quickly, the word including multiple Chinese characters can be found by utilizing the algorithm, which achieves high performance in Chinese word segmentation by invoking neighborhood matching. In theory, its time complexity is 1.66, which is superior to that of other algorithms for Chinese word segmentation.
Keywords:word segmentation    hash  binary search  neighborhood matching  time complexity
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号