首页 | 本学科首页   官方微博 | 高级检索  
     

多次Hash快速分词算法
引用本文:张科.多次Hash快速分词算法[J].计算机工程与设计,2007,28(7):1716-1718.
作者姓名:张科
作者单位:重庆大学,计算机学院,重庆,400044
摘    要:中文分词是中文信息处理的一个重要的组成部分.一些应用不仅要求有较高的准确率,速度也是至关重要的.通过对已有分词算法的分析,尤其是对快速分词算法的分析,提出了一种新的词典结构,并根据新的词典结构提出新的分词算法.该算法不仅实现对词首字的Hash查找,也实现了词的其它字的Hash查找.理论分析和实验结果表明,该算法在速度上优于现有的其它分词算法.

关 键 词:中文分词  中文信息处理  哈希  数据结构  时间复杂度  Hash  快速  分词算法  character  segmentation  Chinese  结果  实验  理论  查找  对词  词典结构  分析  速度  准确率  应用  组成  信息处理  中文分词
文章编号:1000-7024(2007)07-1716-03
修稿时间:2006-03-21

Multi-hash indexing algorism for Chinese character segmentation
ZHANG Ke.Multi-hash indexing algorism for Chinese character segmentation[J].Computer Engineering and Design,2007,28(7):1716-1718.
Authors:ZHANG Ke
Affiliation:College of Computer Science, Chongqing University, Chongqing 400044, China
Abstract:Chinese word segmentation is a very important component and the preparation for Chinese information process. In a lot of application, the precision of word segmentation is paramount, at the same time the velocity is also needed. Through the analysis of the existing algorithms of Chinese word segmentation, especially the fast algorithms, a highly efficient algorithm for Chinese word segmentation is introduced, which is based on the improvement of existing data structure for Chinese dictionary. It not only supports hashing operation on the first Chinese character, but also on the other characters. In theory, the above data structure achieve much more efficiency than other methods,
Keywords:Chinese word segmentation  Chinese information processing  Hash  data structure  time complexity
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号