首页 | 本学科首页   官方微博 | 高级检索  
     

基于PATRICIA tree的汉语自动分词词典机制
引用本文:杨文峰,陈光英,李星.基于PATRICIA tree的汉语自动分词词典机制[J].中文信息学报,2001,15(3):45-50.
作者姓名:杨文峰  陈光英  李星
作者单位:清华大学电子工程系
基金项目:8 6 3计划!(86 3- 30 6 -ZD0 2 - 0 2 - 7)
摘    要:分词词典是汉语信息处理系统的一个基本组成部分,其查询和更新效率将直接影响汉语信息处理系统的性能。本文采用PATRICIA tree的数据结构,设计了一种可以对词典词条进行快速查询、更新的分词词典机制,并从理论上初步分析了它的性能。最后通过实验,在时间效率上与逐字二分的分词词典机制进行了比较。结果表明,基于PATRICIA tree的分词词典机制具有更高的查询速度和更新效率,能满足大规模、开放文本处理系统的需求。

关 键 词:信息检索  PATRICIA  tree  汉语自动分词  
修稿时间:2000年8月30日

PATRICIA-tree based Dictionary Mechanism for Chinese Word Segmentation
YANG Wen,feng,CHEN Guang,ying,LI Xing.PATRICIA-tree based Dictionary Mechanism for Chinese Word Segmentation[J].Journal of Chinese Information Processing,2001,15(3):45-50.
Authors:YANG Wen  feng  CHEN Guang  ying  LI Xing
Affiliation:Department of Electronic Engineering ,Tsinghua University
Abstract:The dictionary mechanism is the basic component of Chinese informationprocessing systems,and its efficiency will greatly affect the performances of those systems.Based on the data structure of PATRICIA tree,this paper designed a new PATRICIA tree based dictionary mechanism.Firstly,the paper presents the primary function analysis of this PATRICIA tree based dictionary mechanism.Then a comparison is given between PATRICIA tree based and binary seek by characters dictionary mechanism.All the results prove that the PATRICIA tree based dictionary mechanism is better than recently used dictionary mechanisms in many aspects such as the efficiency of retireving and modifing the words and more suitable for the large scale Chinese text processing systems.
Keywords:information retrieval  PATRICIA tree  Chinese word segmentation
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号