首页 | 本学科首页   官方微博 | 高级检索  
     

面向中文自动分词的可扩展式电子词典研究
引用本文:贺胜,曲维光,许超. 面向中文自动分词的可扩展式电子词典研究[J]. 计算机工程与应用, 2008, 44(21): 199-201. DOI: 10.3778/j.issn.1002-8331.2008.21.054
作者姓名:贺胜  曲维光  许超
作者单位:南京师范大学,文学院,南京,210097;南京师范大学,计算机科学系,南京,210097
基金项目:国家自然科学基金 , 江苏省自然科学基金
摘    要:在中文自动分词及词性标注系统中,电子词典是系统的重要组成部分,也是影响系统性能的重要因素之一。介绍了电子词典应该具备的查询功能及常用的组织结构,给出了一种结构为系统词典+用户词典的可扩展式电子词典机制。其系统词典是基于首字Hash散列的逐字二分词典结构,用户词典采用基于首字Hash散列的链接表词典结构,具有很强的扩展性和实用性。

关 键 词:电子词典  词典结构  自动分词  Hash
收稿时间:2008-04-30
修稿时间:2008-5-20 

Extendable digital dictionary for automatic Chinese word segmentation
HE Sheng,QU Wei-guang,XU Chao. Extendable digital dictionary for automatic Chinese word segmentation[J]. Computer Engineering and Applications, 2008, 44(21): 199-201. DOI: 10.3778/j.issn.1002-8331.2008.21.054
Authors:HE Sheng  QU Wei-guang  XU Chao
Affiliation:1.School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China 2.Deptartment of Computer Science,Nanjing Normal University,Nanjing 210097,China
Abstract:Digital dictionary is an important part in automatic Chinese word segmentation and part of speech tagging,which is also a vital factor affecting system performance.This thesis introduces the necessary searching functions and common components for a digital dictionary and proposes an extendable mechanism which consists of system dictionary and user dictionary.The system dictionary is indexed with initial character hash table characterized with character-based binary tree structure.The user's dictionary is also indexed with initial character hash table but augmented with linking structure.Experiment shows that the system is extendable in practice.
Keywords:digital dictionary  dictionary structure  automatic word segmentation  hash
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号