首页 | 本学科首页   官方微博 | 高级检索  
     

汉语自动分词词典新机制—词值哈希机制
引用本文:韩莹,王茂发,陈新房,潘志安,张艳霞.汉语自动分词词典新机制—词值哈希机制[J].计算机系统应用,2013,22(2):233-235.
作者姓名:韩莹  王茂发  陈新房  潘志安  张艳霞
作者单位:防灾科技学院 灾害信息工程系, 北京 101601;防灾科技学院 灾害信息工程系, 北京 101601;防灾科技学院 灾害信息工程系, 北京 101601;防灾科技学院 灾害信息工程系, 北京 101601;防灾科技学院 灾害信息工程系, 北京 101601
摘    要:汉语词典查询是中文信息处理系统的重要基础部分, 对系统效率有重要的影响. 国内自80年代中后期就开展了中文分词词典机制的研究, 为了提高现有基于词典的分词机制的查询效率, 对于词长不超过4字的词提出了一种全新的分词词典机制——基于汉字串进制值的拉链式哈希机制即词值哈希机制. 对每个汉字的机内码从新编码, 利用进制原理, 计算出一个词语的词值, 建立一个拉链式词值哈希机制, 从而提高查询匹配速度.

关 键 词:中文信息处理  中文分词  词典机制  2000进制  拉链式词值哈希机制
收稿时间:8/3/2012 12:00:00 AM
修稿时间:9/6/2012 12:00:00 AM

New Dictionary Mechanism for Chinese Word Segmentation
HAN Ying,WANG Mao-F,CHEN Xin-Fang,PAN Zhi-An and ZHANG Yan-Xia.New Dictionary Mechanism for Chinese Word Segmentation[J].Computer Systems& Applications,2013,22(2):233-235.
Authors:HAN Ying  WANG Mao-F  CHEN Xin-Fang  PAN Zhi-An and ZHANG Yan-Xia
Affiliation:Department of Disaster Information Engineering, Institute of Disaster Prevention, Beijing 101601, China;Department of Disaster Information Engineering, Institute of Disaster Prevention, Beijing 101601, China;Department of Disaster Information Engineering, Institute of Disaster Prevention, Beijing 101601, China;Department of Disaster Information Engineering, Institute of Disaster Prevention, Beijing 101601, China;Department of Disaster Information Engineering, Institute of Disaster Prevention, Beijing 101601, China
Abstract:Word query in Chinese Dictionary is essential part in Chinese information processing system. It has a great impact on system efficiency. The Chinese word segmentation has been studied since the late 1980s. In order to improve the existing word query efficiency, for short word of no more than 4 Chinese characters, a new hash algorithm is proposed, named Zipper-style hash indexing based on the value of each characters in Chinese word. The hash value is calculated according to machine code of each character, the weight of the left character is big than the right. The weight is equal to the maximum value of all Chinese characters minus the minimum value. The speed of word query is improved with this kind of Zipper-style Chinese word value hash indexing.
Keywords:Chinese information processing  Chinese word segmentation  dictionary mechanism  two thousand decimal  zipper-style Chinese word value hash indexing
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号