首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hash结构的机械统计分词系统研究
引用本文:费洪晓,胡海苗,巩燕玲.基于Hash结构的机械统计分词系统研究[J].计算机工程与应用,2006,42(5):159-161.
作者姓名:费洪晓  胡海苗  巩燕玲
作者单位:中南大学信息科学与工程学院,湖南,410075
基金项目:中国科学院资助项目;湖南省自然科学基金
摘    要:在综合比较了常用的机械分词方法和统计分词方法基础上,论文设计并实现了一种基于Hash结构的机械统计分词系统。系统突破了传统的机械分词方法和统计分词方法,结合了两者的优势,并对其做了一系列的改进。从对测试结果的分析可以看出,系统的分词速度达到了每秒一万两千个汉字以上,且具有较强的未登陆词识别能力。

关 键 词:中文分词  机械分词  统计分词  Hash结构
文章编号:1002-8331-(2006)05-0159-03
收稿时间:2005-09
修稿时间:2005-09

A Kind of Machine-Statistics System Based on Hash Structure for Chinese Word Segmentation
Fei Hongxiao,Hu Haimiao,Gong Yanling.A Kind of Machine-Statistics System Based on Hash Structure for Chinese Word Segmentation[J].Computer Engineering and Applications,2006,42(5):159-161.
Authors:Fei Hongxiao  Hu Haimiao  Gong Yanling
Affiliation:Information Science and Engineering College of Central South University,Changsha 410075
Abstract:On the basis of comprehensive comparison of the mechanical Chinese word segmentation and statistical Chinese word segmentation which are often used,a kind of machine-statistics system based on Hash structure for Chinese word segmentation is put forward and carried out.This system breaks through the traditional mechanical Chinese word segmentation method and statistical Chinese word segmentation method and combines the advantages of them.On the other hand,this thesis makes a series of improvement on these two methods as well.The analysis of the testing results shows that the segmentation speed of this system can reach more than 12,000 Chinese characters pre second, further more,this system also has great capacity for finding the new words which not exit in the dictionary.
Keywords:Chinese word segmentation  mechanical Chinese word segmentation  statistical Chinese word segmentation  hash structure
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号