首页 | 本学科首页   官方微博 | 高级检索  
     检索      

串频统计和词形匹配相结合的汉语自动分词系统
引用本文:刘挺,吴岩.串频统计和词形匹配相结合的汉语自动分词系统[J].中文信息学报,1998,12(1):17-25.
作者姓名:刘挺  吴岩
作者单位:哈尔滨工业大学计算机系
摘    要:本文介绍了一种汉语自动分词软件系统,该系统对原文进行三遍扫描:第一遍,利用切分标记将文本切分成汉字短串的序列;第二遍,根据各短串的每个子串在上下文中的频率计算其权值,权值大的子串候选词;第三遍,利用侯选词集和一部常用词词典对汉字短串进行切分。实验表明,该分词系统的分词精度在1.5%左右,能够识别大部分生词特别适用于文献检索等领域。

关 键 词:中文信息处理  自动分词  汉语  串频统计  词形匹配

An Chinese Word Automatic Segmentation System Based on String Frequency Statistics Combined with Word Matching
Liu Ting Wu Yan Wang Kaizhu Dept.of Computer Science,Harbin Institute of Technology.An Chinese Word Automatic Segmentation System Based on String Frequency Statistics Combined with Word Matching[J].Journal of Chinese Information Processing,1998,12(1):17-25.
Authors:Liu Ting Wu Yan Wang Kaizhu Deptof Computer Science  Harbin Institute of Technology
Institution:Liu Ting Wu Yan Wang Kaizhu Dept.of Computer Science,Harbin Institute of Technology 150001
Abstract:This paper presents a software system on Chinese automatic word segmentation.The original text is scanned three times:first,the text is cut into short Chinese character string sequence by cut marks;second,every short sting is weighted by its frequency in context,and the short strings weighted heavy are regarded as candidate words;third,short strings are segmented by candidate word set and everyday words.Experiments results shows that the segmentation precision of this word segmentation system is aboue 1.5%,and a large part of new words can be recognized correctly.This system is very suitable to document retrieval and other areas.
Keywords:hinese Information Processing    Automatic Word Segmentation    Software System  
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号