首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于逐层扫描的频繁字串快速提取算法
引用本文:张宇萌,刘传汉. 一种基于逐层扫描的频繁字串快速提取算法[J]. 计算机科学, 2008, 35(5): 127-130
作者姓名:张宇萌  刘传汉
作者单位:上海交通大学计算机科学与工程系,上海,200030;宁波大学商学院信息管理系,宁波,315211;上海交通大学计算机科学与工程系,上海,200030
摘    要:串频统计是一种简便有效的抽取未登录词方法.本文提出了一种快速的频繁字串提取和计频方法,通过逐层扫描快速发现频繁字串,修正字串有效出现频次,最后抽取平均互信息量达到阈值的字串.实验结果显示该方法有效可行.

关 键 词:频繁字串  中文抽词  逐层扫描  互信息

An Algorithm of Fast Frequent String Extracting Based on Level-wise Scan
ZHANG Yu-meng,LIU Chuan-han. An Algorithm of Fast Frequent String Extracting Based on Level-wise Scan[J]. Computer Science, 2008, 35(5): 127-130
Authors:ZHANG Yu-meng  LIU Chuan-han
Affiliation:ZHANG Yu-meng~(1,2) LIU Chuan-han~1 (Department of Computer Science , Engineering,Shanghai Jiaotong University,Shanghai 200030,China)~1(Department of Information Management,Ningbo University,Ningbo 315211,China)~2
Abstract:String frequency statistics is a simple and effective method of extraction unlisted word.This paper presents an effective algorithm of extracting frequent strings.It uses a level-wise scan for finding rapidly frequent strings and modifies the valid frequency that string appears in text.Finally,those high-frequent strings that reach the threshold of average mutual information are extracted.Experimental results show that the method is effective and feasible.
Keywords:Frequent string  Chinese automatic word extraction  Level-wise scan  Mutual information  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号