一种基于逐层扫描的频繁字串快速提取算法 An Algorithm of Fast Frequent String Extracting Based on Level-wise Scan期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于逐层扫描的频繁字串快速提取算法

引用本文：	张宇萌,刘传汉. 一种基于逐层扫描的频繁字串快速提取算法[J]. 计算机科学, 2008, 35(5): 127-130

作者姓名：	张宇萌刘传汉

作者单位：	上海交通大学计算机科学与工程系,上海,200030;宁波大学商学院信息管理系,宁波,315211;上海交通大学计算机科学与工程系,上海,200030

摘要：	串频统计是一种简便有效的抽取未登录词方法.本文提出了一种快速的频繁字串提取和计频方法,通过逐层扫描快速发现频繁字串,修正字串有效出现频次,最后抽取平均互信息量达到阈值的字串.实验结果显示该方法有效可行.
关键词：	频繁字串中文抽词逐层扫描互信息
An Algorithm of Fast Frequent String Extracting Based on Level-wise Scan

ZHANG Yu-meng,LIU Chuan-han. An Algorithm of Fast Frequent String Extracting Based on Level-wise Scan[J]. Computer Science, 2008, 35(5): 127-130

Authors:	ZHANG Yu-meng LIU Chuan-han

Affiliation:	ZHANG Yu-meng~(1,2) LIU Chuan-han~1 (Department of Computer Science , Engineering,Shanghai Jiaotong University,Shanghai 200030,China)~1(Department of Information Management,Ningbo University,Ningbo 315211,China)~2

Abstract:	String frequency statistics is a simple and effective method of extraction unlisted word.This paper presents an effective algorithm of extracting frequent strings.It uses a level-wise scan for finding rapidly frequent strings and modifies the valid frequency that string appears in text.Finally,those high-frequent strings that reach the threshold of average mutual information are extracted.Experimental results show that the method is effective and feasible.

Keywords:	Frequent string Chinese automatic word extraction Level-wise scan Mutual information
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏