快速中文字符串模糊匹配算法 Fast Approximate String Matching for Chinese Text期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

快速中文字符串模糊匹配算法

引用本文：	陈开渠,赵洁,彭志威.快速中文字符串模糊匹配算法[J].中文信息学报,2004,18(2):59-66.

作者姓名：	陈开渠赵洁彭志威

作者单位：	中兴通讯股份有限公司

摘要：	本文解决了中文字符串模糊匹配的两个主要问题:空间问题和时间问题。目前字符串模糊匹配的两个主要方法是位向量方法和过滤方法。由于汉字众多,应用位向量方法时,需要大量空间。对于某些内存很少的小型计算机,比如嵌入式系统,这将会是一个问题。本文改进了位向量方法,使其在应用于中文字符串时,空间需求降低到约5%。本文还利用汉字非常多的特点,提出一种新的基于过滤方法的中文字符串模糊匹配算法,BPM-BM,其速度比世界上最快的算法至少提高14%;在大部分情况下,是其速度的1.5～2倍。
关键词：	计算机应用中文信息处理字符串匹配模糊匹配中文字符串匹配
文章编号：	1003-0077(2004)02-0058-08
修稿时间：	2003年3月2日
Fast Approximate String Matching for Chinese Text

CHEN Kai-qu,ZHAO Jie,PENG Zhi-wei.Fast Approximate String Matching for Chinese Text[J].Journal of Chinese Information Processing,2004,18(2):59-66.

Authors:	CHEN Kai-qu ZHAO Jie PENG Zhi-wei

Affiliation:	ZTE Corporation

Abstract:	For now there are two effective methods to improve approximate string matching : bit-vector method and filter method. Since Chinese alphabet has many characters , it needs much computer memory for bit-vector method. This would be a problem for some little computer which has a small memory , such as embedded system. We present a new bit-vector method which needs only about 5% computer memory of original bit-vector method. And , we also utilize the fact that Chinese alphabet is very large and develop a new filter method , BPM-BM , for approximate string matching of Chinese text . It runs at least 14% faster than the known fasted algorithms. In most cases , our algorithm is even 1.5～2 times faster.

Keywords:	computer application Chinese information processing string matching approximate matching Chinese string matching
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏