中文信息检索系统的模糊匹配算法研究和实现 An Approximate String Matching Algorithm for Chinese Information Retrieval Systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

中文信息检索系统的模糊匹配算法研究和实现

引用本文：	王静帆,邬晓钧,夏云庆,郑方.中文信息检索系统的模糊匹配算法研究和实现[J].中文信息学报,2007,21(6):59-64.

作者姓名：	王静帆邬晓钧夏云庆郑方

作者单位：	清华大学计算机系清华信息科学与技术国家实验室技术创新和开发部语音和语言技术中心,北京 100084

摘要：	在现代中文信息检索系统中,用户输入的字符串和实际数据库中的条目往往存在局部偏差,而基于关键词匹配的检索技术不能很好地解决这一问题。本文参考并改进了Tarhio和Ukkonen提出的过滤算法^1],针对汉字拼音输入法中常出现的同音字/近音字混用现象,将算法进一步扩展到广义的Edit Distance上。实验表明,本文提出的算法能有效提高中文信息检索系统的召回率,在实际应用中可达到“子线性”的效率。
关键词：	计算机应用中文信息处理模糊匹配过滤算法动态规划
文章编号：	1003-0077（2007）06-0059-06
收稿时间：	2007-01-09
修稿时间：	2007-09-10
An Approximate String Matching Algorithm for Chinese Information Retrieval Systems

WANG Jing-fan,WU Xiao-jun,XIA Yun-qing,ZHENG Fang.An Approximate String Matching Algorithm for Chinese Information Retrieval Systems[J].Journal of Chinese Information Processing,2007,21(6):59-64.

Authors:	WANG Jing-fan WU Xiao-jun XIA Yun-qing ZHENG Fang

Affiliation:	Dept.of Computer Sci. & Tech. Tsinghua University, Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsinghua National Laboratory for Information Science and Technology,Beijing 100084, China

Abstract:	In the modern Chinese information retrieval systems,classical keyword based string matching can not work when the input string is different from the entries in the database.This paper proposed a method based on Tarhio and Ukkonen's filtering algorithm to solve the problem.Because the Chinese Pinyin typewriting usually consists Chinese characters with the same or similar pronunciations,we defined a special Edit Distance and expended our method accordingly.The experimental results showed that our algorithm can improve the recall rate of the retrieval systems and obtain practical sub-linear complexity.

Keywords:	computer application Chinese information processing approximate matching filter algorithm dynamic programming
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏