基于多重索引模型的大规模词典近似匹配算法 An Approximate Matching Algorithm for Large Scale Lexicons期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多重索引模型的大规模词典近似匹配算法

引用本文：	龚才春,黄玉兰,许洪波,白硕.基于多重索引模型的大规模词典近似匹配算法[J].计算机研究与发展,2008,45(10).

作者姓名：	龚才春黄玉兰许洪波白硕

作者单位：	1. 中国科学院计算技术研究所,北京,100190;北京市计算中心,北京,100005 2. 中国科学院计算技术研究所,北京,100190

基金项目：	国家"九七三"重点基础研究发展规划基金项目，国家"八六三"高技术研究发展计划基金项目

摘要：	编辑器的拼写校正、搜索引擎的查询纠正、光学字符识别的结果检查等领域都用到词典近似匹配算法.传统单索引模式很难在高性能的前提下保证高召回率.词典越大问题越严重.提出了大规模词典近似匹配的多重索引模型,首先将背景词典根据单词长度划分为若干子词典,对各子词典按照一定策略建立unigram,bigram,trigram,quadgram中的一种或若干种索引,当查找用户模式P的近似匹配时,根据模式P检索特定N-gram索引链,从而得到候选近似匹配集合C,对C中每一个单词W,计算P与W的编辑距离即可输出P的所有最终匹配结果R.实验表明,基于多重索引模型的词典近似匹配算法能够大幅度减少候选近似匹配结果的数量,从而提高词典近似匹配的速度.
关键词：	模式匹配近似匹配多重索引模型大规模词典拼写检查
An Approximate Matching Algorithm for Large Scale Lexicons

Gong Caichun,Huang Yulan,Xu Hongbo,Bai Shuo.An Approximate Matching Algorithm for Large Scale Lexicons[J].Journal of Computer Research and Development,2008,45(10).

Authors:	Gong Caichun Huang Yulan Xu Hongbo Bai Shuo

Affiliation:	Gong Caichun1,2,Huang Yulan1,Xu Hongbo1,, Bai Shuo11(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)2(Beijing Municipal Computing Center,Beijing 100005)

Abstract:	Approximate lexicon matching is widely used for spelling correction of editors,query suggestion of search engines,post-processing of optical character recognizers and other applications.It is quite difficult for traditional single index schemes to obtain high recall and high performance at the same time.When the background lexicon becomes large,things go from worse to worst.A multiple-indices scheme is presented to handle this problem.The background dictionary is partitioned into several sub-dictionaries,ea...

Keywords:	pattern matching approximate matching multiple-indices scheme large scale lexicon spelling correction
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏