Ed-Sjoin:一种优化的字符串相似连接算法 Ed-Sjoin: An Optimal Algorithm for Similarity Joins with Edit Distance Constraints期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Ed-Sjoin:一种优化的字符串相似连接算法

引用本文：	李璐,王宏志,李建中,高宏.Ed-Sjoin:一种优化的字符串相似连接算法[J].计算机研究与发展,2009,46(Z2).

作者姓名：	李璐王宏志李建中高宏

作者单位：	哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001

基金项目：	国家"八六三"高技术研究发展计划基金项目，国家自然科学基金重点项目，国家自然科学基金项目，黑龙江省青年科技专项基金项目，NSFC/RGC联合科研基金项目

摘要：	相似连接(similarity join)在数据清洗、生物信息、模式识别等应用领域中有着广泛应用,其中基于编辑距离的字符串相似连接是一种重要的相似连接.尽管当前有一些基于编辑距离的字符串连接算法提出,然而,当前的算法存在着大量的多余计算,影响了算法的效率.为了高效计算基于编辑距离的字符串连接,提出了一种优化的算法Ed-sjoin,分别从优化筛选算法和基于前缀的重复消减策略两方面对算法进行优化,这些优化策略可以实现更加有效的剪枝,并且避免了部分重复计算,从而加速算法的执行.实验结果表明,提出的方法优于现有方法.
关键词：	相似连接编辑距离前缀
Ed-Sjoin: An Optimal Algorithm for Similarity Joins with Edit Distance Constraints

Li Lu,Wang Hongzhi,Li Jianzhong,Gao Hong.Ed-Sjoin: An Optimal Algorithm for Similarity Joins with Edit Distance Constraints[J].Journal of Computer Research and Development,2009,46(Z2).

Authors:	Li Lu Wang Hongzhi Li Jianzhong Gao Hong

Abstract:	Similarity join has been widely used in many applications,such as data integration and cleaning,bioinformatics,and pattern recognition.The edit-distance-based similarity join is one of the most important similarity join operators.Recently there are many similarity join algorithms based on edit distance.However,in these algorithms,there is a large amount of redundant calculation,which affects their efficiencies.In the paper,In order to calculate similarity join based on edit distance efficiently,a new algorithm,Ed-Sjoin,is proposed,which exploits two optimal filtering methods and two prefix-based redundant computation reducing methods.Experimental results show that the algorithm presented in this paper is more efficient than existing algorithms.

Keywords:	similarity join edit distance prefix
本文献已被万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏