首页 | 本学科首页   官方微博 | 高级检索  
     

基于n-gram的字符串分割技术的算法实现
引用本文:李文,洪亲,滕忠坚,石兆英,胡小丹,刘海博.基于n-gram的字符串分割技术的算法实现[J].计算机与现代化,2010(9):85-87,91.
作者姓名:李文  洪亲  滕忠坚  石兆英  胡小丹  刘海博
作者单位:福建师范大学仓山校区物理与光电信息科技学院,福建,福州,350007
摘    要:相似字符串的模糊查询一直是人们致力研究的方向,目前基于关键字的查询技术都是前缀匹配,无法查找到与搜索字符串相似的结果。本文提出一种基于n-gram的字符串分割技术的算法,该技术是实现基于关键字的模糊查询技术的基础,通过对数据集以及搜索关键字的字符串进行分割,利用编辑距离实现相似字符串的模糊查询,该技术在数据挖掘以及论文抄袭等方面都有很重要的应用。

关 键 词:模糊查询  编辑距离  n-gram  字符串分割

Implementation of Algorithm Based on n-gram String Segmentation
LI Wen,HONG Qin,TENG Zhong-jian,SHI Zhao-ying,HU Xiao-dan,LIU Hai-bo.Implementation of Algorithm Based on n-gram String Segmentation[J].Computer and Modernization,2010(9):85-87,91.
Authors:LI Wen  HONG Qin  TENG Zhong-jian  SHI Zhao-ying  HU Xiao-dan  LIU Hai-bo
Affiliation:(School of Physics and Optoelectronics Technology of Fujian Normal University Cangshan Campus,Fuzhou 350007,China)
Abstract:Recently,similar strings of fuzzy queries have been involved in research,and many people are working on it actively.The current keyword-based query techniques are mostly prefix match,which could not find results that are similar with the query strings.This paper presents an algorithm of n-gram string segmentation,which is the foundation for fuzzy query.Data sets,as well as the keyword strings are segmented by it,and then the string edit distance is used to achieve similar fuzzy query,which has very important applications in data cleaning and cloning papers
Keywords:n-gram
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号