基于n-gram的字符串分割技术的算法实现 Implementation of Algorithm Based on n-gram String Segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于n-gram的字符串分割技术的算法实现

引用本文：	李文,洪亲,滕忠坚,石兆英,胡小丹,刘海博.基于n-gram的字符串分割技术的算法实现[J].计算机与现代化,2010(9):85-87,91.

作者姓名：	李文洪亲滕忠坚石兆英胡小丹刘海博

作者单位：	福建师范大学仓山校区物理与光电信息科技学院,福建,福州,350007

摘要：	相似字符串的模糊查询一直是人们致力研究的方向,目前基于关键字的查询技术都是前缀匹配,无法查找到与搜索字符串相似的结果。本文提出一种基于n-gram的字符串分割技术的算法,该技术是实现基于关键字的模糊查询技术的基础,通过对数据集以及搜索关键字的字符串进行分割,利用编辑距离实现相似字符串的模糊查询,该技术在数据挖掘以及论文抄袭等方面都有很重要的应用。
关键词：	模糊查询编辑距离 n-gram 字符串分割
Implementation of Algorithm Based on n-gram String Segmentation

LI Wen,HONG Qin,TENG Zhong-jian,SHI Zhao-ying,HU Xiao-dan,LIU Hai-bo.Implementation of Algorithm Based on n-gram String Segmentation[J].Computer and Modernization,2010(9):85-87,91.

Authors:	LI Wen HONG Qin TENG Zhong-jian SHI Zhao-ying HU Xiao-dan LIU Hai-bo

Affiliation:	(School of Physics and Optoelectronics Technology of Fujian Normal University Cangshan Campus,Fuzhou 350007,China)

Abstract:	Recently,similar strings of fuzzy queries have been involved in research,and many people are working on it actively.The current keyword-based query techniques are mostly prefix match,which could not find results that are similar with the query strings.This paper presents an algorithm of n-gram string segmentation,which is the foundation for fuzzy query.Data sets,as well as the keyword strings are segmented by it,and then the string edit distance is used to achieve similar fuzzy query,which has very important applications in data cleaning and cloning papers

Keywords:	n-gram
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏