基于分词矩阵模型的模糊匹配查重算法研究 Research on Fuzzy Matching Duplicate Checking Algorithm Based on Matrix Model of Word Segmentation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于分词矩阵模型的模糊匹配查重算法研究

引用本文：	李成龙,杨冬菊,韩燕波. 基于分词矩阵模型的模糊匹配查重算法研究[J]. 计算机科学, 2017, 44(Z11): 55-60, 83

作者姓名：	李成龙杨冬菊韩燕波

作者单位：	大规模流数据集成与分析技术北京市重点实验室北京100144 北方工业大学云计算研究中心北京100144,大规模流数据集成与分析技术北京市重点实验室北京100144 北方工业大学云计算研究中心北京100144,大规模流数据集成与分析技术北京市重点实验室北京100144 北方工业大学云计算研究中心北京100144

基金项目：	本文受国家自然科学基金面上项目(61672042),支持流式大数据实时联动的数据服务模型及方法研究资助

摘要：	针对中文文本查重的需求,利用分词的结果,将待查重的目标文本和查重样本文本转换为分词矩阵模型,然后扫描和分析矩阵,得到查重结果。由此提出了一种查重算法,并通过实例验证了该算法具有一定的实用效果。
关键词：	相似度分词矩阵模型模糊匹配查重算法
Research on Fuzzy Matching Duplicate Checking Algorithm Based on Matrix Model of Word Segmentation

LI Cheng-long,YANG Dong-ju and HAN Yan-bo. Research on Fuzzy Matching Duplicate Checking Algorithm Based on Matrix Model of Word Segmentation[J]. Computer Science, 2017, 44(Z11): 55-60, 83

Authors:	LI Cheng-long YANG Dong-ju HAN Yan-bo

Affiliation:	Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data,Beijing 100144,China Research Center for Cloud Computing,North China University of Technology,Beijing 100144,China,Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data,Beijing 100144,China Research Center for Cloud Computing,North China University of Technology,Beijing 100144,China and Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data,Beijing 100144,China Research Center for Cloud Computing,North China University of Technology,Beijing 100144,China

Abstract:	Aiming at the need of Chinese text duplicate checking,based on the result of word segmentation,we converted target text and sample text into matrix model of word segmentation,then scanned and analyzed matrix to get the result.Therefore an algorithm of duplicate checking was developed,and the usefulness of the method was demonstrated by practical examples.

Keywords:	Similarity Matrix model of word segmentation Fuzzy matching Duplicate checking algorithm

	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏