基于文本分类的文档相似度计算 Calculation of the Text Similarity Based on Text Categorization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于文本分类的文档相似度计算

引用本文：	赵俊杰,胡学钢. 基于文本分类的文档相似度计算[J]. 微型电脑应用, 2008, 24(12): 46-47

作者姓名：	赵俊杰胡学钢

作者单位：	1. 合肥工业大学,安徽,蚌埠,233061 2. 合肥工业大学计算机与信息学院,安徽,合肥,23009

基金项目：	教育部社科研究基金青年项目，安徽财经大学教研重点项目

摘要：	如何从成千上万篇文档中找出与指定文档相似的所有文档，首先要做的第一件事就是判断其类别，也就是分类；在判定类别后，再进一步计算，找出同类中所有与指定文档内容相似的文档。由于文档相似度的计算和文本分类过程很相似，所以可以借助指定文档的分类结果，即类别和文档特征向量值，通过进一步计算与同类中其他文档的相似度值，找出超过阂值的文档，即找出与指定目标内容相似的文档。
关键词：	文本分类相似度向量空间模型 KNN
Calculation of the Text Similarity Based on Text Categorization

ZHAO Jun-jie,HU Xue-gang. Calculation of the Text Similarity Based on Text Categorization[J]. Microcomputer Applications, 2008, 24(12): 46-47

Authors:	ZHAO Jun-jie HU Xue-gang

Affiliation:	ZHAO Jun-jie~1 HU Xue-gang~2(1 Hefei University of Technology,Anhui University of Finance &Economics,Bengbu 233061,2 School of Computer , Information,Hefei University of Technology,Hefei 230009,China)

Abstract:	The first thing to find the similar texts to the assigned ones from thousands of texts is to judge the classification,that is the categorization.And then,make a further calculation to find all the texts which are similar to the content of the assigned texts.Due to the similarity between text similarity calculation and text categorization,we can make use of the results of the assigned text categorization,classification and vector value of text feature.And after that,we find the texts which exceed the thresho...

Keywords:	Text categorization Similarity Vector space model KNN
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏