基于文档标引图模型的文本相似度策略 Document Similarity Strategy Based on Document Index Graph Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于文档标引图模型的文本相似度策略

引用本文：	高茂庭,王正欧.基于文档标引图模型的文本相似度策略[J].计算机工程,2008,34(7):19-22.

作者姓名：	高茂庭王正欧

作者单位：	1. 上海海事大学计算机科学与工程系,上海,200135 2. 天津大学系统工程研究所,天津,300072

基金项目：	国家自然科学基金 , 上海市教委资助项目 , 上海海事大学校科研和教改项目

摘要：	文档标引图是一种基于短语的图结构文本特征表示模型，能更加全面、准确地表达文本特征信息，实现渐增的文本聚类和信息处理。该文基于文档标引图特征模型，提出文档相似度计算加法策略和乘法策略，采用变换函数对文档相似度值进行调整，增强文档之间的可区分性，改进文本聚类和分类等处理的性能，实例证明了策略的有效性。
关键词：	文本聚类文档标引图文本相似度文本特征模型
文章编号：	1000-3428(2008)07-0019-04
修稿时间：	2007年8月17日
Document Similarity Strategy Based on Document Index Graph Model

GAO Mao-ting,WANG Zhen-gou.Document Similarity Strategy Based on Document Index Graph Model[J].Computer Engineering,2008,34(7):19-22.

Authors:	GAO Mao-ting WANG Zhen-gou

Affiliation:	(1. Computer Science and Engineering Department, Shanghai Maritime University, Shanghai 200135; 2. Institute of Systems Engineering, Tianjin University, Tianjin 300072)

Abstract:	Document Index Graph(DIG) is a kind of phrase-based graph structure text feature representation model, which is able to express text feature information more completely and exactly to realize incremental text clustering and information processing. Based on DIG, document similarity additive and multiplicative strategy are proposed, document similarity is adjusted by a set of transform function, distinguishability between documents is strengthened, and performance of text clustering and classification are improved. Experiments demonstrate the efficiency of the methods.

Keywords:	text clustering Document Index Graph(DIG) document similarity text feature model
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏