A Semi-Structured Document Model for Text Mining A semi-structured document model for text mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A Semi-Structured Document Model for Text Mining

作者姓名：	杨建武陈晓鸥

作者单位：	[1]NationalKeyLaboratoryforTextProcessing,InstituteofComputerScienceandTechnologyPekingUniversity,Beijing100871,P.R.China [2]NationalKeyLaboratoryforTextProcessing,InstituteofComputerScienceandTechnologyPek

基金项目：	This research is supported by National Technology Innovation Project and Peking University Graduate Student Development Foundation as one of doctoral dissertation's innovative research

摘要：	A semi-structured document has more structured information compared to an ordinary document,and the relation among semi-structured documents can be fully utilized.In order to take advantage of the structure and link information in a semi-structured document for better mining,a structured link vector model (SLVM) is presented in this paper,where a vector represents a document,and vectors‘ elements are determined by terms,document structure and neighboring documents.Text mining based on SLVM is described in the procedure of K-means for briefness and clarity:calculating document similarity and calculating cluster center.The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments,and its F value increases from 0.65-0.73 to 0.82-0.86.
关键词：	HTML语言 XML语言半结构文件模型版本开采结构信息
A semi-structured document model for text mining

Jianwu Yang,Xiaoou Chen.A Semi-Structured Document Model for Text Mining[J].Journal of Computer Science and Technology,2002,17(5):0-0.

Authors:	Jianwu Yang Xiaoou Chen

Affiliation:	(1) National Key Laboratory for Text Processing, Institute of Computer Science and Technology, Peking University, 100871 Beijing, P.R. China

Abstract:	A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized. In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents. Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center. The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 0.65-0.73 to 0.82-0.86.

Keywords:	semi-structured document XML text mining vector space model structured link vector model
本文献已被 CNKI 维普万方数据 SpringerLink 等数据库收录！
	点击此处可从《计算机科学技术学报》浏览原始摘要信息
	点击此处可从《计算机科学技术学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏