首页 | 本学科首页   官方微博 | 高级检索  
     

基于Rough集潜在语义索引的Web文档分类
引用本文:何明,冯博琴,傅向华.基于Rough集潜在语义索引的Web文档分类[J].计算机工程,2004,30(13):3-5.
作者姓名:何明  冯博琴  傅向华
作者单位:西安交通大学计算机科学与技术系,西安,710049
摘    要:Rough集(粗糙集)埋论是一种处理不确定或模糊知识的数学工具。提出了一种基于Rough集理论的潜在语义索引的Web文档分类方法。首先应用向量空间模型表示Web文档信息,然后通过矩阵的奇异值分解来进行信息过滤和潜在语义索引;运用属性约简算法生成分类规则,最后利用多知识库进行文档分类。通过试验比较,该方法具有较好的分类效果。

关 键 词:粗糙集  潜在语义索引  Web文档分类  信息过滤  信息检索
文章编号:1000-3428(2004)13-0003-03
修稿时间:2003年6月6日

Web Document Classification Based on Rough Set Latent Semantic Indexing
HE Ming,FENG Boqin,FU Xianghua.Web Document Classification Based on Rough Set Latent Semantic Indexing[J].Computer Engineering,2004,30(13):3-5.
Authors:HE Ming  FENG Boqin  FU Xianghua
Abstract:Rough set theory is a mathematical tool to deal with uncertain or vague knowledge. An approach to Web document classification based on rough set latent semantic indexing is proposed. Firstly, Web documents, which are denoted by vector space model reduced document feature set. Then, information filtering and latent semantic indexing are conducted by singular value decomposition of matrix. Generating classification rule by attribution reduces algorithm. Finally, the documents are classified with multiple knowledge bases. The experiment results and the comparison with others show tha this Web document classification has good classification performance.
Keywords:s Rough set  Latent semantic indexing  Web document classification  Information filtering  Information retrieval  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号