首页 | 本学科首页   官方微博 | 高级检索  
     

基于wordNet的类别可拓展网页分类系统
引用本文:彭小刚,明仲,王海涛,周景洲.基于wordNet的类别可拓展网页分类系统[J].深圳大学学报(理工版),2009,26(2).
作者姓名:彭小刚  明仲  王海涛  周景洲
作者单位:深圳大学计算机与软件学院,深圳,518060
基金项目:国家自然科学基金,深圳市科技基金 
摘    要:基于文本写作常采用一个意思由多个不同写法的单词来表述,研究词义文本分类法被用来替代使用关键词分类算法以提高分类准确率.分析wordNet内Synset架构,认为一个兼顾词义以及词义间关系的词义文本分类系统可应用到网页分类中.该系统同时注意到固定的文本类别结构以及结构内不断增长的文件数目间的区别,加入了基于类别信息聚类方法的类别拓展的功能.仿真实验证明,该分类系统与现有的基于语义的分类系统相比,在分类准确度性能上能提高13%.基于类别信息类聚的文本拓展功能与采用基于相似度的类聚方法的系统相比获得了一个质量更高的新增类别.

关 键 词:信息提取  网页分类  基于词义分类  类别拓展

WordNet based webpage classification system with category expansion
PENG Xiao-gang,MING Zhong,WANG Hai-tao,ZHOU Jing-zhou.WordNet based webpage classification system with category expansion[J].Journal of Shenzhen University(Science &engineering),2009,26(2).
Authors:PENG Xiao-gang  MING Zhong  WANG Hai-tao  ZHOU Jing-zhou
Abstract:Since different key words might be used to express the same meaning in text,many sense-based webpage classification algorithms have been presented to facilitate the process of retrieving online information instead of keyword based algorithms.A sense based webpage classification system using synsets in wordNet as well as the whole synset structure was developed to improve the classification accuracy.A category-based clustering algorithm for category expansion was also used in the system to address the problems caused by the conflict between the fixed number of categories and the growing number of documents added to the system.Experimental results show that the semantic hierarchy classification algorithm increases the classification accuracy by 13% compared with existing sense-based classification algorithms.The category-based clustering algorithm achieves a higher quality cluster than other existing methods that use similarity measure only.
Keywords:wordNet  information retrieval  webpage classification  wordNet  sense based classification  category expansion
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号