首页 | 本学科首页   官方微博 | 高级检索  
     

基于领域特征文本的Deep Web分类研究
引用本文:吴春明,谢德体. 基于领域特征文本的Deep Web分类研究[J]. 计算机科学, 2012, 39(4): 177-180
作者姓名:吴春明  谢德体
作者单位:1. 西南大学计算机与信息科学学院 重庆400715;西南大学资源环境学院 重庆400715
2. 西南大学资源环境学院 重庆400715
基金项目:中央高校基本科研业务费专项资金项目,重庆市自然科学基金
摘    要:Deep Web自动分类是建立深网数据集成系统的前提和基础。提出了一种基于领域特征文本的Deep Web分类方法。首先借助本体知识对表达同一语义的不同词汇进行了概念抽象,进而给出了领域相关度的定义,并将其作为特征文本选择的量化标准,避免了人为选取的主观性和不确定性;在接口向量模型构建中,考虑了不同特征文本对于分类作用的差异,提出了一种改进的W-TFIDF权重计算方法;最后采用KNN算法对接口向量进行了分类。对比实验证明,利用所提方法选择的特征文本是准确有效的,新的特征文本权重计算方法能显著地提高分类精度,且在KNN算法中表现出较好的稳定性。

关 键 词:特征文本  领域分类  向量空间模型  Deep Web

Research on Deep Web Classification Based on Domain Feature Text
WU Chun-ming , XIE De-ti. Research on Deep Web Classification Based on Domain Feature Text[J]. Computer Science, 2012, 39(4): 177-180
Authors:WU Chun-ming    XIE De-ti
Affiliation:2(College of Computer and Information Science,Southwest University,Chongqing 400715,China)1(College of Resources and Environment,Southwest University,Chongqing 400715,China)2
Abstract:Automatic Decp Web classification is the basis of building Decp Web data intergration system. An approachwas proposed to classify the Deep Web based on domain feature text. Using the ontology knowledge, the concepts whichexpress the same semantics were firstly extracted from different texts. Then the definition of domain correlation wasgiven as the quantitative criteria for feature text selection, in order to avoid the subjectivity and uncertainty of manualselection. In the process of the interface vector space model construction, an improved weighting method namedwI}FIDF was proposed to evaluate the different roles of feature text. At last, a KNN algorithm was used to classify theseinterface vectors. Comparative experiments indicate that the feature text selected by our method is accurate and effec-tive, and the new weighting method can improve the classification precision significantly and shows good stability inKNN classification.
Keywords:Fcaturc tcxt   Domain classification   Vcctor space model   Dccp Web
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号