首页 | 本学科首页   官方微博 | 高级检索  
     

基于主题和表单属性的深层网络数据源分类方法
引用本文:祝官文,王念滨,王红滨.基于主题和表单属性的深层网络数据源分类方法[J].电子学报,2013,41(2):260-266.
作者姓名:祝官文  王念滨  王红滨
作者单位:哈尔滨工程大学计算机科学与技术学院,黑龙江哈尔滨 150001
基金项目:国家自然科学基金,黑龙江省自然科学基金
摘    要:当前深层网络中蕴含着高质量的海量信息并且其数量不断地增长,由于深层网络具有分布、异构、自治等特点,用户高效、快捷地获取自己感兴趣的信息面临巨大挑战.将深层网络数据源按领域分类是解决这一挑战的基础.本文以对航空订票、图书、汽车和房地产领域的200多个数据源的统计和分析为基础,充分利用主题和表单属性信息,提出了一种新的深层网络数据源分类方法以及改进的查询接口相似性度量方法,实现深层网络数据源的自动分类.本文还提出了一种查询接口标记策略,以降低随机选择初始中心点所产生的影响.实验结果表明该方法具有较高的分类精度.

关 键 词:表单主题和属性  查询接口标记  深层网络  数据源自动分类  
收稿时间:2012-05-18

An Improved Method for Deep Web Sources Classification Based on the Theme and Form Attributes
ZHU Guan-wen , WANG Nian-bin , WANG Hong-bin.An Improved Method for Deep Web Sources Classification Based on the Theme and Form Attributes[J].Acta Electronica Sinica,2013,41(2):260-266.
Authors:ZHU Guan-wen  WANG Nian-bin  WANG Hong-bin
Affiliation:Department of Computer Science and Technology, Harbin Engineering University.Harbin, Heilongjiang 150001, China
Abstract:Nowadays,Deep web consists of vast amounts of high quality information which is rising rapidly.However,because of its distributed character,heterogeneity,autonomy etc,it is faced with huge challenges for users to obtain the information efficiently and quickly which they are interested in.Deep Web data sources are organized by the domains in the real world,which is the foundation for addressing this challenge.In this paper,based on the statistics and analysis on more than 200 data sources which are from four different fields(i.e.,Airfares,Books,Automobiles and Real estates,a novel classification method and an improved similarity measure of query interfaces were proposed to realize the automatic classification of large masses of deep web sources,which make full use of theme information and form attributes.In addition,we present a strategy of tagging query interface to reduce the influence resulted from choosing initial centers randomly.The experimental results indicated that the method is effective and has higher accuracy.
Keywords:form theme and attributes  query interface tagging  deep web  automatic classification of sources
本文献已被 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号