首页 | 本学科首页   官方微博 | 高级检索  
     

Deep Web入口探测与分类方法研究
引用本文:张亮,陆余良,刘金红. Deep Web入口探测与分类方法研究[J]. 计算机应用研究, 2009, 26(12): 4697-4699. DOI: 10.3969/j.issn.1001-3695.2009.12.083
作者姓名:张亮  陆余良  刘金红
作者单位:解放军电子工程学院,网络系,合肥,230037
摘    要:传统的使用语料库对入口标签字符串进行匹配的方法受限于语料库的完整性和匹配算法的灵活性。为突破这种局限,引入了基于表单元件统计特征的Deep Web入口探测方法和使用文本分类方法对其进行分类的双层分类模型,并提出了两种特征权重计算方法用于特征选取。在TEL-8 Query Interfaces数据集上,测试结果体现了双层分类模型的优越性和特征向量维归约的必要性。

关 键 词:Deep Web; 网络爬虫; 结构特征; 维归约; 双层分类模型

Research on detecting and classifying Deep Web interfaces
ZHANG Liang,LU Yu-liang,LIU Jin-hong. Research on detecting and classifying Deep Web interfaces[J]. Application Research of Computers, 2009, 26(12): 4697-4699. DOI: 10.3969/j.issn.1001-3695.2009.12.083
Authors:ZHANG Liang  LU Yu-liang  LIU Jin-hong
Affiliation:(Dept. of Network, PLA Electronic Engineering Institute, Anhui 230037, China)
Abstract:Traditional method using library to match those labels is limited to the integrity of the library and the scalability of the matching algorithm. In order to break through this limitation, this paper introduced a bilateral-layer model based on the statistic characteristics of the interfaces to detect Deep Web entries and text classification approach to classify them. Meanwhile, it provided and applied two methods of computing feature-weight to feature selection. The test results got from TEL-8 Query Interfaces showed the superiority of bilateral-layer classification model and the necessity of dimensionality reduction.
Keywords:Deep Web   Web crawlers   structure feature   dimensionality reduction   bilateral-layer classification model
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号