首页 | 本学科首页   官方微博 | 高级检索  
     

网络信息采集中链接与主题相关性的判定研究
引用本文:王兰成,朱建华.网络信息采集中链接与主题相关性的判定研究[J].计算机应用与软件,2012,29(5):209-211,240.
作者姓名:王兰成  朱建华
作者单位:南京政治学院上海校区军事信息管理系 上海 200433
摘    要:面向主题的Web信息采集需判断提取的URL链接主题相关性。基于主题链接上下文提取,主题型语义块采用提取链接周围一定长度的文本,目录型和图片型语义块利用DOM树层次结构,对链接数据进行URL相关性判定;利用知网基于语义相似度的链接判定,给出一种综合内容和链接结构分析的URL主题相关性判定NPR算法,比较PageRank算法能提供更精确的主题页面。其成果对我国信息机构进行学科网络信息资源的深度建设有实用价值。

关 键 词:Web信息采集  语义分析  URL链接  主题相关性

RESEARCH ON LINK AND SUBJECT RELEVENCE DETERMINATION IN NETWORK INFORMATION COLLECTION
Wang Lancheng , Zhu Jianhua.RESEARCH ON LINK AND SUBJECT RELEVENCE DETERMINATION IN NETWORK INFORMATION COLLECTION[J].Computer Applications and Software,2012,29(5):209-211,240.
Authors:Wang Lancheng  Zhu Jianhua
Affiliation:Wang Lancheng Zhu Jianhua(Department of Information Management,Shanghai Campus of Political College,Shanghai 200433,China)
Abstract:Subject-oriented Web information collection requires determination of the relevence of extracted URL link subject.Context extraction is based on subject links.The subject type of semantic blocks are extracted from a certain length of texts around a link.The catalog and image types of semantic blocks use the DOM tree hierarchy to determine URL relevence for link data.Hownet is utilized for link determination based on semantic similiarity to provide a URL subject relevence determination NPR algorithm that comprises content and link structure analysis,which offers more accurite subject pages than PageRank algorithm.The achievements are of practical value for the depth construction of subject network information resources for domestic information organizations.
Keywords:Web information collection Semantic analysis URL link Subject relevance
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号