首页 | 本学科首页   官方微博 | 高级检索  
     


A semantic based Web page classification strategy using multi-layered domain ontology
Authors:Ahmed?I?Saleh  Mohammed?F?Al?Rahmawy  Email author" target="_blank">Arwa?E?AbulwafaEmail author
Affiliation:1.Department of Computer Engineering & Systems, Faculty of Engineering,Mansoura University,Mansoura,Egypt;2.Department of Computer Science, Faculty of Computers and Information,Mansoura University,Mansoura,Egypt
Abstract:World Wide Web is a continuously growing giant, and within the next few years, Web contents will surely increase tremendously. Hence, there is a great requirement to have algorithms that could accurately classify Web pages. Automatic Web page classification is significantly different from traditional text classification because of the presence of additional information, provided by the HTML structure. Recently, several techniques have been arisen from combinations of artificial intelligence and statistical approaches. However, it is not a simple matter to find an optimal classification technique for Web pages. This paper introduces a novel strategy for vertical Web page classification, which is called Classification using Multi-layered Domain Ontology (CMDO). It employs several Web mining techniques, and depends mainly on proposed multi-layered domain ontology. In order to promote the classification accuracy, CMDO implies a distiller to reject pages related to other domains. CMDO also employs a novel classification technique, which is called Graph Based Classification (GBC). The proposed GBC has pioneering features that other techniques do not have, such as outlier rejection and pruning. Experimental results have shown that CMDO outperforms recent techniques as it introduces better precision, recall, and classification accuracy.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号