首页 | 本学科首页   官方微博 | 高级检索  
     


KPS: a Web information mining algorithm
Affiliation:1. Department of Marketing, Clemson University, United States;2. Department of Management, Clemson University, United States;1. Department of Information Systems, College of Business, City University of, Hong Kong, Hong Kong Special Administrative Region;2. School of Information, Renmin University of China, Beijing 100872, PR China;3. Smart City Research Center, Renmin University of China, Beijing 100872, PR China
Abstract:The Web mostly contains semi-structured information. It is, however, not easy to search and extract structural data hidden in a Web page. Current practices address this problem by (1) syntax analysis (i.e. HTML tags); or (2) wrappers or user-defined declarative languages. The former is only suitable for highly structured Web sites and the latter is time-consuming and offers low scalability. Wrappers could handle tens, but certainly not thousands, of information sources. In this paper, we present a novel information mining algorithm, namely KPS, over semi-structured information on the Web. KPS employs keywords, patterns and/or samples to mine the desired information. Experimental results show that KPS is more efficient than existing Web extracting methods.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号