首页 | 本学科首页   官方微博 | 高级检索  
     


Research of network data mining based on reliability source under big data environment
Authors:Li  Jinhai  He  Youshi  Ma  Yunlei
Affiliation:1.Taizhou University, Taizhou, 225300, China
;2.School of Management, Jiangsu University, Zhenjiang, 212013, China
;3.Faculty of Science, Jiangsu University, Zhenjiang, 212013, China
;
Abstract:

In the era of big data, facing vast amounts of network data, only identifying the reliable data source can the researchers extract the original data that can be used in scientific research. Building reliable network data mining model based on the improvement of PageRank algorithm with applying each improved algorithm. Then the model is divided into three modules: the first, use PageRank and TrustRank to eliminate cheating webpages; then, refine webpages which related to research topic highly by TC-PageRank which combined with the topic relevancy between webpages and weight of time difference; finally, determine the authoritative webpages of the original data source by the improved HITS which considered the influence of the similarity between webpage and research topic and the amplification of webpage links to the authoritative webpages. Meanwhile, the partitioning of matrix operation based on MapReduce reduces the time and space complexity of the algorithms. And the feasibility and accuracy of the method are verified by comparative analysis of the algorithms.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号