首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对钓鱼攻击者常用的伪造HTTPS网站以及其他混淆技术,借鉴了目前主流基于机器学习以及规则匹配的检测钓鱼网站的方法RMLR和PhishDef,增加对网页文本关键字和网页子链接等信息进行特征提取的过程,提出了Nmap-RF分类方法。Nmap-RF是基于规则匹配和随机森林方法的集成钓鱼网站检测方法。根据网页协议对网站进行预过滤,若判定其为钓鱼网站则省略后续特征提取步骤。否则以文本关键字置信度,网页子链接置信度,钓鱼类词汇相似度以及网页PageRank作为关键特征,以常见URL、Whois、DNS信息和网页标签信息作为辅助特征,经过随机森林分类模型判断后给出最终的分类结果。实验证明,Nmap-RF集成方法可以在平均9~10 μs的时间内对钓鱼网页进行检测,且可以过滤掉98.4%的不合法页面,平均总精度可达99.6%。  相似文献   

2.
Web spam denotes the manipulation of web pages with the sole intent to raise their position in search engine rankings. Since a better position in the rankings directly and positively affects the number of visits to a site, attackers use different techniques to boost their pages to higher ranks. In the best case, web spam pages are a nuisance that provide undeserved advertisement revenues to the page owners. In the worst case, these pages pose a threat to Internet users by hosting malicious content and launching drive-by attacks against unsuspecting victims. When successful, these drive-by attacks then install malware on the victims’ machines. In this paper, we introduce an approach to detect web spam pages in the list of results that are returned by a search engine. In a first step, we determine the importance of different page features to the ranking in search engine results. Based on this information, we develop a classification technique that uses important features to successfully distinguish spam sites from legitimate entries. By removing spam sites from the results, more slots are available to links that point to pages with useful content. Additionally, and more importantly, the threat posed by malicious web sites can be mitigated, reducing the risk for users to get infected by malicious code that spreads via drive-by attacks.  相似文献   

3.
Phishing attack is growing significantly each year and is considered as one of the most dangerous threats in the Internet which may cause people to lose confidence in e-commerce. In this paper, we present a heuristic method to determine whether a webpage is a legitimate or a phishing page. This scheme could detect new phishing pages which black list based anti-phishing tools could not. We first convert a web page into 12 features which are well selected based on the existing normal and fishing pages. A training set of web pages including normal and fishing pages are then input for a support vector machine to do training. A testing set is finally fed into the trained model to do the testing. Compared to the existing methods, the experimental results show that the proposed phishing detector can achieve the high accuracy rate with relatively low false positive and low false negative rates.  相似文献   

4.
基于集成学习的钓鱼网页深度检测系统   总被引:1,自引:0,他引:1  
网络钓鱼是一种在线欺诈行为,它利用钓鱼网页仿冒正常合法的网页,窃取用户敏感信息从而达到非法目的.提出了基于集成学习的钓鱼网页深度检测方法,采用网页渲染来应对常见的页面伪装手段,提取渲染后网页的URL信息特征、链接信息特征以及页面文本特征,利用集成学习的方法,针对不同的特征信息构造并训练不同的基础分类器模型,最后利用分类集成策略综合多个基础分类器生成最终的结果.针对PhishTank钓鱼网页的检测实验表明,本文提出的检测方法具有较好的准确率与召回率.  相似文献   

5.
Users have clear expectations of where web objects are located on a web page. Studies conducted with manipulated, fictitious websites showed that web objects placed according to user expectations are found faster and remembered more easily. Whether this is also true for existing websites has not yet been examined. The present study investigates the relation between location typicality and efficiency in finding target web objects in online shops, online newspapers, and company web pages. Forty participants attended a within-subject eye-tracking experiment. Typical web object placement led to fewer fixations and participants found target web objects faster. However, some web objects were less sensitive to location typicality, if they were more visually salient and conformed to user expectations in appearance. Placing web objects at expected locations and designing their appearance according to user expectations facilitates orientation, which is beneficial for first impressions and the overall user experience of websites.  相似文献   

6.
Web页面中计数器技术研究   总被引:9,自引:0,他引:9  
Web页面计数器能够直观地反映该Web站点受关心的程度,一个好的Web页计数器应该方便使用,并具有较高的性能,Web页面计数器技术充分反映了动态Web页面技术的发展现状,本文给出了几种实现了Web页面计数器的技术,并对这些进行了比较。  相似文献   

7.
早期的WEB应用主要是传输文本数据(比如HTML页面),而它们传输的是静态的页面,客户端只有和服务器端进行交互,页面的内容才会改变。客户端和服务器端的交互会导致页面的整体切换。RIA(富客户端)技术则克服了HTML的限制,它将页面的切换限制在更小的局部,只有需要切换的内容才会进行更新,这样不但减轻了服务器端的负担,而且传输的内容会更少,更利于数据传输,会获得更好的用户体验。  相似文献   

8.
To avoid returning irrelevant web pages for search engine results, technologies that match user queries to web pages have been widely developed. In this study, web pages for search engine results are classified as low-adjacence (each web page includes all query keywords) or high-adjacence (each web page includes some of the query keywords) sets. To match user queries with web pages using formal concept analysis (FCA), a concept lattice of the low-adjacence set is defined and the non-redundancy association rules defined by Zaki for the concept lattice are extended. OR- and AND-RULEs between non-query and query keywords are proposed and an algorithm and mining method for these rules are proposed for the concept lattice. The time complexity of the algorithm is polynomial. An example illustrates the basic steps of the algorithm. Experimental and real application results demonstrate that the algorithm is effective.  相似文献   

9.
sIFR这项技术是网页设计者用Flash替换HTML页面中的文本元素,并在客户机的浏览器中以Flash影片形式正常呈现出来,设计者可将网页中的文本设置成任意字体,且在浏览者的机器上不需要预先安装这些字体。同时,被Flash替换的文本内容的真实性并没有被隐藏,它可被搜索引擎所定位和检索到,这将不影响网站的推广。本文介绍了sIFR技术在网页设计中如何实现文本替换,这将值得网页设者们借鉴。  相似文献   

10.
Linux下网页监控与恢复系统的设计与实现   总被引:1,自引:0,他引:1  
Internet上部署了大量的Web服务,随着各种网络攻击事件愈演愈烈,Web网页及后台数据的安全问题成了亟需解决的问题。对网页监控和数据库保护技术进行研究,针对Web网页的特点及安全需求,设计并实现了网页监测控和恢复系统。该监控系统基于C/S模式的三层体系结构,针对网页安全和数据库的安全漏洞进行了增强的保护,对网页及数据库数据实时监控,当发现网页及数据库被篡改时能及时恢复,保护了Web网站的安全。  相似文献   

11.
网页分类技术是Web数据挖掘的基础与核心,是基于自然语言处理技术和机器学习算法的一个典型的具体应用。基于统计学习理论和蚁群算法理论,提出了一种基于支持向量机和蚁群算法相结合的构造网页分类器的高效分类方法,实验结果证明了该方法的有效性和鲁棒性,弥补了仅利用支持向量机对于大样本训练集收敛慢的不足,具有较好的准确率和召回率。  相似文献   

12.
设计了一种采集分析互联网新闻网页的方法。该方法根据给定的新闻网站的入口地址在网络上找出所有的相关链接;区分这些链接所指向的页面特征,过滤掉相关性不大的内容,提取所有新闻网页的链接;进而进行多层次链接分析,根据新闻的图片、标题字体属性及日期,采用NewsPageRank算法计算每个新闻链接的权重。测试结果表明该方法对Internet上的新闻站点普遍具有较好的分析效果,性能可以满足实用要求。  相似文献   

13.
早期的WEB应用主要是传输文本数据(比如HTML页面),而它们传输的是静态的页面,客户端只有和服务器端进行交互,页面的内容才会改变。客户端和服务器端的交互会导致页面的整体切换,RIA(富客户端)技术则克服了HTML的限制,它将页面的切换限制在更小的局部,只有需要切换的内容才会进行更新,这样不但减轻了服务器端的负担,而且传输的内容会更少,更利于数据传输。会获得更好的用户体验。  相似文献   

14.
网络舆情分析系统中,网页信息预处理方案的实现采用了基于网页结构分析的信息抽取技术和数据存储技术。结合HTML网页的内部结构,设计了一种基于HTMLDOM结构节点路径的网页信息解析模板,用于网页信息抽取。通过网页U1KL的特征研究建立了网页之间的联系机制,应用于数据库存取提高了效率。  相似文献   

15.
半结构化网页中多记录信息的自动抽取方法   总被引:1,自引:0,他引:1  
朱明  王庆伟 《计算机仿真》2005,22(12):95-98
从多记录网页中准确的自动抽取出需要的信息,是Web信息处理中的一个重要研究课题。针对现有方法对噪声敏感的缺点,该文提出了基于记录子树的最大相似度发现记录模式的思想,以在同类记录的表现模式存在一定差异的情况下正确识别记录。在此基础上,实现了多记录网页自动抽取系统,该系统可以从多个学术论文检索网站中,自动获取结果网页,并自动抽取其中的记录。对常见论文检索网站的实验表明了该系统具有较好的有效性和准确性。  相似文献   

16.
基于特征串的大规模中文网页快速去重算法研究   总被引:16,自引:1,他引:16  
网页检索结果中,用户经常会得到内容相同的冗余页面,其中大量是由于网站之间的转载造成。它们不但浪费了存储资源,并给用户的检索带来诸多不便。本文依据冗余网页的特点引入模糊匹配的思想,利用网页文本的内容、结构信息,提出了基于特征串的中文网页的快速去重算法,同时对算法进行了优化处理。实验结果表明该算法是有效的,大规模开放测试的重复网页召回率达97.3% ,去重正确率达99.5%。  相似文献   

17.
现有的搜索引擎查询结果聚类算法大多针对用户查询生成的网页摘要进行聚类,由于网页摘要篇幅较短,质量良莠不齐,聚类效果难以有较大的提高(比如后缀树算法,Lingo算法);而传统的基于全文的聚类算法运算复杂度较高,且难以生成高质量的类别标签,无法满足在线聚类的需求(比如KMeans算法)。该文提出一种基于全文最大频繁项集的网页在线聚类算法MFIC (Maximal Frequent Itemset Clustering)。算法首先基于全文挖掘最大频繁项集,然后依据网页集合之间最大频繁项集的共享关系进行聚类,最后依据类别包含的频繁项生成类别标签。实验结果表明MFIC算法降低了基于网页全文聚类的时间,聚类精度提高15%左右,且能生成可读性较好的类别标签。  相似文献   

18.
ASP.NET下利用动态网页技术生成静态HTML页面的方法   总被引:1,自引:0,他引:1  
介绍了一种在ASP.NET环境下利用动态网页技术生成静态HTML页面的方法.利用这种技术,网站内容管理人员在添加网页时直接利用后台管理发布程序就把页面存放成HTML静态文件,它有生成页面简单、快速的优点.这种技术对于访问量大的网站尤其适用,可以减轻服务器端运行程序和读取数据库的压力,提高了网站的数据存取效率,生成的静态页面也更利于搜索引擎收录.  相似文献   

19.
Contents, layout styles, and parse structures of web news pages differ greatly from one page to another. In addition, the layout style and the parse structure of a web news page may change from time to time. For these reasons, how to design features with excellent extraction performances for massive and heterogeneous web news pages is a challenging issue. Our extensive case studies indicate that there is potential relevancy between web content layouts and their tag paths. Inspired by the observation, we design a series of tag path extraction features to extract web news. Because each feature has its own strength, we fuse all those features with the DS (Dempster-Shafer) evidence theory, and then design a content extraction method CEDS. Experimental results on both CleanEval datasets and web news pages selected randomly from well-known websites show that the F 1-score with CEDS is 8.08% and 3.08% higher than existing popular content extraction methods CETR and CEPR-TPR respectively.  相似文献   

20.
An intelligent categorization engine for bilingual web content filtering   总被引:1,自引:0,他引:1  
It is important to protect children and unsuspecting adults from the harmful effects of objectionable materials, such as pornography, violence, and hate messages, which are now prevalent on the World-Wide Web. This calls for effective tools for web content analysis and filtering of objectionable contents. Our study of existing web content filtering systems has identified a number of deficiencies in these systems. Using the analysis of pornographic web pages as a case study, we present an intelligent bilingual web page categorization engine that can determine if an English or Chinese language web page contains pornographic materials. We have implemented the categorization engine to perform offline web page analysis and near-instantaneous online filtering. Performance evaluation of our system has verified its effectiveness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号