首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 93 毫秒
1.
恶意网页利用网页木马来攻击网络用户使之成为僵尸网络中的节点,是目前互联网上较为流行的一种攻击手段。攻击者通常将JavaScript编写的恶意脚本嵌入到网页中,当用户浏览该页面时,脚本执行并试图对浏览器或浏览器插件进行攻击。提出一种适用于大规模网页检测的基于预过滤的恶意JavaScript脚本检测与分析方法———JSFEA,该方法使用静态检测快速扫描页面并判定网页是否为可疑页面,如果判定可疑则进行动态检测。实验表明, JSFEA对恶意网页的误报率很低,并减少了85%以上的页面进行动态检测,大大提高了大规模恶意网页检测效率。  相似文献   

2.
互联网中的网页有较多商业广告,绿色网络系统无法过滤其中具有不良内容的网站.为解决该问题,提出一种绿色网络网页正文内容提取算法.通过文件对象模型树识别与提取网页正文内容模块,使用基于粒子群的权值优化算法对网页正文各个板块特征权值进行评分,利用与不良关键字的比较,确定并过滤不良网页.实验结果表明,经粒子群权值算法优化提取后,绿色网络系统对不良网页的识别准确率为86.9%,召回率为95.6%,F值为91.02%,比优化前有较大提高.  相似文献   

3.
用户聚类问题是在线用户行为分析的一个重要研究方向。基于在线评分系统,用户声誉反映的是用户对产品打分的准确程度,用户——产品二部分网络结构反映的是用户对产品的品味偏好。结合声誉度量算法,分别采用DBSCAN方法和基于模块度的贪婪算法从用户打分准确程度和品味偏好角度进行用户聚类,提出一种一致性度量指标来衡量根据用户声誉与根据网络结构得到的两种聚类结果之间的联系。两个实证数据集上的实验结果表明根据用户声誉与根据网络结构得到的两种聚类结果是不一致的,说明打分准确程度相似的用户的品味偏好并不相似。  相似文献   

4.
网页篡改问题成为各类网站极为关注的安全问题。网页篡改,即通过一定攻击手段对网页内容进行非法修改。一旦攻击得逞,一方面会影响WEB业务的正常开展,从而影响网站声誉甚至引发重大的政治影响和不良的社会影响。通过静态分析技术能够在一定程度上有效地发现和检测网页是否被挂马。  相似文献   

5.
基于粗糙集与贝叶斯决策的不良网页过滤研究   总被引:1,自引:0,他引:1  
不良网页过滤是一种两类网页分类问题。提出了一种基于粗糙集与贝叶斯决策相结合的不良网页分类过滤方法,首先利用粗糙集理论的区分矩阵和区分函数得到网页分类决策的属性约简;然后通过贝叶斯决策理论对网页进行分类与过滤决策。仿真实验表明,该方法在不良网页分类过滤系统中开销小,过滤准确度高,因而在快速过滤不良网页的应用中具有工程应用价值。  相似文献   

6.
曹玉娟  牛振东  赵堃  彭学平 《软件学报》2011,22(8):1816-1826
在搜索引擎的检索结果页面中,用户经常会得到内容近似的网页.为了提高检索整体性能和用户满意度,提出了一种基于概念和语义网络的近似网页检测算法DWDCS(near-duplicate webpages detection based on concept and semantic network).改进了经典基于小世界理论...  相似文献   

7.
基于网页文本结构的网页去重   总被引:1,自引:0,他引:1  
魏丽霞  郑家恒 《计算机应用》2007,27(11):2854-2856
搜索引擎返回的重复网页不但浪费了存储资源,而且加重了用户浏览的负担。 针对网页重复的特征和网页文本自身的特点,提出了一种动态的网页去重方法。该方法通过将网页的正文表示成目录结构树的形式,实现了一种动态的特征提取算法和层次指纹的相似度计算算法。实验证明,该方法对全文重复和部分重复的网页都能进行准确的检测。  相似文献   

8.
搜索引擎返回的重复网页不但浪费了存储资源,而且加重了用户浏览的负担。针对网页重复的特征,提出了一种基于主题的去重方法。该方法通过组块的思想提取出网页正文的主题,然后进行主题的相似度计算,把重复的网页去除。实验证明,该方法对全文重复和部分重复的网页都能进行准确的检测。  相似文献   

9.
一种利用链接信息检索关键资源的算法   总被引:2,自引:0,他引:2  
随着互联网的发展,基于Web的信息处理技术越来越受到人们的重视,也是当前研究的前沿课题。本文探讨的是如何在现有检索技术的基础上,利用Web网页的链接信息,自动地得到更高质量的检索结果——关键资源。本文提出一种同时利用Web网页的结构和内容信息以及链接信息的新方法:先结合网页的结构信息和内容评分得到网页的文档评分,然后基于网页出链的文档评分计算网页的链接评分。实验表明,本文的方法减少了无用链接的干扰,比单纯利用链接信息的效果好得多。  相似文献   

10.
网页去重方法研究   总被引:2,自引:1,他引:1       下载免费PDF全文
搜索引擎返回的重复网页不但浪费了存储资源,而且加重了用户浏览的负担。针对网页重复的特征,提出了一种基于语义的去重方法。该方法通过句子在文本中的位置和组块的重要度,提取出网页正文的主题句向量,然后对主题句向量进行语义相似度计算,把重复的网页去除。实验证明,该方法对全文重复和部分重复的网页都能进行较准确的检测。  相似文献   

11.
本文针对目前恶意网页攻击呈上升趋势的现状,利用Google搜索引擎对恶意网页给出的警告,采用BHO技术制作恶意网页防御工具,在用户访问前对即将访问的URL实施在线诊断,对含有恶意代码的网页进行报警,实现了对恶意代码的"物理隔离",保护用户电脑,给用户提供了一个安全的上网环境。  相似文献   

12.
面向个性化服务的网页特征描述   总被引:1,自引:0,他引:1  
个性化服务研究核心点在于准确描述用户兴趣,即对用户访问过并感兴趣的网页进行准确描述。现今对网页特征描述方法还未有系统的研究。针对网页特征描述中涉及的特征抽取范围,特征词规范化及词语权重计算3方面内容进行了分析研究,将改进后的新方法应用于个性化服务系统时取得了较好的信息推荐效果。  相似文献   

13.
Web spam denotes the manipulation of web pages with the sole intent to raise their position in search engine rankings. Since a better position in the rankings directly and positively affects the number of visits to a site, attackers use different techniques to boost their pages to higher ranks. In the best case, web spam pages are a nuisance that provide undeserved advertisement revenues to the page owners. In the worst case, these pages pose a threat to Internet users by hosting malicious content and launching drive-by attacks against unsuspecting victims. When successful, these drive-by attacks then install malware on the victims’ machines. In this paper, we introduce an approach to detect web spam pages in the list of results that are returned by a search engine. In a first step, we determine the importance of different page features to the ranking in search engine results. Based on this information, we develop a classification technique that uses important features to successfully distinguish spam sites from legitimate entries. By removing spam sites from the results, more slots are available to links that point to pages with useful content. Additionally, and more importantly, the threat posed by malicious web sites can be mitigated, reducing the risk for users to get infected by malicious code that spreads via drive-by attacks.  相似文献   

14.
Collaborative web search utilises past search histories in a community of like-minded users to improve the quality of search results. Search results that have been selected by community members for past queries are promoted in response to similar queries that occur in the future. The I-SPY system is one example of such a collaborative approach to search. As is the case with all open systems, however, it is difficult to establish the integrity of those who access a system and thus the potential for malicious attack exists. In this paper we investigate the robustness of the I-SPY system to attack. In particular, we consider attack scenarios whereby malicious agents seek to promote particular result pages within a community. In addition, we analyse robustness in the context of community homogeneity, and we show that this key characteristic of communities has implications for system robustness.  相似文献   

15.
Correlation-Based Web Document Clustering for Adaptive Web Interface Design   总被引:2,自引:2,他引:2  
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this paper we present a clustering-based approach to address this problem. Our approach to this challenge is to perform efficient and effective correlation analysis based on web logs and construct clusters of web pages to reflect the co-visit behavior of web site users. We present a novel approach for adapting previous clustering algorithms that are designed for databases in the problem domain of web page clustering, and show that our new methods can generate high-quality clusters for very large web logs when previous methods fail. Based on the high-quality clustering results, we then apply the data-mined clustering knowledge to the problem of adapting web interfaces to improve users' performance. We develop an automatic method for web interface adaptation: by introducing index pages that minimize overall user browsing costs. The index pages are aimed at providing short cuts for users to ensure that users get to their objective web pages fast, and we solve a previously open problem of how to determine an optimal number of index pages. We empirically show that our approach performs better than many of the previous algorithms based on experiments on several realistic web log files. Received 25 November 2000 / Revised 15 March 2001 / Accepted in revised form 14 May 2001  相似文献   

16.
We compare two link analysis ranking methods of web pages in a site. The first, called Site Rank, is an adaptation of PageRank to the granularity of a web site and the second, called Popularity Rank, is based on the frequencies of user clicks on the outlinks in a page that are captured by navigation sessions of users through the web site. We ran experiments on artificially created web sites of different sizes and on two real data sets, employing the relative entropy to compare the distributions of the two ranking methods. For the real data sets we also employ a nonparametric measure, called Spearman's footrule, which we use to compare the top-ten web pages ranked by the two methods. Our main result is that the distributions of the Popularity Rank and Site Rank are surprisingly close to each other, implying that the topology of a web site is very instrumental in guiding users through the site. Thus, in practice, the Site Rank provides a reasonable first order approximation of the aggregate behaviour of users within a web site given by the Popularity Rank.  相似文献   

17.
一种WWW搜索引擎的设计与实现   总被引:2,自引:1,他引:2  
随着Internet在我国的迅速发展和WWW信息的不断增长,迫切需要开发中英文兼容的WWW搜索引擎来获得所需的信息。该文在分析WWW搜索引擎主要功能模块:信息采集模块、信息预处理模块和信息查询模块的基础上,提出采用人工智能搜索算法来遍历网页,对中英文网页进行自动的索引,并用向量空间的表示方法来表示网页内容和用户输入的查询表达式。实践证明,使用该搜索引擎,可以快速准确地搜索到用户所需的信息。  相似文献   

18.
网页在其生命周期内的活跃程度会随时间发生变化。有的网页只在特定的阶段有价值,此后就会过时。从用户的角度对网页的生命周期进行分析可以提高网络爬虫和搜索引擎的性能,改善网络广告的效果。利用一台代理服务器收集的网页访问量信息,我们对网页的生命周期进行了研究,给出了用户兴趣演变的模型。这个模型有助于更好地理解网络的组织与运行机理。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号