共查询到20条相似文献,搜索用时 62 毫秒
1.
利娜 《数字社区&智能家居》2013,(2):245-246
网络安全过滤技术是保障公众上网安全的主要技术。随着网络技术的快速发展,人们在享受便利快捷的服务的同时,也不断承受着网络病毒、木马和不良信息的危害。该文将云计算网络爬虫技术及云服务模式引入传统安全过滤系统中,为大范围用户网络安全访问提供了一种高效灵活,可持续发展的解决方案。 相似文献
2.
3.
本文以大数据环境为基础,阐述了python网络爬虫技术的相关内容。先介绍了python网络爬虫技术的相关内容,包括网络爬虫技术的定义、python下网络爬虫技术的先进性等;之后从大数据环境的角度出发,对python下网络爬虫技术的实现策略进行研究,希望能对相关人员工作有所帮助。 相似文献
4.
5.
通过对主题网络爬虫的研究,设计一个对网络中的矿山设备资源进行收集的主题网络爬虫。设计内容主要包括主题网络爬虫的各个功能模块以及各功能模块实现的方法。例如判定网页主题相关度及URL价值评价等。使用的主要技术为向量空间模型和PageRank算法。矿山设备领域主题网络爬虫的研究与设计为矿山设备领域主题爬虫的实现奠定了基础。 相似文献
6.
7.
本文通过对larbin网络爬虫的研究后总结出了larbin网络爬虫的体系结构。然后结合该爬虫详细介绍了整个体系结构的工作过程.最后介绍了larbin网络爬虫的特点。 相似文献
8.
一种网络爬虫的带缓存非阻塞异步域名解析器模型及其性能分析 总被引:1,自引:0,他引:1
网络爬虫是搜索引擎的一个基本组件,网络爬虫抓取页面的效率直接影响搜索引擎提供的服务质量。除了可以通过改进网络爬虫的爬行策略来提高网络爬虫效率之外,也可以通过优化网络爬虫程序某方面的设计来消除特定的效率瓶颈。通过对网络爬虫结构和实际运行数据的分析,针对爬虫的DNS解析瓶颈,设计了一种带缓存异步域名解析器模型,并通过实验和一般DNS解析器模型进行了比较,实验结果证明这种模型对于减少程序等待解析域名的这一操作时间十分有效,显然也能够提高爬虫的整体效率。 相似文献
9.
文竹 《计算机光盘软件与应用》2013,(20):50-51
网络爬虫的主要作用是获取互联网上的信息。我们在浏览网页时候所希望获取的信息都可以使用网络爬虫来抓取实现;网络爬虫从互联网上源源不断的抓取海量信息,搜索引擎结果中的信息都是来源于此。本文对基于C#开发的网络爬虫搜索引擎进行了详细的阐述。 相似文献
10.
11.
Detection of malicious and non-malicious website visitors using unsupervised neural network learning
Distributed denials of service (DDoS) attacks are recognized as one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the WWW. In this study, we examine the use of two unsupervised neural network (NN) learning algorithms for the purpose web-log analysis: the Self-Organizing Map (SOM) and Modified Adaptive Resonance Theory 2 (Modified ART2). In particular, through the use of SOM and modified ART2, our work aims to obtain a better insight into the types and distribution of visitors to a public web-site based on their browsing behavior, as well as to investigate the relative differences and/or similarities between malicious web crawlers and other non-malicious visitor groups. The results of our study show that, even though there is a pretty clear separation between malicious web-crawlers and other visitor groups, 52% of malicious crawlers exhibit very ‘human-like’ browsing behavior and as such pose a particular challenge for future web-site security systems. Also, we show that some of the feature values of malicious crawlers that exhibit very ‘human-like’ browsing behavior are not significantly different than the features values of human visitors. Additionally, we show that Google, MSN and Yahoo crawlers exhibit distinct crawling behavior. 相似文献
12.
网络信息资源呈指数级增长,面对用户越来越个性化的需求,主题网络爬虫应运而生。主题网络爬虫是一种下载特定主题网页的程序。利用在采集页面过程获得的特定信息,主题网络爬虫抓取的页面都是与主题相关的。基于主题网络爬虫的搜索引擎以及基于主题网络爬虫构建领域语料库等应用已经得到广泛运用。首先介绍了主题爬虫的定义、工作原理;然后介绍了近年来国内外关于主题爬虫的研究状况,并比较了各种爬行策略及相关算法的优缺点;最后提出了主题网络爬虫未来的研究方向。关键词: 相似文献
13.
Deep web or hidden web refers to the hidden part of the Web (usually residing in structured databases) that remains unavailable for standard Web crawlers. Obtaining content of the deep web is challenging and has been acknowledged as a significant gap in the coverage of search engines. The paper proposes a novel deep web crawling framework based on reinforcement learning, in which the crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and selects an action (query) to submit to the environment (the deep web database) according to Q-value. While the existing methods rely on an assumption that all deep web databases possess full-text search interfaces and solely utilize the statistics (TF or DF) of acquired data records to generate the next query, the reinforcement learning framework not only enables crawlers to learn a promising crawling strategy from its own experience, but also allows for utilizing diverse features of query keywords. Experimental results show that the method outperforms the state of art methods in terms of crawling capability and relaxes the assumption of full-text search implied by existing methods. 相似文献
14.
随着网络技术的迅速发展,web服务器的安全越来越受到人们的重视,本文从网络操作系统、Web服务器的安全、IIS的安全和ASP的安全等方面,提出了web服务器的安全构建和攻击防范措施。 相似文献
15.
16.
江璜 《数字社区&智能家居》2006,(2)
随着计算机网络的广泛应用,网络安全问题日益重要。本文分析了时下流行的网页木马的原理、InternetExplorer所实施的跨域安全模型及网页木马所利用的几个严重的跨域漏洞。旨在帮助用户提高安全意识,做好网络安全防范。 相似文献
17.
自2010年等级保护制度开始落地实施,网络环境正在逐渐稳定。采用B/S架构的应用系统越来越多,Web应用安全问题凸显。高校作为一个人才的培养摇篮,其Web服务直接影响师生办公学习的效率,其安全稳健的运行对师生的意义重大。文章基于等级保护制度要求,对高校的Web安全进行探讨分析。 相似文献
18.
Jason J. Jung 《Neural computing & applications》2009,18(3):213-221
Conventional focused crawling systems have difficulties on contextual information retrieval in semantic web environment. In
order to deal with these problems, we propose a cooperative crawler platform based on evolution strategy to build semantic
structure (i.e., local ontologies) of web spaces. Mainly, multiple crawlers can discover semantic instances (i.e., ontology
fragments) from annotated resources in a web space, and a centralized meta-crawler can carry out incremental aggregation of
the semantic instances sent by the multiple crawlers. To do this, we exploit similarity-based ontology matching algorithm
for computing semantic fitness of a population, i.e., summation of all possible semantic similarities between the semantic instances. As a result, we could
efficiently obtain the best mapping condition (i.e., maximizing the semantic fitness) of the estimated semantic structures.
We have shown two significant contributions of this paper; (1) reconciling semantic conflicts between multiple crawlers, and
(2) adapting to evolving semantic structures of web spaces over time.
相似文献
Jason J. JungEmail: Email: |
19.
Distributed Denial of Service (DDoS) is one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the WWW. In this study we examine the effect of applying seven well-established data mining classification algorithms on static web server access logs in order to: (1) classify user sessions as belonging to either automated web crawlers or human visitors and (2) identify which of the automated web crawlers sessions exhibit ‘malicious’ behavior and are potentially participants in a DDoS attack. The classification performance is evaluated in terms of classification accuracy, recall, precision and F1 score. Seven out of nine vector (i.e. web-session) features employed in our work are borrowed from earlier studies on classification of user sessions as belonging to web crawlers. However, we also introduce two novel web-session features: the consecutive sequential request ratio and standard deviation of page request depth. The effectiveness of the new features is evaluated in terms of the information gain and gain ratio metrics. The experimental results demonstrate the potential of the new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions. 相似文献
20.
web服务的广泛应用和网络技术多元化的发展迫切需求一个既能实现web服务安全,又能兼容各种客户端的安全框架.在Axis2的基础上,设计并实现了一个完整的、符合WS-Security规范的web服务框架.框架以文件配置、消息加密和程序控制实现web服务安全,采用SOAP通信协议解决了与各种客户端通信的问题.测试结果表明,此框架可以实现数字签名、消息加密和基于角色的访问控制,能够接收各种基于SOAP协议的客户端请求,具有很好的安全性和兼容性,为企业的web服务安全提供了一个有效的解决方案. 相似文献