首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
MD5算法在消除重复网页算法中的应用   总被引:1,自引:0,他引:1  
Internet用户通过常用搜索引擎获取Web信息时,往往得到了大量的重复网页信息,从而导致搜索效率不高。本文利用MD5算法成熟及可移植性好的特点,提出了一种基于MD5的消除重复网页的算法,实验证明该算法能有效的去除重复网页,时间和空间的复杂度不高,具有较强的实用价值。  相似文献   

2.
用户进行Web信息检索时,网络往往返回大量的近似网页(可看作重复网页)。针对搜索引擎查询Web信息所存在的局限性,考虑到基于关键词匹配的搜索引擎系统的特点,结合使用网页的向量空间模型,采用以下算法快速、有效地发现WWW上的重复或相似网页,提高检索效率。首先采用MD5算法(信息-摘要算法)提取返回文本的信息摘要。MD5将整个文件当作一个大文本信息,通过其不可逆的字符串变换算法,产生了一个唯一的MD5信息摘要。MD5以512位分组来处理输入的信息,且每一分组又被划分为16个32位子分组,经过了一系列的处理后,算法的输出由四个32位分组组…  相似文献   

3.
随着网页篡改问题的日趋严峻,网页篡改检测技术成为近年来的研究热点。Hash函数校验是目前网页篡改检测任务中常用的一种方法,其中,MD5算法是应用最为广泛的Hash检验函数。然而,在使用MD5算法对网页内容进行校验时,网页内容篡改前后所对应的Hash值存在一定的碰撞问题。针对上述问题,提出了一种面向网页篡改检测的混沌MD5算法,通过基于明文分组的动态参数模型对传统MD5算法的静态参数进行优化,并采用整数帐篷映射对明文分组进行多次迭代,增强算法的抗碰撞性。实验表明,和传统MD5算法相比,混沌MD5算法的Hash值绝对距离与理想值的偏差率减小了0.6047‰,有效降低了网页篡改检测过程中的Hash值碰撞概率。  相似文献   

4.
基于MD5算法计算数字指纹的网页消重算法简单而高效,在网页消重领域应用比较广泛。但是由于MD5算法是一种严格的信息加密算法,在文章内容变动很少的情况下得出的指纹结果完全不同,导致基于这种算法的网页消重技术召回率不是很高。提出了两种基于字集特征向量的网页消重改进算法,把文章内容映射到字集空间中去,计算字集空间距离来判断文章是否相似。提出的算法具有良好的泛化能力,段落中存在的调整语序和增删改个别字不会影响到对相似段落的识别,大大提高了网页消重算法的召回率。实验结果表明,算法的时间复杂度为[O(n)],空间复杂度为[O(1)],适合应用于大规模网页消重。  相似文献   

5.
当前的网页重复信息抽取方法缺少信息分类步骤,导致传统方法存在抽取全面率低、重复信息比例高以及整体性能差的问题。于是提出基于模式识别算法的网页重复信息抽取方法。利用类间平衡因子和词频获取网页信息的互信息特征。在关联规则的基础上根据网页置信度向量化互信息特征,完成网页信息特征的提取。利用模式识别中的支持向量机对网页信息分类,优化惩罚函数,建立软间隔支持向量机分类器。计算不同类别网页信息的结构相似度和语义相似度,结合上述计算结果获得网页信息相似性,完成网页重复信息的抽取。仿真结果表明,所提方法的抽取全面率高、重复信息比例低,且整体应用性能好,实验结果表明所提方法具有理想的应用效果。  相似文献   

6.
MD5算法研究   总被引:27,自引:1,他引:26  
随着网络技术的迅速发展,信息加密技术已成为保障网络安全的一种重要手段,加密算法已经成为人们的一个研究热点.本文对MD5算法进行了深入研究,介绍MD5算法的产生背景、应用及其算法流程,并提出了MD5算法的一个改进方案.  相似文献   

7.
基于CSS类选择符重复引入的网页信息隐藏算法   总被引:1,自引:0,他引:1  
现有的网页信息隐藏算法存在信息隐藏点与网页内容分离、抗机器过滤能力较差的缺点.基于CSS类选择符重复引入策略,提出了一种新的网页信息隐藏算法.按照嵌入规则,采用重复引入可操作CSS块中相关对象的CSS类选择符的方法来嵌入信息.实验结果表明该算法信息隐藏点与网页内容紧密地结合,提高了抗检测和抵抗机器过滤的能力,且具有较好的隐蔽性,能够隐藏较大量的信息,可以应用于网页保护和隐秘通信.  相似文献   

8.
用信息-摘要算法提高Web信息检索效率的研究   总被引:1,自引:0,他引:1  
杨文忠  章兢 《微机发展》2006,16(6):222-223
针对常用搜索引擎返回给用户的信息中包含大量重复网页的缺陷,提出了一种基于信息-摘要算法的去除重复网页算法。由于算法的成熟,该算法易实现,可移植性强。实验证明该算法能有效地去除常用搜索引擎返回的重复网页,从而为Internet用户提高信息检索效率,具有较强的实用价值。  相似文献   

9.
针对小文本的Web数据挖掘技术及其应用   总被引:4,自引:2,他引:4  
现有搜索引擎技术返回给用户的信息太多太杂,为此提出一种针对小文本的基于近似网页聚类算法的Web文本数据挖掘技术,该技术根据用户的兴趣程度形成词汇库,利用模糊聚类方法获得分词词典组,采用MD5算法去除重复页面,采用近似网页聚类算法对剩余页面聚类,并用马尔可夫Web序列挖掘算法对聚类结果排序,从而提供用户感兴趣的网页簇序列,使用户可以迅速找到感兴趣的页面。实验证明该算法在保证查全率和查准率的基础上大大提高了搜索效率。由于是针对小文本的数据挖掘,所研究的算法时间和空间复杂度都不高,因此有望成为一种实用、有效的信息检索技术。  相似文献   

10.
研究网页查重问题.针对传统的SCAM网页查重算法根据比较几个关键词网页中出现次数来判断网页是否重复,当网站中存在相似网页时,由于其关键词非常相近,导致出现误判,造成查重准确率不高的问题.本文提出一种网页指纹查重算法,通过采用信息检索技术,提取出待检测网页的网页指纹,然后通过与网页库中的网页指纹比较判决,完成网页的查重,避免了传统方法只依靠几个关键词而造成的查重准确率不高的问题.实验证明,这种利用网页指纹查重的方法能准确判断网页是否重复,提高了网页信息的准确性,取得了满意的结果.  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号