共查询到20条相似文献,搜索用时 62 毫秒
1.
网络信息资源呈指数级增长,面对用户越来越个性化的需求,主题网络爬虫应运而生。主题网络爬虫是一种下载特定主题网页的程序。利用在采集页面过程获得的特定信息,主题网络爬虫抓取的页面都是与主题相关的。基于主题网络爬虫的搜索引擎以及基于主题网络爬虫构建领域语料库等应用已经得到广泛运用。首先介绍了主题爬虫的定义、工作原理;然后介绍了近年来国内外关于主题爬虫的研究状况,并比较了各种爬行策略及相关算法的优缺点;最后提出了主题网络爬虫未来的研究方向。关键词: 相似文献
2.
3.
任昱凤 《计算机光盘软件与应用》2015,(1):11-12
本文通过对分布式技术和主题网络爬虫的研究,设计了一个能处理海量数据的分布式主题爬虫。设计内容主要包括分布式主题网络爬虫的各个功能模块及其实现方法。如页面的主题相关度判定方法、URL去重过滤方法等。主要使用了Hadoop技术和向量空间模型。该分布式主题爬虫的研究与设计为后面分布式主题爬虫的实现奠定了基础。 相似文献
4.
主题爬虫是实现主题搜索引擎的关键部分。提出了利用朴素贝叶斯算法进行主题识别的方法,介绍了主题爬虫实现过程中所涉及到的关键部分,包括种子URL集合的生成、页面分析及特征提取、主题识别等。将基于朴素贝叶斯算法的主题爬虫,与基于链接分析的主题爬虫和基于主题词表的主题爬虫进行比较,实验表明基于朴素贝叶斯算法的主题爬虫准确性较好,论证了方法的可行性,为主题信息的采集奠定了良好的基础。 相似文献
5.
深入解析Web主题爬虫的关键性原理 总被引:1,自引:0,他引:1
随着互联网的快速发展,搜索引擎的应用越来越重要,作为搜索引擎的首要组成部分网络爬虫一直备受人们的关注。主题爬虫作为网络爬虫的重要种类使用越来越广泛,深入分析的网络主题爬虫关键性原理有助于根据需求设计出科学合理的爬虫。 相似文献
6.
化学主题网络爬虫的设计和实现 总被引:1,自引:0,他引:1
由于通用搜索引擎检索返回的结果过多、主题相关性不强以及随着人们对提供的各项信息服务的要求越来越高,基于整个Web的信息采集越来越力不从心。同时它无法及时地采集到足够的最新的Web信息,也不能满足人们日益增长的个性化需求。本文通过把Internet化学资源导航系统所积累的化学知识与搜索引擎的自动采集技术相结合展开了对化学主题网络爬虫开发的研究。结果表明,基于Widrow-Hoff分类器的化学主题网络爬虫能有效的采集化学相关的网页。 相似文献
7.
8.
9.
基于PageRank与Bagging的主题爬虫研究 总被引:3,自引:0,他引:3
为克服主题爬虫主题漂移现象,提高搜索引擎的查准率和查全率,提出了一个基于PageRank算法与Bagging算法的主题爬虫设计方法.将主题爬虫系统分为爬虫爬行模块和主题相关性分析模块.利用一种改进的PageRank算法改善了爬虫的搜索策略,进行网页遍历与抓取.用向量空间模型表示网页主题,使用Bagging算法构造网页主题分类器进行主题相关性分析,过滤与主题无关网页.实验结果表明,该方法在网页抓取的性能上和主题网页的查准率上都取得较好的效果. 相似文献
10.
传统的聚焦爬虫在主题未知或者缺少相应训练集的情况下无法完成主题爬行。为让聚焦爬虫具有更好的主题适应性,提出基于聚类算法的自适应主题模型,指导聚焦爬虫在只有少量相同主题(主题未知)初始url的情况下完成主题爬行。通过对初始页面聚类得到主题中心向量,寻找相关网页更新主题中心位置;基于best-first策略实现url排序;基于该模型实现用户定制主题聚焦爬虫。通过对比实验验证了使用该模型的爬虫具有较高的收获比(havest rate)。 相似文献
11.
S. Shaw 《Journal of Computer Assisted Learning》1993,9(2):93-99
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained. 相似文献
12.
European Community policy and the market 总被引:1,自引:0,他引:1
C. Lloyd 《Journal of Computer Assisted Learning》1993,9(2):86-91
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven. 相似文献
13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。 相似文献
14.
Wayne O’Brien Author Vitae 《Journal of Systems and Software》2008,81(11):1997-2013
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them. 相似文献
15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives. 相似文献
16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what
is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic
sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and
its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of
an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify
robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can
or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
17.
David Poole 《Computational Intelligence》1989,5(2):97-110
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given. 相似文献
18.
Watts S. Humphrey 《Annals of Software Engineering》2002,14(1-4):39-72
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical. 相似文献
19.
基于复小波噪声方差显著修正的SAR图像去噪 总被引:4,自引:1,他引:3
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。 相似文献