共查询到20条相似文献,搜索用时 125 毫秒
1.
分布式Web信息采集系统的研究与设计 总被引:6,自引:0,他引:6
Web信息的急速膨胀,使得Web信息采集面临一个巨大的挑战。针对这一情况,实现了一个分布式Web信息采集系统,以提高一般Web信息采集的能力。文章论述了分布式信息采集的基本原理、分类、难点以及相应的对策,并就该分布式Web信息采集系统进行了仔细的剖析。最后,对分布式Web信息采集的发展作了一个展望。 相似文献
2.
3.
4.
根据国内外在信息采集领域的发展以及并行采集技术的研究,提出了一个基于多线程并行的Web信息采集结构模型,该模型以线程并行的方式对Web页面同时采集,实现了全面、高效并且灵活的信息搜集。 相似文献
5.
6.
7.
随着Internet的飞速发展,Web已经发展成为一个巨大的信息资源库,但是目前Web数据大都以HTML形式出现,这使得应用程序无法直接利用Web上的海量信息。针对这一问题,出现了Web信息采集技术。该文对信息采集技术进行了探讨,并在此基础上实现了一个基于Web的新闻采集系统,该系统可根据用户使用正则表达式编写的采集规则快速而精确的采集目标网页中的信息,保存在本地数据库中,用于内部使用或外网发布。 相似文献
8.
随着Intemet的飞速发展,Web已经发展成为一个巨大的信息资源库,但是目前Web数据大都以HTML形式出现,这使得应用程序无法直接利用Web上的海量信息。针对这一问题,出现了Web信息采集技术。该文对信息采集技术进行了探讨,并在此基础上实现了一个基于Web的新闻采集系统.该系统可根据用户使用正则袁达式编写的采集规则快速而精确的采集目标网页中的信息,保存在本地数据库中,用于内部使用或外网发布。 相似文献
9.
10.
高校网上评教与查询系统的设计与实现 总被引:5,自引:0,他引:5
李清霞 《计算机与数字工程》2007,35(10):89-91,174
构建一个集评教信息采集、统计、查询于一体的Web信息平台,实现无纸化评教,具有较高的实用价值. 相似文献
11.
互联网的Web网页中蕴藏着内容广泛、形式各异的信息资源,通过网页的自动分类可以更好地对其内容进行组织和管理,加快信息检索的速度。在训练网页分类器时,对网页样本集进行有效地筛选有可能改善分类器的性能。文中利用HTML文档的结构特点,基于标签对网页样本集进行筛选,从中去除索引型和表格型网页,实验表明,这种方法有一定的可行性。 相似文献
12.
《Computer Networks and ISDN Systems #》1994,25(3):353-360
This paper describes the tools which are being evaluated at the University of Leeds for use by information providers on the World Wide Web. The paper also gives an introduction to the World Wide Web's client/server architecture and the Hypertext Markup Language (HTML). Information is provided on further sources of information which will assist information providers and trainers of information providers.The paper is intended for new information providers on the World Wide Web and for people who are involved in their training. 相似文献
13.
ZHOU Zhou 《数字社区&智能家居》2008,(23)
对基于IE内核(如IE,Maxthon)与基于Gecko内核(如Firefox)的浏览器的网页内容获取与分析的技术进行了研究,采用Vi-sualC 6.0为平台,基于COM技术和微软的MSAA技术,采用了多种方式实现了基于以上两类不同内核的浏览器的网页内容获取,并对这几种获取方式进行了优劣比较。 相似文献
14.
Interactive voice browsers offer an alternative paradigm that enables both sighted and visually impaired users to access the World Wide Web. In addition to the desktop PC, voice browsers afford ubiquitous mobile access to the World Wide Web using a wide range of consumer devices. This technology can facilitate a safe, ‘hands-free' browsing environment which is of importance both to car drivers and various mobile and technical professionals. By providing voice-mediated access, information providers can reach a wider audience and leverage existing investment in their World Wide Web content. In this paper we describe the Vox Portal, a scaleable VoxML client, and a World Wide Web Server-hosted dynamic HTMLVoxML converter. 相似文献
15.
基于XML中间件的分布式数据库的数据分片策略 总被引:6,自引:1,他引:5
XML技术由于其鬼好的数据描述能力,在数据库系统的数据交换中得到广泛应用。为了降低并行查询的困难程度和提高并行查询的准确度,提出一种基于XML的分布式数据交换中间件模型,给出了该模型的体系结构和功能定义,并根据此中间件讨论了一种分布式数据库的数据分片技术。 相似文献
16.
Today's higher education instructors have embarked on a far-ranging detour from the road regularly travelled. Technological enhancements are changing educational approaches by opening new channels for information distribution and instruction. North Dakota State University is taking advantage of such enhancements by bringing the World Wide Web into the classroom with the NDSU World Wide Web Instructional Project. The paper discusses the goals of the project 相似文献
17.
This paper describes the implementation of evolutionary techniques for information filtering and collection from the World Wide Web. We consider the problem of building intelligent agents to facilitate a person's search for information on the Web. An intelligent agent has been developed that uses a metagenetic algorithm in order to collect and recommend Web pages that will be interesting to the user. The user's feedback on the agent's recommendations drives the learning process to adapt the user's profile with his/her interests. The software agent utilizes the metagenetic algorithm to explore the search space of user interests. Experimental results are presented in order to demonstrate the suitability of the metagenetic algorithm's approach on the Web. 相似文献
18.
主要介绍了面对万维网上各种各样的诸如文本、声音、图形和图像等语料信息,如何按照用户的实际需求将其中对用户有用的信息抽取出来,从而实现对现有语料信息的一种有效分离。重点介绍了Web信息簇聚性的特点和语料库的设计,以及语料库的实际工作原理。 相似文献
19.
Sougata Mukherjea 《Computer Networks》2000,33(1-6)
With the explosive growth of the World Wide Web, it is becoming increasingly difficult for users to collect and analyze Web pages that are relevant to a particular topic. To address this problem we are developing WTMS, a system for Web topic management. In this paper we explain how the WTMS crawler efficiently collects Web pages for a topic. We also introduce the user interface of the system that integrates several techniques for analyzing the collection. Moreover, we present the various views of the interface that allow navigation through the information space. We highlight several examples to show how the system enables the user to gain useful insights about the collection. 相似文献
20.
《Intelligent Systems, IEEE》2004,19(3):95-97
The World Wide Web consortium is a group of about 370 international companies working together to develop recommendations, or Web standards, for the World Wide Web. W3G announced final approval of two key semantic Web technologies: the revised RDF and OWL. RDF and OWL are semantic Web standards. These standard formats for data sharing span application, enterprise, and community boundaries share the same information, even if they don't share the same software. 相似文献