首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The perception of the visual complexity of World Wide Web (Web) pages is a topic of significant interest. Previous work has examined the relationship between complexity and various aspects of presentation, including font styles, colours and images, but automatically quantifying this dimension of a web page at the level of the document remains a challenge. In this paper we demonstrate that areas of high complexity can be identified by detecting areas, or ‘chunks’, of a web page high in block-level elements. We report a computational algorithm that captures this metric and places web pages in a sequence that shows an 86% correlation with the sequences generated through user judgements of complexity. The work shows that structural aspects of a web page influence how complex a user perceives it to be, and presents a straightforward means of determining complexity through examining the DOM.  相似文献   

2.
实时数据库系统不但包括事务的定时限制,还包括数据的定时限制。本文讨论了实时数据库中数据的时间特性,给出了数据的绝对时态一致性和相对时态一致性的定义;针对时态数据的特征,提出了一种实时关系数据模型。  相似文献   

3.
An XML-enabled data extraction toolkit for web sources   总被引:7,自引:0,他引:7  
The amount of useful semi-structured data on the web continues to grow at a stunning pace. Often interesting web data are not in database systems but in HTML pages, XML pages, or text files. Data in these formats are not directly usable by standard SQL-like query processing engines that support sophisticated querying and reporting beyond keyword-based retrieval. Hence, the web users or applications need a smart way of extracting data from these web sources. One of the popular approaches is to write wrappers around the sources, either manually or with software assistance, to bring the web data within the reach of more sophisticated query tools and general mediator-based information integration systems. In this paper, we describe the methodology and the software development of an XML-enabled wrapper construction system—XWRAP for semi-automatic generation of wrapper programs. By XML-enabled we mean that the metadata about information content that are implicit in the original web pages will be extracted and encoded explicitly as XML tags in the wrapped documents. In addition, the query-based content filtering process is performed against the XML documents. The XWRAP wrapper generation framework has three distinct features. First, it explicitly separates tasks of building wrappers that are specific to a web source from the tasks that are repetitive for any source, and uses a component library to provide basic building blocks for wrapper programs. Second, it provides inductive learning algorithms that derive or discover wrapper patterns by reasoning about sample pages or sample specifications. Third and most importantly, we introduce and develop a two-phase code generation framework. The first phase utilizes an interactive interface facility to encode the source-specific metadata knowledge identified by individual wrapper developers as declarative information extraction rules. The second phase combines the information extraction rules generated at the first phase with the XWRAP component library to construct an executable wrapper program for the given web source.  相似文献   

4.
The expansion of the World Wide Web (WWW) has created an increasing need for tools capable of supporting WWW authors in composing documents using the HyperText Markup Language (HTML). Currently, most web authors use tools which are basically ordinary text editors and have additional features to facilitate the easy and correct use of HTML tags. This approach places the burden on the web author to design and then create the entire web site in a top-down fashion, without any explicit support for the structural design of the site. In this paper we discuss an alternative structural approach to Web authoring, which is based on the use of the HyperTree hypermedia system as the central authoring tool. The advantages of using HyperTree are two-dimensional. Firstly, web authors can manage a web site as a single complete hypermedia database. For example, HyperTree provides facilities like the automatic creation of indices and the discovery of link inconsistencies. Additionally, it organizes the web pages in an easy to understand hierarchy without using any HTML directly. Secondly, web end-users can benefit from the use of HyperTree, since seeking information in structured web sites is generally less disorientating and develops fewer cognitive overheads. ©1997 John Wiley & Sons, Ltd.  相似文献   

5.
This paper examines the extent to which companies in various industries are using the World Wide Web and its associated technologies to conduct retail business. Using the framework of an Electronic Commerce Architecture (ECA), a variety of commercial Web sites are analyzed to determine which commerce processes are being supported online in each industry and each type of industry. The results of this study provide useful insight not only for researchers on the state of Web technology adoption and electronic commerce practices in industry, but also for companies seeking to derive competitive advantage through electronic commerce.  相似文献   

6.
片段缓存机制是加速动态网页分发的有效解决方案之一,但是实施片段缓存需要有效的共享片段检测机制。针对这种情况,提出了一种高效的共享片段检测算法,介绍了基于片段缓存的动态网页传送模型。该模型能够自动识别共享片段和有效的缓存单元,更好地消除冗余数据,提高缓存命中率。实验和分析表明,与现有方案ESI和Silo相比,该模型能够有效节约带宽,缩短用户请求的响应时间。  相似文献   

7.
网络地理信息系统的特点与实现方法   总被引:12,自引:0,他引:12  
国际互联网与地理信息系统结合而形成的网络地理信息系统 (WebGIS)是目前地理信息系统发展的一个热点。文章在指出当今WebGIS与传统GIS相比的新特点后 ,提出了WebGIS实现的几种方法 ,并简要介绍了目前WebGIS的国内外产品  相似文献   

8.
With the explosive growth of information in the WWW, it is becoming increasingly difficult for the user to find information of interest. Visualisations may be helpful in assisting the users in their information retrieval task. Effective visualisation of the structure of a WWW site is extremely useful for browsing through the site. Visualisation can also be used to augment a WWW search engine when too many or too few results are retrieved. In this paper, we discuss several visualisations we have developed to facilitate information retrieval on the WWW. With VRML becoming the standard for graphics on the Web and efficient VRML browsers becoming available, VRML was used for developing these visualisations. Unique visualisations like focus + context views of WWW nodes and semantic visualisation are presented and examples are given on scenarios where the visualisations are useful.  相似文献   

9.
This work addresses issues related to the design and implementation of focused crawlers. Several variants of state-of-the-art crawlers relying on web page content and link information for estimating the relevance of web pages to a given topic are proposed. Particular emphasis is given to crawlers capable of learning not only the content of relevant pages (as classic crawlers do) but also paths leading to relevant pages. A novel learning crawler inspired by a previously proposed Hidden Markov Model (HMM) crawler is described as well. The crawlers have been implemented using the same baseline implementation (only the priority assignment function differs in each crawler) providing an unbiased evaluation framework for a comparative analysis of their performance. All crawlers achieve their maximum performance when a combination of web page content and (link) anchor text is used for assigning download priorities to web pages. Furthermore, the new HMM crawler improved the performance of the original HMM crawler and also outperforms classic focused crawlers in searching for specialized topics.  相似文献   

10.
To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation. However, due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what is not saved. We provide an overview of our ongoing research projects that focus on using the “web infrastructure” to provide preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval. The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they focus on providing at least some form of archival capability for the mass of web pages that may prove to have value in the future. We characterize the preservation approaches by the level of effort required by the web administrator: web sites are reconstructed from the caches of search engines (“lazy preservation”); lexical signatures are used to find the same or similar pages elsewhere on the web (“just-in-time preservation”); resources are pushed to other sites using NNTP newsgroups and SMTP email attachments (“shared infrastructure preservation”); and an Apache module is used to provide OAI-PMH access to MPEG-21 DIDL representations of web pages (“web server enhanced preservation”).  相似文献   

11.
12.
An appropriate standardized data model is necessary tofacilitate electronic publication and analysis ofarchaeological data on the World Wide Web. Ahierarchical ``item-based' model is proposed which canbe readily implemented as an Extensible MarkupLanguage (XML) tagging scheme that can represent anykind of archaeological data and deliver it in across-platform, standardized fashion to any Webbrowser. This tagging scheme and the data model itimplements permit seamless integration and jointquerying of archaeological datasets derived from manydifferent sources.  相似文献   

13.
在对目前Web存在的主要问题进行了分析的基础上,介绍语义网,并针对语义网优势和特征进行了论述,展望了语义网的发展前景。  相似文献   

14.
Web站点的超链结构挖掘   总被引:11,自引:0,他引:11  
WWW是一个由成千上万个分布在世界各地的Web站点组成的全球信息系统,每个Web站点又是一个由许多Web页构成的信息(子)系统。由于一个文档作者可以通过超链把自己的文档与任意一个已知的Web页链接起来,而一个 Web站点上的信息资源又通常是由许多人共同提供的, 因此 Web站点内的超链链接通常是五花八门、各种各样的,它们可以有各种含义和用途。文章分析了WWW系统中超链的使用特征和规律,提出了一个划分超链类型、挖掘站点结构的方法,初步探讨了它在信息收集和查询等方面的应用。  相似文献   

15.
The World Wide Web as Enabling Technology for CSCW: The Case of BSCW   总被引:5,自引:0,他引:5  
Despite the growth of interest in the field of CSCW,and the increasingly large number of systems whichhave been developed, it is still the case that fewsystems have been adopted for widespread use. This isparticularly true for widely-dispersed, cross-organisational working groups where problems ofheterogeneity in computing hardware and softwareenvironments inhibit the deployment of CSCWtechnologies. With a lightweight and extensibleclient-server architecture, client implementations forall popular computing platforms, and an existing userbase numbered in millions, the World Wide Web offersgreat potential in solving some of these problems toprovide an enabling technology for CSCWapplications. We illustrate this potential using ourwork with the BSCW shared workspace system – anextension to the Web architecture which provides basicfacilities for collaborative information sharing fromunmodified Web browsers. We conclude that despitelimitations in the range of applications which can bedirectly supported, building on the strengths of theWeb can give significant benefits in easing thedevelopment and deployment of CSCW applications.  相似文献   

16.
The growing popularity of the information superhighway has opened up exciting opportunities for companies looking to, not only maintain their current customer base, but also to reach new customers. One of the most popular methods to enter into cybermarketing has been to establish a home page or Web site on the Internet. Almost two-thirds of Fortune 500 companies currently maintain home pages on the Web. An analysis of the content of corporate home pages provides useful insights. Over four-fifths of the companies display products and services (93.2%) and company overview (86.1%) information. Roughly three-fourths of the companies present interactive feedback (79.3%) and what's new (71.1%). Less than one-third (26.2%) of Fortune 500 companies provide for online business. An analysis of the data also provides valuable insight into the future trends of home page usage by large business organizations.  相似文献   

17.
Deep Web数据集成研究综述   总被引:24,自引:1,他引:24  
刘伟  孟小峰  孟卫一 《计算机学报》2007,30(9):1475-1489
随着World Wide Web(WWW)的飞速发展,Deep Web中蕴含了海量的可供访问的信息,并且还在迅速地增长.这些信息要通过查询接口在线访问其后端的Web数据库.尽管丰富的信息蕴藏在Deep Web中,由于Deep Web数据的异构性和动态性,有效地把这些信息加以利用是一件十分挑战性的工作.Deep Web数据集成至今仍然是一个新兴的研究领域,其中包含有若干需要解决的问题.总体来看,在该领域已经开展了大量的研究工作,但各个方面发展并不均衡.文中提出了一个Deep Web数据集成的系统架构,依据这个系统架构对Deep Web数据集成领域中若干关键研究问题的现状进行了回顾总结,并对未来的研究发展方向作了较为深入的探讨分析.  相似文献   

18.
Archives of software packages made available on the Internet have become an increasingly common and important way of distributing these resources. To improve local access speeds, it is common for these archives to be mirrored, i.e. replicated at regional sites throughout the world. When these sites are also active participants in the augmentation and maintenance of the archive, it becomes necessary to impose a regime which will ensure that errors and inconsistencies do not arise as a result of conflicting activities at different centres. We describe here procedures which have been developed for the organisation and management of a multi-site software archive in which items of software may be introduced or updated at any of the participating sites. A simple algorithm is outlined to propagate changes made to all sites, protecting against conflicting changes and ensuring consistency of the archive is maintained. Similar methods are applicable to the management of other kinds of distributed system, especially internet-based information services, including World Wide Web sites which allow regional updates. © 1998 John Wiley & Sons, Ltd.  相似文献   

19.
In this paper, we present the results of a research project concerning the temporal management of normative texts in XML format. In particular, four temporal dimensions (publication, validity, efficacy and transaction times) are used to correctly represent the evolution of norms in time and their resulting versioning. Hence, we introduce a multiversion data model based on XML schema and define basic mechanisms for the maintenance and retrieval of multiversion norm texts. Finally, we describe a prototype management system which has been implemented and evaluated.  相似文献   

20.
No abstract available for this article.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号