首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Behind all the fancy tools that churn out volumes of messy HTML, there is a basic set of standardized HTML tags that all professional Webspinners must learn to master. It is important to understand these tags and how they work because despite recent advances in code-checking tools, debugging HTML often still comes down to hand-tweaking code with simple text editors like NotePad and BBEdit. The paper discusses HTML tables which are the fundamental building blocks for most of today's Web pages. It considers three basic tags to build a table  相似文献   

2.
文档转换工具的设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
项湜伍  曹峰 《计算机工程》2008,34(21):48-50
针对在文档开发和管理方面存在的问题,设计与实现基于Docbook规范的文档格式转换工具。该工具将文档中自定义的可扩展标识语言(XML)和HTML文件转换为符合Docbook规范的文件,通过XSLT转换为其他格式的文件,实现将自定义标签的XML文件自由转换为多种格式的文件,从而提高文档开发和管理的效率。  相似文献   

3.
针对在文档开发和管理方面存在的问题,设计与实现基于Docbook规范的文档格式转换工具.该工具将文档中自定义的可扩展标识语言(XML)和HTML文件转换为符合Docbook规范的文件,通过XSLT转换为其他格式的文件,实现将自定义标签的XML文件自由转换为多种格式的文件,从而提高文档开发和管理的效率.  相似文献   

4.
基于规则的HTML文档元数据提取   总被引:2,自引:0,他引:2  
狄涤  周竞扬  潘金贵 《计算机工程》2004,30(9):85-86,165
提出了一种基于规则提取HTML文档元数据的方法,介绍了规则的语法、语义和规则库的设计,研制了一个原型系统MEDES(MEtaData Extracting System),实现HTML文档元数据的自动提取。文章的最后给出了实验结果和评价,并指出进一步的工作。  相似文献   

5.
The TABLE tags in HTML (Hypertext Markup Language) documents are widely used for formatting layout of Web documents as well as for describing genuine tables with relational information. As a prerequisite for information extraction from the Web, this paper presents an efficient method for sophisticated table detection. The proposed method consists of two phases: preprocessing and attribute–value relations extraction. During preprocessing, a part of genuine or non-genuine tables are filtered out using a set of rules, which are devised based on careful examination of general characteristics of various HTML tables. The remaining tables are detected at the attribute–value relations extraction phase. Specifically, a value area is extracted and checked out whether there is syntactic coherency. Furthermore, the method looks for semantic coherency between an attribute area and a value area of a table. Experimental results with 11,477 TABLE tags from 1393 HTML documents show that the method has performed better compared with previous works, resulting in a precision of 97.54% and a recall of 99.22%.  相似文献   

6.
Social content sites allow ordinary internet users to upload, edit, share, and annotate Web content with freely chosen keywords called tags. However, tags are only useful to the extent that they are processable by users and machines, which is often not the case since users frequently provide ambiguous and idiosyncratic tags. Thereby, many social content sites are starting to allow users to enrich their tags with semantic metadata, such as the GeoSocial Content Sites, for example, where users can annotate their tags with geographic metadata. But geographic metadata alone only unveils a very specific facet of a tag, which leads to the need for more general purpose semantic metadata. This paper introduces DYSCS – Do it Yourself Social Content Sites – a platform that combines Web 2.0 and Semantic Web technologies for assisting users in creating their own social content sites enriched with geographic and general purpose semantics. Moreover, DYSCS is highly reusable and interoperable, which are consequences of its ontology driven architecture.  相似文献   

7.
Digital repositories must periodically check the integrity of stored objects to assure users of their correctness. Prior solutions calculate integrity metadata and require the repository to store it alongside the actual data objects. To safeguard and detect damage to this metadata, prior solutions rely on widely visible media (unaffiliated third parties) to store and provide back digests of the metadata to verify it is intact. However, they do not address recovery of the integrity metadata in case of damage or adversarial attack. We introduce IntegrityCatalog, a novel software system that can be integrated into any digital repository. It collects all integrity‐related metadata in a single component and treats them as first class objects, managing both their integrity and their preservation. We introduce a treap‐based persistent authenticated dictionary managing arbitrary length key/value pairs, which we use to store all integrity metadata, accessible simply by object name. Additionally, IntegrityCatalog is a distributed system that includes a network protocol that manages both corruption detection and preservation of this metadata, using administrator‐selected network peers with 2 possible roles. Verifiers store and offer attestations on digests and have minimal storage requirements, while preservers efficiently synchronize a complete copy of the catalog to assist in recovery in case of a detected catalog compromise on the local system. We present our approach in developing the prototype implementation, measure its performance experimentally, and demonstrate its effectiveness in real‐world situations. We believe the implementation techniques of our open‐source IntegrityCatalog will be useful in the construction of next‐generation digital repositories.  相似文献   

8.
Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.  相似文献   

9.
有效地检索HTML文档   总被引:22,自引:1,他引:21  
WWW上的资源大多以HTML格式的文档存储,同普通文档不同,THML文档的标签特性使得它具有一定的结构我们采取了一种检索,它扩展了传统的传统检索,利用HTML文档结构提高了在WWW环境下的检索和率。本文介绍了HTML的结构以及传统的向量空间信息检索提出了运用聚族方法为标符合分组;最后详细讨论了如何利用文棣结构扩展加权架,使得检索词能更贴切地描述文档,以提高检索的准确性。  相似文献   

10.
This paper shows how trees can be stored in a very compact form, called ‘Bonsai’, using hash tables. A method is described that is suitable for large trees that grow monotonically within a predefined maximum size limit. Using it, pointers in any tree can be represented within 6 + [log2n] bits per node where n is the maximum number of children a node can have. We first describe a general way of storing trees in hash tables, and then introduce the idea of compact hashing which underlies the Bonsai structure. These two techniques are combined to give a compact representation of trees, and a practical methodology is set out to permit the design of these structures. The new representation is compared with two conventional tree implementations in terms of the storage required per node. Examples of programs that must store large trees within a strict maximum size include those that operate on trie structures derived from natural language text. We describe how the Bonsai technique has been applied to the trees that arise in text compression and adaptive prediction, and include a discussion of the design parameters that work well in practice.  相似文献   

11.
网上表格数据到XML的自动转换   总被引:3,自引:0,他引:3       下载免费PDF全文
互联网上有大量信息采用HTML表格表示,由于HTML不描述数据的内容,机器不能理解和查询。论文利用HTML表格属性,在表格中插入冗余单元,使HTML表格规范化;对没有标志表头的HTML表格,采用格式化的信息的量化值识别网上表格的表头。在此基础上,提出了通过获取表格属性与值对应的语义层次,自动转换HTML表格数据为XML文挡的新方法。  相似文献   

12.
13.
SVG Linearization and Accessibility   总被引:1,自引:0,他引:1  
  相似文献   

14.
数据拥有性证明技术是当前云存储安全领域中的一大重要研究内容,目的是不必下载所有文件,就能安全而高效地远程校验存储在云服务器中的数据是否完整.目前已陆续提出了许多批处理数据拥有性证明方案,但大多数方案都没有考虑用户数据出错后的错误定位问题,仅有的几个批处理校验方案也只能单独定位错误数据所在服务器或其所属用户.提出了利用定位标签辅助第三方审计员快速定位错误的方法,并在Zhou等人工作的基础上,利用Merkle Hash Tree构造数据定位标签,实现了一个多用户、多服务器环境下支持批处理校验且具备错误数据定位功能的数据拥有性证明方案,可以在批处理校验失败后快速定位错误数据的拥有者和所在服务器.在随机谕言机模型下,该方案是可证明安全的,且性能分析表明,定位错误数据的能力和效率比其他具有单一定位功能的方案更高.  相似文献   

15.
Aim of this work is to provide a formal characterization of those emotions that deal with normative reasoning, such as shame and sense of guilt, to understand their relation with rational action and to ground their formalization on a cognitive science perspective. In order to do this we need to identify the factors that constitute the preconditions and trigger the reactions of shame and sense of guilt in cognitive agents, that is when agents feel ashamed or guilty and what agents do when they feel so. We will also investigate how agents can induce and silence these feelings in themselves, i.e. the analysis of defensive strategies they can employ. We will argue that agents do have control over their emotions and we will analyze some operations they can carry out on them.  相似文献   

16.
To date, long-term preservation approaches have comprised of emulation, migration, normalization, and metadata – or some combination of these. Most existing work has focussed on applying these approaches to digital objects of a singular media type: text, HTML, images, video or audio. In this paper, we consider the preservation of composite, mixed-media digital objects, a rapidly growing class of resources. We describe an integrated, flexible system that we have developed, which leverages existing tools and services and assists organizations to dynamically discover the optimum preservation strategy as it is required. The system captures and periodically compares preservation metadata with software and format registries to determine those objects (or sub-objects) at risk. By making preservation software modules available as Web services and describing them semantically using a machine-processable ontology (OWL-S), the most appropriate preservation service(s) for each object (or sub-object) can then be dynamically discovered, composed and invoked by software agents (with optional human input at critical decision-making steps). The PANIC system successfully illustrates how the growing array of available preservation tools and services can be integrated to provide a sustainable, collaborative solution to the long-term preservation of large-scale collections of complex digital objects.  相似文献   

17.
基于标记图的Web数据模型   总被引:10,自引:0,他引:10  
本文详细探讨了一种新的Web数据模型-标记图,给类格的形式化描述。  相似文献   

18.
In this Exa byte scale era, data increases at an exponential rate. This is in turn generating a massive amount of metadata in the file system. Hadoop is the most widely used framework to deal with big data. Due to this growth of huge amount of metadata, however, the efficiency of Hadoop is questioned numerous times by many researchers. Therefore, it is essential to create an efficient and scalable metadata management for Hadoop. Hash-based mapping and subtree partitioning are suitable in distributed metadata management schemes. Subtree partitioning does not uniformly distribute workload among the metadata servers, and metadata needs to be migrated to keep the load roughly balanced. Hash-based mapping suffers from a constraint on the locality of metadata, though it uniformly distributes the load among NameNodes, which are the metadata servers of Hadoop. In this paper, we present a circular metadata management mechanism named dynamic circular metadata splitting (DCMS). DCMS preserves metadata locality using consistent hashing and locality-preserving hashing, keeps replicated metadata for excellent reliability, and dynamically distributes metadata among the NameNodes to keep load balancing. NameNode is a centralized heart of the Hadoop. Keeping the directory tree of all files, failure of which causes the single point of failure (SPOF). DCMS removes Hadoop’s SPOF and provides an efficient and scalable metadata management. The new framework is named ‘Dr. Hadoop’ after the name of the authors.  相似文献   

19.
The success of the Web services technology has brought topics as software reuse and discovery once again on the agenda of software engineers. While there are several efforts towards automating Web service discovery and composition, many developers still search for services via online Web service repositories and then combine them manually. However, from our analysis of these online repositories, it yields that, unlike traditional software libraries, they rely on little metadata to support service discovery. We believe that the major cause is the difficulty of automatically deriving metadata that would describe rapidly changing Web service collections. In this paper, we discuss the major shortcomings of state of the art Web service repositories and as a solution, we report on ongoing work and ideas on how to use techniques developed in the context of the Semantic Web (ontology learning, matching, metadata based presentation) to improve the current situation.  相似文献   

20.
基于带根连通有向图的对象集成模型及代数   总被引:19,自引:3,他引:19  
王宁  徐宏炳  王能斌 《软件学报》1998,9(12):894-898
提出一种便于异构数据源集成的公共数据模型——OIM对象模型.它基于带根连通有向图,图中可出现环路,因而能自然地描述复杂对象与其成员对象间的引用关系和WWW上HTML文件间的链接关系.它的每个对象含有描述符,特别适合于描述那些没有显式模式或模式无法预知的数据对象.OIM对象代数提供对象并、差、选择、投影、粘贴及切削6种操作.比关系代数具有更大的灵活性,可作为查询分解和优化的形式化基础.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号