首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
Kwong  Linus W.  Ng  Yiu-Kai 《World Wide Web》2003,6(3):281-303
To retrieve Web documents of interest, most of the Web users rely on Web search engines. All existing search engines provide query facility for users to search for the desired documents using search-engine keywords. However, when a search engine retrieves a long list of Web documents, the user might need to browse through each retrieved document in order to determine which document is of interest. We observe that there are two kinds of problems involved in the retrieval of Web documents: (1) an inappropriate selection of keywords specified by the user; and (2) poor precision in the retrieved Web documents. In solving these problems, we propose an automatic binary-categorization method that is applicable for recognizing multiple-record Web documents of interest, which appear often in advertisement Web pages. Our categorization method uses application ontologies and is based on two information retrieval models, the Vector Space Model (VSM) and the Clustering Model (CM). We analyze and cull Web documents to just those applicable to a particular application ontology. The culling analysis (i) uses CM to find a virtual centroid for the records in a Web document, (ii) computes a vector in a multi-dimensional space for this centroid, and (iii) compares the vector with the predefined ontology vector of the same multi-dimensional space using VSM, which we consider the magnitudes of the vectors, as well as the angle between them. Our experimental results show that we have achieved an average of 90% recall and 97% precision in recognizing Web documents belonged to the same category (i.e., domain of interest). Thus our categorization discards very few documents it should have kept and keeps very few it should have discarded.  相似文献   

2.
With the development of the Semantic Web and Artificial Intelligence techniques, ontology has become a very powerful way of representing not only knowledge but also their semantics. Therefore, how to construct ontologies from existing data sources has become an important research topic. In this paper, an approach for constructing ontologies by mining deep semantics from eXtensible Markup Language (XML) Schemas (including XML Schema 1.0 and XML Schema 1.1) and XML instance documents is proposed. Given an XML Schema and its corresponding XML instance document, 34 rules are first defined to mine deep semantics from the XML Schema. The mined semantics is formally stored in an intermediate conceptual model and then is used to generate an ontology at the conceptual level. Further, an ontology population approach at the instance level based on the XML instance document is proposed. Now, a complete ontology is formed. Also, some corresponding core algorithms are provided. Finally, a prototype system is implemented, which can automatically generate ontologies from XML Schemas and populate ontologies from XML instance documents. The paper also classifies and summarizes the existing work and makes a detailed comparison. Case studies on real XML data sets verify the effectiveness of the approach.  相似文献   

3.
4.
Universal Business Language (UBL) is an OASIS initiative to develop common business document schemas to provide document interoperability in the eBusiness domain. Since the data requirements change according to a context, UBL schemas need to be customized and UBL defines a guideline to be followed for customization of schemas. XSD derivation based customization as proposed by UBL provides syntactic interoperability, that is, an XML parser that can interpret standard UBL documents can also interpret customized UBL documents. We argue that for UBL to become mainstream, syntactic interoperability alone is not enough. It needs to be supported by semantic interoperability, that is, it must be possible for users and even automated processes to discover and reuse customizations provided by other users. In this paper, we describe how to improve the UBL customization mechanism by providing semantic representations for context domains and describe how these semantics can be utilized by automated processes for component discovery and schema customization. For this purpose, we derive ontologies from taxonomies like the North American Industry Classification System (NAICS), the Universal Standard Products and Services Classification (UNSPSC) and relate corresponding concepts from different ontologies through ontology alignment. Then, we process these aligned ontologies using a reasoner to compute inferred ontologies representing context domains. We show that when custom UBL components are annotated using classes from these ontologies, automated discovery and customization becomes possible.  相似文献   

5.
In this article, we describe an adaptation proxy we developed as a part of a Kontti research project at VTT Information Technology that lets mobile users access Web content that's not directly targeted to mobile user agents. More and more content is now available on the Internet, and there's a growing need for mobile users to be able to access it. Thus the authors describe the adaptation proxy, which lets mobile users access Web content that's not directly targeted to user agents of mobile devices. The adaptation proxy can adapt Extensible Hypertext Markup Language (XHTML) documents into XHTML mobile profile (XHTML MP) and Wireless Markup Language (WML), and can perform media adaptation. At the system's core is an adaptation framework to which new source and target XML languages can be introduced with relatively little effort.  相似文献   

6.
Increased availability of mobile computing, such as personal digital assistants (PDAs), creates the potential for constant and intelligent access to up-to-date, integrated and detailed information from the Web, regardless of one's actual geographical position. Intelligent question-answering requires the representation of knowledge from various domains, such as the navigational and discourse context of the user, potential user questions, the information provided by Web services and so on, for example in the form of ontologies. Within the context of the SmartWeb project, we have developed a number of domain-specific ontologies that are relevant for mobile and intelligent user interfaces to open-domain question-answering and information services on the Web. To integrate the various domain-specific ontologies, we have developed a foundational ontology, the SmartSUMO ontology, on the basis of the DOLCE and SUMO ontologies. This allows us to combine all the developed ontologies into a single SmartWeb Integrated Ontology (SWIntO) having a common modeling basis with conceptual clarity and the provision of ontology design patterns for modeling consistency. In this paper, we present SWIntO, describe the design choices we made in its construction, illustrate the use of the ontology through a number of applications, and discuss some of the lessons learned from our experiences.  相似文献   

7.
The eXtensible Markup Language (XML) has reached a wide acceptance as the relevant standardization for representing and exchanging data on the Web. Unfortunately, XML covers the syntactic level but lacks semantics, and thus cannot be directly used for the Semantic Web. Currently, finding a way to utilize XML data for the Semantic Web is challenging research. As we have known that ontology can formally represent shared domain knowledge and enable semantics interoperability. Therefore, in this paper, we investigate how to represent and reason about XML with ontologies. Firstly, we give formalized representations of XML data sources, including Document Type Definitions (DTDs), XML Schemas, and XML documents. On this basis, we propose formal approaches for transforming the XML data sources into ontologies, and we also discuss the correctness of the transformations and provide several transformation examples. Furthermore, following the proposed approaches, we implement a prototype tool that can automatically transform XML into ontologies. Finally, we apply the transformed ontologies for reasoning about XML, so that some reasoning problems of XML may be checked by the existing ontology reasoners.  相似文献   

8.
针对目前基于语义网的本体映射算法中背景本体搜索面少、本体收集不精确的问题,利用基于虚拟文档的映射技术提取在Word-Net中与概念同义的同义词集,将对单个概念进行搜索转换成对同义概念集进行搜索,从而扩大本体搜索面,获取更多背景本体.提出基于语义环境的动态本体映射算法来排除错误背景本体,使本体收集更加精确.实验结果表明,该算法可有效提高映射的查全率和查准率.  相似文献   

9.
The development of accessible Web software is complicated for several reasons. Though some of them are technological, the majority are related with the need to compose different and, many times, unrelated design concerns which may be functional as in the case of most of the specific application’s requirements, or non-functional such as Accessibility itself. In this paper, we present a novel approach to conceive, design and develop Accessible Web applications in an aspect-oriented manner. In order to reach our goal, we provide some modeling techniques that we specifically developed for handling the non-functional, generic and crosscutting characteristics of the Accessibility concerns. Specifically, we have enriched User Interaction Diagrams with integration points, which are used to reason and document Accessibility for activity modeling during user interface design. Then by instantiating a Softgoal Interdependency Graph template with association tables, we work on an abstract interface model (composed by ontology widgets) to obtain a concrete and accessible interface model for the Web application being developed. We use a real application example to illustrate our ideas and point out the advantages of a clear separation of concerns throughout the development life-cycle.  相似文献   

10.
The Ontolingua Server: a tool for collaborative ontology construction   总被引:2,自引:0,他引:2  
Reusable ontologies are becoming increasingly important for tasks such as information integration, knowledge-level interoperation and knowledge-base development. We have developed a set of tools and services to support the process of achieving consensus on commonly shared ontologies by geographically distributed groups. These tools make use of the World Wide Web to enable wide access and provide users with the ability to publish, browse, create and edit ontologies stored on anontology server. Users can quickly assemble a new ontology from a library of modules. We discuss how our system was constructed, how it exploits existing protocols and browsing tools, and our experience supporting hundreds of users. We describe applications using our tools to achieve consensus on ontologies and to integrate information.The Ontolingua Server may be accessed through the URLhttp://ontolingua.stanford.edu  相似文献   

11.
Liu  Mengchi  Ling  Tok Wang 《World Wide Web》2001,4(1-2):49-77
Most documents available over the Web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing data models for the Web either fail to capture the hierarchical structure within the documents or can only provide a very low level representation of such hierarchical structure. How to represent and query HTML documents at a higher level is an important issue. In this paper, we first propose a novel conceptual model for HTML. This conceptual model has only a few simple constructs but is able to represent the complex hierarchical structure within HTML documents at a level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, one can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way. Based on this conceptual model, we then present a rule–based language to query HTML documents over the Internet. This language provides a simple but very powerful way to query both intra–document structures and inter–document structures and allows the query results to be restructured. Being rule–based, it naturally supports negation and recursion and therefore is more expressive than SQL–based languages. A logical semantics is also provided.  相似文献   

12.
一种基于语义网的本体映射改进算法   总被引:1,自引:1,他引:0       下载免费PDF全文
针对目前基于语义网的本体映射算法中背景本体搜索面少、本体收集不精确的问题,利用基于虚拟文档的映射技术提取在Word—Net中与概念同义的同义词集,将对单个概念进行搜索转换成对同义概念集进行搜索,从而扩大本体搜索面,获取更多背景本体。提出基于语义环境的动态本体映射算法来排除错误背景本体,使本体收集更加精确。实验结果表明,该算法可有效提高映射的查全率和查准率。  相似文献   

13.
Continuing innovations in pedagogical uses of the Web are consistent with our discipline’s long-standing commitment to the expansion of literacy. Surging interest in multimedia and visual rhetoric emphasizes the importance of the 1999 Web Content Accessibility Guidelines as a tool for instructors seeking to make their Web documents accessible to learners and colleagues who have disabilities. Text-only variants of media-rich sites are not sufficient; on the Web, as on our campuses, separate is not and cannot be equal. Changes in the way we approach designing class Web sites may be necessary to enable all learners to participate equally in the learning community. Accessibility is not a property of the document: It is situated in specific contexts and distributed across multiple agents and artifacts. A Web experience designed to be rich and meaningful for people with disabilities is likely to be rich and meaningful for those without disabilities as well; however, the reverse is not necessarily true.  相似文献   

14.
More people than ever before have access to information with the World Wide Web; information volume and number of users both continue to expand. Traditional search methods based on keywords are not effective, resulting in large lists of documents, many of which unrelated to users’ needs. One way to improve information retrieval is to associate meaning to users’ queries by using ontologies, knowledge bases that encode a set of concepts about one domain and their relationships. Encoding a knowledge base using one single ontology is usual, but a document collection can deal with different domains, each organized into an ontology. This work presents a novel way to represent and organize knowledge, from distinct domains, using multiple ontologies that can be related. The model allows the ontologies, as well as the relationships between concepts from distinct ontologies, to be represented independently. Additionally, fuzzy set theory techniques are employed to deal with knowledge subjectivity and uncertainty. This approach to organize knowledge and an associated query expansion method are integrated into a fuzzy model for information retrieval based on multi-related ontologies. The performance of a search engine using this model is compared with another fuzzy-based approach for information retrieval, and with the Apache Lucene search engine. Experimental results show that this model improves precision and recall measures.  相似文献   

15.
语义标注是实现语义网的一个重要研究内容,目前已有很多标注方法取得了不错的效果。但这些方法几乎都没有注意到本体所描述的知识往往稀疏地分布在文档中,也未能有效地利用文档的组织结构信息,使得这些方法对质量较差的文档的标注不理想。为此提出了一种基于稀疏编码的本体语义自动标注方法((Semantic Annotation Method based on Sparse Coding, SAMSC),该方法先按本体知识描述从文档中识别出一定的语义作为初始值,再通过迭代解析文档段落结构和描述主题,完成本体知识与文档资源的相关系数矩阵计算,最后在全局文档空间中通过最小化损失函数来实现用本体对文档的语义标注。实验表明,该方法能有效地对互联网中大量良芬不齐的文档进行自动语义标注,对质量差的文档资源能取得让人接受的结果。  相似文献   

16.
吕锋  余丽 《微机发展》2007,17(6):53-55
文中介绍了三种常用的Web数据抽取的方法:直接解析HTML文档的方法,基于XML的方法(也称作为分析HTML层次结构的方法)以及基于概念建模的方法。重点研究其中的基于XML的数据抽取方法,基本做法是将原始的HTML文档通过一个过滤器检查并修改HTML文档的语法结构,从而形成一篇基于XML的XHTML,然后利用XML工具来处理这些HTML文档。实现了从非结构化的HTML文档向结构化的XML文档转化的预处理过程,给在Web挖掘中使用传统的数据抽取方法进行数据抽取创造了有利条件。  相似文献   

17.
18.
Learning domain ontologies for semantic Web service descriptions   总被引:1,自引:0,他引:1  
High quality domain ontologies are essential for successful employment of semantic Web services. However, their acquisition is difficult and costly, thus hampering the development of this field. In this paper we report on the first stage of research that aims to develop (semi-)automatic ontology learning tools in the context of Web services that can support domain experts in the ontology building task. The goal of this first stage was to get a better understanding of the problem at hand and to determine which techniques might be feasible to use. To this end, we developed a framework for (semi-)automatic ontology learning from textual sources attached to Web services. The framework exploits the fact that these sources are expressed in a specific sublanguage, making them amenable to automatic analysis. We implement two methods in this framework, which differ in the complexity of the employed linguistic analysis. We evaluate the methods in two different domains, verifying the quality of the extracted ontologies against high quality hand-built ontologies of these domains.

Our evaluation lead to a set of valuable conclusions on which further work can be based. First, it appears that our method, while tailored for the Web services context, might be applicable across different domains. Second, we concluded that deeper linguistic analysis is likely to lead to better results. Finally, the evaluation metrics indicate that good results can be achieved using only relatively simple, off the shelf techniques. Indeed, the novelty of our work is not in the used natural language processing methods but rather in the way they are put together in a generic framework specialized for the context of Web services.  相似文献   


19.
A Flexible Ontology Reasoning Architecture for the Semantic Web   总被引:2,自引:0,他引:2  
Knowledge-based systems in the semantic Web era can make use of the power of the semantic Web languages and technologies, in particular those related to ontologies. Recent research has shown that user-defined data types are very useful for semantic Web and ontology applications. The W3C semantic Web best practices and development working group has set up a task force to address this issue. Very recently, OWL-Eu and OWL-E, two decidable extensions of the W3C standard ontology language OWL DL, have been proposed to support customized data types and customized data type predicates, respectively. In this paper, we propose a flexible reasoning architecture for these two expressive semantic Web ontology languages and describe our prototype implementation of the reasoning architecture, based on the well-known FaCT DL reasoner, which witnesses the two key flexibility features of our proposed architecture: 1) It allows users to define their own data types and data type predicates based on built-in ones and 2) new data type reasoners can be added into the architecture without having to change the concept reasoner  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号