首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 142 毫秒
1.
The distributed nature of the Web, as a decentralized system exchanging information between heterogeneous sources, has underlined the need to manage interoperability, i.e., the ability to automatically interpret information in Web documents exchanged between different sources, necessary for efficient information management and search applications. In this context, XML was introduced as a data representation standard that simplifies the tasks of interoperation and integration among heterogeneous data sources, allowing to represent data in (semi-) structured documents consisting of hierarchically nested elements and atomic attributes. However, while XML was shown most effective in exchanging data, i.e., in syntactic interoperability, it has been proven limited when it comes to handling semantics, i.e.,  semantic interoperability, since it only specifies the syntactic and structural properties of the data without any further semantic meaning. As a result, XML semantic-aware processing has become a motivating challenge in Web data management, requiring dedicated semantic analysis and disambiguation methods to assign well-defined meaning to XML elements and attributes. In this context, most existing approaches: (i) ignore the problem of identifying ambiguous XML elements/nodes, (ii) only partially consider their structural relationships/context, (iii) use syntactic information in processing XML data regardless of the semantics involved, and (iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Semantic Disambiguation Framework titled XSDFdesigned to address each of the above limitations, taking as input: an XML document, and then producing as output a semantically augmented XML tree made of unambiguous semantic concepts extracted from a reference machine-readable semantic network. XSDF consists of four main modules for: (i) linguistic pre-processing of simple/compound XML node labels and values, (ii) selecting ambiguous XML nodes as targets for disambiguation, (iii) representing target nodes as special sphere neighborhood vectors including all XML structural relationships within a (user-chosen) range, and (iv) running context vectors through a hybrid disambiguation process, combining two approaches: concept-basedand context-based disambiguation, allowing the user to tune disambiguation parameters following her needs. Conducted experiments demonstrate the effectiveness and efficiency of our approach in comparison with alternative methods. We also discuss some practical applications of our method, ranging over semantic-aware query rewriting, semantic document clustering and classification, Mobile and Web services search and discovery, as well as blog analysis and event detection in social networks and tweets.  相似文献   

2.
We are interested in defining and querying views in a huge and highly heterogeneous XML repository (Web scale). In this context, view definitions are very large, involving lots of sources, and there is no apparent limitation to their size. This raises interesting problems that we address in the paper: (i) how to distribute views over several machines without having a negative impact on the query translation process; (ii) how to quickly select the relevant part of a view given a query; (iii) how to minimize the cost of communicating potentially large queries to the machines where they will be evaluated. The solution that we propose is based on a simple view definition language that allows for automatic generation of views. The language maps paths in the view abstract DTD to paths in the concrete source DTDs. It enables a distributed implementation of the view system that is scalable both in terms of data and load. In particular, the query translation algorithm is shown to have a good (linear) complexity. Received: November 1, 2001 / Accepted: March 2, 2002 Published online: September 25, 2002  相似文献   

3.
4.
XML documents are extensively used in several applications and evolve over time. Identifying the semantics of these changes becomes a fundamental process to understand their evolution. Existing approaches related to understanding changes (diff) in XML documents focus only on syntactic changes. These approaches compare XML documents based on their structure, without considering the associated semantics. However, for large XML documents, which have undergone many changes from a version to the next, a large number of syntactic changes in the document may correspond to fewer semantic changes, which are then easier to analyze and understand. For instance, increasing the annual salary and the gross pay, and changing the job title of an employee (three syntactic changes) may mean that this employee was promoted (one semantic change). In this paper, we explore this idea and present the XChange approach. XChange considers the semantics of the changes to calculate the diff of different versions of XML documents. For such, our approach analyzes the granular syntactic changes in XML attributes and elements using inference rules to combine them into semantic changes. Thus, differently from existing approaches, XChange proposes the use of syntactic changes in versions of an XML document to infer the real reason for the change and support the process of semantic diff. Results of an experimental study indicate that XChange can provide higher effectiveness and efficiency when used to understand changes between versions of XML documents when compared with the (syntactic) state-of-the-art approaches.  相似文献   

5.
本文首先给出了语义Web的体系结构,继而分析了XML结合RDF与Ontology怎样用于实现Web数据语义的描述.最后总结了全文。  相似文献   

6.
Studying the XML Web: Gathering Statistics from an XML Sample   总被引:1,自引:0,他引:1  
XML has emerged as the language for exchanging data on the web and has attracted considerable interest both in industry and in academia. Nevertheless, to date, little is known about the XML documents published on the web. This paper presents a comprehensive analysis of a sample of about 200,000 XML documents on the web, and is the first study of its kind. We study the distribution of XML documents across the web in several ways; moreover, we provided a detailed characterization of the structure of real XML documents. Our results provide valuable input to the design of algorithms, tools and systems that use XML in one form or another. An erratum to this article is available at .  相似文献   

7.
Supporting concurrent ontology development: Framework, algorithms and tool   总被引:1,自引:0,他引:1  
We propose a novel approach to facilitate the concurrent development of ontologies by different groups of experts. Our approach adapts Concurrent Versioning, a successful paradigm in software development, to allow several developers to make changes concurrently to an ontology. Conflict detection and resolution are based on novel techniques that take into account the structure and semantics of the ontology versions to be reconciled by using precisely-defined notions of structural and semantic differences between ontologies and by extending state-of-the-art ontology debugging and repair techniques. We also present ContentCVS, a system that implements our approach, and a preliminary empirical evaluation which suggests that our approach is both computationally feasible and useful in practice.  相似文献   

8.
A factor limiting the take up of Web services is that all tasks associated with the creation of an application, for example, finding, composing, and resolving mismatches between Web services have to be carried out by a software developer. Semantic Web services is a combination of semantic Web and Web service technologies that promise to alleviate these problems. In this paper we describe IRS-III, a framework for creating and executing semantic Web services, which takes a semantic broker-based approach to mediating between service requesters and service providers. We describe the overall approach and the components of IRS-III from an ontological and architectural viewpoint. We then illustrate our approach through an application in the eGovernment domain.  相似文献   

9.
周颖  王义发 《微机发展》2006,16(11):125-127
大量分散的形式及不同格式的数据给现代数据处理带来了越来越大的困难。为统一数据形式以利于数据操作和处理,讨论了将形式多样的数据格式转换成统一的XML(Extensible Markup Language)格式的问题。对数据源中不同格式文件数据,按照预先定义的XML模板,以格式说明文件结构统一描述,并提取数据或作进一步的处理,最后转换为XML格式输出。文中论述了从数据库中提取数据转换为XML格式的方法及步骤,并且方法简单实用,可以推广到对所有格式数据的提取。  相似文献   

10.
Semantic Web Mining: State of the art and future directions   总被引:2,自引:0,他引:2  
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: More and more researchers are working on improving the results of Web Mining by exploiting semantic structures in the Web, and they make use of Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself.The Semantic Web is the second-generation WWW, enriched by machine-processable information which supports the user in his tasks. Given the enormous size even of today’s Web, it is impossible to manually enrich all of these resources. Therefore, automated schemes for learning the relevant information are increasingly being used. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of the data being mined, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web sites and navigation behavior are becoming more and more common. Furthermore, mining the Semantic Web itself is another upcoming application. We argue that the two areas Web Mining and Semantic Web need each other to fulfill their goals, but that the full potential of this convergence is not yet realized. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.  相似文献   

11.
XML has already become the de facto standard for specifying and exchanging data on the Web. However, XML is by nature verbose and thus XML documents are usually large in size, a factor that hinders its practical usage, since it substantially increases the costs of storing, processing, and exchanging data. In order to tackle this problem, many XML-specific compression systems, such as XMill, XGrind, XMLPPM, and Millau, have recently been proposed. However, these systems usually suffer from the following two inadequacies: They either sacrifice performance in terms of compression ratio and execution time in order to support a limited range of queries, or perform full decompression prior to processing queries over compressed documents.In this paper, we address the above problems by exploiting the information provided by a Document Type Definition (DTD) associated with an XML document. We show that a DTD is able to facilitate better compression as well as generate more usable compressed data to support querying. We present the architecture of the XCQ, which is a compression and querying tool for handling XML data. XCQ is based on a novel technique we have developed called DTD Tree and SAX Event Stream Parsing (DSP). The documents compressed by XCQ are stored in Partitioned Path-Based Grouping (PPG) data streams, which are equipped with a Block Statistics Signature (BSS) indexing scheme. The indexed PPG data streams support the processing of XML queries that involve selection and aggregation, without the need for full decompression. In order to study the compression performance of XCQ, we carry out comprehensive experiments over a set of XML benchmark datasets. Wilfred Ng obtained his M.Sc.(Distinction) and Ph.D. degrees from the University of London. His research interests are in the areas of databases and information Systems, which include XML data, database query languages, web data management, and data mining. He is now an assistant professor in the Department of Computer Science, the Hong Kong University of Science and Technology (HKUST). Further Information can be found at the following URL: . Wai-Yeung Lam obtained his M.Phil. degree from the Hong Kong University of Science and Technology (HKUST) in 2003. His research thesis was based on the project “XCQ: A Framework for Querying Compressed XML Data.” He is currently working in industry. Peter Wood received his Ph.D. in Computer Science from the University of Toronto in 1989. He has previously studied at the University of Cape Town, South Africa, obtaining a B.Sc. degree in 1977 and an M.Sc. degree in Computer Science in 1982. Currently he is a senior lecturer at Birkbeck and a member of the Information Management and Web Technologies research group. His research interests include database and XML query languages, query optimisation, active and deductive rule languages, and graph algorithms. Mark Levene received his Ph.D. in Computer Science in 1990 from Birkbeck College, University of London, having previously been awarded a B.Sc. in Computer Science from Auckland University, New Zealand in 1982. He is currently professor of Computer Science at Birkbeck College, where he is a member of the Information Management and Web Technologies research group. His main research interests are Web search and navigation, Web data mining and stochastic models for the evolution of the Web. He has published extensively in the areas of database theory and web technologies, and has recently published a book called ‘An Introduction to Search Engines and Web Navigation’.  相似文献   

12.
One major challenge in the content-based image retrieval (CBIR) and computer vision research is to bridge the so-called “semantic gap” between low-level visual features and high-level semantic concepts, that is, extracting semantic concepts from a large database of images effectively. In this paper, we tackle the problem by mining the decisive feature patterns (DFPs). Intuitively, a decisive feature pattern is a combination of low-level feature values that are unique and significant for describing a semantic concept. Interesting algorithms are developed to mine the decisive feature patterns and construct a rule base to automatically recognize semantic concepts in images. A systematic performance study on large image databases containing many semantic concepts shows that our method is more effective than some previously proposed methods. Importantly, our method can be generally applied to any domain of semantic concepts and low-level features. Wei Wang received his Ph.D. degree in Computing Science and Engineering from the State University of New York (SUNY) at Buffalo in 2004, under Dr. Aidong Zhang's supervision. He received the B.Eng. in Electrical Engineering from Xi'an Jiaotong University, China in 1995 and the M.Eng. in Computer Engineering from National University of Singapore in 2000, respectively. He joined Motorola Inc. in 2004, where he is currently a senior research engineer in Multimedia Research Lab, Motorola Applications Research Center. His research interests can be summarized as developing novel techniques for multimedia data analysis applications. He is particularly interested in multimedia information retrieval, multimedia mining and association, multimedia database systems, multimedia processing and pattern recognition. He has published 15 research papers in refereed journals, conferences, and workshops, has served in the organization committees and the program committees of IADIS International Conference e-Society 2005 and 2006, and has been a reviewer for some leading academic journals and conferences. In 2005, his research prototype of “seamless content consumption” was awarded the “most innovative research concept of the year” from the Motorola Applications Research Center. Dr. Aidong Zhang received her Ph.D. degree in computer science from Purdue University, West Lafayette, Indiana, in 1994. She was an assistant professor from 1994 to 1999, an associate professor from 1999 to 2002, and has been a professor since 2002 in the Department of Computer Science and Engineering at the State University of New York at Buffalo. Her research interests include bioinformatics, data mining, multimedia systems, content-based image retrieval, and database systems. She has authored over 150 research publications in these areas. Dr. Zhang's research has been funded by NSF, NIH, NIMA, and Xerox. Dr. Zhang serves on the editorial boards of International Journal of Bioinformatics Research and Applications (IJBRA), ACMMultimedia Systems, the International Journal of Multimedia Tools and Applications, and International Journal of Distributed and Parallel Databases. She was the editor for ACM SIGMOD DiSC (Digital Symposium Collection) from 2001 to 2003. She was co-chair of the technical program committee for ACM Multimedia 2001. She has also served on various conference program committees. Dr. Zhang is a recipient of the National Science Foundation CAREER Award and SUNY Chancellor's Research Recognition Award.  相似文献   

13.
The Visual Semantic Web (ViSWeb) is a new paradigm for enhancing the current Semantic Web technology. Based on Object-Process Methodology (OPM), which enables modeling of systems in a single graphic and textual model, ViSWeb provides for representation of knowledge over the Web in a unified way that caters to human perceptions while also being machine processable. The advantages of the ViSWeb approach include equivalent graphic-text knowledge representation, visual navigability, semantic sentence interpretation, specification of system dynamics, and complexity management. Arguing against the claim that humans and machines need to look at different knowledge representation formats, the principles and basics of various graphic and textual knowledge representations are presented and examined as candidates for ViSWeb foundation. Since OPM is shown to be most adequate for the task, ViSWeb is developed as an OPM-based layer on top of XML/RDF/OWL to express knowledge visually and in natural language. Both the graphic and the textual representations are strictly equivalent. Being intuitive yet formal, they are not only understandable to humans but are also amenable to computer processing. The ability to use such bimodal knowledge representation is potentially a major step forward in the evolution of the Semantic Web.Received: 14 December 2002, Accepted: 28 November 2003, Published online: 6 February 2004Edited by: V. AtluriDov Dori: dori@ie.technion.ac.il  相似文献   

14.
Yongtao   《Knowledge》2006,19(8):755-764
A process-planning model (PP model) is proposed to convert the geometric features into manufacture machining operations and sequence the machining operations of the part in a feasible and effective order. The process-planning model (PP model) construct a feature framework that makes a mapping from geometric features into machining operations. A semantic net named the Precedence-Relations-Net is established to reflect the precedence relationships among the machining operations. The vectors and the matrixes are employed to construct a mathematical sequencing model. A part is decomposed into several basic geometrical units, namely, U1U2, … , UN. For each unit Ui, two vectors, named Fi and Pi, represent the features and machining operations of Ui. Finally, a matrix named PP is used to memorize the process plan, and a matrix – PO (performing objects) – represents the object of machining operations.  相似文献   

15.
Recently, access control on XML data has become an important research topic. Previous research on access control mechanisms for XML data has focused on increasing the efficiency of access control itself, but has not addressed the issue of integrating access control with query processing. In this paper, we propose an efficient access control mechanism tightly integrated with query processing for XML databases. We present the novel concept of the dynamic predicate (DP), which represents a dynamically constructed condition during query execution. A DP is derived from instance-level authorizations and constrains accessibility of the elements. The DP allows us to effectively integrate authorization checking into the query plan so that unauthorized elements are excluded in the process of query execution. Experimental results show that the proposed access control mechanism improves query processing time significantly over the state-of-the-art access control mechanisms. We conclude that the DP is highly effective in efficiently checking instance-level authorizations in databases with hierarchical structures.  相似文献   

16.
17.
This paper describes the POESIA approach to systematic composition of Web services. This pragmatic approach is strongly centered in the use of domain-specific multidimensional ontologies. Inspired by applications needs and founded on ontologies, workflows, and activity models, POESIA provides well-defined operations (aggregation, specialization, and instantiation) to support the composition of Web services. POESIA complements current proposals for Web services definition and composition by providing a higher degree of abstraction with verifiable consistency properties. We illustrate the POESIA approach using a concrete application scenario in agroenvironmental planning.Received: 15 December 2002, Accepted: 16 April 2003, Published online: 30 September 2003Edited by: V. AtluriRenato Fileto: fileto@ic.unicamp.br  相似文献   

18.
A multi-step recognition process is developed for extracting compound forest cover information from manually produced scanned historical topographic maps of the 19th century. This information is a unique data source for GIS-based land cover change modeling. Based on salient features in the image the steps to be carried out are character recognition, line detection and structural analysis of forest symbols. Semantic expansion implying the meanings of objects is applied for final forest cover extraction. The procedure resulted in high accuracies of 94% indicating a potential for automatic and robust extraction of forest cover from larger areas.  相似文献   

19.
From SHIQ and RDF to OWL: the making of a Web Ontology Language   总被引:11,自引:0,他引:11  
  相似文献   

20.
LUBM: A benchmark for OWL knowledge base systems   总被引:3,自引:0,他引:3  
We describe our method for benchmarking Semantic Web knowledge base systems with respect to use in large OWL applications. We present the Lehigh University Benchmark (LUBM) as an example of how to design such benchmarks. The LUBM features an ontology for the university domain, synthetic OWL data scalable to an arbitrary size, 14 extensional queries representing a variety of properties, and several performance metrics. The LUBM can be used to evaluate systems with different reasoning capabilities and storage mechanisms. We demonstrate this with an evaluation of two memory-based systems and two systems with persistent storage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号