首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Establishing interschema semantic knowledge between corresponding elements in a cooperating OWL-based multi-information server grid environment requires deep knowledge, not only about the structure of the data represented in each server, but also about the commonly occurring differences in the intended semantics of this data. The same information could be represented in various incompatible structures, and more importantly the same structure could be used to represent data with many diverse and incompatible semantics. In a grid environment interschema semantic knowledge can only be detected if both the structural and semantic properties of the schemas of the cooperating servers are made explicit and formally represented in a way that a computer system can process. Unfortunately, very often there is lack of such knowledge and the underlying grid information servers (ISs) schemas, being semantically weak as a consequence of the limited expressiveness of traditional data models, do not help the acquisition of this knowledge. The solution to overcome this limitation is primarily to upgrade the semantic level of the IS local schemas through a semantic enrichment process by augmenting the local schemas of grid ISs to semantically enriched schema models, then to use these models in detecting and representing correspondences between classes belonging to different schemas. In this paper, we investigate the possibility of using OWL-based domain ontologies both for building semantically rich schema models, and for expressing interschema knowledge and reasoning about it. We believe that the use of OWL/RDF in this setting has two important advantages. On the one hand, it enables a semantic approach for interschema knowledge specification, by concentrating on expressing conceptual and semantic correspondences between both the conceptual (intensional) definition and the set of instances (extension) of classes represented in different schemas. On the other hand, it is exactly this semantic nature of our approach that allows us to devise reasoning mechanisms for discovering and reusing interschema knowledge when the need arises to compare and combine it.  相似文献   

2.
Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance.  相似文献   

3.
4.
XML and XML Schema are used in the geospatial domain for the definition of standards that enhance the interoperability between producers and consumers of spatial data. The size and complexity of these geospatial standards and their associated schemas have been growing with time reaching levels of complexity that make it difficult to build systems based on them in a timely and cost-effective manner. The problem of producing XML processing code based on large schemas has been traditionally solved by using XML data binding generators. Unfortunately, this solution is not always effective when code is generated for resource-constrained devices, such as mobile phones. Large and complex schemas often result in the production of code with a large size and a complicated structure that might not fit the device limitations. In this article we present the instance-based XML data binding approach to produce more compact application-specific XML processing code for geospatial applications targeted to mobile devices. The approach tries to reduce the size and complexity of the generated code by using information about how schemas are used by individual applications. Our experimental results suggest a significant simplification of XML Schema sets to the real needs of client applications accompanied by a substantial reduction of size of the generated code.  相似文献   

5.
6.
Data warehouse architectures rely on extraction, transformation and loading (ETL) processes for the creation of an updated, consistent and materialized view of a set of data sources. In this paper, we support these processes by proposing a tool that: (1) allows the semi-automatic definition of inter-attribute semantic mappings, by identifying the parts of the data source schemas which are related to the data warehouse schema, thus supporting the extraction process; and (2) groups the attribute values semantically related thus defining a transformation function for populating with homogeneous values the data warehouse.Our proposal couples and extends the functionalities of two previously developed systems: the MOMIS integration system and the RELEVANT data analysis system. The system has been experimented within a real scenario concerning the creation of a data warehouse for enterprises working in the beverage and food logistic area. The results showed that the coupled system supports effectively the extraction and transformation processes.  相似文献   

7.
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe their contents. Semantic models of data sources represent the implicit meaning of the data by specifying the concepts and the relationships within the data. Such models are the key ingredients to automatically publish the data into knowledge graphs. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the related work focuses on semantic annotation of the data fields (source attributes). However, constructing a semantic model that explicitly describes the relationships between the attributes in addition to their semantic types is critical.We present a novel approach that exploits the knowledge from a domain ontology and the semantic models of previously modeled sources to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and the known semantic models to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.  相似文献   

8.
The proliferation of mobile devices coupled with Internet access is generating a tremendous amount of highly personal and sensitive data. Applications such as location-based services and quantified self harness such data to bring meaningful context to users’ behavior. As social applications are becoming prevalent, there is a trend for users to share their mobile data. The nature of online social networking poses new challenges for controlling access to private data, as compared to traditional enterprise systems. First, the user may have a large number of friends, each associated with a unique access policy. Second, the access control policies must be dynamic and fine-grained, i.e. they are content-based, as opposed to all-or-nothing. In this paper, we investigate the challenges in sharing of mobile data in social applications. We design and evaluate a middleware running on Google App Engine, named Mosco, that manages and facilitates sharing of mobile data in a privacy-preserving manner. We use Mosco to develop a location sharing and a health monitoring application. Mosco helps shorten the development process. Finally, we perform benchmarking experiments with Mosco, the results of which indicate small overhead and high scalability.  相似文献   

9.
The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to provide interoperability—both between sites with different terminologies and with existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world’s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies.This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, and it maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.  相似文献   

10.
STEP标准数据存取界面-SDAI为应用程序提供了独立于数据存储的STEP数据访问界面,SDAI允许访问不同的数据存储系统,有不同的实现语言联编方式,如果对每一种实用语言和数据存储系统单独实现SDAI,其工作量将是巨大的,该文选择系数据库作为SDAI数据存储系统,C语言作为实现语言,实现系统独立于不同的关系数据库系统,并且为不同语言的联编提供了一个统一的开发平台,提高了系统的开放性和可扩展性,为STEP在企业信息集成中的应用提供了核心操作,STEP建模语言EXPRESS,关系数据库,SDAI实现语言在模式表示上的不匹配是系统实现要解决的主要问题,该文从数据字典,STEP数据的字储与访问等方面阐明了系统实现时没模式之间的匹配过程。  相似文献   

11.
Start making sense: The Chatty Web approach for global semantic agreements   总被引:1,自引:0,他引:1  
This paper describes a novel approach for obtaining semantic interoperability in a bottom–up, semi-automatic manner without relying on pre-existing, global semantic models. We assume that large amounts of data exist that have been organized and annotated according to local schemas. Seeing semantics as a form of agreement, our approach enables the participating data sources to incrementally develop global agreements in an evolutionary and completely decentralized process that solely relies on pair-wise, local interactions.  相似文献   

12.
The Semantic Web is the next step of the current Web where information will become more machine-understandable to support effective data discovery and integration. Hierarchical schemas, either in the form of tree-like structures (e.g., DTDs, XML schemas), or in the form of hierarchies on a category/subcategory basis (e.g., thematic hierarchies of portal catalogs), play an important role in this task. They are used to enrich semantically the available information. Up to now, hierarchical schemas have been treated rather as sets of individual elements, acting as semantic guides for browsing or querying data. Under that view, queries like “find the part of a portal catalog which is not present in another catalog” can be answered only in a procedural way, specifying which nodes to select and how to get them. For this reason, we argue that hierarchical schemas should be treated as full-fledged objects so as to allow for their manipulation. This work proposes models and operators to manipulate the structural information of hierarchies, considering them as first-class citizens. First, we explore the algebraic properties of trees representing hierarchies, and define a lattice algebraic structure on them. Then, turning this structure into a boolean algebra, we present the operators S-union, S-intersection and S-difference to support structural manipulation of hierarchies. These operators have certain algebraic properties to provide clear semantics and assist the transformation, simplification and optimization of sequences of operations using laws similar to those of set theory. Also, we identify the conditions under which this framework is applicable. Finally, we demonstrate an application of our framework for manipulating hierarchical schemas on tree-like hierarchies encoded as RDF/s files.  相似文献   

13.
Collaboration technologies must support information sharing between collaborators, but must also take care not to share too much information or share information too widely. Systems that share information without requiring an explicit action by a user to initiate the sharing must be particularly cautious in this respect. Presence systems are an emerging class of applications that support collaboration. Through the use of pervasive sensors, these systems estimate user location, activities, and available communication channels. Because such presence data are sensitive, to achieve wide-spread adoption, sharing models must reflect the privacy and sharing preferences of their users. This paper looks at the role that privacy-preserving aggregation can play in addressing certain user sharing and privacy concerns with respect to presence data.We define conditions to achieve CollaPSE (Collaboration Presence Sharing Encryption) security, in which (i) an individual has full access to her own data, (ii) a third party performs computation on the data without learning anything about the data values, and (iii) people with special privileges called “analysts” can learn statistical information about groups of individuals, but nothing about the individual values contributing to the statistic other than what can be deduced from the statistic. More specifically, analysts can decrypt aggregates without being able to decrypt the individual values contributing to the aggregate. Based in part on studies we carried out that illustrate the need for the conditions encapsulated by CollaPSE security, we designed and implemented a family of CollaPSE protocols. We analyze their security, discuss efficiency tradeoffs, describe extensions, and review more recent privacy-preserving aggregation work.  相似文献   

14.
In recent years, the spectral clustering method has gained attentions because of its superior performance. To the best of our knowledge, the existing spectral clustering algorithms cannot incrementally update the clustering results given a small change of the data set. However, the capability of incrementally updating is essential to some applications such as websphere or blogsphere. Unlike the traditional stream data, these applications require incremental algorithms to handle not only insertion/deletion of data points but also similarity changes between existing points. In this paper, we extend the standard spectral clustering to such evolving data, by introducing the incidence vector/matrix to represent two kinds of dynamics in the same framework and by incrementally updating the eigen-system. Our incremental algorithm, initialized by a standard spectral clustering, continuously and efficiently updates the eigenvalue system and generates instant cluster labels, as the data set is evolving. The algorithm is applied to a blog data set. Compared with recomputation of the solution by the standard spectral clustering, it achieves similar accuracy but with much lower computational cost. It can discover not only the stable blog communities but also the evolution of the individual multi-topic blogs. The core technique of incrementally updating the eigenvalue system is a general algorithm and has a wide range of applications—as well as incremental spectral clustering—where dynamic graphs are involved. This demonstrates the wide applicability of our incremental algorithm.  相似文献   

15.
We re-visit2 the age-old problem of estimating the parameters of a distribution from its observations. Traditionally, scientists and statisticians have attempted to obtain strong estimates by ‘extracting’ the information contained in the observations taken as a set. However, generally speaking, the information contained in the sequence in which the observations have appeared, has been ignored—i.e., except to consider dependence information as in the case of Markov models and n-gram statistics. In this paper, we present results which, to the best of our knowledge, are the first reported results, which consider how estimation can be enhanced by utilizing both the information in the observations and in their sequence of appearance. The strategy, known as sequence based estimation (SBE) works as follows. We first quickly allude to the results pertaining to computing the maximum likelihood estimates (MLE) of the data when the samples are taken individually. We then derive the corresponding MLE results when the samples are taken two-at-a-time, and then extend these for the cases when they are processed three-at-a-time, four-at-a-time etc. In each case, we also experimentally demonstrate the convergence of the corresponding estimates. We then suggest various avenues for future research, including those by which these estimates can be fused to yield a superior overall cumulative estimate of the parameter of the distribution, in pattern recognition (PR), and in other internet and compression applications. We believe that our new estimates have great potential for practitioners, especially when the cardinality of the observation set is small.  相似文献   

16.
The availability of large amounts of open, distributed, and structured semantic data on the web has no precedent in the history of computer science. In recent years, there have been important advances in semantic search and question answering over RDF data. In particular, natural language interfaces to online semantic data have the advantage that they can exploit the expressive power of Semantic Web data models and query languages, while at the same time hiding their complexity from the user. However, despite the increasing interest in this area, there are no evaluations so far that systematically evaluate this kind of systems, in contrast to traditional question answering and search interfaces to document spaces. To address this gap, we have set up a series of evaluation challenges for question answering over linked data. The main goal of the challenge was to get insight into the strengths, capabilities, and current shortcomings of question answering systems as interfaces to query linked data sources, as well as benchmarking how these interaction paradigms can deal with the fact that the amount of RDF data available on the web is very large and heterogeneous with respect to the vocabularies and schemas used. Here, we report on the results from the first and second of such evaluation campaigns. We also discuss how the second evaluation addressed some of the issues and limitations which arose from the first one, as well as the open issues to be addressed in future competitions.  相似文献   

17.
18.
19.
With the growing adoption of Building Information Modeling (BIM), specialized applications have been developed to perform domain-specific analyses. These applications need tailored information with respect to a BIM model element’s attributes and relationships. In particular, architectural elements need further qualification concerning their geometric and functional ‘subtypes’ to support exact simulations and compliance checks. BIM and its underlying data schema, the Industry Foundation Classes (IFC), provide a rich representation with which to exchange semantic entity and relationship data. However, subtypes for individual elements are not represented by default and often require manual designation, leaving it vulnerable to errors and omissions. Existing research to enrich the semantics of IFC model entities employed domain-specific rule sets that scrutinize their legitimacy and modify them, if and when necessary. However, such an approach is limited in their scalability and comprehensibility. This study explored the use of 3D geometric deep neural networks originating from computer vision research. Specifically, Multi-view CNN(MVCNN) and PointNet were investigated to determine their applicability in extracting unique features of door (IfcDoor) and wall (IfcWall) element subtypes, and in turn be leveraged to automate subtype classifications. Test results indicated MVCNN as having the best prediction performance, while PointNet’s accuracy was hampered by resolution loss due to selective use of point cloud data. The research confirmed deep neural networks as a viable solution to distinguishing BIM element subtypes, the critical factor being their ability to detect subtle differences in local geometries.  相似文献   

20.
XML is becoming a prevalent format and standard for data exchange in many applications. With the increase of XML data, there is an urgent need to research some efficient methods to store and manage XML data. As relational databases are the primary choices for this purpose considering their data management power, it is necessary to research the problem of mapping XML schemas to relational schemas. The semantics of XML schemas are crucial to design, query, and store XML documents and functional dependencies are very important representations of semantic information of XML schemas. As DTDs are one of the most frequently used schemas for XML documents in these days, we will use DTDs as schemas of XML documents here. This paper proposes the concept and the formal definition of XML functional dependencies over DTDs. A method to map XML DTDs to relational schemas with constraints such as functional dependencies, domain constraints, choice constraints, reference constraints, and cardinality constraints over DTDs is given, which can preserve the structures of DTDs as well as the semantics implied by the above constraints over DTDs. The concepts and method of mapping DTDs to relational schemas presented in the paper can be extended to the field of XML Schema just with some modifications in related formal definitions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号