首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
基于Web服务的异构数据库共享及同步机制   总被引:1,自引:0,他引:1  
分析了企业数据集成中存在异构数据库共享和同步问题,提出了一种基于Web Service的异地异构数据库集成方法,把分布在异地的异构数据库源通过Web Service连接起来形成一个异构的中心数据库,为用户提供一个透明统一的接口,用户不仅能够对中心数据库进行查询,还能够对中心数据库进行增、删、改的操作,并使之同步到异构源数据库中,同时源数据库端的数据和结构的改动也能同步到中心数据库上,之后对关键技术做了详细描述.最后,用实例表明了研究的框架如何应用于实际应用中.  相似文献   

2.
3.
A large number of web pages contain data structured in the form of ??lists??. Many such lists can be further split into multi-column tables, which can then be used in more semantically meaningful tasks. However, harvesting relational tables from such lists can be a challenging task. The lists are manually generated and hence need not have well-defined templates??they have inconsistent delimiters (if any) and often have missing information. We propose a novel technique for extracting tables from lists. The technique is domain independent and operates in a fully unsupervised manner. We first use multiple sources of information to split individual lines into multiple fields and then, compare the splits across multiple lines to identify and fix incorrect splits and bad alignments. In particular, we exploit a corpus of HTML tables, also extracted from the web, to identify likely fields and good alignments. For each extracted table, we compute an extraction score that reflects our confidence in the table??s quality. We conducted an extensive experimental study using both real web lists and lists derived from tables on the web. The experiments demonstrate the ability of our technique to extract tables with high accuracy. In addition, we applied our technique on a large sample of about 100,000 lists crawled from the web. The analysis of the extracted tables has led us to believe that there are likely to be tens of millions of useful and query-able relational tables extractable from lists on the web.  相似文献   

4.
基于XML的异构数据库间联合使用   总被引:5,自引:0,他引:5  
分析了目前异构数据库系统的存在问题和XML在解决异构数据库联合使用问题中的优势,并在此基础上提出了一个基于XML的异构数据库间联合使用的设想和实现方案。通过实践着重研究和验证了如何构造系统的XML元数据,以及如何实现基于XML的异构数据库的数据转换和数据透明访问等一系列问题。  相似文献   

5.
In this paper we investigate highly sophisticated mechanisms that merge and automate interoperability of heterogeneity classical database systems together with the World Wide Web as one world. In particular, we introduce the global intelligence benevolent builder (GIBB) system that employs Intelligent Benevolent Architecture (IBA), which is comprised of assertions, integration rules, conceptual model constructs and agents. IBAs boost the components' versatility to reconcile the semantics involved in data sharing in order to withstand the dynamic computer technology in the present and future information age. Due to the IBAs power of intelligence, and in order to save costs and time, GIBB also has the capability to filter out and process only the relevant operational sources like preferences (i.e. customer's interest) from the sites.  相似文献   

6.
Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem—rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key–Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.  相似文献   

7.
统计数据的Web表达研究   总被引:1,自引:0,他引:1  
对常规的基于Web的统计数据可视化表达方法和在Web上的实现进行了分析,提出基于GIS技术的统计数据可视化表达的解决方案,对数据准备、可视化生成过程等进行了详细地分析,给出了一个基于GIS的统计数据可视化表达的实例。  相似文献   

8.
可搜索加密是解决云端不可信条件下加密数据安全云检索的重要方法。针对可搜索公钥加密、可搜索对称加密这2种可搜索加密类型,分别介绍了近几年来学术界的主要成果及其存在的问题、解决方法。在可搜索公钥加密领域,主要介绍了高安全条件下降低检索复杂度的方法;在可搜索对称加密领域,主要介绍了高安全条件下支持物理删除的方法。  相似文献   

9.
This article proposes a method to segment Internet images, that is, a group of images corresponding to a specific object (the query) containing a significant amount of irrelevant images. The segmentation algorithm we propose is a combination of two distinct methods based on color. The first one considers all images to classify pixels into two sets: object pixels and background pixels. The second method segments images individually by trying to find a central object. The final segmentation is obtained by intersecting the results from both. The segmentation results are then used to re-rank images and display a clean set of images illustrating the query. The algorithm is tested on various queries for animals, natural and man-made objects, and results are discussed, showing that the obtained segmentation results are suitable for object learning.  相似文献   

10.
Aggregate views are commonly used for summarizing information held in very large databases such as those encountered in data warehousing, large scale transaction management, and statistical databases. Such applications often involve distributed databases that have developed independently and therefore may exhibit incompatibility, heterogeneity, and data inconsistency. We are here concerned with the integration of aggregates that have heterogeneous classification schemes where local ontologies, in the form of such classification schemes, may be mapped onto a common ontology. In previous work, we have developed a method for the integration of such aggregates; the method previously developed is efficient, but cannot handle innate data inconsistencies that are likely to arise when a large number of databases are being integrated. In this paper, we develop an approach that can handle data inconsistencies and is thus inherently much more scalable. In our new approach, we first construct a dynamic shared ontology by analyzing the correspondence graph that relates the heterogeneous classification schemes; the aggregates are then derived by minimization of the Kullback-Leibler information divergence using the EM (Expectation-Maximization) algorithm. Thus, we may assess whether global queries on such aggregates are answerable, partially answerable, or unanswerable in advance of computing the aggregates themselves.  相似文献   

11.
针对经典的基于编辑距离的字符串相似度计算方法计算效率低且准确率差的不足,提出一种基于编辑距离和最长公共子串的改进字符串相似度求解方法,引入最长公共前缀和最长公共后缀,定义新的相似度计算公式。将该方法应用于基于异构平台的动态异构web服务系统模型,通过网页篡改检测实验验证,与经典算法和经典公式相比,改进的相似度计算方法能够在适应自身差异性的基础上,提高相似度计算的准确性和计算效率。  相似文献   

12.
We consider the micro-aggregation problem which involves partitioning a set of individual records in a micro-data file into a number of mutually exclusive and exhaustive groups. This problem, which seeks for the best partition of the micro-data file, is known to be NP-hard, and has been tackled using many heuristic solutions. In this paper, we would like to demonstrate that in the process of developing micro-aggregation techniques (MATs), it is expedient to incorporate information about the dependence between the random variables in the micro-data file. This can be achieved by pre-processing the micro-data before invoking any MAT, in order to extract the useful dependence information from the joint probability distribution of the variables in the micro-data file, and then accomplishing the micro-aggregation on the “maximally independent” variables—thus confirming the conjecture [A conjecture, which was recently proposed by Domingo-Ferrer et al. (IEEE Trans Knowl Data Eng 14(1):189–201, 2002), was that the phenomenon of micro-aggregation can be enhanced by incorporating dependence-based information between the random variables of the micro-data file by working with (i.e., selecting) the maximally independent variables. Domingo-Ferrer et al. have proposed to select one variable from among the set of highly correlated variables inferred via the correlation matrix of the micro-data file. In this paper, we demonstrate that this process can be automated, and that it is advantageous to select the “most independent variables” by using methods distinct from those involving the correlation matrix.] of Domingo-Ferrer et al. Our results, on real life and artificial data sets, show that including such information will enhance the process of determining how many variables are to be used, and which of them should be used in the micro-aggregation process.  相似文献   

13.
基于多库操作语言的异构数据库集成框架研究   总被引:1,自引:0,他引:1  
为解决当前异构数据库集成框架中全局模式维护复杂的问题,提出了基于多库操作语言的异构数据库集成框架HDIFBML.主要内容包含关键模块设计、全局模式及模式映射概念,并定义了一种多库操作语言SMSQL.SMSQL定义了模式构建语言集和模式映射语言集来构建,维护全局模式,并通过字段转化解决不同模式集成时的结构/语义冲突.实践应用表明,HDIFBML屏蔽了底层异构数据库执行细节,可以灵活地维护全局模式,具有良好的可操作性、可维护性和可扩展性.  相似文献   

14.
In a large-scale design process, designers cooperate in a complex situation where a variety of software tools run on different hardware platforms. This paper presents a data enhancement approach to integrate heterogeneous Computer-Aided Design (CAD) databases through the Internet. The data enhancement means topological changes in a geometric model and additional information in design semantics. The geometric data is enhanced using a non-manifold modeler to produce data sets valuable in downstream applications such as a Finite Element Method (FEM) solver or a detail design system. As a practical example, a shipbuilding product model database has been implemented based on the Standard for the Exchange of Product (STEP) model data methodology and shipbuilding features. The system has been implemented on a network environment that consists of a Web browser, Common Object Request Broker Architecture (CORBA) objects, a relational database management system, a data enhancement module, and various computer-aided applications.  相似文献   

15.
Estimating the disclosure risk of a Statistical Disclosure Control (SDC) protection method by means of (distance-based) record linkage techniques is a very popular approach to analyze the privacy level offered by such a method. When databases are very large, some particular record linkage techniques such as blocking or partitioning are usually applied to make this process reasonably efficient. However, in this case the record linkage process is not exact, which means that the disclosure risk of a SDC protection method may be underestimated.In this paper we propose the use of kd-trees techniques to apply exact yet very efficient record linkage when (protected) datasets are very large. We describe some experiments showing that this approach achieves better results, in terms of both accuracy and running time, than more classical approaches such as record linkage based on a sliding window.We also discuss and experiment on the use of these techniques not to link a whole protected record with its original one, but just to guess the value of some confidential attribute(s) of the record(s). This fact leads to concepts such as k-neighbor l-diversity or k-neighbor p-sensitivity, a generalization (to any SDC protection method) of l-diversity or p-sensitivity, which have been defined for SDC protection methods ensuring k-anonymity, such as microaggregation.  相似文献   

16.
Searching desired data on the Internet is one of the most common ways the Internet is used. No single search engine is capable of searching all data on the Internet. The approach that provides an interface for invoking multiple search engines for each user query has the potential to satisfy more users. When the number of search engines under the interface is large, invoking all search engines for each query is often not cost effective because it creates unnecessary network traffic by sending the query to a large number of useless search engines and searching these useless search engines wastes local resources. The problem can be overcome if the usefulness of every search engine with respect to each query can be predicted. We present a statistical method to estimate the usefulness of a search engine for any given query. For a given query, the usefulness of a search engine in this paper is defined to be a combination of the number of documents in the search engine that are sufficiently similar to the query and the average similarity of these documents. Experimental results indicate that our estimation method is much more accurate than existing methods.  相似文献   

17.
Identifying replicated sites is an important task for search engines. It can reduce data storage costs, improve query processing time and remove noise that might affect the quality of the final answers given to the user. This paper introduces a new approach to detect web sites that are likely to be replicas in a search engine database. Our method uses the websites’ structure and the content of their pages to identify possible replicas. As we show through experiments, such a combination improves the precision and reduces the overall costs related to the replica detection task. Our method achieves a quality improvement of 47.23% when compared to previously proposed approaches.  相似文献   

18.
19.
针对传统统计测试方法效率低和适应性不强的局限性,结合web应用的特点,提出了一种基于用户分类的web日志统计测试方法。根据web应用的不同复杂度,通过用户分类和web日志统计进行建模,并依据该模型测试,评估web应用的可靠性。实验结果表明,该方法较传统统计测试方法对于web应用主要业务功能测试的覆盖率更高,其可靠性评估更具现实意义。  相似文献   

20.
The aim of the conceptual step in database design is to describe data involving in the application in a formal and abstract way, without any concern to the specific model and language chosen for the implementation. In statistical applications, data are described at different levels of aggregation, from elementary facts of the reality to complex aggregations such as classifications, time series, indexes. The paper describes a methodology for conceptual design of statistical databases that provides the designer suitable strategies for defining such different levels of aggregation starting from user requirements, and checking the completeness, coherence and minimality of the conceptual schema at the different levels. The methodology makes use of two data models for the representation of data: for elementary data the Entity-Relationship model, widely used in database applications, and for summary data a new model is proposed, designed to be an effective trade-off between expressive power and simplicity of use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号