首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Instance-based attribute identification in database integration   总被引:3,自引:0,他引:3  
Most research on attribute identification in database integration has focused on integrating attributes using schema and summary information derived from the attribute values. No research has attempted to fully explore the use of attribute values to perform attribute identification. We propose an attribute identification method that employs schema and summary instance information as well as properties of attributes derived from their instances. Unlike other attribute identification methods that match only single attributes, our method matches attribute groups for integration. Because our attribute identification method fully explores data instances, it can identify corresponding attributes to be integrated even when schema information is misleading. Three experiments were performed to validate our attribute identification method. In the first experiment, the heuristic rules derived for attribute classification were evaluated on 119 attributes from nine public domain data sets. The second was a controlled experiment validating the robustness of the proposed attribute identification method by introducing erroneous data. The third experiment evaluated the proposed attribute identification method on five data sets extracted from online music stores. The results demonstrated the viability of the proposed method.Received: 30 August 2001, Accepted: 31 August 2002, Published online: 31 July 2003Edited by L. Raschid  相似文献   

2.
Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not sufficient. We propose an evidential approach to combining multiple matchers using Dempster–Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require the use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective.  相似文献   

3.
基于实例的Deep Web数据源结果模式匹配技术   总被引:1,自引:0,他引:1       下载免费PDF全文
针对Deep Web数据源结果模式信息的匹配问题,提出了一种基于实例的结果模式匹配的方法。该方法能够匹配并验证数据源的结果模式属性信息,同时记录数据在结果页面中的结构信息。利用基于查询请求松弛的两段模式匹配方法精确地匹配模式属性,并基于模式属性间共现度信息来提高属性匹配的查全率和查准率。从实验结果分析可以看出,基于实例的方法能够有效地识别数据源模式信息,提高模式属性查全率和查准率。  相似文献   

4.
Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.  相似文献   

5.
Blood flow and derived data are essential to investigate the initiation and progression of cerebral aneurysms as well as their risk of rupture. An effective visual exploration of several hemodynamic attributes like the wall shear stress (WSS) and the inflow jet is necessary to understand the hemodynamics. Moreover, the correlation between focus-and-context attributes is of particular interest. An expressive visualization of these attributes and anatomic information requires appropriate visualization techniques to minimize visual clutter and occlusions. We present the FLOWLENS as a focus-and-context approach that addresses these requirements. We group relevant hemodynamic attributes to pairs of focus-and-context attributes and assign them to different anatomic scopes. For each scope, we propose several FLOWLENS visualization templates to provide a flexible visual filtering of the involved hemodynamic pairs. A template consists of the visualization of the focus attribute and the additional depiction of the context attribute inside the lens. Furthermore, the FLOWLENS supports local probing and the exploration of attribute changes over time. The FLOWLENS minimizes visual cluttering, occlusions, and provides a flexible exploration of a region of interest. We have applied our approach to seven representative datasets, including steady and unsteady flow data from CFD simulations and 4D PC-MRI measurements. Informal user interviews with three domain experts confirm the usefulness of our approach.  相似文献   

6.
为了解决多源异构民航旅客服务数据集成过程中存在多模式匹配的效率不高、精确性不足、完整模式信息获取难度较大等问题,提出了一种基于SimHash和混合相似度的多模式匹配方法。该方法首先基于PMI计算特征单元权重,并通过SimHash算法构造属性列的签名来表示属性特征,以降低特征维度,进而引入K-means++算法对属性聚类并生成候选匹配集。最后基于属性的混合相似度构建属性映射图,以直观的方式展示属性间的匹配关系,同时提高多模式匹配效率。实验结果表明该方法具有可行性,为高效地解决多源异构民航旅客服务数据集成中的模式冲突问题提供新的解决方案。  相似文献   

7.
Brushing of attribute clouds for the visualization of multivariate data   总被引:1,自引:0,他引:1  
The visualization and exploration of multivariate data is still a challenging task. Methods either try to visualize all variables simultaneously at each position using glyph-based approaches or use linked views for the interaction between attribute space and physical domain such as brushing of scatterplots. Most visualizations of the attribute space are either difficult to understand or suffer from visual clutter. We propose a transformation of the high-dimensional data in attribute space to 2D that results in a point cloud, called attribute cloud, such that points with similar multivariate attributes are located close to each other. The transformation is based on ideas from multivariate density estimation and manifold learning. The resulting attribute cloud is an easy to understand visualization of multivariate data in two dimensions. We explain several techniques to incorporate additional information into the attribute cloud, that help the user get a better understanding of multivariate data. Using different examples from fluid dynamics and climate simulation, we show how brushing can be used to explore the attribute cloud and find interesting structures in physical space.  相似文献   

8.
A global schema is a single, connected view of heterogeneous databases. Past research into the problem of global schema design has demonstrated the use of generalization for connecting disjoint schemas. Before generalization can be applied, the common attributes of the local schemas must be identified. We use the term attribute equivalence for the identification of such common attributes. This paper defines four types of attribute equivalences. The distinction between local and global equivalence is explained, and special attention is given to key attributes. We also discuss the placement of locally equivalent attributes in a global schema.  相似文献   

9.
模式匹配就是在作为输入的模式中有对应语义关系的元素间产生一个映射.为了提高模式匹配的效率,提出了一种新型的模式匹配方法--源模式分裂模式匹配算法.它可以解决标准模式匹配难以解决的问题:1)源模式的某一个属性和多个目标模式的多个属性之间建立匹配关系;2)表格中的不同元组对应其他表格同一元组的不同属性值的匹配.在匹配过程中,该方法先搜索种类型属性,然后根据种类型属性建立选择条件,最后把源模式进行分裂形成视图,再重新生成候选匹配集合,从而提高模式匹配的质量.  相似文献   

10.
Extract, Transform and Load (ETL) processes organized as workflows play an important role in data warehousing. As ETL workflows are usually complex, various ETL facilities have been developed to address their control-flow process modeling and execution control. To evaluate the quality of ETL facilities, Synthetic ETL workflow test cases, consisting of control-flow and data-flow aspects are needed to check ETL facility functionalities at construction time and to validate the correctness and performance of ETL facilities at run time. Although there are some synthetic workflow and data set test case generation approaches existed in literatures, little work is done to consider both aspects at the same time specifically for ETL workflow generators. To address this issue, this paper proposes a schema aware ETL workflow generator with which users can characterize their ETL workflows by various parameters and get ETL workflow test cases with control-flow of ETL activities, complied schemas and associated recordsets. Our generator consists of three steps. First, with type and ratio of individual activities and their connection characteristic parameter specification, the generator will produce ETL activities and form ETL skeleton which determine how generated activities are cooperated with each other. Second, with schema transformation characteristic parameter specification, e.g. ranges of numbers of attributes, the generator will resolve attribute dependencies and refine input/output schemas with complied attributes and their data types. In the last step, recordsets are generated following cardinality specifications. ETL workflows in specific patterns are produced in the experiment in order to show the ability of our generator. Also experiments to generate thousands of ETL workflow test cases in seconds have been done to verify the usability of the generator.  相似文献   

11.
在很多领域的统计分析中,通常需要分析既具有层次结构又具有多维属性的复杂数据,如食品安全数据、股票数据、网络安全数据等.针对现有多维数据和层次结构的可视化方法不能满足对同时具有层次和多维两种属性数据的可视分析要求,提出了一种树图中的多维坐标MCT(multi-coordinate in treemap)技术.该技术采用基于Squarified和Strip布局算法的树图表示层次结构,用树图中节点矩形的边作为属性轴,通过属性映射、属性点连接、曲线拟合实现层次结构中多维属性的可视化.将该技术应用于全国农药残留侦测数据,实现了对全国各地区、各超市、各农产品中农药残留检出和超标情况的可视化,为领域人员提供了有效的分析工具.MCT技术也可用于其他领域的层次多属性数据的可视化.  相似文献   

12.
XML模式中隐式冗余不存在的充要条件   总被引:1,自引:0,他引:1  
XML数据库模式规范化设计是产生一组相关联的、能表示数据间依赖关系、而且消除了冗余的XML模式或DTD,以更好地进行信息检索.XML数据库模式中某些数据依赖的存在是冗余存在的原因,因此在XML数据库模式中数据依赖与冗余的关联是其规范化设计研究的关键问题,但对这一问题目前还没有专门的研究.XML数据库模式的数据依赖包括属性间数据依赖和元素间的数据依赖,给出综合了属性间和元素间数据依赖的XML数据库模式数据依赖的定义,分析与之关联的隐式冗余,并论证XML模式中隐式冗余不存在当且仅当该XML模式是规范的,为XML数据库模式规范化设计更深一层的研究奠定理论基础.  相似文献   

13.
Feature extraction and iconic visualization   总被引:1,自引:0,他引:1  
We present a conceptual framework and a process model for feature extraction and iconic visualization. The features are regions of interest extracted from a dataset. They are represented by attribute sets, which play a key role in the visualization process. These attribute sets are mapped to icons, or symbolic parametric objects, for visualization. The features provide a compact abstraction of the original data, and the icons are a natural way to visualize them. We present generic techniques to extract features and to calculate attribute sets, and describe a simple but powerful modeling language which was developed to create icons and to link the attributes to the icon parameters. We present illustrative examples of iconic visualization created with the techniques described, showing the effectiveness of this approach  相似文献   

14.
15.
针对传统关联规则可视化方法无法展现数据间的频繁模式和关联关系,表示形式比较单一,缺乏多模式展现形式等问题,提出了一种新的多值属性关联规则可视化表示算法。该算法运用概念格理论对多值属性数据进行了重新定义和分类,将频繁项集和关联规则中的多值数据项分别以概念格结构进行表示,实现了频繁项集可视化展示和一对一、一对多、多对一、多对多及概念分层的多模式关联规则可视化展示。最后,以某省全员人口数据为基础对算法进行了具体实现和分析,同时实现了对人口数据的源数据、频繁模式以及关联关系的可视化展示。实验结果表明,所提出的可视化形式和已有成果相比具有良好的频繁项集与多模式关联规则展现效果。  相似文献   

16.
文中通过对关系模式中的属性进行适当的分类,讨论了一个属性成为主属性应该满足的条件,并在文献「4」、「5」「6」的基础上,采用闭包,实现了利用Armstrong公理进行函数信赖推导的过程,从而给出了一个简便的求解关系模式全部主属性的多项式时间算法。  相似文献   

17.
Social networks collected by historians or sociologists typically have a large number of actors and edge attributes. Applying social network analysis (SNA) algorithms to these networks produces additional attributes such as degree, centrality, and clustering coefficients. Understanding the effects of this plethora of attributes is one of the main challenges of multivariate SNA. We present the design of GraphDice, a multivariate network visualization system for exploring the attribute space of edges and actors. GraphDice builds upon the ScatterDice system for its main multidimensional navigation paradigm, and extends it with novel mechanisms to support network exploration in general and SNA tasks in particular. Novel mechanisms include visualization of attributes of interval type and projection of numerical edge attributes to node attributes. We show how these extensions to the original ScatterDice system allow to support complex visual analysis tasks on networks with hundreds of actors and up to 30 attributes, while providing a simple and consistent interface for interacting with network data.  相似文献   

18.
获取模式信息是深入研究Deep Web数据的必要步骤,针对Deep Web结果模式结构信息的丢失问题,提出了一种基于启发式信息的Deep Web结果模式获取方法.通过解析Deep Web结果页面数据,利用启发式信息为结果页面数据添加正确的属性名,进而得到对应Deep Web的结果模式,并对其进行规范化处理,解决不同数据...  相似文献   

19.
Clustering is an important data mining problem. However, most earlier work on clustering focused on numeric attributes which have a natural ordering to their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received more attention. A common issue in cluster analysis is that there is no single correct answer to the number of clusters, since cluster analysis involves human subjective judgement. Interactive visualization is one of the methods where users can decide a proper clustering parameters. In this paper, a new clustering approach called CDCS (Categorical Data Clustering with Subjective factors) is introduced, where a visualization tool for clustered categorical data is developed such that the result of adjusting parameters is instantly reflected. The experiment shows that CDCS generates high quality clusters compared to other typical algorithms.  相似文献   

20.
基于近邻方法的高维数据可视化聚类发现   总被引:2,自引:0,他引:2  
提出了一种新颖的基于近邻方法的高维数据可经聚类方法,并实现了一个近邻可视化聚类发现系统VisNN。已有的解决高维数据可视化聚类方法主要是通过降维把维数据投影到二维或三维空间上,从而达到可视化目的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号