共查询到18条相似文献,搜索用时 78 毫秒
1.
2.
3.
4.
5.
6.
8.
9.
10.
如何让数据使用者参与到数据治理的环节中,从而切实提高数据质量,是当前高校数据治理面临的关键问题之一。数据起源不仅能追溯治理过程,还能实现数据的质量评价。因此,基于数据起源,从两方面探索提升高校数据质量的方法:一是设计了一种多元治理主体共同参与的数据治理构架,将治理主体对数据质量的治理过程通过数据起源予以记录、查询、展示,使得所有治理主体,特别是数据使用者都能明晰数据治理的过程,从而参与到提升数据质量的过程中,进而提高数据质量的治理效率;二是提出了一种综合性的数据质量评估方法,将用户的定性反馈评价转化为可定量评估的质量值,然后结合已有的基于规则的定量评估方法,实现数据质量的综合评价,进而辅助治理主体实施具体的治理行为,最后通过原型系统进行了具体的应用实现和效用评估,证明了方法的有效性和可行性。 相似文献
11.
In this paper, we introduce an efficient mechanism to collect, store, and retrieve data provenance information in workflows of multiphysics simulations. Using notifications, we enable the nonintrusive collection of information about workflow events during workflow execution. Combining these events with workflow structure information, constant for every execution of a workflow, we obtain the data provenance information for the specific run of the workflow. Data provenance information is structured into a graph that represents workflow events on the basis of their causal dependency. We use a graph database to store this graph and utilize the traversal framework provided, to efficiently retrieve data provenance information from the graph by traversing backwards from a data object to every workflow event that is part of its provenance. Finally, we integrate data provenance information with semantics of workflow services to provide complete and meaningful data provenance information. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
12.
数据起源是关于数据来源、转换和更新过程的研究。基于频繁模式挖掘的性质和特点,提出了FP+树来记录频繁模式来源。给出了频繁模式溯源的相关理论和证明,根据不同追溯机制提出了三种频繁模式溯源方法,并对方法的正确性和执行代价给出了理论证明和推导。在进行频繁模式挖掘时,在不增加额外负担的情况下实现了频繁模式溯源。针对条件FP+树结构特点和频繁模式性质,提出了采用α-剪枝求解条件FP+树的投影操作,加快了频繁模式挖掘和数据溯源的执行效率。实验结果显示,采用基于FP+树的频繁模式溯源方法,可以高效地实现频繁模式溯源,并且条件FP+树的α-剪枝策略的有效性得到验证。 相似文献
13.
Dayse Silveira de Almeida Carmem Satie Hara Ricardo Rodrigues Ciferri Cristina Dutra de Aguiar Ciferri 《Software》2018,48(1):197-232
Reconciliation is the process of providing a consistent view of the data imported from different sources. Despite some efforts reported in the literature for providing data reconciliation solutions with asynchronous collaboration, the challenge of reconciling data when multiple users work asynchronously over local copies of the same imported data has received less attention. In this paper, we propose AcCORD, an asynchronous collaborative data reconciliation model based on data provenance. AcCORD is innovative because it supports applications in which all users are required to agree on the data values to provide a single consistent view to all of them, as well as applications that allow users to disagree on the data values to keep in their local copies but promote collaboration by sharing integration decisions. We also introduce a decision integration propagation method that keeps users from taking inconsistent decisions over data items present in several sources. Further, different policies based on data provenance are proposed for solving conflicts among multiusers' integration decisions. Our experimental analysis shows that AcCORD is efficient and effective. It performs well, and the results highlight its flexibility by generating either a single integrated view or different local views. We have also conducted interviews with end users to analyze the proposed policies and feasibility of the multiuser reconciliation. They provide insights with respect to acceptability, consistency, correctness, time‐saving, and satisfaction. Copyright © 2017 John Wiley & Sons, Ltd. 相似文献
14.
Provenance is information about the origin and creation of data. In data science and engineering related with cloud environment, such information is useful and sometimes even critical. In data analytics, it is necessary for making data-driven decisions to trace back history and reproduce final or intermediate results, even to tune models and adjust parameters in a real-time fashion. Particularly, in cloud, users need to evaluate data and pipeline trustworthiness. In this paper, we propose a solution: LogProv, toward realizing these functionalities for big data provenance, which needs to renovate data pipelines or some of big data software infrastructure to generate structured logs for pipeline events, and then stores data and logs separately in cloud space. The data are explicitly linked to the logs, which implicitly record pipeline semantics. Semantic information can be retrieved from the logs easily since they are well defined and structured beforehand. We implemented and deployed LogProv in Nectar Cloud,* associated with Apache Pig, Hadoop ecosystem, and adopted Elasticsearch to provide query service. LogProv was evaluated and empirically case studied. The results show that LogProv is efficient since the performance overhead is no more than 10%; the query can be responded within 1 second; the trustworthiness is marked clearly; and there is no impact on the data processing logic of original pipelines. 相似文献
15.
首先介绍了与地理空间数据处理相关的科学工作流技术,并与传统商业工作流技术作对比,然后以Kepler项目为例说明科学工作流技术在空间数据处理领域的应用,最后讨论目前存在的问题并给出了结论。 相似文献
16.
针对传统静态检测及动态检测方法无法应对基于大量混淆及未知技术的PDF文档攻击的缺陷,提出了一个基于系统调用和数据溯源技术的新型检测模型NtProvenancer。首先,使用系统调用捕获工具收集文档执行时产生的系统调用记录;其次,利用数据溯源技术构建基于系统调用的数据溯源图;而后,用图的路径筛选算法提取系统调用特征片段进行检测。实验数据集由528个良性PDF文档与320个恶意PDF文档组成。在Adobe Reader上展开测试,并使用词频-逆文档频率(TF-IDF)及PROVDETECTOR稀有度算法替换所提出的图的关键点算法来进行对比实验。结果表明NtProvenancer在精确率和F1分数等多项指标上均优于对比模型。在最佳参数设置下,所提模型的文档训练与检测阶段的平均用时分别为251.51 ms以及60.55 ms,同时误报率低于5.22%,F1分数达到0.989。可见NtProvenancer是一种高效实用的PDF文档检测模型。 相似文献
17.
溯源管理是科学工作流系统的核心功能之一。科学工作流语境下的溯源,可分为工作流定义溯源和工作流执行溯源,分别描述工作流定义和执行阶段的元数据、过程依赖及数据演化。本文重点关注工作流定义溯源和执行溯源的表示及查询技术,并阐释针对科学工作流领域内独有问题,如"黑盒"问题、依赖区分问题以及细粒度溯源等问题的解决方案。文中还将介绍现存的一些面向科学工作流的溯源系统,并提出对溯源技术未来的展望。 相似文献
18.
现有起源过滤机制的通用性差,一个过滤机制仅能过滤某一特定类型的敏感元素,处理包含多种类型敏感元素的综合性起源过滤需求仍然非常困难,为此提出了一种基于原语的通用起源过滤框架。首先,介绍了起源过滤涉及的敏感元素类型以及过滤约束;其次,深入分析已有过滤机制改造起源图的基本操作和过程,形式地定义了一系列起源过滤原语,描述针对起源图的最小改造操作,将起源过滤过程划分为隐藏敏感元素、恢复有用依赖和验证过滤约束三个阶段,提出了一种基于原语组装的分阶段过滤策略空间构造方法;最后设计并实现了基于原语的通用过滤算法,并在公开数据集上验证了该算法的可行性。 相似文献