首页 | 本学科首页   官方微博 | 高级检索  
     


Metadata management for scientific databases
Abstract:Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.
Keywords:Metadata management  Scientific databases  Query optimization
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号