首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.  相似文献   

2.
N. H. Gehani 《Software》1985,15(6):555-569
Types in programming languages cannot model many properties of real world objects and quantities. Consequently, many errors resulting from the inconsistent usage of program objects representing real world objects and quantities cannot be detected automatically. For example, the real variables PRICE and WEIGHT, representing the price of diesel fuel and the weight of a person, may be inadvertently added, giving a non-sensical value; such an error cannot be detected by a compiler. The programming language Ada introduces the concept of derived types to tackle this problem. An alternative solution to this problem is the incorporation of units of measure as a new data attribute. Derived types only partially solve the problem of detecting the inconsistent usage of objects; some valid usages of objects are also not allowed. Moreover, the solution is inelegant and inconvenient to use. On the other hand, specification of units of measure solves the problem elegantly and conveniently. The two solutions are compared and analysed. Several ways to implement units of measure in Ada are examined.  相似文献   

3.
4.
The new technologies make the appearance of highly motivating and dynamic games with different levels of interaction possible, in which large amounts of data, information, procedures and values are included which are intimately bound with the social sciences.  相似文献   

5.
This paper explores the increasing trend towards the commodification of public research and development (R&D) and the impact of this on social wellbeing. In many developed countries, the changes introduced by governments to funding mechanisms for universities and public research institutions has led to a fundamental shift in the focus of public R&D. The focus has shifted from creating useful public, codifiable knowledge to creating a knowledge commodity driven by commercial imperatives. Although there may be an economic argument to be made for the virtues of such change, we argue here that the potential costs to social wellbeing have been largely, and dangerously, ignored.
Rebecca BodenEmail:
  相似文献   

6.
A strategy for the design of social science software is outlined concentrating particularly on the nature of packages, languages, data structures and characteristics of procedures.  相似文献   

7.
One likely consequence of recent developments in information technology is the greater availability to the public of the social data routinely collected by government agencies. It is argued that educators must anticipate these developments, which have implications for what is taught in schools and for methods of teaching. As an instance of the increasing accessibility of such data, work on making the General Household Survey data more widely available is described. The data are currently being used in teaching the social sciences to undergraduates, and will shortly also become available for use on microcomputers in schools.  相似文献   

8.
In this paper, we propose notions of equivalence and inclusion of fuzzy data in relational databases for measuring their semantic relationship. The fuzziness of data appears in attribute values in forms of possibility distribution as well as resemblance relations in attribute domain elements. An approach for evaluating semantic measures is presented. With the proposal, one can remove fuzzy data redundancy and define fuzzy functional dependency. © 2000 John Wiley & Sons, Inc.  相似文献   

9.
Artificial societies and generative social science   总被引:1,自引:0,他引:1  
What is anartificial society? What can such models offer the social sciences in particular? We address these general questions, drawing brief illustrations from the specific artificial society we call “Sugarscape.”  相似文献   

10.
WebGIS中的元数据研究   总被引:3,自引:0,他引:3  
论述了元数据的定义及其在WebGIS中的作用;同时,基于“辽宁省统计信息辅助决策支持系统”的开发实践,从数据存储、数据传输和数据展现三个角度探讨了元数据在WebGIS的表示与实现.  相似文献   

11.
基于CWM的企业元数据集成中元数据抽取与导出研究   总被引:1,自引:0,他引:1  
公共仓库元模型(CwM)是为了在数据仓库和业务分析环境之间方便交换元数据而制定的一个标准,并已经成为模型驱动体系结构(MDA)新策略方向中的核心组成部份。基于CWM的企业元数据集成相关技术,重点探讨集成中元数据的抽取与导出问题,给出相应的导出规则以及一个应用实例。  相似文献   

12.
There is a growing demand for more cost-efficient production processes in Statistical Institutes. One way to address this need is to equip Statistical Information Systems (SIS) with the ability to automatically produce statistical data and metadata of high quality and deliver them to the user via the Internet. Current approaches, although provide for the storage of appropriate metadata, do not use process metadata for guiding the production process. In this paper we present an approach on creating SISs that permits metadata-guided statistical processing based on an object-based, statistical metadata model. The model is not domain specific and can accommodate both microdata and macrodata. We have equipped the model with a set of transformations that can be used to automatically manipulate data and metadata. We show the applicability of transformations with some examples using actual statistical data for R&D expenditures. Finally, we demonstrate how the presented framework can be exploited for the construction of a web site that offers ad hoc query capabilities to the users of statistical data.  相似文献   

13.
14.
We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IR-MUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplication, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that 1) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency.  相似文献   

15.
16.
工程化学数据库元数据标准的制定   总被引:3,自引:0,他引:3  
元数据是关于数据的数据。元数据对于数据共享具有重要作用。本文分析了国内外元数据的发展状况,从工程化学数据库存在的基本问题出发,针对工程化学数据的特点,并参考国内外通用的元数据标准,提出了工程化学数据库元数据设计方案,并对其各部分的内容和功能进行了比较详细的介绍。  相似文献   

17.
地学元数据结构分析及其管理系统设计   总被引:5,自引:0,他引:5  
在分析了地学数据的Web共享需求及其多学科特点基础上,设计了地学数据的可扩展元数据结构,它包括地学核心元数据、模式核心元数据、模式(专用)扩展元数据等三层体系,并利用W3C推荐的RDF/XML数据模型和方法开发了地学数据共享平台的元数据管理系统(MMS)。该系统的应用验证了地学数据共享元数据构架体系的可靠性和适用性。  相似文献   

18.
19.
本文提出了数据库操作自动化元数据的概念,并概括了它的特点,基于信息系统开发实践的角度,论述了信息系统中的五种数据库操作自动化型元数据,并给出这几种元数据的表示和管理方式。  相似文献   

20.
语义异构是异构数据库信息集成中要解决的关键问题.为了使关系数据库的表和字段具有语义信息,将数据库元数据自动标注成语义元数据成为研究的热点.基于概念名和概念结构的语义相似度计算,提出了一种数据库元数据自动语义标注方法.首先从关系数据库的元数据中提取隐含的语义信息,并据此创建领域本体,然后通过计算元数据与本体实体间的语义相似度对提取的元数据进行自动语义标注,提出的相似度算法综合考虑了概念名称和结构的相似性,并采取了必要的优化措施进行改进.经实验测试证明,该方法具有较高的标注正确率,是一种行之有效的语义标注方法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号