首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Metadata (i.e., data describing about data) of digital objects plays an important role in digital libraries and archives, and thus its quality needs to be maintained well. However, as digital objects evolve over time, their associated metadata evolves as well, causing a consistency issue. Since various functionalities of applications containing digital objects (e.g., digital library, public image repository) are based on metadata, evolving metadata directly affects the quality of such applications. To make matters worse, modern data applications are often large-scale (having millions of digital objects) and are constructed by software agents or crawlers (thus often having automatically populated and erroneous metadata). In such an environment, it is challenging to quickly and accurately identify evolving metadata and fix them (if needed) while applications keep running. Despite the importance and implications of the problem, the conventional solutions have been very limited. Most of existing metadata-related approaches either focus on the model and semantics of metadata, or simply keep authority file of some sort for evolving metadata, and never fully exploit its potential usage from the system point of view. On the other hand, the question that we raise in this paper is “when millions of digital objects and their metadata are given, (1) how to quickly identify evolving metadata in various context? and (2) once the evolving metadata are identified, how to incorporate them into the system?” The significance of this paper is that we investigate scalable algorithmic solution toward the identification of evolving metadata and emphasize the role of “systems” for maintenance, and argue that “systems” must keep track of metadata changes pro-actively, and leverage on the learned knowledge in their various services.  相似文献   

2.
主要研究了数字图书馆应用中数字对象的有效存储和索引机制,提出并设计了针对数字图书馆的数据仓储系统PuntTable。PuntTable使用非关系模式存储和管理对象,并且通过在数据对象内部建立索引来支持查询。PuntTable包括两个主要模块:一个是PuntStore,一种有多存储引擎的数据存储系统;另一个是PuntIndex,一种支持多种索引方式的索引系统。PuntTable实现了高吞吐量和低延迟对象存储,数据对象的索引和内容都可以选择最为合适的存储层来进行存储。使用实际数字图书馆中的数据对PuntTable进行了测试和评估。在测试所用的数据集中,每个数据都采用不同长度,使测试更加接近实际应用。实验结果显示,对于不同的数据集使用不同的存储模型可以显著增大数据库系统的吞吐量,并且有效减少延迟。  相似文献   

3.
现如今数字图书馆所发布的大部分数据只包含图书资源的相关信息,并没有用户属性与图书资源共同发布的数据,使得分析者不能从现有发布数据中分析出更多的信息,对有些科学研究造成困扰。建立一种用户属性与图书信息共同发布的匿名方式,首先将所有图书使用图书分类号进行重新编码,其次根据重新编码的稀疏情况将整个数据进行划分,最后在每个划分中使用置换方法进行匿名。实验结果表明,最终匿名表的数据具有较高的准确性和实用性,并能够通过散点图的方式直观地看到属性间的关系,为科学研究提供更多有用信息。  相似文献   

4.
构件具有的封装性给构件测试带来了困难,而目前构件包含的元数据尚不完整。为了充分利用构件元数据进行构件自动化测试,从构件使用者和测试者的角度设计了内涵丰富的构件元数据,并且针对COM构件,通过访问类型库来自动获取构件结构信息元数据,按层次提取并用XML形式化描述类型信息,得到了COM构件辅助测试规格说明。实例表明,该方法能直接操纵COM构件,自动获取元数据,便于测试脚本自动生成。  相似文献   

5.
为了在海量数据中把有用的数据提取给用户进行分析,通过对数据可视化和聚类分析的深入研究,将可视化技术与数据挖掘技术两者结合起来,在Java平台下开发一个可视化的数据挖掘系统,把数据挖掘的结果以3D散点图、平行坐标图的方式显示给用户,使用户能够直观地看到数据集的全貌及分析各对象同一属性值的分布和各属性之间的关系,有效地表达数据挖掘结果。  相似文献   

6.
网络图可视化可以有效展示网络节点之间的连接关系,广泛应用于诸多领域,如社交网络、知识图谱、生物基因网络等.随着网络数据规模的不断增加,如何简化表达大规模网络图结构已成为图可视化领域中的研究热点.经典的网络图简化可视化方法主要包括图采样、边绑定和图聚类等技术,在减少大量点线交叉造成的视觉紊乱的基础上,提高用户对大规模网络结构的探索和认知效率.然而,上述方法主要侧重于网络图中的拓扑结构,却较少考虑和利用多元图节点的多维属性特征,难以有效提取和表达语义信息,从而无法帮助用户理解大规模多元网络的拓扑结构与多维属性之间的内在关联,为大规模多元图的认知和理解带来困难.因此,本文提出一种语义增强的大规模多元图简化可视分析方法,首先在基于模块度的图聚类算法基础上提取出网络图的层次结构;其次通过多维属性信息熵的计算和比较分析,对网络层次结构进行自适应划分,筛选出具有最优属性聚集特征的社团;进而设计交互便捷的多个关联视图来展示社团之间的拓扑结构、层次关系和属性分布,从不同角度帮助用户分析多维属性在社团形成和网络演化中的作用.大量实验结果表明,本文方法能够有效简化大规模多元图的视觉表达,可以快速分析不同应用领域大规模多元图的关联结构与语义构成,具有较强的实用性.  相似文献   

7.
在等离子体动力学、电磁学理论等物理问题的数值模拟中,各类数值模拟程序产生了大量复杂结构的科学数据.一方面,计算程序需要以高效率的I/O方式存储数据,另一方面,数据需要在各类程序间很容易地交换与共享.随着数据的规模与复杂度不断增加,传统数据管理方式的局限性日益突出.为此,设计了面向计算物理领域的数据存储模型--数值模拟网格数据模型(JAD),引入元数据管理机制,对数值模拟程序数据对象进行抽象与封装,在HDF5软件库基础上实现了高层I/O函数库(JADLib),集成先进的数据存储技术,提供直观、易用的应用程序编程接口(API),使得数值模拟数据以统一格式高效率地存储.目前,JADLib已推广应用于高功率微波、惯性约束聚变等领域多个数值模拟程序中,与元数据管理系统(JADIS)、并行可视化系统(JaVis)建立了耦合,使得用户可以直接利用这些系统进行数据的浏览、分析及可视化,促进了应用程序间的数据共享.  相似文献   

8.
In this paper, we present a prototype system, an integrated data management system, which is capable of querying, retrieving, and visualizing datasets with heterogeneous formats and large sizes without requiring users to have any knowledge of any other specific software. Our system has three distinguished characteristics: (1) modular structure and simple architecture which make it easy and feasible for users to add new functions and features to the system, (2) a new search concept and method based on the bounding box and on dynamically delineated watershed boundary from GIS (Geographic Information System), and (3) no requirement on having any knowledge about or installation of any other complicated software. The architecture of our integrated data management system is based on a metadata approach, which consists of four components including a metadata mechanism and a Java-based application engine. The metadata mechanism in conjunction with the Java-based application engine allows users to access and retrieve diverse data formats and structures from many heterogeneous hydrological data sources. The visualization component of the system makes it possible for users to view their queried data first before spending time retrieving them. The extensible and integrative characteristics of our system are illustrated by an example in which new and unique functions for data merging and GIS-based data querying are added to the system. Although the data sources and applications shown in this prototype system are related to the field of hydrology, the ideas, approaches, and system architecture are not domain-specific, and can be used/applied to other fields as well.  相似文献   

9.
This paper presents an object oriented hypermedia database framework for designing and building digital libraries, which are treated as enhanced hypermedia applications. It is based on combining and extending results from two domains: the navigation characteristics of hypertext systems, and the view mechanism and the persistent storage management facility of an object oriented database management system. As a result, users can alternate between two integrated types of interaction modes. The hypertext dimension of the framework allows navigation via static and dynamic hyperlinks; the object oriented database support enables querying by content and metadata management. The framework includes still a digital library design methodology that guides the implementation of digital libraries over the OODBMS. This proposal integrates hypermedia and DL concepts to a database environment, being instantiated on the realm of geographic data.  相似文献   

10.
For digital libraries to thrive, the providers of information processing services must be able to evolve their systems autonomously. However, as the complexity of their offerings increases, software tools more sophisticated than existing Web facilities are needed. Distributed object technology may be the answer. The availability of high-volume, increasingly sophisticated information is making the need for metadata facilities more urgent. Traditional, library-based approaches break down when used in an advanced digital library. More modular mechanisms are needed, and the CORBA system is one approach. Digital libraries are affected at a deep technical level by the widely differing user traditions of Web users and library patrons. The challenge and opportunity of digital libraries will be the synthesis of these traditions. The authors set out to create a technical infrastructure to support the construction of digital libraries. In their view, a digital library comprises widely distributed resources that can be maintained autonomously by different organizations and will not require adherence to uniform interfaces  相似文献   

11.
Much of the visualization research has focused on improving the rendering quality and speed, and enhancing the perceptibility of features in the data. Recently, significant emphasis has been placed on focus+context (F+C) techniques (e.g., fisheye views and magnification lens) for data exploration in addition to viewing transformation and hierarchical navigation. However, most of the existing data exploration techniques rely on the manipulation of viewing attributes of the rendering system or optical attributes of the data objects, with users being passive viewers. In this paper, we propose a more active approach to data exploration, which attempts to mimic how we would explore data if we were able to hold it and interact with it in our hands. This involves allowing the users to physically or actively manipulate the geometry of a data object. While this approach has been traditionally used in applications, such as surgical simulation, where the original geometry of the data objects is well understood by the users, there are several challenges when this approach is generalized for applications, such as flow and information visualization, where there is no common perception as to the normal or natural geometry of a data object. We introduce a taxonomy and a set of transformations especially for illustrative deformation of general data exploration. We present combined geometric or optical illustration operators for focus+context visualization, and examine the best means for preventing the deformed context from being misperceived. We demonstrated the feasibility of this generalization with examples of flow, information and video visualization.  相似文献   

12.
在CAS系统中,提出了将多媒体对象的存储元数据和内容元数据进行整合分析,然后根据属性值的不同将对象归类存储。并且为方便用户使用,使用了Inotify对文件系统进行实时监控,自动提取对象的各项元数据信息。对象的元数据信息使用标准的XML文件和MYSQL数据库分别保存,并且各项属性能在CAS系统中很好地体现出来。整合分析自动提取的元数据信息可以极大地帮助用户提高搜索和管理多媒体数据的效率。  相似文献   

13.
一种多维数据的聚类算法及其可视化研究   总被引:8,自引:0,他引:8  
任永功  于戈 《计算机学报》2005,28(11):1861-1865
提出了一种基于主次属性划分的聚类方法和一种新的数据可视化方法.首先,利用数据的主属性和次属性的特征值对数据集进行聚类;然后,采用彩色刺激光谱投影到RGB颜色空间的原理,通过色度学中麦克斯韦的三角平面坐标色度图对各聚类结果进行可视化显示.实验证明了文中方法算法简单、容易实现,可视化结果有利于用户全面地理解数据,为数据的预测、决策起到重要作用.  相似文献   

14.
Visualization of diversity in large multivariate data sets   总被引:1,自引:0,他引:1  
Understanding the diversity of a set of multivariate objects is an important problem in many domains, including ecology, college admissions, investing, machine learning, and others. However, to date, very little work has been done to help users achieve this kind of understanding. Visual representation is especially appealing for this task because it offers the potential to allow users to efficiently observe the objects of interest in a direct and holistic way. Thus, in this paper, we attempt to formalize the problem of visualizing the diversity of a large (more than 1000 objects), multivariate (more than 5 attributes) data set as one worth deeper investigation by the information visualization community. In doing so, we contribute a precise definition of diversity, a set of requirements for diversity visualizations based on this definition, and a formal user study design intended to evaluate the capacity of a visual representation for communicating diversity information. Our primary contribution, however, is a visual representation, called the Diversity Map, for visualizing diversity. An evaluation of the Diversity Map using our study design shows that users can judge elements of diversity consistently and as or more accurately than when using the only other representation specifically designed to visualize diversity.  相似文献   

15.
高校数字图书馆元数据检索系统的设计与实现   总被引:10,自引:0,他引:10  
结合承担某高校数字图书馆建设工程项目背景,详细分析了元数据的重要性和都柏林核心数据的特点,提出了高校数字图书馆信息检索系统总的设计思想和统一资源检索模型,最后设计出了数字资源的元数据结构和基于元数据的检索系统。  相似文献   

16.
1 简介在数字图书馆领域的研究当中,描述性的元数据用来记录实际的馆藏对象的关键特征值。由于很多信息发现的方法并不直接检索实际的馆藏对象,而是检索对象的描述性元数据,因此对于数字图书馆中元数据存储模式的研究,无论从实际应用还是从理论研究方面来看,都具有相当大的实际价值和理论意义。数字图书馆中的元数据通常是以XML文档作为其表现形式,但同时又具有一些鲜明的特点,而元数据的存储与查询子系统的性能直接关系到整个数字图书馆系统的整体性能。迄今为止,国内外的一些相关研究成果已经提供了一些可行  相似文献   

17.
空间信息数字图书馆元数据系统研究   总被引:3,自引:0,他引:3  
本基于分布式GIS系统的技术体系,结合无数据技术,空间数据仓库技术,提出了可操作的空间数字图书馆元数据模型及数据管理模型。  相似文献   

18.
This paper presents a new interactive scatter plot visualization for multi-dimensional data analysis. We apply Rough Set Theory (RST) to reduce the visual complexity through dimensionality reduction. We use an innovative point-to-region mouse click concept to enable direct interactions with scatter points that are theoretically impossible. To show the decision trend we use a virtual Z dimension to display a set of linear flows showing approximation of the decision trend. We conducted case studies to demonstrate the effectiveness and usefulness of our new technique for analyzing the property of three popular data sets including wine quality, wages and cars. The paper also includes a pilot usability study to evaluate parallel coordinate visualization with scatter plot matrices visualization with RST results.  相似文献   

19.
在没有干扰和噪音影响的理想条件下,多波段遥感数据中不同地物在散点图上的位置是一
个点。但在实际情况中,地形和噪音大大地改变了目标分布形状。利用DEM数字高程数据以及太
阳高度角和方位角等参数计算出地表接收到的辐射能量,然后设置已知目标,利用散点图来分析不
同反射率下目标所占的空间位置。再通过增加不同强度的噪音,分析目标在散点图上的变化规律,
以便寻找目标分离的有效特征。  相似文献   

20.
Although the Metadata Editor is an important part of any digital library, it becomes fundamental in the presence of audiovisual content. This is because the metadata produced by automated support tools (such as speech recognizers and shot detection procedures) is error-prone and often needs correction. In addition, scenes are manually annotated. This paper describes Regia, a prototype application for manually editing metadata for audiovisual documents developed in the ECHO project. Regia allows the user to manually edit textual metadata and to hierarchically organize the segmentation of the audiovisual content. An important feature of this metadata editor is that it is not hard-wired with a particular metadata attributes set. To achieve this feature the XML schema of the metadata model is used by the editor as a configuration file.
Claudio GennaroEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号