首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The users of a content repository express the semantics they have in mind while defining the content items and their properties, and forming them into a particular hierarchy. However, this valuable semantics is not formally expressed, and hence cannot be used to discover meaningful relationships among the content items in an automated way. Although the need is apparent, there are several challenges in explicating this semantics in a fully automated way: first, it is difficult to distinguish between data and the metadata in the repository and secondly, not all the metadata defined, such as the file size or encoding type, contribute to the meaning. More importantly, for the developed solution to have practical value, it must address the constraints of the content management system (CMS) industry: CMS industry cannot change their repositories in production use and they need a generic solution not limited to a specific repository architecture. In this article, we address all these challenges through a set of tools developed which first semi-automatically explicate the content repository semantics to a knowledge-base and establish semantic bridges between this backend knowledge-base and the content repository. The repository content is dynamic; to be able to maintain the content repository semantics while new content is created, the changes in the repository semantics are reflected onto the knowledge-base through the semantic bridges. The tool set is complemented with a search engine that make use of the explicated semantics.  相似文献   

2.
Digital repositories must periodically check the integrity of stored objects to assure users of their correctness. Prior solutions calculate integrity metadata and require the repository to store it alongside the actual data objects. To safeguard and detect damage to this metadata, prior solutions rely on widely visible media (unaffiliated third parties) to store and provide back digests of the metadata to verify it is intact. However, they do not address recovery of the integrity metadata in case of damage or adversarial attack. We introduce IntegrityCatalog, a novel software system that can be integrated into any digital repository. It collects all integrity‐related metadata in a single component and treats them as first class objects, managing both their integrity and their preservation. We introduce a treap‐based persistent authenticated dictionary managing arbitrary length key/value pairs, which we use to store all integrity metadata, accessible simply by object name. Additionally, IntegrityCatalog is a distributed system that includes a network protocol that manages both corruption detection and preservation of this metadata, using administrator‐selected network peers with 2 possible roles. Verifiers store and offer attestations on digests and have minimal storage requirements, while preservers efficiently synchronize a complete copy of the catalog to assist in recovery in case of a detected catalog compromise on the local system. We present our approach in developing the prototype implementation, measure its performance experimentally, and demonstrate its effectiveness in real‐world situations. We believe the implementation techniques of our open‐source IntegrityCatalog will be useful in the construction of next‐generation digital repositories.  相似文献   

3.
The Digital Repository of Ireland (DRI) is Ireland’s national trusted digital repository for the social and cultural, historical and contemporary data held by Irish institutions. DRI provides users with a bilingual (Irish and English) user interface at all user access levels, and provides innovative ways to process and display bilingual metadata. This article details our experience in enriching the bilingual metadata and developing the bilingual features of the repository. We present solutions to some of the linguistic and technical challenges we faced and provide recommendations to developers and archivists on how best to prepare bilingual content for contemporary archival repositories.  相似文献   

4.
The success of the Web services technology has brought topics as software reuse and discovery once again on the agenda of software engineers. While there are several efforts towards automating Web service discovery and composition, many developers still search for services via online Web service repositories and then combine them manually. However, from our analysis of these online repositories, it yields that, unlike traditional software libraries, they rely on little metadata to support service discovery. We believe that the major cause is the difficulty of automatically deriving metadata that would describe rapidly changing Web service collections. In this paper, we discuss the major shortcomings of state of the art Web service repositories and as a solution, we report on ongoing work and ideas on how to use techniques developed in the context of the Semantic Web (ontology learning, matching, metadata based presentation) to improve the current situation.  相似文献   

5.
Open archives initiative (OAI) allows both libraries and museums create and share their own low-cost digital libraries (DL). OAI DL are based on OAI-PMH protocol which, although is consolidated as a pattern for disseminating metadata, does not rely on either digital preservation and availability of content, essential requirements in this type of system. Building new mechanisms that guarantee improvements, at no or low cost increases, becomes a great challenge. This article proposes a distributed archiving system based on a P2P network, that allows OAI-based libraries to replicate digital objects to ensure their reliability and availability. The proposed system keeps and extends the current OAI-PMH protocol characteristics and is designed as a set of OAI repositories, where each repository has an independent fail probability assigned to it. Items are inserted with a reliability that is satisfied by replicating them in subsets of repositories. Communication between the nodes (repositories) of the network is organized in a distributed hash table and multiple hash functions are used to select repositories that keep the replicas of each stored item. The OAI characteristics combined with a structured P2P digital preservation system allow the construction of a reliable and totally distributed digital library. The archiving system has been evaluated through experiments in a real environment and the OAI-PMH extension validated by the implementation of a proof-of-principle prototype.  相似文献   

6.
Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (∼11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval.  相似文献   

7.
Metadata (i.e., data describing about data) of digital objects plays an important role in digital libraries and archives, and thus its quality needs to be maintained well. However, as digital objects evolve over time, their associated metadata evolves as well, causing a consistency issue. Since various functionalities of applications containing digital objects (e.g., digital library, public image repository) are based on metadata, evolving metadata directly affects the quality of such applications. To make matters worse, modern data applications are often large-scale (having millions of digital objects) and are constructed by software agents or crawlers (thus often having automatically populated and erroneous metadata). In such an environment, it is challenging to quickly and accurately identify evolving metadata and fix them (if needed) while applications keep running. Despite the importance and implications of the problem, the conventional solutions have been very limited. Most of existing metadata-related approaches either focus on the model and semantics of metadata, or simply keep authority file of some sort for evolving metadata, and never fully exploit its potential usage from the system point of view. On the other hand, the question that we raise in this paper is “when millions of digital objects and their metadata are given, (1) how to quickly identify evolving metadata in various context? and (2) once the evolving metadata are identified, how to incorporate them into the system?” The significance of this paper is that we investigate scalable algorithmic solution toward the identification of evolving metadata and emphasize the role of “systems” for maintenance, and argue that “systems” must keep track of metadata changes pro-actively, and leverage on the learned knowledge in their various services.  相似文献   

8.
Semantic web and grid technologies offer a promising approach to facilitate semantic information retrieval based on heterogeneous document repositories. In this paper the authors describe the design and implementation of an Ontology Server (OS) component to be used in a distributed contents management grid system. Such a system could be used to build collection document repositories, mutually interoperable at the semantic level. From the contents point of view, the distributed system is built as a collection of multimedia documents repository nodes glued together by an OS. A set of methodologies and tools to organize the knowledge space around the notion of contents community is developed, where each content provider will publish a set of ontologies to collect metadata information organized and published through a knowledge community, built on top of the OS. These methodologies were deployed while setting up a prototype to connect about 20 museums in the city of Naples (Italy).  相似文献   

9.
Hierarchical representations are common in digital repositories, yet are not always fully leveraged in their onlinesearch interfaces. This work describes ResultMaps, which use hierarchical treemap representations with query string-driven digital library search engines. We describe two lab experiments, which find that ResultsMap users yield significantly better results over a control condition on some subjective measures, and we find evidence that ResultMaps have ancillary benefits via increased understanding of some aspects of repository content. The ResultMap system and experiments contribute an understanding of the benefits—direct and indirect—of the ResultMap approach to repository search visualization.  相似文献   

10.
11.
Learning object repositories (LOR) are digital collections of educational resources and/or metadata aimed at facilitating reuse of materials worldwide. In open repositories, resources are made available at no cost, representing a case of information sharing with an implicit and diffuse social context. In such settings, quality control is in many cases based in some form of community filtering that provides a reliable basis for ranking resources when repositories reach a critical mass of users. However, there have been numerous repository initiatives and projects and many of them did not reached a significant degree of actual usage and growth that made them sustainable in the long term. In consequence, finding models for sustainable collections is a key issue in repository research, and the main problem behind that is understanding the evolution of successful repositories. This in turn requires analyzing experimental models of the behavior of their users that are coherent with the available evidence on their structure and growth patters. This paper provides a partial model for such behavior based on existing reported evidence and on the examination of patterns in a large and mature repository. Agent-based simulation was chosen to allow for contrasting configurations with different parameters. Simulations were devised with the RePast framework and the resulting model implementation constitutes an initial baseline for future studies aimed at contrasting empirical data on repository usage with their community setting. The model described accounts for known user contribution patterns and it is coherent with the implicit social network structure found in an existing large LOR.  相似文献   

12.
Automatic evaluation of metadata quality in digital repositories   总被引:1,自引:0,他引:1  
Owing to the recent developments in automatic metadata generation and interoperability between digital repositories, the production of metadata is now vastly surpassing manual quality control capabilities. Abandoning quality control altogether is problematic, because low-quality metadata compromise the effectiveness of services that repositories provide to their users. To address this problem, we present a set of scalable quality metrics for metadata based on the Bruce & Hillman framework for metadata quality control. We perform three experiments to evaluate our metrics: (1) the degree of correlation between the metrics and manual quality reviews, (2) the discriminatory power between metadata sets and (3) the usefulness of the metrics as low-quality filters. Through statistical analysis, we found that several metrics, especially Text Information Content, correlate well with human evaluation and that the average of all the metrics are roughly as effective as people to flag low-quality instances. The implications of this finding are discussed. Finally, we propose possible applications of the metrics to improve tools for the administration of digital repositories.  相似文献   

13.
Having indexed much of the "surface" Web, search engines are now using various approaches to index the "deep" Web. At the same time, institutional repositories and digital libraries are adopting the open archives initiative protocol for metadata harvesting (OAI-PMH) to expose their holdings. The authors harvested nearly 10 million records from OAI-PMH repositories. From these records, they extracted 3.3 million unique resource URLs and then conducted searches on samples from this collection to determine how much of the OAI-PMH corpus the three major search engines have indexed.  相似文献   

14.
XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many Web-based knowledge management repositories store data and documents in XML. If semantics about the data are formally represented in an ontology, then it is possible to extract knowledge: This is done as ontology definitions and axioms are applied to XML data to automatically infer knowledge that is not explicitly represented in the repository. Ontologies also play a central role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract knowledge from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value such as knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.
Henry M. KimEmail:
  相似文献   

15.
An important aspect of the quality assurance of large component repositories is to ensure the logical coherence of component metadata, and to this end one needs to identify incoherences as early as possible. Some relevant classes of problems can be formulated in term of properties of the future repositories into which the current repository may evolve. However, checking such properties on all possible future repositories requires a way to construct a finite representation of the infinite set of all potential futures. A class of properties for which this can be done is presented in this work.We illustrate the practical usefulness of the approach with two quality assurance applications: (i) establishing the amount of “forced upgrades” induced by introducing new versions of existing components in a repository, and (ii) identifying outdated components that are currently not installable and need to be upgraded in order to become installable again. For both applications we provide experience reports obtained on the Debian free software distribution.  相似文献   

16.
17.
From unstructured data to actionable intelligence   总被引:1,自引:0,他引:1  
Rao  R. 《IT Professional》2003,5(6):29-35
There's content everywhere, but not the information you need. Content analysis can organize a pile of text into a richly accessible repository. This article explains two key technologies for generating metadata about content - automatic categorization and information extraction. These technologies, and the applications that metadata makes possible, can transform an organization's reservoir of unstructured content into a well-organized repository of knowledge. With metadata available, a company's search system can move beyond simple dialogs to richer means of access that work in more situations. Information visualization, for example, uses metadata and our innate visual abilities to improve access. Besides better access, metadata enables intelligent switching in the content flows of various organizational processes - for example, making it possible to automatically route the right information to the right person. A third class of metadata applications involves mining text to extract features for analysis using the statistical approaches typically applied to structured data. For example, if you turn the text fields in a survey into data, you can then analyze the text along with other data fields. All these metadata-powered applications can improve your company's use of its information resources.  相似文献   

18.
《Knowledge》2006,19(1):1-8
In large organizations, management of large amounts of knowledge is a common problem. This knowledge is usually available in a distributed environment, in structured or non-structured form, and often is not exactly known where it is located and how to retrieve it in flexible ways. This paper describes an architecture to manage typical activities for an organization such as our University. During system analysis and specification, we had to collect a lot of information about structure and content of our organization, information available in various formats and media and not always automatically collectable. Hence, we designed a Document-based Software Architecture to support systems where formalization of information repositories, standardization of information location and management of every aspect involved in distributed contexts are crucial needs. We discuss if the configuration of a central-knowledge centered organization is the right solution or if it is better the choice of a distributed one. Since, all the documents managed in any organization are usually available in a structured way, we foresee a strong usage of XML documents and metadata standard definitions.  相似文献   

19.
数据仓库中元数据互通的研究   总被引:3,自引:2,他引:3  
介绍了数据仓库中元数据 ,着重分析了元数据的管理和当前的标准化进程 ,提出了一种建立企业级元数据库并采用元数据标准来解决元数据互通的方法。  相似文献   

20.
M.  P.   《Decision Support Systems》2003,35(4):467-486
Knowledge repositories have been implemented in many organizations, but they often suffer from non-use. This research considers two key design factors that cause non-use: the extra burden on users to document knowledge in the repository, and the lack of a standard knowledge structure that facilitates knowledge sharing among users with different perspectives. We propose a design of a knowledge management system called KnowledgeScope that addresses these problems through (1) an integrated workflow support capability that captures and retrieves knowledge as an organizational process proceeds, i.e., within the context in which it is created and used, and (2) a process meta-model that organizes that knowledge and context in a knowledge repository. In this paper, we describe this design and report the results from implementing it in a real-life organization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号