首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Efficient file searching is an essential feature in P2P systems. While many current approaches use brute force techniques to search files by meta information (file names, extensions or user-provided tags), the interest is in implementing techniques that allow content-based search in P2P systems. Recently, clustering techniques have been used for searching text documents to increase the efficiency of document discovery and retrieval. Integrating such techniques into P2P systems is important to enhance searching in P2P file sharing systems. While some effort has been taken for content-based searching for text documents in P2P systems, there has been few research work for applying these techniques to multimedia content in P2P systems. In this paper, we introduce two P2P content-based clustering techniques for multimedia documents. These techniques are an adaptation of the existing Class-based Semantic Search algorithm for text documents. The proposed algorithms have been integrated into a JXTA-based Overlay P2P platform, and evaluation results are provided. The JXTA-Overlay together with the considered clustering techniques is thus very useful for developing P2P multimedia applications requiring efficient searching of multimedia contents in peer nodes.  相似文献   

2.
3.
苟孟洛 《计算机安全》2014,(5):12-13,18
随着互联网的高速发展和办公自动化的日益普及,PDF(portable document format)文件已经成为全球电子文档分发的开放式标准,由于PDF文档的高实用性和普遍适应性,使其成为有针对性钓鱼攻击的有效载体。恶意代码对计算机的严重破坏性,检测和防止含有恶意代码的PDF文档已日益成为计算机安全领域的重要目标。通过从文档中提取特征数据,提出了一个基于机器学习算法的恶意PDF检测框架,最后并通过实验验证了其检测模型的有效性。  相似文献   

4.
5.
随着计算机技术和网络系统的不断发展,信息化建设在医院档案管理(本文主要就文书档案)中的地位日益提高;医院档案的信息化管理可以提高工作效率、实现信息共享、节约成本,使医院管理工作充分利用档案信息资源,为制定医院工作质量评价、医疗、护理、教学、科研、论文撰写等提供了坚实的平台。在医院文书档案的信息化建设中也存在着电子文件保密性受限制,系统运行的稳定性和安全性不可靠,医院文档的多样性、复杂性、专业性,档案综合管理人员的素质难以达到专业化、信息化要求等实际问题。解决这些问题需要从规范统一标准入手,采取主动管理方式,提高档案管理人员素质,注重网络维护和安全设置及档案部门之间的广泛联系。  相似文献   

6.
Collaborative editing enables a group of people to edit documents collaboratively over a computer network. Customisation of the collaborative environment to different subcommunities of users at different points in time is an important issue. The model of the document is an important factor in achieving customisation. We have chosen a tree representation encompassing a large class of documents, such as text, XML and graphical documents and here we propose a multi-level editing approach for maintaining consistency over hierarchical-based documents. The multi-level editing approach involves logging edit operations that refer to each node. Keeping operations associated with the tree nodes to which they refer offers support for tracking user activity performed on various units of the document. This facilitates the computation of awareness information and the handling of conflicting changes referring to units of the document. Moreover, increased efficiency is obtained compared to existing approaches that use a linear structure for representing documents. The multi-level editing approach involves the recursive application of any linear merging algorithm over the document structure and we show how the approach was applied for real-time and asynchronous modes of collaboration.  相似文献   

7.
随着用户存储和使用的文件数量和种类的急剧增长,现存的文件存储系统渐渐不能满足有效管理这些信息的需求.传统文件系统遵守严格的层次结构;以树状结构来组织文件;用户只能以单一化的存储路径来访问文件.为了解决这些不足,设计和开发了VFSS,它充分利用被存储文件的元数据信息,将文件存储系统和数据库技术相结合,以网状方式组织文件.VFSS提供丰富的用户接口,同时支持传统文件系统操作.  相似文献   

8.
简述了当前大型数据中心普遍采用的计算节点集群与存储系统模块化设计的系统结构,说明了部署在各模块上的主要集群系统。分析了具有独立性的结构化数据本地化存储于计算节点的可能性,给出了系统基本框架,从总体拥有成本(TCO)的角度分析了其价值。结合高能物理研究的原始数据特点,认为数据本地化存储在节点上,有利于提高整体利用率,指出了关键部件——文件元数据管理系统的设计要点,分析了PBS作业批处理系统集成文件元数据管理系统的三种方案,给出第一种方案的详细设计,相应的用户提交作业方式的改变。在测试环境下,初步部署了文件元数据管理系统,测试了三种集成方案,给出了简要的分析比较。  相似文献   

9.
Semantic web and grid technologies offer a promising approach to facilitate semantic information retrieval based on heterogeneous document repositories. In this paper the authors describe the design and implementation of an Ontology Server (OS) component to be used in a distributed contents management grid system. Such a system could be used to build collection document repositories, mutually interoperable at the semantic level. From the contents point of view, the distributed system is built as a collection of multimedia documents repository nodes glued together by an OS. A set of methodologies and tools to organize the knowledge space around the notion of contents community is developed, where each content provider will publish a set of ontologies to collect metadata information organized and published through a knowledge community, built on top of the OS. These methodologies were deployed while setting up a prototype to connect about 20 museums in the city of Naples (Italy).  相似文献   

10.
Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (∼11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval.  相似文献   

11.
Structured documents have gained popularity with the advent of documentstructure markupstandards such as SGML, ODA, HyTime, and HTML.Document management systems can provide powerful facilities by maintaining thestructure information of documents.Since the hypermediadocument is also a kind of structured document, wecan apply the results of many studies, whichhave been performed in storing, retrieving, and managing structured documents,to the hypermedia document management.However, more factors should be considered in handling hypermedia documentsbecause they contain multimedia data and also have multiple complex structuressuch as hyperlink networks and spatial/temporal layout structures as well aslogical structures.In this paper, we propose an object-oriented model for multi-structuredhypermediadocuments and multimedia data, and a query language for retrievinghypermedia document elements based on the content and multiple complexstructures.By using unique element identifiers and an indexing scheme whichexploits multiple structures,we can process queries efficiently with minimal storage overheadfor maintaining structure information.  相似文献   

12.
13.
This paper presents the architecture of the iTrust system together with algorithms for maintaining censorship resistance. In iTrust, metadata describing documents, and requests containing keywords, are distributed to randomly chosen nodes in the iTrust network. If a node receives a request containing keywords that match metadata it holds, it sends the URL of the matching document to the requesting node, which then retrieves the document from the source node. A novel detection algorithm estimates the proportion of operational nodes in the iTrust network, by comparing the empirical probabilities of the number of responses received for a node’s request with the analytical probabilities for a match, for various proportions of operational nodes. A novel defensive adaptation algorithm increases the number of nodes to which the requests are distributed, in order to maintain the same high probability of a match when some of the nodes are non-operational or malicious as when all of the nodes are operational. Extensive experimental evaluations demonstrate the effectiveness of the architecture and the algorithms for maintaining censorship resistance in the iTrust network.  相似文献   

14.
Users of mobile devices can nowadays easily create large quantities of mobile multimedia documents tracing significant events attended, places visited or, simply, moments of their everyday life. However, they face the challenge of organizing these documents in order to facilitate searching through them at a later time and sharing them with other users. We propose using context awareness and semantic technologies in order to improve and facilitate the organization, annotation, retrieval and sharing of personal mobile multimedia documents. Our approach combines metadata extracted and enriched automatically from the users’ context with annotations provided manually by the users and with annotations inferred by applying user-defined rules to context features. These new contextual metadata are integrated into the processes of annotation, sharing and keyword-based retrieval.  相似文献   

15.
在文件存储系统中,文件系统整体性能的提升对于保证文件的安全性和可靠性具有重要意义,而在此过程中,元数据访问性能与文件系统性能有密切关系,要想进一步满足大规模文件存储系统需要,就必须建立相应的文件元数据预取模型。本文通过对基于数据挖掘的文件元数据预取进行分析,以期满足文件数据的大量存取访问需求。  相似文献   

16.
提出了一种分散式体系结构的高可靠文件存储系统(DHAFS),各个存储节点相互协作,将本地的存储资源虚拟化为一个全局的存储空间,实现统一的文件名字空间,向客户端提供文件接口,存储、缓存、数据/元数据的管理功能分布在各个存储节点中.相对于现有的集群存储系统而言,DHAFS一方面弥补了单一元数据节点的单点失效,另一方面消除了单一元数据节点的性能瓶颈,提高了系统的动态可扩展性.测试实验结果证明DHAFS能够高效、稳定地提供文件存储服务.  相似文献   

17.
在CAS系统中,提出了将多媒体对象的存储元数据和内容元数据进行整合分析,然后根据属性值的不同将对象归类存储。并且为方便用户使用,使用了Inotify对文件系统进行实时监控,自动提取对象的各项元数据信息。对象的元数据信息使用标准的XML文件和MYSQL数据库分别保存,并且各项属性能在CAS系统中很好地体现出来。整合分析自动提取的元数据信息可以极大地帮助用户提高搜索和管理多媒体数据的效率。  相似文献   

18.
The rapid growth of multimedia documents has raised huge demand for sophisticated multimedia knowledge discovery systems. The knowledge extraction of the documents mainly relies on the data representation model and the document representation model. As the multimedia document comprised of multimodal multimedia objects, the data representation depends on modality of the objects. The multimodal objects require distinct processing and feature extraction methods resulting in different features with different dimensionalities. Managing multiple types of features is challenging for knowledge extraction tasks. The unified representation of multimedia document benefits the knowledge extraction process, as they are represented by same type of features. The appropriate document representation will benefit the overall decision making process by reducing the search time and memory requirements. In this paper, we propose a domain converting method known as Multimedia to Signal converter (MSC) to represent the multimodal multimedia document in an unified representation by converting multimodal objects as signal objects. A tree based approach known as Multimedia Feature Pattern (MFP) tree is proposed for the compact representation of multimedia documents in terms of features of multimedia objects. The effectiveness of the proposed framework is evaluated by performing the experiments on four multimodal datasets. Experimental results show that the unified representation of multimedia documents helped in improving the classification accuracy for the documents. The MFP tree based representation of multimedia documents not only reduces the search time and memory requirements, also outperforms the competitive approaches for search and retrieval of multimedia documents.  相似文献   

19.
《Knowledge》2005,18(2-3):117-124
In this paper we propose an approach for refining a document ranking by learning filtering rulesets through relevance feedback. This approach includes two important procedures. One is a filtering method, which can be incorporated into any kinds of information retrieval systems. The other is a learning algorithm to make a set of filtering rules, each of which specifies a condition to identify relevant documents using combinations of characteristic words. Our approach is useful not only to overcome the limitation of the vector space model, but also to utilize tags of semi-structured documents like Web pages. Through experiments we show our approach improves the performance of relevance feedback in two types of IR systems adopting the vector space model and a Web search engine, respectively.  相似文献   

20.
This paper proposes a non-domain-specific metadata ontology as a core component in a semantic model-based document management system (DMS), a potential contender towards the enterprise information systems of the next generation. What we developed is the core semantic component of an ontology-driven DMS, providing a robust semantic base for describing documents’ metadata. We also enabled semantic services such as automated semantic translation of metadata from one domain to another. The core semantic base consists of three semantic layers, each one serving a different view of documents’ metadata. The core semantic component’s base layer represents a non-domain-specific metadata ontology founded on ebRIM specification. The main purpose of this ontology is to serve as a meta-metadata ontology for other domain-specific metadata ontologies. The base semantic layer provides a generic metadata view. For the sake of enabling domain-specific views of documents’ metadata, we implemented two domain-specific metadata ontologies, semantically layered on top of ebRIM, serving domain-specific views of the metadata. In order to enable semantic translation of metadata from one domain to another, we established model-to-model mappings between these semantic layers by introducing SWRL rules. Having the semantic translation of metadata automated not only allows for effortless switching between different metadata views, but also opens the door for automating the process of documents long-term archiving. For the case study, we chose judicial domain as a promising ground for improving the efficiency of the judiciary by introducing the semantics in this field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号