首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper presents a knowledge-based approach to managing and retrieving personal documents. The dual document models consist of a document type hierarchy and a folder organization. The document type hierarchy is used to capture the layout, logical and conceptual structures of documents. The folder organization mimics the user's real-world document filing system for organizing and storing documents in an office environment. Predicate-based representation of documents is formalized for specifying knowledge about documents. Document filing and retrieval are predicate-driven. The filing criteria for the folders, which are specified in terms of predicates, govern the grouping of frame instances, regardless of their document types. We incorporated the notions of document type hierarchy and folder organization into the multilevel architecture of document storage. This architecture supports various text-based information retrieval techniques and content-based multimedia information retrieval techniques. The paper also proposes a knowledge-based query-preprocessing algorithm, which reduces the search space. For automating the document filing and retrieval, a predicate evaluation engine with a knowledge base is proposed. The learning agent is responsible for acquiring the knowledge needed by the evaluation engine.  相似文献   

2.
A Knowledge-Based Approach to Effective Document Retrieval   总被引:3,自引:0,他引:3  
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.  相似文献   

3.
The Indexing and Retrieval of Document Images: A Survey   总被引:2,自引:0,他引:2  
The economic feasibility of maintaining large data bases of document images has created a tremendous demand for robust ways to access and manipulate the information these images contain. In an attempt to move toward a paperless office, large quantities of printed documents are often scanned and archived as images, without adequate index information. One way to provide traditional data-base indexing and retrieval capabilities is to fully convert the document to an electronic representation which can be indexed automatically. Unfortunately, there are many factors which prohibit complete conversion including high cost, low document quality, and the fact that many nontext components cannot be adequately represented in a converted form. In such cases, it can be advantageous to maintain a copy of and use the document in image form. In this paper, we provide a survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion. We briefly discuss traditional text indexing techniques on imperfect data and the retrieval of partially converted documents. This is followed by a more comprehensive review of techniques for the direct characterization, manipulation, and retrieval, of images of documents containing text, graphics, and scene images.  相似文献   

4.
5.
Document similarity search is to find documents similar to a given query document and return a ranked list of similar documents to users, which is widely used in many text and web systems, such as digital library, search engine, etc. Traditional retrieval models, including the Okapi's BM25 model and the Smart's vector space model with length normalization, could handle this problem to some extent by taking the query document as a long query. In practice, the Cosine measure is considered as the best model for document similarity search because of its good ability to measure similarity between two documents. In this paper, the quantitative performances of the above models are compared using experiments. Because the Cosine measure is not able to reflect the structural similarity between documents, a new retrieval model based on TextTiling is proposed in the paper. The proposed model takes into account the subtopic structures of documents. It first splits the documents into text segments with TextTiling and calculates the similarities for different pairs of text segments in the documents. Lastly the overall similarity between the documents is returned by combining the similarities of different pairs of text segments with optimal matching method. Experiments are performed and results show: 1) the popular retrieval models (the Okapi's BM25 model and the Smart's vector space model with length normalization) do not perform well for document similarity search; 2) the proposed model based on TextTiling is effective and outperforms other models, including the Cosine measure; 3) the methods for the three components in the proposed model are validated to be appropriately employed.  相似文献   

6.
查询扩展是提高检索效率的有效方法.但是许多查询扩展方法中扩展词的选择没有充分考虑词项之间以及词项与文档之间的相关性,这样可能在查询扩展时加入太多不相关信息降低检索的性能.通过对文档间相关性和词间相关性的计算,把文档和词关联起来构建Markov网络检索模型,然后根据词项子空间和文档子空间的映射关系提取词团,将提取的词团信息用于查询扩展,使得查询扩展的内容更为相关.实验表明:基于文档团依赖的Markov检索模型能有效地提高检索效果.  相似文献   

7.
3维模型检索是近年来基于内容检索的研究热点。在基于2维草图检索3维模型的研究中,通常没有结合人的知识。为了提高3维模型检索的精度,提出了一种基于视图分类的3维模型检索方法,它的主要思想是通过视图分类将人对3维模型的认知变换为对2维视图的理解,以便在度量2维草图与视图之间的相似性时能利用人的知识。该方法涉及以下2个问题:视图分类、2维草图与视图之间的相似性度量。实验结果表明,该方法能提高检索精度,因此可应用在2维草图查询方式的3维模型检索中。  相似文献   

8.
如何对急速增长的文档图像进行有效检索是文档图像管理系统的关键技术之一。提出了一种不需要识别文字的检索中文文档图像的方法,该方法在字符分割基础上采用基于粗外围特征粗匹配和基于改进Hausdorff距离相似度测量的两级匹配方法,以适应于时间、准确性的不同要求。同时用对200幅文档图像样本进行了实验,其结果表明,使用该方法对检索印刷体汉字的文档图像具有较高的检索效果,对于数字图书馆中文档图像检索系统的设计,有一定的参考价值。  相似文献   

9.
魏彬  张军  项颖 《数字社区&智能家居》2009,5(3):1686-1687,1698
针对当前几种常用文本检索方法的不足,文中基于统计模型和小波变换,提出了一种新的文本检索方法。与传统方法的主要区别在于:1)利用小波变换把输入信号引入到频域进行处理,消除了交叉比较运算的巨大计算量;2)在进行相关度计算时,同时考虑了检索词的出现次数和出现位置因素,有效提高了检索精确度。理论分析和实验结果表明该方法较传统方法在查准率和查询速度上均有所提高。  相似文献   

10.
针对当前几种常用文本检索方法的不足,文中基于统计模型和小波变换,提出了一种新的文本检索方法。与传统方法的主要区别在于:1)利用小波变换把输入信号引入到频域进行处理,消除了交叉比较运算的巨大计算量;2)在进行相关度计算时,同时考虑了检索词的出现次数和出现位置因素,有效提高了检索精确度。理论分析和实验结果表明该方法较传统方法在查准率和查询速度上均有所提高。  相似文献   

11.
基于特征的文档图像检索   总被引:1,自引:0,他引:1       下载免费PDF全文
张田  王希常  尘昌华 《计算机工程》2009,35(22):176-178
提出一种综合利用文档图像的段落特征和局部像素分布相对差特征进行文档图像检索的方法。给出段落特征和局部像素分布相对差特征的定义、提取方法以及基于这2个特征结合使用的检索方法。段落特征这一全局特征以及局部像素分布相对差特征这一局部特征相结合能够较好地表征和区分文档图像,检索方法将两者充分结合取得较好的效果。  相似文献   

12.
13.
检索一篇文档在其他语言中的译文对于双语平行语料库的建立是一件很有意义的工作。本文提出一种改进的跨语言相似文档检索算法,该算法使用双语词典或统计翻译模型作为双语知识库,查找两篇文档的共同翻译词对,把翻译词对的权重作为一种特征来进行相似度计算,用Dice方法的改进算法计算双语文档的相似度。在实验中,统计检索文档的译文排在检索结果前 N位的总次数来评价算法的性能,并使用了两个噪音数据集来评价算法的有效性。实验表明,在噪音数据干扰比较大的情况下,译文排在检索结果前5位的译文结果接近90%。实验证明,翻译词对的权重对于相似度计算有很大帮助,本算法可以有效地发现一种语言书写的文档在另一种语言中的译稿。  相似文献   

14.
数字图书馆科技文献知识导航   总被引:5,自引:2,他引:5  
提出了一种基于分类法和主题词表的科技文献知识导航体系,该体系支持分类法和主题词表知识导航、元数据结构查询和全文检索这三种检索手段以及他们的混合应用。从分类法主题词表的概念浏览和元数据查询的语义支持的角度来说,这是一个支持概念检索的知识导航体系。根据该体系,实现了“北京大学科技文献检索系统”实验数字图书馆。  相似文献   

15.
一种基于关键词的中文文档图像检索方法   总被引:1,自引:0,他引:1  
本文提出了一种基于关键词的中文文档图像检索方法,能在不经OCR(Optical Character Recognition)识别的情况下,直接利用中文字符的图像特征进行关键词检索。首先将文档图像分割成单个中文字符图像,接着对字符图像进行汉字笔画的特征数据提取,然后在特征数据间进行基于WMHD(Weighted Modified Hausdorff Distance)的相似性测量。该方法不受字号的影响,也有一定的抗字体能力,实验证明其具有较高的检索效果。  相似文献   

16.
随着大量图像数据库的广泛应用,使得图像检索成为图像资源管理和检索的一个研究热点.传统的基于颜色特征的固定分块的方法在进行相似性匹配时比较对应子块的颜色特征,各子块间的约束关系较强,对图像的旋转也较敏感,利用循环队列的数据结构对固定分块的方法进行改进,可以使基于固定分块的方法具有旋转不变性.  相似文献   

17.
The Cambridge University Multimedia Document Retrieval (CU-MDR) Demo System is a web-based application that allows the user to query a database of radio broadcasts that are available on the Internet. The audio from several radio stations is downloaded and transcribed automatically. This gives a collection of text and audio documents that can be searched by a user. The paper describes how speech recognition and information retrieval techniques are combined in the CU-MDR Demo System and shows how the user can interact with it.  相似文献   

18.
19.
面向XML文档的概念检索技术   总被引:11,自引:1,他引:11  
孙登峰 《计算机应用》2003,23(1):110-112
面向XML文档的信息检索是一个重要的研究课题,文中介绍了结构化文档的结构索引以及语义检索中的“上下文共现分析”技术,并在此基础上提出了一个面向XML文档的概念检索原型系统,并对系统设计及实现中应注意考虑的几个主要问题进行了分析。  相似文献   

20.
工程数据管理系统中的工程图档检索   总被引:3,自引:0,他引:3  
工程数据管理系统 (EDMS) ,已在越来越多的现代企业中被推广应用。它使企业中的传统纸质工程图档及相关信息被数字化了的电子图档所替代。真正实现了企业内部的工程技术图档无纸化计算机管理。而工程图档的电子检索在整个EDMS中是一项十分重要的功能。它使被授权用户能通过网络从系统的共享数据库中快速、方便、灵活、安全地获得有用的信息。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号