首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Enhancing Concept-Based Retrieval Based on Minimal Term Sets   总被引:1,自引:0,他引:1  
There is considerable interest in bridging the terminological gap that exists between the way users prefer to specify their information needs and the way queries are expressed in terms of keywords or text expressions that occur in documents. One of the approaches proposed for bridging this gap is based on technologies for expert systems. The central idea of such an approach was introduced in the context of a system called Rule Based Information Retrieval by Computer (RUBRIC). In RUBRIC, user query topics (or concepts) are captured in a rule base represented by an AND/OR tree. The evaluation of AND/OR tree is essentially based on minimum and maximum weights of query terms for conjunctions and disjunctions, respectively. The time to generate the retrieval output of AND/OR tree for a given query topic is exponential in number of conjunctions in the DNF expression associated with the query topic. In this paper, we propose a new approach for computing the retrieval output. The proposed approach involves preprocessing of the rule base to generate Minimal Term Sets (MTSs) that speed up the retrieval process. The computational complexity of the on-line query evaluation following the preprocessing is polynomial in m. We show that the computation and use of MTSs allows a user to choose query topics that best suit their needs and to use retrieval functions that yield a more refined and controlled retrieval output than is possible with the AND/OR tree when document terms are binary. We incorporate p-Norm model into the process of evaluating MTSs to handle the case where weights of both documents and query terms are non-binary.  相似文献   

3.
This paper presents a knowledge-based approach to managing and retrieving personal documents. The dual document models consist of a document type hierarchy and a folder organization. The document type hierarchy is used to capture the layout, logical and conceptual structures of documents. The folder organization mimics the user's real-world document filing system for organizing and storing documents in an office environment. Predicate-based representation of documents is formalized for specifying knowledge about documents. Document filing and retrieval are predicate-driven. The filing criteria for the folders, which are specified in terms of predicates, govern the grouping of frame instances, regardless of their document types. We incorporated the notions of document type hierarchy and folder organization into the multilevel architecture of document storage. This architecture supports various text-based information retrieval techniques and content-based multimedia information retrieval techniques. The paper also proposes a knowledge-based query-preprocessing algorithm, which reduces the search space. For automating the document filing and retrieval, a predicate evaluation engine with a knowledge base is proposed. The learning agent is responsible for acquiring the knowledge needed by the evaluation engine.  相似文献   

4.
A new technology for intelligent full text document retrieval is presented. The retrieval of a document is treated as an expert system problem, recognizing that human document retrieval is expert behavior. The technology is semantic measurement. A working prototype system, LIBRARY, has been built based on the technology. Input is a request for information, in unrestricted technical English; output is all documents with measured content similar to that of the request, ranked in order of relevance. Retrieval is unaffected by similarity or dissimilarity of terms between request and document. LIBRARY's performance is comparable to that of an expert human librarian, representing a significant improvement over traditional document retrieval systems.  相似文献   

5.
Information retrieval from the Internet is becoming a commonplace phenomenon. Users and consumers are browsing websites and seeking various kinds of information for personal use. Retrieving quality information from the Internet can be challenging even for the computer-savvy. There are several search engines, even some personalized, to help users search for information on the Internet. In spite of all the claims about search engines, users still have difficult time retrieving relevant information quickly. This paper proposes a general conceptual model for user-centered quality information retrieval (UCQIR) from the Internet. The UCQIR conceptual model is presented in an architectural form. The UCQIR architectural model uses the concept of “Task-performer” to present various aspects of an information retrieval system at the knowledge level. Task-performer is an abstract construct used to conceptualize the idea of an entity that is competent in doing its tasks. The UCQIR architectural model can be used to easily design and develop domain-specific, user-centered quality information retrieval systems. The proposed UCQIR conceptual model is unique and comprehensive. The use of the conceptual model is illustrated through a design of a patient-centered quality medical information retrieval for the medical domain. We also present an experimental evaluation of a UCQIR prototype based upon real user experiences. The experimental results are very positive.  相似文献   

6.
An active document framework is a self-representable, self-explainable, and self-executable document mechanism. A document’s content is reflected in four aspects: granularity hierarchy, template hierarchy, background knowledge, and semantic links between fragments. An active document has a set of build-in engines for browsing, retrieving, and reasoning, which can work in a way best suited to the document’s content. Besides browsing and retrieval services, the active document supports intelligent information services such as complex question answering, online teaching, and assistant problem solving. The client side service provider is only responsible for the retrieval of the required active document. The detailed information services are provided by the document mechanism. This improves the current Web information retrieval approach by raising the efficiency of information retrieval, enhancing the preciseness and mobility of information services, and enabling intelligent information services. A tool for making semantic links in a document and an intelligent browser have been developed to support the proposed approach, which provides a new type of web information service.  相似文献   

7.
A content-search information retrieval process based on conceptual graphs   总被引:1,自引:0,他引:1  
An intelligent information retrieval system is presented in this paper. In our approach, which complies with the logical view of information retrieval, queries, document contents and other knowledge are represented by expressions in a knowledge representation language based on the conceptual graphs introduced by Sowa. In order to take the intrinsic vagueness of information retrieval into account, i.e. to search documents imprecisely and incompletely represented in order to answer a vague query, different kinds of probabilistic logic are often used. The search process described in this paper uses graph transformations instead of probabilistic notions. This paper is focused on the content-based retrieval process, and the cognitive facet of information retrieval is not directly addressed. However, our approach, involving the use of a knowledge representation language for representing data and a search process based on a combinatorial implementation of van Rijsbergen’s logical uncertainty principle, also allows the representation of retrieval situations. Hence, we believe that it could be implemented at the core of an operational information retrieval system. Two applications, one dealing with academic libraries and the other concerning audiovisual documents, are briefly presented.  相似文献   

8.
Meghini  C. Rabitti  F. Thanos  C. 《Computer》1991,24(10):23-30
An approach to the document-retrieval problem that aims to increase the efficiency and effectiveness of document-retrieval systems by exploiting the semantic contents of the documents is presented. The document retrieval problem is delineated, and conceptual document modeling basics and requirements are discussed. An experimental system, the Multimedia Office Server (Multos), which implements some of the document-model concepts described, is presented  相似文献   

9.
This paper presents the main current theoretical issues in Information Retrieval. The principles of conceptual modelling, as they have emerged in the database area, are presented and their application to document modelling in order to enhance document retrieval is discussed. Finally, the main features of the MULTOS project are presented and critically reviewed confronting them with the requirements which have been identified during the general discussion on document conceptual modelling for information retrieval.  相似文献   

10.
Database systems have many advantages for implementing document retrieval systems. One of the main advantages would be the integration of data and text handling in a single information system. However, it has not been clear how much a database implementation would cost in terms of efficiency. In this paper, we compare a database implementation and a stand-alone implementation of a flexible representation of the content of documents and the associated search strategies. The representation used is a network of document and index term nodes. The comparison shows that certain features of a database system can have a significant effect on the efficiency of the implementation. Despite this, it appears that a database implementation of a sophisticated document retrieval system can be competitive with a stand-alone implementation.  相似文献   

11.
目前各省市各部门分别拥有各自专用的专家信息系统,分别管理所在地区和所属领域的专家信息,其形式各异且专家信息分布异构.针对实际应用中共享专家信息困难,基于关键字查询信息效率不高等问题,提出基于语义的专家信息系统解决方案.采用5W1H分析法归纳领域本体中概念和关系,建立基于5W1H的专家本体概念模型并生成专家领域本体,实现了专家语义信息统一建模.构建系统的四层体系结构,研究语义推理规则的定义方法,设计了基于语义的规则推理算法和基于SPAR-QL的语义信息查询算法,实现了系统基于语义的信息查询和推理等功能.实验表明,基于语义的专家信息查询在查全率、查准率优于基于关键字的查询.  相似文献   

12.
基于模糊语言方法的信息检索系统的研究   总被引:4,自引:2,他引:2  
该文提出了一个基于模糊语言方法的信息检索系统模型。该系统分为查询界面子系统、数据库子系统和检索子系统三大部分。在查询界面子系统,用布尔表达式表示用户的查询请求,并对每个查询关键词赋予了两种不同语义的语言值权重,该权重表达了用户的模糊检索要求;在数据库子系统,用索引词一文档模糊矩阵表示待检索的文档,对每个索引词。根据其在文档中的出现频率大小。引入了数值权重;在检索子系统,运用模糊语言方法,对用户输入的布尔查询表达式与索引词一文档模糊矩阵进行自底向上的模糊匹配,最后返回满足用户要求的检索结果。相对于传统的基于查询关键词精确匹配的检索系统而言,该系统能较好地满足用户查询要求中的灵活性。  相似文献   

13.
This paper describes aminimally immersive three-dimensional volumetric interactive information visualization system for management and analysis of document corpora. The system, SFA, uses glyph-based volume rendering, enabling more complex data relationships and information attributes to be visualized than traditional 2D and surface-based visualization systems. Two-handed interaction using three-space magnetic trackers and stereoscopic viewing are combined to produce aminimally immersive interactive system that enhances the user’s three-dimensional perception of the information space. This new system capitalizes on the human visual system’s pre-attentive learning capabilities to quickly analyze the displayed information. SFA is integrated with adocument management and information retrieval engine named Telltale. Together, these systems integrate visualization and document analysis technologies to solve the problem of analyzing large document corpora. We describe the usefulness of this system for the analysis and visualization of document similarity within acorpus of textual documents, and present an example exploring authorship of ancient Biblical texts. Received: 15 December 1997 / Revised: June 1999  相似文献   

14.
Most of the common techniques in text retrieval are based on the statistical analysis terms (words or phrases). Statistical analysis of term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that represent the concepts of the sentence, which leads to discovering the topic of the document. In this paper, a new concept-based retrieval model is introduced. The proposed concept-based retrieval model consists of conceptual ontological graph (COG) representation and concept-based weighting scheme. The COG representation captures the semantic structure of each term within a sentence. Then, all the terms are placed in the COG representation according to their contribution to the meaning of the sentence. The concept-based weighting analyzes terms at the sentence and document levels. This is different from the classical approach of analyzing terms at the document level only. The weighted terms are then ranked, and the top concepts are used to build a concept-based document index for text retrieval. The concept-based retrieval model can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning. Experiments using the proposed concept-based retrieval model on different data sets in text retrieval are conducted. The experiments provide comparison between traditional approaches and the concept-based retrieval model obtained by the combined approach of the conceptual ontological graph and the concept-based weighting scheme. The evaluation of results is performed using three quality measures, the preference measure (bpref), precision at 10 documents retrieved (P(10)) and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based retrieval model is used, confirming that such model enhances the quality of text retrieval.  相似文献   

15.
当前,信息检索系统通常采用“检索+重排序”的多级流水线架构。基于稠密表示的检索模型已经被逐渐应用到第一阶段检索中,并展现出了相比传统的稀疏向量空间模型更好的性能。考虑到第一阶段检索所需的高效性,大多数情况下这些模型的基本架构都采用双编码器(bi-encoder)结构。对查询和文档进行独立的编码,分别得到一个稠密表示向量,然后基于获得的查询和文档表示使用简单的相似度函数计算查询-文档对的得分。然而,在编码文档的过程中查询是不可知的,而且文档相比查询而言通常包含更多的主题信息,因此这种简单的单表示模型可能会造成严重的文档信息丢失。为了解决这个问题,设计了一种新的语义检索方法 MDR(multi-representation dense retrieval),将文档编码成多个稠密向量表示。同时,该方法引入覆盖率(coverage)机制来保证多个向量之间的差异性,从而能够覆盖文档中不同主题的信息。为了评估模型性能,在MS MARCO数据集上进行了段落排序和文档排序任务,实验结果证明了MDR方法的有效性。  相似文献   

16.
In this paper, we present clear and formal definitions of ranking factors that should be concerned in opinion retrieval and propose a new opinion retrieval model which simultaneously combines the factors from the generative modeling perspective. The proposed model formally unifies relevance-based ranking with subjectivity detection at the document level by taking multiple ranking factors into consideration: topical relevance, subjectivity strength, and opinion-topic relatedness. The topical relevance measures how strongly a document relates to a given topic, and the subjectivity strength indicates the likelihood that the document contains subjective information. The opinion-topic relatedness reflects whether the subjective information is expressed with respect to the topic of interest. We also present the universality of our model by introducing the model’s derivations that represent other existing opinion retrieval approaches. Experimental results on a large-scale blog retrieval test collection demonstrate that not only are the individual ranking factors necessary in opinion retrieval but they cooperate advantageously to produce a better document ranking when used together. The retrieval performance of the proposed model is comparable to that of previous systems in the literature.  相似文献   

17.
全文检索技术是智能信息管理的关键技术之一,Oracle Text作为Oracle的一个组件,提供了强大的全文检索功能,用Oracle做后台数据库,就可以充分利用其全文检索技术,构建复杂的大型文档管理系统。本文主要介绍了Oracle Text的体系结构及其在电子政务系统中的应用与实现,讨论了采用Oracle Text为组件进行电子政务全文检索应用系统的设计思想,并着重讨论了Oracle Text体系架构,在Oracle Text上如何实现全文检索做了某些研究,结合电子政务典型业务流程实例进行了具体实践的描述,对以后电子政务全文检索开发设计有一定的现实意义。  相似文献   

18.
基于传统的文献管理方式难以满足文献资料的存储、检索和使用等各方面的需要,设计了电子文献管理系统,实现了用户管理、文献管理、文献查阅、打印管理和系统管理功能。电子文献管理系统为文献资料提供了高效的管理方法,同时为用户提供了方便快捷的信息共享。  相似文献   

19.
In this paper, we present a new method for query reweighting to deal with document retrieval. The proposed method uses genetic algorithms to reweight a user's query vector, based on the user's relevance feedback, to improve the performance of document retrieval systems. It encodes a user's query vector into chromosomes and searches for the optimal weights of query terms for retrieving documents by genetic algorithms. After the best chromosome is found, the proposed method decodes the chromosome into the user's query vector for dealing with document retrieval. The proposed query reweighting method can find the best weights of query terms in the user's query vector, based on the user's relevance feedback. It can increase the precision rate and the recall rate of the document retrieval system for dealing with document retrieval.  相似文献   

20.
A new approach is described for the fusion of multimedia information based on the concept of active documents advertising on the Internet, whereby the metadata of a document travels in the network to seek out documents of interest to the parent document and, at the same time, advertises its parent document to other interested documents. This abstraction of metadata is called an adlet, which is the core of our approach. Two important features make this approach applicable to multimedia information fusion, information retrieval, data mining, geographic information systems, and medical information systems: 1) any document, including a Web page, database record, video file, audio file, image and even paper documents, can be enhanced by an adlet and become an active document; and 2) any node in a nonactive network can be enhanced by adlet-savvy software and the adlet-enhanced node can coexist with other nonenhanced nodes. An experimental prototype provides a testbed for feasibility studies in a hybrid active network  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号