首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
This paper presents a knowledge-based approach to managing and retrieving personal documents. The dual document models consist of a document type hierarchy and a folder organization. The document type hierarchy is used to capture the layout, logical and conceptual structures of documents. The folder organization mimics the user's real-world document filing system for organizing and storing documents in an office environment. Predicate-based representation of documents is formalized for specifying knowledge about documents. Document filing and retrieval are predicate-driven. The filing criteria for the folders, which are specified in terms of predicates, govern the grouping of frame instances, regardless of their document types. We incorporated the notions of document type hierarchy and folder organization into the multilevel architecture of document storage. This architecture supports various text-based information retrieval techniques and content-based multimedia information retrieval techniques. The paper also proposes a knowledge-based query-preprocessing algorithm, which reduces the search space. For automating the document filing and retrieval, a predicate evaluation engine with a knowledge base is proposed. The learning agent is responsible for acquiring the knowledge needed by the evaluation engine.  相似文献   

2.
The rapid growth of multimedia documents has raised huge demand for sophisticated multimedia knowledge discovery systems. The knowledge extraction of the documents mainly relies on the data representation model and the document representation model. As the multimedia document comprised of multimodal multimedia objects, the data representation depends on modality of the objects. The multimodal objects require distinct processing and feature extraction methods resulting in different features with different dimensionalities. Managing multiple types of features is challenging for knowledge extraction tasks. The unified representation of multimedia document benefits the knowledge extraction process, as they are represented by same type of features. The appropriate document representation will benefit the overall decision making process by reducing the search time and memory requirements. In this paper, we propose a domain converting method known as Multimedia to Signal converter (MSC) to represent the multimodal multimedia document in an unified representation by converting multimodal objects as signal objects. A tree based approach known as Multimedia Feature Pattern (MFP) tree is proposed for the compact representation of multimedia documents in terms of features of multimedia objects. The effectiveness of the proposed framework is evaluated by performing the experiments on four multimodal datasets. Experimental results show that the unified representation of multimedia documents helped in improving the classification accuracy for the documents. The MFP tree based representation of multimedia documents not only reduces the search time and memory requirements, also outperforms the competitive approaches for search and retrieval of multimedia documents.  相似文献   

3.
4.
文本聚类是自然语言处理中的一项重要研究课题,主要应用于信息检索和Web挖掘等领域。其中的关键是文本的表示和聚类算法。在层次聚类的基础上,提出了一种新的基于边界距离的层次聚类算法,该方法通过选择两个类间边缘样本点的距离作为类间距离,有效地利用类的边界信息,提高类间距离计算的准确性。综合考虑不同词性特征对文本的贡献,采用多向量模型对文本进行表示。不同文本集上的实验表明,基于边界距离的多向量文本聚类算法取得了较好的性能。  相似文献   

5.
林泽琦  邹艳珍  赵俊峰  曹英魁  谢冰 《软件学报》2019,30(12):3714-3729
自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图,并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合,从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明,与多种文本检索方法相比,该方法在平均准确率(mean average precision,简称MAP)上可以取得至少13.77%的提升.  相似文献   

6.
《Computer Networks》1999,31(11-16):1403-1419
The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the content of Web documents and representing knowledge within them. We believe that these languages have advantages over metadata languages based on the Extensible Mark-up Language (XML). Indeed, the retrieval of precise information is better supported by languages designed to represent semantic content and support logical inference, and the readability of such a language eases its exploitation, presentation and direct insertion within a document (thus also avoiding information duplication). We advocate the use of Conceptual Graphs and simpler notational variants that enhance knowledge readability. To further ease the representation process, we propose techniques allowing users to leave some knowledge terms undeclared. We also show how lexical, structural and knowledge-based techniques may be combined to retrieve or generate knowledge or Web documents. To support and guide the knowledge modeling approach, we present a top-level ontology of 400 concept and relation types. We have implemented these features in a Web-accessible tool named WebKB2, and show examples to illustrate them.  相似文献   

7.
8.
曹存根  眭跃飞  孙瑜  曾庆田 《软件学报》2006,17(8):1731-1742
数学知识表示是知识表示中的一个重要方面,是数学知识检索、自动定理机器证明、智能教学系统等的基础.根据在设计NKI(national knowledge infrastructure)的数学知识表示语言中遇到的问题,并在讨论了数学对象的本体论假设的基础上提出了两种数学知识的表示方法:一种是以一个逻辑语言上的公式为属性值域的描述逻辑;另一种是以描述逻辑描述的本体为逻辑语言的一部分的一阶逻辑.在前者的表示中,如果对公式不作任何限制,那么得到的知识库中的推理不是可算法化的;在后者的表示中,以描述逻辑描述的本体中的推理是可算法化的,而以本体为逻辑语言的一部分的一阶逻辑所表示的数学知识中的推理一般是不可算法化的.因此,在表示数学知识时,需要区分概念性的知识(本体中的知识)和非概念性的知识(用本体作为语言表示的知识).框架或者描述逻辑可以表示和有效地推理概念性知识,但如果将非概念性知识加入到框架或知识库中,就可能使得原来可以有效推理的框架所表示的知识库不存在有效的推理算法,甚至不存在推理算法.为此,建议在表示数学知识时,用框架或描述逻辑来表示概念性知识;然后,用这样表示的知识库作为逻辑语言的一部分,以表示非概念性知识.  相似文献   

9.
A Knowledge-Based Approach to Effective Document Retrieval   总被引:3,自引:0,他引:3  
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.  相似文献   

10.
Information retrieval has evolved from searches of references, to abstracts, to documents. Search on the Web involves search engines that promise to parse full-text and other files: audio, video, and multimedia. With the indexable Web at 320 million pages and growing, difficulties with locating relevant information have become apparent. The most prevalent means for information retrieval relies on syntax-based methods: keywords or strings of characters are presented to a search engine, and it returns all the matches in the available documents. This method is satisfactory and easy to implement, but it has some inherent limitations that make it unsuitable for many tasks. Instead of looking for syntactical patterns, the user often is interested in keyword meaning or the location of a particular word in a title or header. This paper describes some precise search approaches in the environmental domain that locate information according to syntactic criteria, augmented by the utilization of information in a certain context. The main emphasis of this paper lies in the treatment of structured knowledge, where essential aspects about the topic of interest are encoded not only by the individual items, but also by their relationships among each other. Examples for such structured knowledge are hypertext documents, diagrams, logical and chemical formulae. Benefits of this approach are enhanced precision and approximate search in an already focused, context-specific search engine for the environment: EnviroDaemon.  相似文献   

11.
Semantic search has been one of the motivations of the semantic Web since it was envisioned. We propose a model for the exploitation of ontology-based knowledge bases to improve search over large document repositories. In our view of information retrieval on the semantic Web, a search engine returns documents rather than, or in addition to, exact values in response to user queries. For this purpose, our approach includes an ontology-based scheme for the semiautomatic annotation of documents and a retrieval system. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. Semantic search is combined with conventional keyword-based retrieval to achieve tolerance to knowledge base incompleteness. Experiments are shown where our approach is tested on corpora of significant scale, showing clear improvements with respect to keyword-based search  相似文献   

12.
针对因应急文档知识查找和利用效率不高造成应急决策者不能快速有效制定应急决策的问题,从知识系统工程的角度出发,结合知识元理论对应急文档知识进行结构化建模,为决策者快速有效地使用应急文档知识提供了一种新的途径。通过对物理结构分析提取元数据和进行文档结构化处理,对逻辑结构分析提出知识元提取的方法,知识元导航链接建立知识与结构化文档间的关联,进行知识推理与检索,并对应急文档的细粒度知识挖掘模式进行了深入的探讨。最后开发了应急决策知识支持系统原型并进行了验证,结果表明该建模方法能有效解决应急文档知识查找和利用效率不高的问题。  相似文献   

13.
Term mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words in the matching process. In this paper, we propose a method to incorporate word embedding (WE) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch. Experiments are performed on the standard Arabic TREC collection using three neural word embedding models. The results show that extending the existing IR models improves significantly baseline bag-of-words models. Although the proposed extensions significantly outperform their baseline bag-of-words, the difference between the evaluated neural word embedding models is not statistically significant. Moreover, the overall comparison results show that our extensions significantly improve the Arabic WordNet based semantic indexing approach and three recent WE-based IR language models.  相似文献   

14.
基于文档实例的中文信息检索   总被引:2,自引:0,他引:2  
传统的信息检索系统基于关键词建立索引并进行信息检索.这些系统存在查询返回文档集大、准确率低和普通用户不便于构造查询等不足.为此,该文提出基于文档实例的信息检索,即以已有文档作为样本,在文档库中检索与样本文档相似的所有文档.文中给出了基于文档实例的中文信息检索的解决方法和实现技术.初步实验结果表明该方法是行之有效的.  相似文献   

15.
World Wide Web search engines including Google, Yahoo and MSN have become the most heavily-used online services (including the targeted advertising), with millions of searches performed each day on unstructured sites. In this presentation, we would like to go beyond the traditional web search engines that are based on keyword search and the Semantic Web which provides a common framework that allows data to be shared and reused across application. For this reason, our view is that “Before one can use the power of web search the relevant information has to be mined through the concept-based search mechanism and logical reasoning with capability to Q&A representation rather than simple keyword search”. In this paper, we will first present the state of the search engines. Then we will focus on development of a framework for reasoning and deduction in the web. A new web search model will be presented. One of the main core ideas that we will use to extend our technique is to change terms-documents-concepts (TDC) matrix into a rule-based and graph-based representation. This will allow us to evolve the traditional search engine (keyword-based search) into a concept-based search and then into Q&A model. Given TDC, we will transform each document into a rule-based model including it’s equivalent graph model. Once the TDC matrix has been transformed into maximally compact concept based on graph representation and rules based on possibilistic relational universal fuzzy-type II (pertaining to composition), one can use Z(n)-compact algorithm and transform the TDC into a decision-tree and hierarchical graph that will represents a Q&A model. Finally, the concept of semantic equivalence and semantic entailment based on possibilistic relational universal fuzzy will be used as a basis for question-answering (Q&A) and inference from fuzzy premises. This will provide a foundation for approximate reasoning, language for representation of imprecise knowledge, a meaning representation language for natural languages, precisiation of fuzzy propositions expressed in a natural language, and as a tool for Precisiated Natural Language (PNL) and precisation of meaning. The maximally compact documents based on Z(n)-compact algorithm and possibilistic relational universal fuzzy-type II will be used to cluster the documents based on concept-based query-based search criteria. This Paper is dedicated to Prof. Lotfi A. Zadeh, father of Fuzzy Logic “Zadeh Logic”.  相似文献   

16.
Abstract: Vast amounts of medical information reside within text documents, so that the automatic retrieval of such information would certainly be beneficial for clinical activities. The need for overcoming the bottleneck provoked by the manual construction of ontologies has generated several studies and research on obtaining semi-automatic methods to build ontologies. Most techniques for learning domain ontologies from free text have important limitations. Thus, they can extract concepts so that only taxonomies are generally produced although there are other types of semantic relations relevant in knowledge modelling. This paper presents a language-independent approach for extracting knowledge from medical natural language documents. The knowledge is represented by means of ontologies that can have multiple semantic relationships among concepts.  相似文献   

17.
More people than ever before have access to information with the World Wide Web; information volume and number of users both continue to expand. Traditional search methods based on keywords are not effective, resulting in large lists of documents, many of which unrelated to users’ needs. One way to improve information retrieval is to associate meaning to users’ queries by using ontologies, knowledge bases that encode a set of concepts about one domain and their relationships. Encoding a knowledge base using one single ontology is usual, but a document collection can deal with different domains, each organized into an ontology. This work presents a novel way to represent and organize knowledge, from distinct domains, using multiple ontologies that can be related. The model allows the ontologies, as well as the relationships between concepts from distinct ontologies, to be represented independently. Additionally, fuzzy set theory techniques are employed to deal with knowledge subjectivity and uncertainty. This approach to organize knowledge and an associated query expansion method are integrated into a fuzzy model for information retrieval based on multi-related ontologies. The performance of a search engine using this model is compared with another fuzzy-based approach for information retrieval, and with the Apache Lucene search engine. Experimental results show that this model improves precision and recall measures.  相似文献   

18.
In this paper, we present a framework for a feedback process to implement a highly accurate document retrieval system. In the system, a document vector space is created dynamically to implement retrieval processing. The retrieval accuracy of the system depends on the vector space. When the vector space is created based on a specific purpose and interest of a user, highly accurate retrieval results can be obtained. In this paper, we present a method for analyzing and personalizing the vector space according to the purposes and interests of users. In order to optimize the document vector space, we defined and implemented functions for the operations of adding, deleting and weighting the terms that were used to create the vector space. By exploiting effectively and dynamically the classified-document information related to the queries, our methods allow users to retrieve relevant documents for their interests and purposes. Even if the search results of the initial retrieval space are not appropriate, by applying the proposed feedback operations, our proposed method effectively improves the search results. We also implemented an experimental search system for semantic document retrieval. Several experimental results including comparisons of our method with the traditional relevance feedback method is presented to clarify how retrieval accuracy was improved by the feedback process and how accurately documents that satisfied the purpose and interests of users were extracted.  相似文献   

19.
Recovering traceability links between code and documentation   总被引:2,自引:0,他引:2  
Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work is that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. We believe that the application-domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers; therefore, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. We apply both a probabilistic and a vector space information retrieval model in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. We compare the results of applying the two models, discuss the benefits and limitations, and describe directions for improvements.  相似文献   

20.
In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the structure of a form by using lines and the XY-tree approach. The approach is top-down and no domain knowledge such as the preprinted data or filled-in data is used. Geometrical modifications and slight variations are handled by this representation. Logically identical forms are associated to the same or similar hierarchical structure. Identification and the retrieval of similar forms are performed by computing the edit distances between the generated trees. Received: August 21, 2001 / Accepted: November 5, 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号