首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
A Knowledge-Based Approach to Effective Document Retrieval   总被引:3,自引:0,他引:3  
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.  相似文献   

2.
This paper describes our research into a query-by-semantics approach to searching the World Wide Web. This research extends existing work, which had focused on a query-by-structure approach for the Web. We present a system that allows users to request documents containing not only specific content information, but also to specify that documents be of a certain type. The system captures and utilizes structure information as well as content during a distributed query of the Web. The system also allows the user the option of creating their own document types by providing the system with example documents. In addition, although the system still gives users the option of dynamically querying the web, the incorporation of a document database has improved the response time involved in the search process. Based on extensive testing and validation presented herein, it is clear that a system that incorporates structure and document semantic information into the query process can significantly improve search results over the standard keyword search.  相似文献   

3.
Among the developments of information technology, the most popular tools nowadays for seeking the knowledge are the Google or Yahoo keywords-based search engines on the Internet. Users can easily obtain the information they need, but they still have to read and organize those documents by themselves. Due to that reason, users have to spend most of time in browsing and skipping the documents they have searched. In order to facilitate this process, this paper proposes a query-based ontology knowledge acquisition system which dynamically constructs query-based partial ontology to provide proficient answers for users’ queries. To construct the relationships and hierarchy of concepts in such an ontology, the formal concept analysis approach is adopted. After the ontology is built, the system can deduct the specific answer according to the relationships and hierarchy of ontology without asking users to read the whole document sets. We collected three kinds of sports news pages as source documents including those regarding NBA, CPBL and MLB to evaluate the precision of the system function in the experiment, which, as a result, reveals that the proposed approach indeed can work effectively.  相似文献   

4.
基于知识的网页检索工具   总被引:3,自引:0,他引:3  
随着因特网在全球范围的广泛使用,越来越多的人们借助于因特网从事科研和商务活动,而网页检索工具成了人们必不可少的软件工具.然而,目前流行的检索工具大多基于关键字查询,常常出现信息过载或有用信息丢失等现象.造成这一原因主要有两方面:用户提交的查询不能很好地表达他的目的;查询的结果没有建立有效的索引机制,引导人们快速找到有用信息。为此我们提出一种基于知识的网页检索工具(KWSE),它是在已有的检索工具的  相似文献   

5.
Internet上多来源MSDS的统一检索方法   总被引:1,自引:0,他引:1  
国际上称之为材料安全数据表的MSDS是一份关于化合物安全信息的综合性法律文件.随着人们安全意识的提高以及对职业健康、环境保护等认识的不断深入,MSDS作为一个安全信息文件越来越受重视.由于制作MSDS需要一定的成本,充分利用网络上免费的MSDS数据对于了解化合物的安全信息具有重要的参考价值.已有针对MSDS的搜索工具,一般只能检索单一来源的MSDS数据库,所以建立一个能通过一个查询同时榆索多个来源数据库的MSDS搜索引擎,将为使用MSDS的人员提供极大方便.本文提出建立MSDS统一搜索引擎的框架,通过链接分析技术和深层网数据检索技术获取MSDS文件,然后缓存各数据源检索结果,并建立化合物索引以提高MSDS搜索引擎的响应速度.其实现过程包括发现与自动构造检索式模式、自动获取检索结果页面、利用数据提取的方法获取化合物标识信息以建立MSDS的化合物索引等多个方面,为建立一个可用的MSDS统一搜索引擎奠定坚实基础.  相似文献   

6.
ACIRD: intelligent Internet document organization and retrieval   总被引:6,自引:0,他引:6  
This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier, and two-phase search engine. The knowledge acquisition process of ACIRD automatically learns classification knowledge from classified Internet documents. The document classifier applies learned classification knowledge to classify newly collected Internet documents into one or more classes. Experimental results indicate that ACIRD performs as well or better than human experts in both knowledge acquisition and document classification. By using the learned classification knowledge and the given class lattice, the ACIRD two-phase search engine responds to user queries with hierarchically structured navigable results (instead of a conventional flat ranked document list), which greatly aids users in locating information from numerous, diversified Internet documents  相似文献   

7.
More people than ever before have access to information with the World Wide Web; information volume and number of users both continue to expand. Traditional search methods based on keywords are not effective, resulting in large lists of documents, many of which unrelated to users’ needs. One way to improve information retrieval is to associate meaning to users’ queries by using ontologies, knowledge bases that encode a set of concepts about one domain and their relationships. Encoding a knowledge base using one single ontology is usual, but a document collection can deal with different domains, each organized into an ontology. This work presents a novel way to represent and organize knowledge, from distinct domains, using multiple ontologies that can be related. The model allows the ontologies, as well as the relationships between concepts from distinct ontologies, to be represented independently. Additionally, fuzzy set theory techniques are employed to deal with knowledge subjectivity and uncertainty. This approach to organize knowledge and an associated query expansion method are integrated into a fuzzy model for information retrieval based on multi-related ontologies. The performance of a search engine using this model is compared with another fuzzy-based approach for information retrieval, and with the Apache Lucene search engine. Experimental results show that this model improves precision and recall measures.  相似文献   

8.
A content-search information retrieval process based on conceptual graphs   总被引:1,自引:0,他引:1  
An intelligent information retrieval system is presented in this paper. In our approach, which complies with the logical view of information retrieval, queries, document contents and other knowledge are represented by expressions in a knowledge representation language based on the conceptual graphs introduced by Sowa. In order to take the intrinsic vagueness of information retrieval into account, i.e. to search documents imprecisely and incompletely represented in order to answer a vague query, different kinds of probabilistic logic are often used. The search process described in this paper uses graph transformations instead of probabilistic notions. This paper is focused on the content-based retrieval process, and the cognitive facet of information retrieval is not directly addressed. However, our approach, involving the use of a knowledge representation language for representing data and a search process based on a combinatorial implementation of van Rijsbergen’s logical uncertainty principle, also allows the representation of retrieval situations. Hence, we believe that it could be implemented at the core of an operational information retrieval system. Two applications, one dealing with academic libraries and the other concerning audiovisual documents, are briefly presented.  相似文献   

9.
We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted information needs. We have developed the semantic components model to describe important semantic content within documents. The semantic components model for a given collection (based on a general understanding of the type of information needs expected) consists of a set of document classes, where each class has an associated set of semantic components. Each semantic component instance consists of segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. This paper describes how the semantic components model can be used to improve an information retrieval system. We present experimental evidence from a large interactive searching study that compared the use of semantic components in a system with full text and keyword indexing, where we extended the query language to allow users to search using semantic components, to a base system that did not have semantic components. We evaluate the systems from a system perspective, where semantic components were shown to improve document ranking for precision-oriented searches, and from a user perspective. We also evaluate the systems from a session-based perspective, evaluating not only the results of individual queries but also the results of multiple queries during a single interactive query session.  相似文献   

10.
The huge volume of distributed information that is nowadays available in electronic multimedia documents forces a lot of people to consume a significant percentage of their time looking for documents that contain information useful to them. The filtering of electronic documents seems hard to automate, partly because of document heterogeneity, but mainly because it is difficult to train computers to have an understanding of the contents of these documents and make decisions based on user-subjective criteria. In this paper, we suggest a model for the automation of content-based electronic document filtering, supporting multimedia documents in a wide variety of forms. The model is based on multi-agent technology and utilizes an adaptive knowledge base organized as a set of logical rules. Implementations of the model using the client-server architecture should be able to efficiently access documents distributed over an intranet or the Internet.  相似文献   

11.
The increasing use of computers for transactions and communication have created mountains of data that contain potentially valuable knowledge. To search for this knowledge we have to develop a new generation of tools, which have the ability of flexible querying and intelligent searching. In this paper we will introduce an extension of a fuzzy query language called Summary SQL which can be used for knowledge discovery and data mining. We show how it can be used to search for fuzzy rules.  相似文献   

12.
Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (∼11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval.  相似文献   

13.
XML非完全结构查询(NFS)允许用户利用部分XML结构信息,甚至仅仅是关键字来描述查询要求,是在缺乏完整的XML文档结构信息情况下的重要查询手段.针对图模型下的NFS有意义结果判断问题,在PE模型基础上提出一种基于图的有意义结果判断模型GPE,包括结果粒度、模式实体定义、等价模式定义和判断规则;针对标签歧义性和复杂的结构语义,GPE提出一种结合基于领域字典的语境受限的标签语义相似性和模式结构相似性的等价模式计算方法.通过在实际数据集和XML实验数据上的实验表明,GPE模型在查准率和查全率上均有较大提高.  相似文献   

14.
XML作为Web上新的数据发布语言,将成为web下一代"数据表达"和"数据交换"的统一标准.然而XML文档很少是静止的,它经常会被修改.引入"时态表达"后,时态XML文档能够记录一系列的修改痕迹以及数据的变化过程.本文提出了将双时态XML数据模型映射到双时态XML文档的四种映射方法,最后通过实验对比了这些映射方法及其适用场合.  相似文献   

15.
The sheer volume of information and variety of sources from which it may be retrieved on the Web make searching the sources a difficult task. Usually, meta-search engines can be used only to search Web pages or documents; other major sources such as data bases, library corpuses and the so-called Web data bases are not involved. Faced with these restrictions, an effective retrieval technology for a much wider range of sources becomes increasingly important. In our previous work, we proposed an Integrated Retrieval (IIR), which is based on Common Object Request Broker Architecture, to spare clients the trouble of complicated semantics when federating multiple sources. In this paper, we present an IIR-based prototype for integrated information gathering system. It offers a unified interface for querying heterogeneous interfaces or protocols of sources and uses SQL compatible query language for heterogeneous backend targets. We use it to link two general search engines (Yahoo and AltaVista), a science paper explorer (IEEE), and two library corpus explorers. We also perform preliminary measurements to assess the potential of the system. The results shown that the overhead spent on each source as the system queries them is within reason, that is, that using IIR to construct an integrated gathering system incurs low overhead.  相似文献   

16.
本文研究如何有效地查询基于"单属性表示时态信息"的XML文档的时态信息操作,并选择XML功能较为强大的查询语言XQuery进行时态查询扩展.文中先计论如何解决时态数据库中特殊元素now的语义失真的问题,然后对XQuery进行双时态扩展,最后给出扩展后的查询实例.  相似文献   

17.
Maintaining, customizing, sharing and reusing ISO9000 quality documents are essential for many organizations, especially those who work as virtual enterprises (VE). In a VE, the documents must be shared among organizations to take the full advantages of the recent Internet advances. XML is a new browser-based language standard. The purpose of this research is to explore the capabilities of XML and Internet technologies in electronic document management environments to comply with the ISO9000 requirements. This research has demonstrated several XML-enabled examples beneficial for the main functions of ISO9000 document management such as document creation, document change, document control and document access. The implemented examples demonstrate the effectiveness and efficiencies of document customizing, querying, hierarchical linking, tracking and reusing. The research results solve the ISO9000 document-related problems among working partners and facilitate document flow and information integration of value chain.  相似文献   

18.
This paper presents a knowledge-based approach to managing and retrieving personal documents. The dual document models consist of a document type hierarchy and a folder organization. The document type hierarchy is used to capture the layout, logical and conceptual structures of documents. The folder organization mimics the user's real-world document filing system for organizing and storing documents in an office environment. Predicate-based representation of documents is formalized for specifying knowledge about documents. Document filing and retrieval are predicate-driven. The filing criteria for the folders, which are specified in terms of predicates, govern the grouping of frame instances, regardless of their document types. We incorporated the notions of document type hierarchy and folder organization into the multilevel architecture of document storage. This architecture supports various text-based information retrieval techniques and content-based multimedia information retrieval techniques. The paper also proposes a knowledge-based query-preprocessing algorithm, which reduces the search space. For automating the document filing and retrieval, a predicate evaluation engine with a knowledge base is proposed. The learning agent is responsible for acquiring the knowledge needed by the evaluation engine.  相似文献   

19.
企业搜索为提高企业核心竞争力,加强内部资源共享和知识创新提供了一个实现的平台。为了更好的为用户提供实时、迅速、准确的综合信息查询服务,汕头供电局构建一个强大的搜索引擎系统,通过整合搜索工具和文档管理工具,实现对发布、存储于各业务系统中的不同格式的文档、HTML页面全文搜索,为员工提供方便快捷的信息存储、查找方式。  相似文献   

20.
Semantic search has been one of the motivations of the semantic Web since it was envisioned. We propose a model for the exploitation of ontology-based knowledge bases to improve search over large document repositories. In our view of information retrieval on the semantic Web, a search engine returns documents rather than, or in addition to, exact values in response to user queries. For this purpose, our approach includes an ontology-based scheme for the semiautomatic annotation of documents and a retrieval system. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. Semantic search is combined with conventional keyword-based retrieval to achieve tolerance to knowledge base incompleteness. Experiments are shown where our approach is tested on corpora of significant scale, showing clear improvements with respect to keyword-based search  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号