首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
This paper describes the FACT system for knowledge discovery fromtext. It discovers associations—patterns ofco-occurrence—amongst keywords labeling the items in a collection oftextual documents. In addition, when background knowledge is available aboutthe keywords labeling the documents FACT is able to use this information inits discovery process. FACT takes a query-centered view of knowledgediscovery, in which a discovery request is viewed as a query over theimplicit set of possible results supported by a collection of documents, andwhere background knowledge is used to specify constraints on the desiredresults of this query process. Execution of a knowledge-discovery query isstructured so that these background-knowledge constraints can be exploitedin the search for possible results. Finally, rather than requiring a user tospecify an explicit query expression in the knowledge-discovery querylanguage, FACT presents the user with a simple-to-use graphical interface tothe query language, with the language providing a well-defined semantics forthe discovery actions performed by a user through the interface.  相似文献   

Metadoc: An adaptive hypertext reading system   总被引:2,自引:0,他引:2  
Presentation of textual information is undergoing rapid transition. Millennia of experience writing linear documents is gradually being discarded in favor of non-linear hypertext writing. In this paper, we investigate how hypertext — in its current node-and-link form — can be augmented by an adaptive, user-model-driven tool. Currently the reader of a document has to adapt to that document — if the detail level is wrong the reader either skims the document or has to consult additional sources of information for clarification. The MetaDoc system not only has hypertext capabilities but also has knowledge about the documents it represents. This knowledge enables the document to modify its level of presentation to suit the user. MetaDoc builds and dynamically maintains a user model for each reader. The model tailors the presentation of the document to the reader. The three-dimensionality of MetaDoc allows the text presented to be changed either by the user model or through explicit user action. MetaDoc is more a documentation reading system rather than a hypertext navigation or reading tool. MetaDoc is a fully developed and debugged system that has been applied to technical documentation.  相似文献   

Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

The advent of electronic documents and the consequent creation of digital libraries—vast repositories of electronic information—has a profound impact on how we produce, organize, store, retrieve and consume information. All of these activities have been dictated to the present by the technologies used to share information. A change in the underlying technology, namely, the move from paper to electronic documents, offers a unique opportunity to revolutionize how information is archived and disseminated. This paper will focus on a specific aspect of the opportunities opened up by electronic publishing on the NII—the ability to present information in multiple modalities and thereby free it from any single presentation medium.Traditional printed communication relies on a passive intermediary, paper, for the exchange of information between the author and reader. Ideas put down on paper come back to life only when perused by the reader.Electronic publishing is mediated by a computer, an agent capable of processing the information. As a consequence, the ideas expressed by an author need no longer be bound to any single display form; nor does it require human intervention to translate the information from one displayed form to another. Electronic information can be processed and displayed in a manner best suited to each individual's needs. Thus, the advent of electronic documents makes information available in more than its visual form—electronic information can now be display-independent.Traditionally, an electronic document has been viewed simply as digitally representing (or the means towards producing) the printed page. Instead, we view the electronic document as the basic entity that represents information; we allow the information to be rendered in different ways—on paper, spoken, processed in different ways by a computer, etc. This change of viewpoint has allowed us to develop ASTER (Audio System For Technical Readings) a computing system that audio formats electronic documents to produce audio documents. ASTER can speak both literary texts and highly technical documents that contain complex mathematics. Moreover, the listener can ask to have parts of a document repeated in different ways: a document has many different spoken views.The adequacy of the audio rendering depends on how well the electronic document captures the essential internal structure of the information. In this paper, we discuss capturing structure and give guidelines for authors to follow to ensure that their documents exhibit structure adequately.In the context of the NII, the digital libraries of the future can be viewed as large information servers that allow multiple clients to access and display information in a format chosen by the user. By obviating the need to move physical media, e.g., printed paper or recorded tapes, the NII enables the ready dissemination of multimodal renderings of information.  相似文献   

In this paper, we present a three-dimensional user interface for synchronous co-operative work, Spin, which has been designed for multi-user synchronous real-time applications to be used in, for example, meetings and learning situations. Spin is based on a new metaphor of virtual workspace. We have designed an interface, for an office environment, which recreates the three-dimensional elements needed during a meeting and increases the user's scope of interaction. In order to accomplish these objectives, animation and three-dimensional interaction in real time are used to enhance the feeling of collaboration within the three-dimensional workspace. Spin is designed to maintain a maximum amount of information visible. The workspace is created using artificial geometry — as opposed to true three-dimensional geometry — and spatial distortion, a technique that allows all documents and information to be displayed simultaneously while centring the user's focus of attention. Users interact with each other via their respective clones, which are three-dimensional representations displayed in each user's interface, and are animated with user action on shared documents. An appropriate object manipulation system (direct manipulation, 3D devices and specific interaction metaphors) is used to point out and manipulate 3D documents.  相似文献   

Experienced users who query search engines have a complex behavior. They explore many topics in parallel, experiment with query variations, consult multiple search engines, and gather information over many sessions. In the process they need to keep track of search context — namely useful queries and promising result links, which can be hard. We present an extension to search engines called SearchPad that makes it possible to keep track of ‘search context' explicitly. We describe an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot. Our design of SearchPad has several desirable properties: (i) portability across all major platforms and browsers; (ii) instant start requiring no code download or special actions on the part of the user; (iii) no server side storage; and (iv) no added client–server communication overhead. An added benefit is that it allows search services to collect valuable relevance information about the results shown to the user. In the context of each query SearchPad can log the actions taken by the user, and in particular record the links that were considered relevant by the user in the context of the query. The service was tested in a multi-platform environment with over 150 users for 4 months and found to be usable and helpful. We discovered that the ability to maintain search context explicitly seems to affect the way people search. Repeat SearchPad users looked at more search results than is typical on the Web, suggesting that availability of search context may partially compensate for non-relevant pages in the ranking.  相似文献   

互联网中文信息获取研究   总被引:1,自引:0,他引:1  
提出了一种以智能化、主动搜索为标志的互联网中文信息获取方法,实现了一种互联网中文信息智能获取工具,该工具采用智能Agent的体系结构,通过学习用户日常的文档和用户的交互意见推测出用户需求,建立个性化的用户模型。并使用元搜索引擎从互联网上主动获取信息。最后通过本地智能处理技术,剔除合并重复及相似性大的信息,将处理后的结果以显明易懂的方式提交给用户。  相似文献   

Antique documents, which undoubtedly represent our cultural heritage and can be considered a very rich source of information, are kept in many countries only on libraries with historical archives. The antiquity and fragility of such documents makes their access very restricted. Considering that nowadays the Internet is one of the most interesting places to publish any kind of information, it seems logical to use it to both preserve our cultural heritage and provide a broader access to these documents. This work presents a virtual library that stores data, transcribed texts and digitalized pages of historic Spanish documents from the 16th–18th centuries. This virtual library has two main objectives: first, by offering a set of services, including a powerful user interface to search and browse the documents, a bulletin board, a chat, or mail boxes, the virtual library is transformed into a meeting place for researchers that use emblem books as sources of information for their studies. Second, the virtual library contributes to the preservation of emblem books. We shall describe in this work the project that led to the development of the Virtual Library of Emblem Books, showing its evolution from the beginning (simple search forms and answer pages) to its current state as a virtual library, focusing on the techniques used to build an intuitive and powerful user interface. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

The web is nowadays one of the main information sources, and information search is an important area in which many advances have been registered. One approach to improve web search results is to consider contextual information. Usually, information about context has been provided through user logs on previous searches or the monitoring of clicks on first results, but different approaches can be used in specific environments. In a web based learning environment, existing documents and exchanged messages could provide contextual information. So, the main goal of this work is to provide a contextual web search engine based on shared documents and messages posted in a social network used for collaborative learning. Contextual search is provided through query expansion using learning documents (material provided by the teacher) and discussion messages (posts, links and comments that result from the participants’ interactions). A prototype was implemented and used in a learning scenario to acquire the context in a learning community. The proposed approach makes the context acquisition faster and more dynamic as it considers an automatic approach over text processing of documents and discussions. In addition, the results of the query engine with and without the contextual information were compared and the proposed approach using contextual information showed improvements in the precision of the results.  相似文献   

对于加密云数据的搜索,传统的关键词模糊搜索方案虽然能搜索到相关文档,但是搜索的结果并不令人满意。在用户输入正确的情况下,无法完成近似搜索,当用户出现拼写错误时,返回的结果中包含大量无关关键词文档,严重浪费了带宽资源。针对目前在加密云数据下关键词模糊搜索的缺陷,提出了一种新型的关键词模糊搜索方案,通过对关键词计算相关度分数并对文档根据相关度分数进行排序,将top-k(即相关度最高的k个文档)个文档返回给搜索用户,减少了不必要的带宽浪费和用户寻找有效文档的时间消耗,提供了更加有效的搜索结果,并且通过引入虚假陷门集,增大了云服务器对文档关键词的分析难度,增加了系统的隐私性保护。  相似文献   

集成搜索引擎的文本数据库选择   总被引:8,自引:0,他引:8  
用户需要检索的信息往往分散存储在多个搜索多个搜索引擎各自的数据库里,对普通用户而言,访问多个搜索引擎并从返回的结果中分辨出确实有网页是一件费时费力的工作,集成搜索引擎则可以提供给用户一个同时记问多个搜索引擎人集成环境,集成搜索引擎能将其接收到的用户查询提交给底层的多个搜索引擎进行搜索,作为一种搜索工具,集成搜索引擎具有如WEB查询覆盖面比传统引擎更大,引警有更好的可扩展性等优点,讨论了解决集成搜索引擎的数据库选择问题的多种技术,针对用户提交的查询要求,通过数据库选择可以选定最有可能返回有用信息的底层搜索引擎。  相似文献   

基于知识的网页检索工具   总被引:3,自引:0,他引:3  
随着因特网在全球范围的广泛使用,越来越多的人们借助于因特网从事科研和商务活动,而网页检索工具成了人们必不可少的软件工具.然而,目前流行的检索工具大多基于关键字查询,常常出现信息过载或有用信息丢失等现象.造成这一原因主要有两方面:用户提交的查询不能很好地表达他的目的;查询的结果没有建立有效的索引机制,引导人们快速找到有用信息。为此我们提出一种基于知识的网页检索工具(KWSE),它是在已有的检索工具的  相似文献   

基于Ontology的信息检索技术研究   总被引:26,自引:0,他引:26  
随着Web 的迅速发展,网上信息资源越来越丰富,网络已经成为了一个全球最大的信息库。而用户要从中得到所需的信息一般是通过各种信息检索工具。但是现有的信息检索工具都存在着检索精度不高等问题。本文针对这些问题,提出了将Ontology 融合到信息检索技术中的思路。利用Ontology 中拥有的领域知识,可以大大提高检索系统对自然语言文本的理解能力,同时方便用户以自然语言的方式提出检索请求,从而提高检索的效果。  相似文献   

王秋月  曹巍  史少晨 《计算机应用》2015,35(9):2553-2559
联邦搜索是从大规模深层网上获取信息的一种重要技术。给定一个用户查询,联邦搜索系统需要解决的一个主要问题是数据源选择问题,即从海量数据源中选出一组最有可能返回相关结果的数据源。现有的数据源选择算法大多基于数据源的样本文档集和查询之间的关键词匹配,通常无法很好地解决少量样本文档的信息缺失问题。针对这一问题,提出了基于隐含狄利克雷分布(LDA)主题模型进行数据源选择的方法。首先,使用LDA主题模型获得数据源和查询的主题概率分布;然后,通过比较两者主题概率分布的相近性来对所有数据源进行排序。通过将数据源和查询映射到低维的主题空间来解决高维词条空间稀疏性所带来的信息缺失问题。在TREC FedWeb 2013和2014 Track的测试集上分别进行了实验,并和其他参赛方法的结果进行了比较。在FedWeb 2013测试集上的实验结果显示比其他参赛方法的最好结果提高了24%;在FedWeb 2014测试集上的实验结果显示比传统的基于小文档和大文档的关键词匹配方法分别提高了22%和43%。另外,使用文档片段来代替文档还可以大幅提升系统的效率,更增加了此方法的实用性和可行性。  相似文献   

A Knowledge-Based Approach to Effective Document Retrieval   总被引:3,自引:0,他引:3  
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.  相似文献   

Over the past few years, the amount of electronic information available through the Internet has increased dramatically. Unfortunately, the search tools currently available for retrieving and filtering information in this space are not effective in balancing relevance and comprehensiveness. This paper analyzes the results of experiments in which HTML documents are searched with user models and software agents used as intermediaries to the search. Simple user models are first combined with search specifications (or ‘User Needs’), to define an Enhanced User Need. Then Uniform Resource Agents are constructed to filter information based on the EUN parameters. The results of searches using different agents are then compared to those obtained through a comparable simple keyword search, and it is shown that a user searching a pool of existing agents can obtain better search results than by conducting a traditional keyword search. This work thus demonstrates that the use of user models and information filtering agents do improve search results and may be used to improve Internet information retrieval. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

We study a new research problem, where an implicit information retrieval query is inferred from eye movements measured when the user is reading, and used to retrieve new documents. In the training phase, the user’s interest is known, and we learn a mapping from how the user looks at a term to the role of the term in the implicit query. Assuming the mapping is universal, that is, the same for all queries in a given domain, we can use it to construct queries even for new topics for which no learning data is available. We constructed a controlled experimental setting to show that when the system has no prior information as to what the user is searching, the eye movements help significantly in the search. This is the case in a proactive search, for instance, where the system monitors the reading behaviour of the user in a new topic. In contrast, during a search or reading session where the set of inspected documents is biased towards being relevant, a stronger strategy is to search for content-wise similar documents than to use the eye movements.  相似文献   

Most Web search engines use the content of the Web documents and their link structures to assess the relevance of the document to the user’s query. With the growth of the information available on the web, it becomes difficult for such Web search engines to satisfy the user information need expressed by few keywords. First, personalized information retrieval is a promising way to resolve this problem by modeling the user profile by his general interests and then integrating it in a personalized document ranking model. In this paper, we present a personalized search approach that involves a graph-based representation of the user profile. The user profile refers to the user interest in a specific search session defined as a sequence of related queries. It is built by means of score propagation that allows activating a set of semantically related concepts of reference ontology, namely the ODP. The user profile is maintained across related search activities using a graph-based merging strategy. For the purpose of detecting related search activities, we define a session boundary recognition mechanism based on the Kendall rank correlation measure that tracks changes in the dominant concepts held by the user profile relatively to a new submitted query. Personalization is performed by re-ranking the search results of related queries using the user profile. Our experimental evaluation is carried out using the HARD 2003 TREC collection and showed that our session boundary recognition mechanism based on the Kendall measure provides a significant precision comparatively to other non-ranking based measures like the cosine and the WebJaccard similarity measures. Moreover, results proved that the graph-based search personalization is effective for improving the search accuracy.  相似文献   

A new architecture for information retrieval systems is presented. If it was implemented, this architecture would allow the system to process retrieval statements that are equivalent to fuzzily defined queries. The philosophy on which the centerpiece of this system is based—the document search module—is fully explained in this paper. The emphasis is placed on the quick elimination of irrelevant references. A new technique, that takes into account the user's knowledge to discriminate between documents before they are actually retrieved from the data base, was developed. The search technique uses simple computations to select or eliminate potential candidates for retrieval. This technique does not have, qualitatively, the shortcomings of, not only conventional retrieval techniques, but also retrieval systems that accept relevance feedback from the user, in order to refine the search process. No implementation details have been included in this article and system performance figures are not discussed.  相似文献   

Query expansion by mining user logs   总被引:9,自引:0,他引:9  
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to previous query expansion methods, ours takes advantage of the user judgments implied in user logs. The experimental results show that the log-based query expansion method can produce much better results than both the classical search method and the other query expansion methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号