首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kwong  Linus W.  Ng  Yiu-Kai 《World Wide Web》2003,6(3):281-303
To retrieve Web documents of interest, most of the Web users rely on Web search engines. All existing search engines provide query facility for users to search for the desired documents using search-engine keywords. However, when a search engine retrieves a long list of Web documents, the user might need to browse through each retrieved document in order to determine which document is of interest. We observe that there are two kinds of problems involved in the retrieval of Web documents: (1) an inappropriate selection of keywords specified by the user; and (2) poor precision in the retrieved Web documents. In solving these problems, we propose an automatic binary-categorization method that is applicable for recognizing multiple-record Web documents of interest, which appear often in advertisement Web pages. Our categorization method uses application ontologies and is based on two information retrieval models, the Vector Space Model (VSM) and the Clustering Model (CM). We analyze and cull Web documents to just those applicable to a particular application ontology. The culling analysis (i) uses CM to find a virtual centroid for the records in a Web document, (ii) computes a vector in a multi-dimensional space for this centroid, and (iii) compares the vector with the predefined ontology vector of the same multi-dimensional space using VSM, which we consider the magnitudes of the vectors, as well as the angle between them. Our experimental results show that we have achieved an average of 90% recall and 97% precision in recognizing Web documents belonged to the same category (i.e., domain of interest). Thus our categorization discards very few documents it should have kept and keeps very few it should have discarded.  相似文献   

2.
 Relevance feedback techniques have demonstrated to be a powerful means to improve the results obtained when a user submits a query to an information retrieval system as the world wide web search engines. These kinds of techniques modify the user original query taking into account the relevance judgements provided by him on the retrieved documents, making it more similar to those he judged as relevant. This way, the new generated query permits to get new relevant documents thus improving the retrieval process by increasing recall. However, although powerful relevance feedback techniques have been developed for the vector space information retrieval model and some of them have been translated to the classical Boolean model, there is a lack of these tools in more advanced and powerful information retrieval models such as the fuzzy one. In this contribution we introduce a relevance feedback process for extended Boolean (fuzzy) information retrieval systems based on a hybrid evolutionary algorithm combining simulated annealing and genetic programming components. The performance of the proposed technique will be compared with the only previous existing approach to perform this task, Kraft et al.'s method, showing how our proposal outperforms the latter in terms of accuracy and sometimes also in time consumption. Moreover, it will be showed how the adaptation of the retrieval threshold by the relevance feedback mechanism allows the system effectiveness to be increased.  相似文献   

3.
4.
Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

5.
6.
Enrichment is a process whereby computer-based information is tagged with additional attributes which can be used in an information retrieval system to increase the speed and accuracy of access. In this way, the additional attributes act as external memory aids. Lansdale (1988a) evaluated such a system by looking at the memorability of coloured shapes, placed in different locations on a document, which were used as enrichers in a simple information retrieval task. This paper extends that study to look at memory for labels used in an identical way. Verbal and visual enriching attributes were studied under two conditions: one in which they were assigned to documents automatically by the system, and one in which the users made their own choice. Results indicate a strong trend in which recall was higher when subjects made their own selection of enriching attributes as opposed to having them selected for them. In the comparison of words and icons, there was no evidence that the modalities of the enrichers were a significant factor in recall. Recall performance seems to be primarily related to the 'semantic fit' of the documents and the attributes selected to enrich them. The extent to which this implies potential differences in the utility of visual and verbal methods in future applications is discussed.  相似文献   

7.
8.
Most information retrieval systems use keywords entered by the user as the search criteria to find documents. However, the language used in documents is often complicated and ambiguous, and thus the results obtained by using keywords are often inaccurate. To address this problem, this study developed a semantic-based content mapping mechanism for an information retrieval system. This approach employs the semantic features and ontological structure of the content as the basis for constructing a content map, thus simplifying the search process and improving the accuracy of the returned results.  相似文献   

9.
查询扩展可以有效地消除查询歧义,提高信息检索的准确率和召回率.通过挖掘用户日志中查询词和相关文档的连接关系,构造关联查询,并在此基础上提出一种从关联查询中提取查询扩展词的查询扩展方法.同时,还提出一种查询歧义的判别方法,该方法可以对查询词所表达的检索意图的模糊程度进行有效度量,也可以对查询词的检索性能进行预先估计.通过对查询歧义的度量来动态调整扩展词的长度,提高查询扩展模型的灵活性和适应能力.  相似文献   

10.
With the ever-increasing growth of the World Wide Web, there is an urgent need for an efficient information retrieval system that can search and retrieve handwritten documents when presented with user queries. However, unconstrained handwriting recognition remains a challenging task with inadequate performance thus proving to be a major hurdle in providing robust search experience in handwritten documents. In this paper, we describe our recent research with focus on information retrieval from noisy text derived from imperfect handwriting recognizers. First, we describe a novel term frequency estimation technique incorporating the word segmentation information inside the retrieval framework to improve the overall system performance. Second, we outline a taxonomy of different techniques used for addressing the noisy text retrieval task. The first method uses a novel bootstrapping mechanism to refine the OCR’ed text and uses the cleaned text for retrieval. The second method uses the uncorrected or raw OCR’ed text but modifies the standard vector space model for handling noisy text issues. The third method employs robust image features to index the documents instead of using noisy OCR’ed text. We describe these techniques in detail and also discuss their performance measures using standard IR evaluation metrics.  相似文献   

11.
12.
基于概念图的信息检索的查询扩展模型   总被引:1,自引:0,他引:1  
针对传统的基于关键词匹配的信息检索存在的查全率和精确率不高的问题,提出一种基于概念图匹配的查询扩展方法:一方面通过知网对用户查询的词或者句子进行扩展后,将用户查询和文档生成概念图;另一方面利用概念图的不完全匹配和语义相似度的计算方法计算概念图的相似度,以提高检索效果。实验结果表明该方法取得了良好的效果。  相似文献   

13.
传统的案例查询算法通过被动响应用户的查询请求为用户返回与查询请求相关的案例,忽略了用户查询行为能够对案例查询过程进行指导。提出了一个基于用户查询行为模型的案例查询算法,通过收集用户的查询请求,利用用户查询行为之间的相似度建立用户查询行为的分类模型;分析了用户查询行为的分类算法,重点论述了用户查询行为模型对案例查询过程的指导过程。实验结果表明,该方法能够有效地提高查询结果召回率以及查询成功率。  相似文献   

14.
In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used to infer a lower dimensional latent space that can be subsequently employed to improve various aspects of the retrieval system. We use a subset of a clickthrough corpus from the image search portal of a news agency to evaluate several popular latent variable models in terms of their ability to model topics underlying queries. We demonstrate that latent variable modelling reveals underlying structure in clickthrough data and our results show that computing document similarities in the latent space improves retrieval effectiveness compared to computing similarities in the original query space. These results are compared with baselines using visual and textual features. We show performance substantially better than the visual baseline, which indicates that content-based image retrieval systems that do not exploit query logs could improve recall and precision by taking this historical data into account.  相似文献   

15.
基于UML的构件检索   总被引:2,自引:0,他引:2  
基于构件的软件开发(CBD)是当前大型软件系统开发方法的主流,而CBD的基础是构件库及其检索方法。目前主要采用从领域梃型中获得特定领域知识辅助用户进行构件检索,但缺乏较好的领域模型表示方法。本文对使用UML表示领域模型进行了研究,提出了一个利用UML和领域词典中的领域知识辅助用户刻画领域、扩充和求精初始查询、形成用户的构件需求并指导构件库检索,通过行为相似性确定构件的构件检索方法。该方法增强了用户对领域知识的了解,在检索过程中充分考虑了与构件相关的领域知识、检索上下文以及用户的意图,可对结果集进行有效筛选评优,极大地提高了查全率、查准率及用户的满意度。为了验证该方法的可行性和有效性,设计并实现了一个高效的构件检索环境。  相似文献   

16.
Icon color and icon border shape are two key factors that affect search efficiency and user experience but have previously been studied separately. This study aimed to ascertain their separate and combined effects on smartphone interfaces. We conducted an experiment using eye tracking in addition to performance and experience measures to understand the effects of app icon color and border shape on visual efficiency and user experience. The results identified both features as essential attributes with interactive effects in the process of searching app icons on a smartphone interface. The study confirmed that varied colors across icons and a rounded square border shape helped to improve search efficiency, decrease cognitive effort, and lead to a more positive user experience.Relevance to industryUsers of smartphones are often confronted with the problem of selecting a single app from a great number of apps. Visual design of app icons plays a key role in influencing visual search efficiency and user experience. The results of this study have implications for designing app icons on the interface of smartphones to improve search efficiency and elicit positive user experience.  相似文献   

17.
Improving the recall of information retrieval systems for similarity search in time series databases is of great practical importance. In the manufacturing domain, these systems are used to query large databases of manufacturing process data that contain terabytes of time series data from millions of parts. This allows domain experts to identify parts that exhibit specific process faults. In practice, the search often amounts to an iterative query–response cycle in which users define new queries (time series patterns) based on results of previous queries. This is a well-documented phenomenon in information retrieval and not unique to the manufacturing domain. Indexing manufacturing databases to speed up the exploratory search is often not feasible as it may result in an unacceptable reduction in recall. In this paper, we present a novel adaptive search algorithm that refines the query based on relevance feedback provided by the user. Additionally, we propose a mechanism that allows the algorithm to self-adapt to new patterns without requiring any user input. As the search progresses, the algorithm constructs a library of time series patterns that are used to accurately find objects of the target class. Experimental validation of the algorithm on real-world manufacturing data shows, that the recall for the retrieval of fault patterns is considerably higher than that of other state-of-the-art adaptive search algorithms. Additionally, its application to publicly available benchmark data sets shows, that these results are transferable to other domains.  相似文献   

18.
Traditional Chinese text retrieval systems return a ranked list of documents in response to a user‘s request. While a ranked list of documents may be an appropriate response for the user, frequently it is not.Usually it would be better for the system to provide the answer itself instead of requiring the user to search for the answer in a set of documents. Since Chinese text retrieval has just been developed lately, and due to various specific characteristics of Chinese language, the approaches to its retrieval are quite different from those studies and researches proposed to deal with Western language. Thus, an architecture that augments existing search engines is developed to support Chinese natural language question answering. In this paper a new approach to building Chinese question-answering system is described, which is the general-purpose, fully-automated Chinese question-answering system available on the web. In the approach, we attempt to represent Chinese text by its characteristics, and try to convert the Chinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the question through ERE relation model. The system performs quite well giving the simplicity of the techniques being utilized. Experimental results show that question-answering accuracy can be greatly improved by analyzing more and more matching ERE relation data lists. Simple ERE relation data extraction techniques work well in our system making it efficient to use with many backend retrieval engines.  相似文献   

19.
多个分类方式不同的构件库之间,实现互通,可以有效扩大重用者检索构件的范围和提高检索效率,而检索的查准率和查全率是多构件库检索亟待解决的问题。在对关键词检索的原理和语义关系模型进行分析后,利用领域本体,提出一种基于用户反馈的语义关系识别的多构件库二次检索模型,从而获得高质量的检索结果。实验结果证明了该方法的有效性和可行性。  相似文献   

20.

This article presents a system that carries out highly effective searches over collections of textual information, such as those found on the Internet. The system is made up of two major parts. The first part consists of an agent, Musag, that learns to relate concepts that are semantically ''similar'' to one another. In other words, this agent dynamically builds a dictionary of expressions for a given concept that captures the words people have in mind when mentioning the specific concept. We aim at achieving this by learning from the context in which these words appear. The second part consists of another agent, Sag, which is responsible for retrieving documents, given a set of keywords with relative weights. This retrieval makes use of the dictionary learned by Musag, in the sense that the documents to be retrieved for a query are related to the concept given according to the context of previously scanned documents. In this way, we overcome two main problems with current text search engines, which are largely based on syntactic methods. One problem is that the keyword given in the query might have ambiguous meaning, leading to the retrieval of documents not related to the topic requested. The second problem concerns relevant documents that will not be recommended to the user, since they did not include the specific keyword mentioned in the query. Using context learning methods, we will be able to retrieve such documents if they include other words, learned by Musag, that are related to the main concept. We describe the agents'system architecture, along with the nature of their interactions. We describe our learning and search algorithms and present results from experiments performed on specific concepts. We also discuss the notion of ''cost of learning'' and how it influences the learning process and the quality of the dictionary at any given time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号