首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
时雷  席磊  段其国 《计算机科学》2007,34(10):228-229
本文提出了一种基于粗糙集理论的个性化web搜索系统。用户偏好文件中对关键字进行分组以表示用户兴趣类别。利用粗糙集理论处理自然语言的内在含糊性,根据用户偏好文件对查询条件进行扩展。搜索组件使用扩展后的查询条件搜索相关信息。为了进一步排除不相关信息,排序组件计算查询条件和搜索结果之间的相似程度,根据计算值对搜索结果进行排序。与传统搜索引擎进行了比较,实验结果表明,该系统有效地提高了搜索结果的精度,满足了用户的个性化需求。  相似文献   

2.
Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit for search and retrieval is an individual page, even though information on a topic is often spread across multiple pages. This degrades the quality of search results, especially for long or uncorrelated (multitopic) queries (in which individual keywords rarely occur together in the same document), where a single page is unlikely to satisfy the user's information need. We propose a technique that, given a keyword query, on the fly generates new pages, called composed pages, which contain all query keywords. The composed pages are generated by extracting and stitching together relevant pieces from hyperlinked Web pages and retaining links to the original Web pages. To rank the composed pages, we consider both the hyperlink structure of the original pages and the associations between the keywords within each page. Furthermore, we present and experimentally evaluate heuristic algorithms to efficiently generate the top composed pages. The quality of our method is compared to current approaches by using user surveys. Finally, we also show how our techniques can be used to perform query-specific summarization of Web pages.  相似文献   

3.
Most interactive "query-by-example" based image retrieval systems utilize relevance feedback from the user for bridging the gap between the user's implied concept and the low-level image representation in the database. However, traditional relevance feedback usage in the context of content-based image retrieval (CBIR) may not be very efficient due to a significant overhead in database search and image download time in client-server environments. In this paper, we propose a CBIR system that efficiently addresses the inherent subjectivity in user perception during a retrieval session by employing a novel idea of intra-query modification and learning. The proposed system generates an object-level view of the query image using a new color segmentation technique. Color, shape and spatial features of individual segments are used for image representation and retrieval. The proposed system automatically generates a set of modifications by manipulating the features of the query segment(s). An initial estimate of user perception is learned from the user feedback provided on the set of modified images. This largely improves the precision in the first database search itself and alleviates the overheads of database search and image download. Precision-to-recall ratio is improved in further iterations through a new relevance feedback technique that utilizes both positive as well as negative examples. Extensive experiments have been conducted to demonstrate the feasibility and advantages of the proposed system.  相似文献   

4.
Starting from a member of an image database designated the "query image," traditional image retrieval techniques, for example, search by visual similarity, allow one to locate additional instances of a target category residing in the database. However, in many cases, the query image or, more generally, the target category, resides only in the mind of the user as a set of subjective visual patterns, psychological impressions, or "mental pictures." Consequently, since image databases available today are often unstructured and lack reliable semantic annotations, it is often not obvious how to initiate a search session; this is the "page zero problem." We propose a new statistical framework based on relevance feedback to locate an instance of a semantic category in an unstructured image database with no semantic annotation. A search session is initiated from a random sample of images. At each retrieval round, the user is asked to select one image from among a set of displayed images-the one that is closest in his opinion to the target class. The matching is then "mental." Performance is measured by the number of iterations necessary to display an image which satisfies the user, at which point standard techniques can be employed to display other instances. Our core contribution is a Bayesian formulation which scales to large databases. The two key components are a response model which accounts for the user's subjective perception of similarity and a display algorithm which seeks to maximize the flow of information. Experiments with real users and two databases of 20,000 and 60,000 images demonstrate the efficiency of the search process.  相似文献   

5.
Fuzzy User Modeling for Information Retrieval on the World Wide Web   总被引:5,自引:1,他引:4  
Information retrieval from the World Wide Web through the use of search engines is known to be unable to capture effectively the information needs of users. The approach taken in this paper is to add intelligence to information retrieval from the World Wide Web, by the modeling of users to improve the interaction between the user and information retrieval systems. In other words, to improve the performance of the user in retrieving information from the information source. To effect such an improvement, it is necessary that any retrieval system should somehow make inferences concerning the information the user might want. The system then can aid the user, for instance by giving suggestions or by adapting any query based on predictions furnished by the model. So, by a combination of user modeling and fuzzy logic a prototype system has been developed (the Fuzzy Modeling Query Assistant (FMQA)) which modifies a user's query based on a fuzzy user model. The FMQA was tested via a user study which clearly indicated that, for the limited domain chosen, the modified queries are better than those that are left unmodified. Received 10 November 1998 / Revised 14 June 2000 / Accepted in revised form 25 September 2000  相似文献   

6.
Most Web search engines use the content of the Web documents and their link structures to assess the relevance of the document to the user’s query. With the growth of the information available on the web, it becomes difficult for such Web search engines to satisfy the user information need expressed by few keywords. First, personalized information retrieval is a promising way to resolve this problem by modeling the user profile by his general interests and then integrating it in a personalized document ranking model. In this paper, we present a personalized search approach that involves a graph-based representation of the user profile. The user profile refers to the user interest in a specific search session defined as a sequence of related queries. It is built by means of score propagation that allows activating a set of semantically related concepts of reference ontology, namely the ODP. The user profile is maintained across related search activities using a graph-based merging strategy. For the purpose of detecting related search activities, we define a session boundary recognition mechanism based on the Kendall rank correlation measure that tracks changes in the dominant concepts held by the user profile relatively to a new submitted query. Personalization is performed by re-ranking the search results of related queries using the user profile. Our experimental evaluation is carried out using the HARD 2003 TREC collection and showed that our session boundary recognition mechanism based on the Kendall measure provides a significant precision comparatively to other non-ranking based measures like the cosine and the WebJaccard similarity measures. Moreover, results proved that the graph-based search personalization is effective for improving the search accuracy.  相似文献   

7.
针对根据目前网络信息检索存在的查全率和查准率低的特点,提出一种个性化的局部上下文分析方法,以提高Web信息检索的性能.该方法通过设计一种客户端的用户兴趣挖掘模型,同时将用户兴趣模型与局部上下文分析方法相结合,克服了局部上下文分析的缺陷.实验结果显示该方法能有效提高Web信息检索的查全率与查准率.  相似文献   

8.
用户兴趣和行为的多样性使得为不同用户提供更符合其查询意图的搜索结果成为一个具有挑战性的任务.Web 2.0下的社会标签是用户为他们感兴趣的网页等对象进行标注行为的结果,用户用标签来描述自己感兴趣的话题.这些标签不但代表着用户的兴趣,而且是对网页承载信息的最好揭示.提出了面向用户查询意图的标签推荐方法,旨在把能够体现用户真正查询意图的标签选择出来.标签作为对查询关键词的补充,不仅可以弥补用户短查询的缺陷,而且可以根据标签与网页上曾被标注过的标签间的关系,更准确地判断用户查询意图与网页内容之间的相关度,从而把更符合用户查询兴趣的结果排在靠前的位置上.实验结果表明,该方法比现有的其他方法更有效,这也说明社会标注对更准确地捕捉用户真实查询意图确实有重要作用.  相似文献   

9.
When classifying search queries into a set of target categories, machine learning based conventional approaches usually make use of external sources of information to obtain additional features for search queries and training data for target categories. Unfortunately, these approaches rely on large amount of training data for high classification precision. Moreover, they are known to suffer from inability to adapt to different target categories which may be caused by the dynamic changes observed in both Web topic taxonomy and Web content. In this paper, we propose a feature-free classification approach using semantic distance. We analyze queries and categories themselves and utilizes the number of Web pages containing both a query and a category as a semantic distance to determine their similarity. The most attractive feature of our approach is that it only utilizes the Web page counts estimated by a search engine to provide the search query classification with respectable accuracy. In addition, it can be easily adaptive to the changes in the target categories, since machine learning based approaches require extensive updating process, e.g., re-labeling outdated training data, re-training classifiers, to name a few, which is time consuming and high-cost. We conduct experimental study on the effectiveness of our approach using a set of rank measures and show that our approach performs competitively to some popular state-of-the-art solutions which, however, frequently use external sources and are inherently insufficient in flexibility.  相似文献   

10.
随着Web信息的快速增长和人们对信息检索质量要求的提高,传统的搜索引擎已不能很好地满足人们的需求. 本文提出了一种个性化元搜索引擎模型.个性化是指模型可以针对不同的用户建立不同的用户兴趣模型,然后根据用户兴趣,模型对搜索结果进行过滤、重排序处理,使得显示给用户的搜索结果更具有针对性.本文阐述了各主要功能模块工作原理,并详细介绍了根据用户兴趣模型对搜索结果进行排序的算法,实验表明该算法能够有效地提高用户的检索质量.  相似文献   

11.
We discuss an adaptive approach towards Content-Based Image Retrieval. It is based on the Ostensive Model of developing information needs—a special kind of relevance feedback model that learns from implicit user feedback and adds a temporal notion to relevance. The ostensive approach supports content-assisted browsing through visualising the interaction by adding user-selected images to a browsing path, which ends with a set of system recommendations. The suggestions are based on an adaptive query learning scheme, in which the query is learnt from previously selected images. Our approach is an adaptation of the original Ostensive Model based on textual features only, to include content-based features to characterise images. In the proposed scheme textual and colour features are combined using the Dempster-Shafer theory of evidence combination. Results from a user-centred, work-task oriented evaluation show that the ostensive interface is preferred over a traditional interface with manual query facilities. This is due to its ability to adapt to the user's need, its intuitiveness and the fluid way in which it operates. Studying and comparing the nature of the underlying information need, it emerges that our approach elicits changes in the user's need based on the interaction, and is successful in adapting the retrieval to match the changes. In addition, a preliminary study of the retrieval performance of the ostensive relevance feedback scheme shows that it can outperform a standard relevance feedback strategy in terms of image recall in category search.  相似文献   

12.
In this paper, we propose CYBER, a CommunitY Based sEaRch engine, for information retrieval utilizing community feedback information in a DHT network. In CYBER, each user is associated with a set of user profiles that capture his/her interests. Likewise, a document is associated with a set of profiles—one for each indexed term. A document profile is updated by users who query on the term and consider the document as a relevant answer. Thus, the profile acts as a consolidation of users feedback from the same community, and reflects their interests. In this way, as one user finds a document to be relevant, another user in the same community issuing a similar query will benefit from the feedback provided by the earlier user. Hence, the search quality in terms of both precision and recall is improved. Moreover, we further improve the effectiveness of CYBER by introducing an index tuning technique. By choosing the indexing terms more carefully, community-based relevance feedback is utilized in both building/refining indices and re-evaluating queries. We first propose a naive scheme, CYBER+, which involves an index tuning technique based on past queries only, and then re-evaluates queries in a separate step. We then propose a more complex scheme, CYBER+ +, which refines its index based on both past queries and relevance feedback. As the index is built with more selective and accurate terms, the search performance is further improved. We conduct a comprehensive experimental study and the results show the effectiveness of our schemes.  相似文献   

13.
In this paper, we advance a technique to develop a user profile for information retrieval through knowledge acquisition techniques. The profile bridges the discrepancy between user-expressed keywords and system-recognizable index terms. The approach presented in this paper is based on the application of personal construct theory to determine a user's vocabulary and his/her view of different documents in a training set. The elicited knowledge is used to develop a model for each phrase/concept given by the user by employing machine learning techniques.Our model correlates the concepts in a user's vocabulary to the index terms present in the documents in the training set. Computation of dependence between the user phrases also contributes in the development of the user profile and in creating a classification of documents. The resulting system is capable of automatically identifying the user concepts and query translation to index terms computed by the conventional indexing process. The system is evaluated by using the standard measures of precision and recall by comparing its performance against the performance of the smart system for different queries.This research is supported by the NSF grant IRI-8805875.  相似文献   

14.
基于大规模日志分析的搜索引擎用户行为分析   总被引:18,自引:0,他引:18  
用户行为分析是网络信息检索技术得以前进的重要基石,也是能够在商用搜索引擎中发挥重要作用的各种算法的基本出发点之一。为了更好的理解中文搜索用户的检索行为,本文对搜狗搜索引擎在一个月内的近5 000万条查询日志进行了分析。我们从独立查询词分布、同一session内的用户查询习惯及用户是否使用高级检索功能等方面对用户行为进行了分析。分析结论对于改进中文搜索引擎的检索算法和更准确的评测检索效果都有较好的指导意义。  相似文献   

15.
Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

16.
集成搜索引擎的文本数据库选择   总被引:8,自引:0,他引:8  
用户需要检索的信息往往分散存储在多个搜索多个搜索引擎各自的数据库里,对普通用户而言,访问多个搜索引擎并从返回的结果中分辨出确实有网页是一件费时费力的工作,集成搜索引擎则可以提供给用户一个同时记问多个搜索引擎人集成环境,集成搜索引擎能将其接收到的用户查询提交给底层的多个搜索引擎进行搜索,作为一种搜索工具,集成搜索引擎具有如WEB查询覆盖面比传统引擎更大,引警有更好的可扩展性等优点,讨论了解决集成搜索引擎的数据库选择问题的多种技术,针对用户提交的查询要求,通过数据库选择可以选定最有可能返回有用信息的底层搜索引擎。  相似文献   

17.
User profiling in web search has the advantage of enabling personalized web search: the quality of the results offered by the search engine to the user is increased by taking the user’s interests into account when presenting those results. The negative side is that the interests and the query history of users may contain information considered as private; hence, technology should be provided for users to avoid profiling if they wish so. There are several anti-profiling approaches in web search, from basic level countermeasures to private information retrieval and including profile obfuscation. Except private information retrieval (PIR), which hides the retrieved item from the database, the rest of approaches focus on anonymizing the user’s identity and fall into the category of anonymous keyword search (also named sometimes user-private information retrieval). Most current PIR protocols are ill-suited to provide PIR from a search engine or large database, due to their complexity and their assumption that the database actively cooperates in the PIR protocol. Peer-to-peer profile obfuscation protocols appear as a competitive option provided that peers are rationally interested in helping each other. We present a game-theoretic analysis of P2P profile obfuscation protocols which shows under which conditions helping each other is in the peers’ rational interest.  相似文献   

18.
一种通过内容和结构查询文档数据库的方法   总被引:4,自引:0,他引:4       下载免费PDF全文
文档是有一定逻辑结构的,标题、章节、段落等这些概念是文档的内在逻辑.不同的用户对文档的检索,有不同的需求,检索系统如何提供有意义的信息,一直是研究的中心任务.结合文档的结构和内容,对结构化文件的检索,提出了一种新的计算相似度的方法.这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节.基于这种方法实现了一个问题回答系统,测试集是微软的百科全书Encarta,通过与传统方法实验比较,证明通过这种方法检索的文章片断更合理、更有效.  相似文献   

19.
Keyword queries have long been popular to search engines and to the information retrieval community and have recently gained momentum for its usage in the expert systems community. The conventional semantics for processing a user query is to find a set of top-k web pages such that each page contains all user keywords. Recently, this semantics has been extended to find a set of cohesively interconnected pages, each of which contains one of the query keywords scattered across these pages. The keyword query having the extended semantics (i.e., more than a list of keywords hyperlinked with each other) is referred to the graph query. In case of the graph query, all the query keywords may not be present on a single Web page. Thus, a set of Web pages with the corresponding hyperlinks need to be presented as the search result. The existing search systems reveal serious performance problem due to their failure to integrate information from multiple connected resources so that an efficient algorithm for keyword query over graph-structured data is proposed. It integrates information from multiple connected nodes of the graph and generates result trees with the occurrence of all the query keywords. We also investigate a ranking measure called graph ranking score (GRS) to evaluate the relevant graph results so that the score can generate a scalar value for keywords as well as for the topology.  相似文献   

20.
基于用户查询意图识别的Web搜索优化模型   总被引:2,自引:1,他引:1  
杨艺  周元 《计算机科学》2012,39(1):264-267
在对用户查询意图进行分析分类的基础上,提出了一种Web搜索优化模型。该模型通过识别用户查询意图来查询意图特征词和内容主题词的双重约束,再结合用户查询行为获得查询目标,既保证了用户查询意图的准确匹配,又自动过滤和屏蔽了不相关信息。与相关工作对比,其重点在于准确获取用户查询意图,提高用户满意度。实验结果表明,该模型在实现信息搜索准确性和用户对查询结果满意度方面比传统搜索方法有明显改善。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号