首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
一种通过内容和结构查询文档数据库的方法   总被引:4,自引:0,他引:4       下载免费PDF全文
文档是有一定逻辑结构的,标题、章节、段落等这些概念是文档的内在逻辑.不同的用户对文档的检索,有不同的需求,检索系统如何提供有意义的信息,一直是研究的中心任务.结合文档的结构和内容,对结构化文件的检索,提出了一种新的计算相似度的方法.这种方法可以提供多粒度的文档内容的检索,包括从单词、短语到段落或者章节.基于这种方法实现了一个问题回答系统,测试集是微软的百科全书Encarta,通过与传统方法实验比较,证明通过这种方法检索的文章片断更合理、更有效.  相似文献   

2.
基于内容的图象检索系统的设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
依据当前对图象查询的要求,本文设计了一套完整的基于内容的图象信息检索系统,该系统较以往的各种系统,功能更加全面。对基于内容的图象信息检索算法作了研究.重点阐述了对颜色、边缘、纹理等全局特征的提取与匹配算法。实验结果表明,该系统能有效、快速地检索大规模的图象数据库,具有一定的应用价值。  相似文献   

3.
Retrieval Failure and Recovery in Recommender Systems   总被引:2,自引:0,他引:2  
  相似文献   

4.
Applying EuroWordNet to Cross-Language Text Retrieval   总被引:1,自引:0,他引:1  
We discuss ways in which EuroWordNet (EWN) can be used in multilingual information retrieval activities, focusing on two approaches to Cross-Language Text Retrieval that use the EWN database as a large-scale multilingual semantic resource. The first approach indexes documents and queries in terms of the EuroWordNet Inter-Lingual-Index, thus turning term weighting and query/document matching into language-independent tasks. The second describes how the information in the EWN database could be integrated with a corpus-based technique, thus allowing retrieval of domain-specific terms that may not be present in our multilingual database. Our objective is to show the potential of EuroWordNet as a promising alternative to existing approaches to Cross-Language Text Retrieval.  相似文献   

5.
高维索引技术作为高维空间数据的快速查询手段,对使用高维数据的基于内容图像检索有着广泛的应用。本文提出以Guttm an提出的R树结构建立存储图像的特征值的高维索引结构来提高图像检索效率。首先对R树的结构进行介绍,然后通过对比相同情况下使用线性查询和R树查询各自的查询次数和查询时间分析R树查询的优势。实验结果表明,利用R树结构可以减少图像检索的查询次数和查询时间,明显地提高图像检索的效率。  相似文献   

6.
针对用户查询语句中所使用的词语和语料库中使用的词语不完全相同的问题,该文提出了基于用户信息的信息检索效果提高策略。充分利用用户查询语句中所使用的词语,适当的进行查询扩展,以达到解决词语不匹配的矛盾。在NTCIR-4的测试语料集上的实验结果表明,该文的方法是有效的。  相似文献   

7.
一种基于上下文的中文信息检索查询扩展   总被引:13,自引:5,他引:13  
在中文信息检索的研究和实践中,由于查询中所使用的词可能与文件集中使用的词不匹配而导致一些相关的文件不能被成功地检索出来,这是影响检索效果的一个很关键的问题。查询扩展可以在一定程度上解决这种词的不匹配现象,然而,实验表明,通常简单的查询扩展并不能稳定地提高中文信息检索的检索效果。本论文中提出并实现了一种基于上下文的查询扩展方法,可以根据查询的上下文对扩展词进行选择,是一种相对“智能”的查询扩展方法。在TREC - 9 中文信息检索测试集上进行的实验表明,相对于通常简单的查询扩展,基于上下文的查询扩展方法取得了具有统计意义提高的检索效果。  相似文献   

8.
基于内容的大数据量商标检索系统   总被引:1,自引:0,他引:1  
到目前为止,大数据量的图像检索依然是一个难题,提出了运行在一种大型数据库上的基于内容的快速的商标图像检索.首先,从商标图像中提取两种统计特征,然后采用概率主成分分析降维,生成特征字典一数据库中商标图像集的一个特征映射.在检索阶段,采用快速的层次检索来得到一个数目不定的候选集,再通过相关反馈进行不断的优化,将候选集的数目减少.直至符合检索要求.在国家商标局提供的30,0270商标图像上运行本系统,每一个查询时间不超过0.3秒.  相似文献   

9.
The Cambridge University Multimedia Document Retrieval (CU-MDR) Demo System is a web-based application that allows the user to query a database of radio broadcasts that are available on the Internet. The audio from several radio stations is downloaded and transcribed automatically. This gives a collection of text and audio documents that can be searched by a user. The paper describes how speech recognition and information retrieval techniques are combined in the CU-MDR Demo System and shows how the user can interact with it.  相似文献   

10.
1 IntroductionThe eXPlOSive growth of the internet and other souxces of networked information has madeautomatic mediation of access to networked information sources an increasingly boortatproblem. Much of this information is eXPressed as electronic teXt in English. However, mostChinese users are able to read English bat without fluent writing ability. So they would liketo express their queries in Chinese to retrieve the rele~ English documents.The use of such systems can aJ8o be benefici…  相似文献   

11.
The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this paper, we study six predictors of query performance, which can be generated prior to the retrieval process without the use of relevance scores. As a consequence, the cost of computing these predictors is marginal. The linear and non-parametric correlations of the proposed predictors with query performance are thoroughly assessed on the Text REtrieval Conference (TREC) disk4 and disk5 (minus CR) collection with the 249 TREC topics that were used in the recent TREC2004 Robust Track. According to the results, some of the proposed predictors have significant correlation with query performance, showing that these predictors can be useful to infer query performance in practical applications.  相似文献   

12.
Enhancing Concept-Based Retrieval Based on Minimal Term Sets   总被引:1,自引:0,他引:1  
There is considerable interest in bridging the terminological gap that exists between the way users prefer to specify their information needs and the way queries are expressed in terms of keywords or text expressions that occur in documents. One of the approaches proposed for bridging this gap is based on technologies for expert systems. The central idea of such an approach was introduced in the context of a system called Rule Based Information Retrieval by Computer (RUBRIC). In RUBRIC, user query topics (or concepts) are captured in a rule base represented by an AND/OR tree. The evaluation of AND/OR tree is essentially based on minimum and maximum weights of query terms for conjunctions and disjunctions, respectively. The time to generate the retrieval output of AND/OR tree for a given query topic is exponential in number of conjunctions in the DNF expression associated with the query topic. In this paper, we propose a new approach for computing the retrieval output. The proposed approach involves preprocessing of the rule base to generate Minimal Term Sets (MTSs) that speed up the retrieval process. The computational complexity of the on-line query evaluation following the preprocessing is polynomial in m. We show that the computation and use of MTSs allows a user to choose query topics that best suit their needs and to use retrieval functions that yield a more refined and controlled retrieval output than is possible with the AND/OR tree when document terms are binary. We incorporate p-Norm model into the process of evaluating MTSs to handle the case where weights of both documents and query terms are non-binary.  相似文献   

13.
Recent progress in peer to peer (P2P) search algorithms has presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P databases. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significant reduction in the system's storage requirements. During query lookup, agents use unstructured search to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved in structured and unstructured approaches, allowing for a significant reduction in query costs. Finally, we address how node failures can be effectively addressed through storing multiple copies of selected data. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.  相似文献   

14.
User Modeling for Adaptive News Access   总被引:16,自引:0,他引:16  
We present a framework for adaptive news access, based on machine learning techniques specifically designed for this task. First, we focus on the system's general functionality and system architecture. We then describe the interface and design of two deployed news agents that are part of the described architecture. While the first agent provides personalized news through a web-based interface, the second system is geared towards wireless information devices such as PDAs (personal digital assistants) and cell phones. Based on implicit and explicit user feedback, our agents use a machine learning algorithm to induce individual user models. Motivated by general shortcomings of other user modeling systems for Information Retrieval applications, as well as the specific requirements of news classification, we propose the induction of hybrid user models that consist of separate models for short-term and long-term interests. Furthermore, we illustrate how the described algorithm can be used to address an important issue that has thus far received little attention in the Information Retrieval community: a user's information need changes as a direct result of interaction with information. We empirically evaluate the system's performance based on data collected from regular system users. The goal of the evaluation is not only to understand the performance contributions of the algorithm's individual components, but also to assess the overall utility of the proposed user modeling techniques from a user perspective. Our results provide empirical evidence for the utility of the hybrid user model, and suggest that effective personalization can be achieved without requiring any extra effort from the user.  相似文献   

15.
We discuss an adaptive approach towards Content-Based Image Retrieval. It is based on the Ostensive Model of developing information needs—a special kind of relevance feedback model that learns from implicit user feedback and adds a temporal notion to relevance. The ostensive approach supports content-assisted browsing through visualising the interaction by adding user-selected images to a browsing path, which ends with a set of system recommendations. The suggestions are based on an adaptive query learning scheme, in which the query is learnt from previously selected images. Our approach is an adaptation of the original Ostensive Model based on textual features only, to include content-based features to characterise images. In the proposed scheme textual and colour features are combined using the Dempster-Shafer theory of evidence combination. Results from a user-centred, work-task oriented evaluation show that the ostensive interface is preferred over a traditional interface with manual query facilities. This is due to its ability to adapt to the user's need, its intuitiveness and the fluid way in which it operates. Studying and comparing the nature of the underlying information need, it emerges that our approach elicits changes in the user's need based on the interaction, and is successful in adapting the retrieval to match the changes. In addition, a preliminary study of the retrieval performance of the ostensive relevance feedback scheme shows that it can outperform a standard relevance feedback strategy in terms of image recall in category search.  相似文献   

16.
Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. This paper addresses the ranking of documents by both textual and spatial relevance. To this end, we introduce multi-dimensional scattered ranking, where textually and spatially similar documents are ranked spread in the list, instead of consecutively. The effect of this is that documents close together in the ranked list have less redundant information. We present various ranking methods of this type, efficient algorithms to implement them, and experiments to show the outcome of the methods.*This research is supported by the EU-IST Project No. IST-2001-35047 (SPIRIT).  相似文献   

17.
面向本体的语义相似度计算及在检索中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
检索是获取信息的重要方式。传统检索只停留在关键字异同的逻辑层面,忽略了语义层面的信息。以本体的知识组织体系为基础,以检索应用为目标,提出面向本体的文档和查询的语义向量表示方法,进而建立面向本体的相似度计算方法,为语义检索创造条件,检索结果关注语义层面的匹配。并在理论的指导下,进行实验和分析。  相似文献   

18.
XML is an ordered data model and XQuery expressions return results that have a well-defined order. However, little work on how order is supported in XML query processing has been done to date. In this paper we study the issues related to handling order in the XML context, namely challenges imposed by the XML data model, the variety of order requirements of the XQuery language, and the need to maintain order in the presence of updates to the XML data. We propose an efficient solution that addresses all these issues. Our solution is based on a key encoding for XML nodes that serves as node identity and at the same time encodes order. We design rules for encoding order of processed XML nodes based on the XML algebraic query execution model and the node key encoding. These rules do not require any actual sorting for intermediate results during execution. Our approach enables efficient order-sensitive incremental view maintenance as it makes most XML algebra operators distributive with respect to bag union. We prove the correctness of our order encoding approach. Our approach is implemented and integrated with Rainbow, an XML data management system developed at WPI. We have tested the efficiency of our approach using queries that have different order requirements. We have also measured the relative cost of different components related to our order solution in different types of queries. In general the overhead of maintaining order in our approach is very small relative to the query processing time.  相似文献   

19.
Relevance Feedback in Content-Based Image Retrieval is an active field of research. Many mechanisms of Relevance Feedback exist with many interactive techniques and implement criteria. In this paper, we proposed a novel approach of RF which can set adaptive weights of similarity measurement for each database image from the user feedback, i.e. ego-similarity measurement. We would explore the feedback records were archived in the two different ways that stored along with query images (QRF-based) or along with each retrieved relevant image from the image database (DBRF-based). In the experiment, DBRF-based relevant feedback improved greatly in the retrieval effectiveness.  相似文献   

20.
We describe two scenarios of user tasks in which access to multimedia data plays a significant role. Because current multimedia databases cannot support these tasks, we introduce three new requirements on multimedia databases: multimedia objects should be active objects, querying is an interaction process, and query processing uses multiple representations. We discuss three techniques to handle multimedia objects as active objects. Also, we introduce a promising database architecture to meet the new user requirements. Agents within the database handle objects' representations, and a search engine on top of a conventional database handles relevance feedback and multiple representations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号