首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《Information Systems》2005,30(7):543-563
One of the main problems in the (web) information retrieval is the ambiguity of users’ queries, since they tend to post very short queries which do not express their information need clearly. This seems to be valid for the ontology-based information retrieval in which the domain ontology is used as the backbone of the searching process. In this paper, we present a novel approach for determining possible refinements of an ontology-based query. The approach is based on measuring the ambiguity of a query with respect to the original user's information need. We defined several types of the ambiguities concerning the structure of the underlying ontology and the content of the information repository. These ambiguities are interpreted regarding the user's information need, which we infer from the user's behaviour in searching process. Finally, the ranked list of the potentially useful refinements of her query is provided to the user. We present a small evaluation study that shows the advantages of the proposed approach.  相似文献   

2.
A video retrieval system user hopes to find relevant information when the proposed queries are ambiguous. The retrieval process based on detecting concepts remains ineffective in such a situation. Potential relationships between concepts have been shown as a valuable knowledge resource that can enhance the retrieval effectiveness, even for ambiguous queries. Recent researches in multimedia retrieval have focused on ontology modeling as a common framework to manage knowledge. Handling these ontologies has to cope with issues related to generic knowledge management and processing scalability. Considering these issues, we suggest a context-based fuzzy ontology framework for video content analysis and indexing. In this paper, we focused on the way in which we modeled our fuzzy ontology: First, we populate automatically the generated ontology by gathering various available video annotation datasets. Then, the ontology content was used to infer enhanced video semantic interpretation. Finally, considering user feedback, the content of the ontology was improved. Experimental results showed that our approach achieves the goal of scalability while at the same time allowing better video content semantic interpretation.  相似文献   

3.
A vast amount of social feedback expressed via ratings (i.e., likes and dislikes) and comments is available for the multimedia content shared through Web 2.0 platforms. However, the potential of such social features associated with shared content still remains unexplored in the context of information retrieval. In this paper, we first study the social features that are associated with the top-ranked videos retrieved from the YouTube video sharing site for the real user queries. Our analysis considers both raw and derived social features. Next, we investigate the effectiveness of each such feature for video retrieval and the correlation between the features. Finally, we investigate the impact of the social features on the video retrieval effectiveness using state-of-the-art learning to rank approaches. In order to identify the most effective features, we adopt a new feature selection strategy based on the Maximal Marginal Relevance (MMR) method, as well as utilizing an existing strategy. In our experiments, we treat popular and rare queries separately and annotate 4,969 and 4,949 query-video pairs from each query type, respectively. Our findings reveal that incorporating social features is a promising approach for improving the retrieval performance for both types of queries.  相似文献   

4.
One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user’s queries by additional context. We used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents. Then, we applied a query expansion approach based on the query context and the Latent Semantic Analyses method. Using a web test collection, we tested our approach on short and long queries. We varied the number of recommended queries and the number of expansion terms to specify the appropriate parameters for the proposed approach. Experimental results show that the proposed approach improves the effectiveness of the information retrieval system by 19.23 % for short queries and 52.94 % for long queries according to the retrieval results using the original users’ queries.  相似文献   

5.
随着微博的快速发展,微博检索已经成为近年来研究领域的热点之一。该文首先以TREC Microblog数据为基础,从分析微博文档和微博查询两方面出发,得出微博检索与传统文本检索之间的两点不同: 一是微博文档相较于网页具有很多独有的特征;二是微博查询属于时间敏感查询,即在排序时除了考虑文本的语义相似度,还需要考虑时间因素,将这类方法统称为时间感知的检索技术。这两点差异使得已有的信息检索技术不能满足微博搜索的需求。该文主要介绍了近年来这两方面的相关研究: 首先描述了微博本身的多种特征以及基于这些特征提出的检索方法;然后以传统信息检索过程为主线,分别介绍了将时间信息用于文本表示、文档先验、查询扩展三方面的排序模型,最后总结了已有工作并且对未来研究内容进行了展望。  相似文献   

6.
The increasing number of images on the Web and other information environments, needs efficient management and suitable retrieval especially by computers. Image annotation is a process which produces words for a digital image based on its content. Users prefer an image search based on text queries and keywords which has increased the use of image annotation. In this paper, we discuss the applicability of structured sparse representations at image annotation. First the components of image annotation and sparse representation are reviewed. Then, we survey the structure of sparse representation based on the image annotation algorithms. Next, the comparison of algorithm has been presented. Finally the paper concludes with some major challenges and open issues in image annotation using structured sparse representations.  相似文献   

7.
The paper proposes a preprocessing scheme for efficient processing of XML queries in XML-based information retrieval systems. For the preprocessing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based information retrieval systems, the user queries have the form of path queries. Therefore, the flat signature cannot be effective for XML documents. In the paper, we propose two structured signature methods for XML documents. Through experiments, we evaluate the performance of the proposed methods.  相似文献   

8.
在XML文档上进行全文本检索已经成为很多研究课题的基础问题,例如Web信息检索,信息抽取等。有效的XML索引结构对于加速检索速度是至关重要的,在文献[1]的基础上全面地构建和实现了一个可以有效的支持XML全文本检索的索引结构。实验表明提出的索引结构在索引构建时间、空间等性能指标上均有很好的表现。  相似文献   

9.
互联网上大部分的数字化信息都与地球上的地点和位置关联,信息检索查询中大量地包含地理信息,传统的基于关键字匹配方法没有考虑检索中的空间关系,无法满足此类检索需求。地理信息检索根据地理范围从文档中获取空间语义匹配的地理知识文档,成为国内外信息检索和GIS领域的热点研究方向。提出了一个地理信息检索的基本系统框架,依据该框架对地理信息知识库、地理信息抽取、地理信息检索模型、混合索引和检索可视化等关键性技术进行了分类概括总结。在对已有技术进行深入对比分析的基础上,指出了该领域未来的研究工作和面临的挑战,并提供了大量的参考文献。  相似文献   

10.
A content-search information retrieval process based on conceptual graphs   总被引:1,自引:0,他引:1  
An intelligent information retrieval system is presented in this paper. In our approach, which complies with the logical view of information retrieval, queries, document contents and other knowledge are represented by expressions in a knowledge representation language based on the conceptual graphs introduced by Sowa. In order to take the intrinsic vagueness of information retrieval into account, i.e. to search documents imprecisely and incompletely represented in order to answer a vague query, different kinds of probabilistic logic are often used. The search process described in this paper uses graph transformations instead of probabilistic notions. This paper is focused on the content-based retrieval process, and the cognitive facet of information retrieval is not directly addressed. However, our approach, involving the use of a knowledge representation language for representing data and a search process based on a combinatorial implementation of van Rijsbergen’s logical uncertainty principle, also allows the representation of retrieval situations. Hence, we believe that it could be implemented at the core of an operational information retrieval system. Two applications, one dealing with academic libraries and the other concerning audiovisual documents, are briefly presented.  相似文献   

11.
Logic-based models have been already proposed for information retrieval purpose. However, there is a need for new formalisms providing more generic frameworks. For this purpose, an information retrieval axiomatic theory is proposed in this paper, independently of any model. Our proposal which mainly relies on many-sorted logic allows to consider various sets in the domain of discourse that provides us a rich framework to model the different items such as documents, index terms, queries. The theory relies on a sound set of axioms driving the retrieval process as proof of theorems. As such the genericity consists of a main motivation; it will be proved that three classical information retrieval models, namely the Boolean model; the fuzzy-set-based extension of the Boolean model; and the vector space model, satisfy the proposed theory, establishing then its consistency. Beyond the genericity, the proposed approach may face concrete problems. Indeed, it is well known that the use of the classical settings of formal concept analysis theory for information retrieval does not allow disjunctions and negations in queries. For this purpose, this paper gives a characterization of these queries forms using appropriates theorems of the theory. Useful algebraic properties (i.e., isomorphisms) are then established for this end.  相似文献   

12.
基于汉明距离的文本相似度计算   总被引:8,自引:1,他引:7  
传统的文本分类中相似度的计算,是根据欧氏空间中向量之间夹角的余弦值来表征的,它根据余弦值的大小来反映文本之间的相互关系。该文则首先建立文本集与码字集之间的1-1对应关系,然后借用编码理论中汉明距离的概念,由汉明距离的计算公式,得到了一种全新的文本相似度的计算方法,与传统的方法相比较,它具有简便,快速等优点。  相似文献   

13.
The goal of object retrieval is to rank a set of images by the similarity of their contents to those of a query image. However, it is difficult to measure image content similarity due to visual changes caused by varying viewpoint and environment. In this paper, we propose a simple, efficient method to more effectively measure content similarity from image measurements. Our method is based on the ranking information available from existing retrieval systems. We observe that images within the set which, when used as queries, yield similar ranking lists are likely to be relevant to each other and vice versa. In our method, ranking consistency is used as a verification method to efficiently refine an existing ranking list, in much the same fashion that spatial verification is employed. The efficiency of our method is achieved by a list-wise min-Hash scheme, which allows rapid calculation of an approximate similarity ranking. Experimental results demonstrate the effectiveness of the proposed framework and its applications.  相似文献   

14.
隐含语义索引及其在中文文本处理中的应用研究   总被引:33,自引:0,他引:33  
信息检索本质上是语义检索,而传统信息检索系统都是基于独立词索引,因此检索效果并不理想,隐含语义索引是一种新型的信息检索模型,它通过奇异值分析,将词向量和文档向量投影到一个低维空间,消减了词和文档之间的语义模糊度,使得文档之间的语义关系更为明晰。实验和理论结果证实了隐含语义索引能够取得更好的检索效果。本文论述了隐含语义索引的理论基础,研究了隐含语义索引在中文文本处理中的应用,包括中文文本检索、中文文本分类和中文文本聚类等。  相似文献   

15.
In this paper, we present a framework for a feedback process to implement a highly accurate document retrieval system. In the system, a document vector space is created dynamically to implement retrieval processing. The retrieval accuracy of the system depends on the vector space. When the vector space is created based on a specific purpose and interest of a user, highly accurate retrieval results can be obtained. In this paper, we present a method for analyzing and personalizing the vector space according to the purposes and interests of users. In order to optimize the document vector space, we defined and implemented functions for the operations of adding, deleting and weighting the terms that were used to create the vector space. By exploiting effectively and dynamically the classified-document information related to the queries, our methods allow users to retrieve relevant documents for their interests and purposes. Even if the search results of the initial retrieval space are not appropriate, by applying the proposed feedback operations, our proposed method effectively improves the search results. We also implemented an experimental search system for semantic document retrieval. Several experimental results including comparisons of our method with the traditional relevance feedback method is presented to clarify how retrieval accuracy was improved by the feedback process and how accurately documents that satisfied the purpose and interests of users were extracted.  相似文献   

16.
基于文档实例的中文信息检索   总被引:2,自引:0,他引:2  
传统的信息检索系统基于关键词建立索引并进行信息检索.这些系统存在查询返回文档集大、准确率低和普通用户不便于构造查询等不足.为此,该文提出基于文档实例的信息检索,即以已有文档作为样本,在文档库中检索与样本文档相似的所有文档.文中给出了基于文档实例的中文信息检索的解决方法和实现技术.初步实验结果表明该方法是行之有效的.  相似文献   

17.
针对当前信息资源描述框架(RDF)检索过程中存在的内存使用过大及检索效率低等问题,提出一个RDF图的层次聚类语义检索模型,设计并实现了相应的检索方法。首先从RDF图中抽取实体数据,在本体库的指导下,通过层次聚类,将复杂的图形结构转换为适合检索的树型结构;根据在树中查找到的目标对象,确定其在RDF图中的位置,进行语义扩充查询。检索模型的构建缩小了检索范围,从而提高了检索效率,其语义扩充查询还可以得到较好的查全率。  相似文献   

18.
In this paper, we extend the work of Kraft et al. to present a new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. First, we present a fuzzy agglomerative hierarchical clustering algorithm for clustering documents and to get the document cluster centers of document clusters. Then, we present a method to construct fuzzy logic rules based on the document clusters and their document cluster centers. Finally, we apply the constructed fuzzy logic rules to modify the user's query for query expansion and to guide the information retrieval system to retrieve documents relevant to the user's request. The fuzzy logic rules can represent three kinds of fuzzy relationships (i.e., fuzzy positive association relationship, fuzzy specialization relationship and fuzzy generalization relationship) between index terms. The proposed fuzzy information retrieval method is more flexible and more intelligent than the existing methods due to the fact that it can expand users' queries for fuzzy information retrieval in a more effective manner.  相似文献   

19.
An unsolved problem in logic-based information retrieval is how to obtain automatically logical representations for documents and queries. This problem limits the impact of logical models for information retrieval because their full expressive power cannot be harnessed. In this paper we propose a method for producing logical document representations which goes further than other simplistic “bag-of-words” approaches. The suggested procedure adopts popular information retrieval heuristics, such as document length corrections and global term distribution. This work includes a report of several experiments applying partial document representations in the context of a propositional model of information retrieval. The benefits of this expressive framework, powered by the new logical indexing approach, become apparent in the evaluation.  相似文献   

20.
在全文信息检索系统中,存储文本及其上关键词的索引结构需要大量的空间。位图索引不能支持基于信息量的查询,倒排文件需要的空间比较大。提出了频率向量这种索引结构的压缩存储方法,设计并实现了基于这种压缩存储方法的存储结构,理论分析表明该压缩方法与存储结构可以获得较高的压缩比;此外,还讨论了压缩频率向量上的查询处理技术,实验结果表明这种压缩的索引结构能够保证查询结果的完备性,并能有效地提高频率向量的存储和查询效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号