首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
One of the key difficulties for users in information retrieval is to formulate appropriate queries to submit to the search engine. In this paper, we propose an approach to enrich the user’s queries by additional context. We used the Language Model to build the query context, which is composed of the most similar queries to the query to expand and their top-ranked documents. Then, we applied a query expansion approach based on the query context and the Latent Semantic Analyses method. Using a web test collection, we tested our approach on short and long queries. We varied the number of recommended queries and the number of expansion terms to specify the appropriate parameters for the proposed approach. Experimental results show that the proposed approach improves the effectiveness of the information retrieval system by 19.23 % for short queries and 52.94 % for long queries according to the retrieval results using the original users’ queries.  相似文献   

3.
Improving the recall of information retrieval systems for similarity search in time series databases is of great practical importance. In the manufacturing domain, these systems are used to query large databases of manufacturing process data that contain terabytes of time series data from millions of parts. This allows domain experts to identify parts that exhibit specific process faults. In practice, the search often amounts to an iterative query–response cycle in which users define new queries (time series patterns) based on results of previous queries. This is a well-documented phenomenon in information retrieval and not unique to the manufacturing domain. Indexing manufacturing databases to speed up the exploratory search is often not feasible as it may result in an unacceptable reduction in recall. In this paper, we present a novel adaptive search algorithm that refines the query based on relevance feedback provided by the user. Additionally, we propose a mechanism that allows the algorithm to self-adapt to new patterns without requiring any user input. As the search progresses, the algorithm constructs a library of time series patterns that are used to accurately find objects of the target class. Experimental validation of the algorithm on real-world manufacturing data shows, that the recall for the retrieval of fault patterns is considerably higher than that of other state-of-the-art adaptive search algorithms. Additionally, its application to publicly available benchmark data sets shows, that these results are transferable to other domains.  相似文献   

4.
A Knowledge-Based Approach to Effective Document Retrieval   总被引:3,自引:0,他引:3  
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.  相似文献   

5.
Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

6.
A content-search information retrieval process based on conceptual graphs   总被引:1,自引:0,他引:1  
An intelligent information retrieval system is presented in this paper. In our approach, which complies with the logical view of information retrieval, queries, document contents and other knowledge are represented by expressions in a knowledge representation language based on the conceptual graphs introduced by Sowa. In order to take the intrinsic vagueness of information retrieval into account, i.e. to search documents imprecisely and incompletely represented in order to answer a vague query, different kinds of probabilistic logic are often used. The search process described in this paper uses graph transformations instead of probabilistic notions. This paper is focused on the content-based retrieval process, and the cognitive facet of information retrieval is not directly addressed. However, our approach, involving the use of a knowledge representation language for representing data and a search process based on a combinatorial implementation of van Rijsbergen’s logical uncertainty principle, also allows the representation of retrieval situations. Hence, we believe that it could be implemented at the core of an operational information retrieval system. Two applications, one dealing with academic libraries and the other concerning audiovisual documents, are briefly presented.  相似文献   

7.
Following the rapid development of Internet, particularly web page interaction technology, distant e-learning has become increasingly realistic and popular. To solve the problems associated with sharing and reusing teaching materials in different e-learning systems, several standard formats, including SCORM, IMS, LOM, and AICC, etc., recently have been proposed by several different international organizations. SCORM LOM, namely learning object metadata, facilitates the indexing and searching of learning objects in a learning object repository through extended sharing and searching features. However, LOM suffers a weakness in terms of semantic-awareness capability. Most information retrieval systems assume that users have cognitive ability regarding their needs. However, in e-learning systems, users may have no idea of what they are looking for and the learning object metadata. This study presents an ontological approach for semantic-aware learning object retrieval. This approach has two significant novel features: a fully automatic ontology-based query expansion algorithm for inferring and aggregating user intention based on their original short query, and another “ambiguity removal” procedure for correcting inappropriate user query terms. This approach is sufficiently generic to be embedded to other LOM-based search mechanisms for semantic-aware learning object retrieval.Focused on digital learning material and contrasted to other traditional keyword-based search technologies, the proposed approach has experimentally demonstrated significantly improved retrieval precision and recall rate.  相似文献   

8.
 Relevance feedback techniques have demonstrated to be a powerful means to improve the results obtained when a user submits a query to an information retrieval system as the world wide web search engines. These kinds of techniques modify the user original query taking into account the relevance judgements provided by him on the retrieved documents, making it more similar to those he judged as relevant. This way, the new generated query permits to get new relevant documents thus improving the retrieval process by increasing recall. However, although powerful relevance feedback techniques have been developed for the vector space information retrieval model and some of them have been translated to the classical Boolean model, there is a lack of these tools in more advanced and powerful information retrieval models such as the fuzzy one. In this contribution we introduce a relevance feedback process for extended Boolean (fuzzy) information retrieval systems based on a hybrid evolutionary algorithm combining simulated annealing and genetic programming components. The performance of the proposed technique will be compared with the only previous existing approach to perform this task, Kraft et al.'s method, showing how our proposal outperforms the latter in terms of accuracy and sometimes also in time consumption. Moreover, it will be showed how the adaptation of the retrieval threshold by the relevance feedback mechanism allows the system effectiveness to be increased.  相似文献   

9.
Fuzzy User Modeling for Information Retrieval on the World Wide Web   总被引:5,自引:1,他引:4  
Information retrieval from the World Wide Web through the use of search engines is known to be unable to capture effectively the information needs of users. The approach taken in this paper is to add intelligence to information retrieval from the World Wide Web, by the modeling of users to improve the interaction between the user and information retrieval systems. In other words, to improve the performance of the user in retrieving information from the information source. To effect such an improvement, it is necessary that any retrieval system should somehow make inferences concerning the information the user might want. The system then can aid the user, for instance by giving suggestions or by adapting any query based on predictions furnished by the model. So, by a combination of user modeling and fuzzy logic a prototype system has been developed (the Fuzzy Modeling Query Assistant (FMQA)) which modifies a user's query based on a fuzzy user model. The FMQA was tested via a user study which clearly indicated that, for the limited domain chosen, the modified queries are better than those that are left unmodified. Received 10 November 1998 / Revised 14 June 2000 / Accepted in revised form 25 September 2000  相似文献   

10.
Keyword queries have long been popular to search engines and to the information retrieval community and have recently gained momentum for its usage in the expert systems community. The conventional semantics for processing a user query is to find a set of top-k web pages such that each page contains all user keywords. Recently, this semantics has been extended to find a set of cohesively interconnected pages, each of which contains one of the query keywords scattered across these pages. The keyword query having the extended semantics (i.e., more than a list of keywords hyperlinked with each other) is referred to the graph query. In case of the graph query, all the query keywords may not be present on a single Web page. Thus, a set of Web pages with the corresponding hyperlinks need to be presented as the search result. The existing search systems reveal serious performance problem due to their failure to integrate information from multiple connected resources so that an efficient algorithm for keyword query over graph-structured data is proposed. It integrates information from multiple connected nodes of the graph and generates result trees with the occurrence of all the query keywords. We also investigate a ranking measure called graph ranking score (GRS) to evaluate the relevant graph results so that the score can generate a scalar value for keywords as well as for the topology.  相似文献   

11.
时雷  席磊  段其国 《计算机科学》2007,34(10):228-229
本文提出了一种基于粗糙集理论的个性化web搜索系统。用户偏好文件中对关键字进行分组以表示用户兴趣类别。利用粗糙集理论处理自然语言的内在含糊性,根据用户偏好文件对查询条件进行扩展。搜索组件使用扩展后的查询条件搜索相关信息。为了进一步排除不相关信息,排序组件计算查询条件和搜索结果之间的相似程度,根据计算值对搜索结果进行排序。与传统搜索引擎进行了比较,实验结果表明,该系统有效地提高了搜索结果的精度,满足了用户的个性化需求。  相似文献   

12.
Enhancing Concept-Based Retrieval Based on Minimal Term Sets   总被引:1,自引:0,他引:1  
There is considerable interest in bridging the terminological gap that exists between the way users prefer to specify their information needs and the way queries are expressed in terms of keywords or text expressions that occur in documents. One of the approaches proposed for bridging this gap is based on technologies for expert systems. The central idea of such an approach was introduced in the context of a system called Rule Based Information Retrieval by Computer (RUBRIC). In RUBRIC, user query topics (or concepts) are captured in a rule base represented by an AND/OR tree. The evaluation of AND/OR tree is essentially based on minimum and maximum weights of query terms for conjunctions and disjunctions, respectively. The time to generate the retrieval output of AND/OR tree for a given query topic is exponential in number of conjunctions in the DNF expression associated with the query topic. In this paper, we propose a new approach for computing the retrieval output. The proposed approach involves preprocessing of the rule base to generate Minimal Term Sets (MTSs) that speed up the retrieval process. The computational complexity of the on-line query evaluation following the preprocessing is polynomial in m. We show that the computation and use of MTSs allows a user to choose query topics that best suit their needs and to use retrieval functions that yield a more refined and controlled retrieval output than is possible with the AND/OR tree when document terms are binary. We incorporate p-Norm model into the process of evaluating MTSs to handle the case where weights of both documents and query terms are non-binary.  相似文献   

13.
14.
因特网的飞速发展,网络资源呈爆炸式的增长。信息检索是人们上网的主要目的之一。目前的信息检索领域有许多检索方法与检索工具,为用户检索信息提供了许多途径。但如何利用搜索引擎实现更快更精确的搜索已经成为这一领域的研究热点。在研究现有的几种搜索引擎的基础上,提出了一种基于用户行为聚类的搜索引擎。通过分析不同的用户行为将搜索用户聚类成不同的用户组,为每组用户返回其喜欢的结果,优化查询结果。  相似文献   

15.
We present an approach to increasing the effectiveness of ranked-output retrieval systems that relies on graphical display and user manipulation of “views” of retrieval results, where a view is the subset of retrieved documents that contain a specified subset of query terms. This approach has been implemented in a system named VIEWER (VIEwing WEb Results), acting as an interface to available search engines. An experimental evaluation of the performance of VIEWER in contrast to AltaVista is the major focus of the paper. We first report the results of an experiment on single, short query searches where VIEWER, used as an interactive ranking system, markedly outperformed AltaVista. We then concentrate on a more realistic searching scenario, involving free query formulation, unconstrained selection of retrieval results, and possibility of query reformulation. We report the results of an experiment where the use of VIEWER, compared to AltaVista, seemed to shift the user effort from inspection to evaluation of results, increasing retrieval effectiveness, and user satisfaction. In particular, we found that the VIEWER users retrieved half as many nonrelevant documents as the AltaVista users while retrieving a comparable number of relevant documents. Published online: 22 September 2000  相似文献   

16.
基于模糊语言方法的信息检索系统的研究   总被引:4,自引:2,他引:2  
该文提出了一个基于模糊语言方法的信息检索系统模型。该系统分为查询界面子系统、数据库子系统和检索子系统三大部分。在查询界面子系统,用布尔表达式表示用户的查询请求,并对每个查询关键词赋予了两种不同语义的语言值权重,该权重表达了用户的模糊检索要求;在数据库子系统,用索引词一文档模糊矩阵表示待检索的文档,对每个索引词。根据其在文档中的出现频率大小。引入了数值权重;在检索子系统,运用模糊语言方法,对用户输入的布尔查询表达式与索引词一文档模糊矩阵进行自底向上的模糊匹配,最后返回满足用户要求的检索结果。相对于传统的基于查询关键词精确匹配的检索系统而言,该系统能较好地满足用户查询要求中的灵活性。  相似文献   

17.
《Information Systems》2005,30(7):543-563
One of the main problems in the (web) information retrieval is the ambiguity of users’ queries, since they tend to post very short queries which do not express their information need clearly. This seems to be valid for the ontology-based information retrieval in which the domain ontology is used as the backbone of the searching process. In this paper, we present a novel approach for determining possible refinements of an ontology-based query. The approach is based on measuring the ambiguity of a query with respect to the original user's information need. We defined several types of the ambiguities concerning the structure of the underlying ontology and the content of the information repository. These ambiguities are interpreted regarding the user's information need, which we infer from the user's behaviour in searching process. Finally, the ranked list of the potentially useful refinements of her query is provided to the user. We present a small evaluation study that shows the advantages of the proposed approach.  相似文献   

18.
How to automatically understand and answer users' questions (eg, queries issued to a search engine) expressed with natural language has become an important yet difficult problem across the research fields of information retrieval and artificial intelligence. In a typical interactive Web search scenario, namely, session search, to obtain relevant information, the user usually interacts with the search engine for several rounds in the forms of, eg, query reformulations, clicks, and skips. These interactions are usually mixed and intertwined with each other in a complex way. For the ideal goal, an intelligent search engine can be seen as an artificial intelligence agent that is able to infer what information the user needs from these interactions. However, there still exists a big gap between the current state of the art and this goal. In this paper, in order to bridge the gap, we propose a Markov random field–based approach to capture dependence relations among interactions, queries, and clicked documents for automatic query expansion (as a way of inferring the information needs of the user). An extensive empirical evaluation is conducted on large‐scale web search data sets, and the results demonstrate the effectiveness of our proposed models.  相似文献   

19.
Information Retrieval (IR) systems assist users in finding information from the myriad of information resources available on the Web. A traditional characteristic of IR systems is that if different users submit the same query, the system would yield the same list of results, regardless of the user. Personalised Information Retrieval (PIR) systems take a step further to better satisfy the user’s specific information needs by providing search results that are not only of relevance to the query but are also of particular relevance to the user who submitted the query. PIR has thereby attracted increasing research and commercial attention as information portals aim at achieving user loyalty by improving their performance in terms of effectiveness and user satisfaction. In order to provide a personalised service, a PIR system maintains information about the users and the history of their interactions with the system. This information is then used to adapt the users’ queries or the results so that information that is more relevant to the users is retrieved and presented. This survey paper features a critical review of PIR systems, with a focus on personalised search. The survey provides an insight into the stages involved in building and evaluating PIR systems, namely: information gathering, information representation, personalisation execution, and system evaluation. Moreover, the survey provides an analysis of PIR systems with respect to the scope of personalisation addressed. The survey proposes a classification of PIR systems into three scopes: individualised systems, community-based systems, and aggregate-level systems. Based on the conducted survey, the paper concludes by highlighting challenges and future research directions in the field of PIR.  相似文献   

20.
The accuracy of searches for visual data elements, as well as other types of information, depends on the terms used by the user in the input query to retrieve the relevant results and to reduce the irrelevant ones. Most of the results that are returned are relevant to the query terms, but not to their meaning. For example, certain types of web contents hold hidden information that traditional search engines are unable to retrieve. Searching for the mathematical construct of 1/x using Google will not result in the retrieval of the documents that contain the mathematically equivalent expressions (i.e. x?1). Because conventional search engines fall short of providing math-search capabilities. One of these capabilities is the ability of these search engines to detect the mathematical equivalence between users’ quires and math contents. In addition, users sometimes need to use slang terms, either to retrieve slang-based visual data (e.g. social media content) or because they do not know how to write using classical form. To solve such a problem, this paper proposed an AI-based system for analysing multilingual slang web contents so as to allow a user to retrieve web slang contents that are relevant to the user’s query. The proposed system presents an approach for visual data analytics, and it also enables users to analyse hundreds of potential search results/web pages by starting an informed friendly dialogue and presenting innovative answers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号