首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 263 毫秒
1.
SEEKER:基于关键词的关系数据库信息检索   总被引:20,自引:3,他引:20  
文继军  王珊 《软件学报》2005,16(7):1270-1281
传统上,SQL是存取关系数据库中数据的主要界面.但是,对于没有经验的用户来说,学习复杂的SQL语法是一件困难的事情.实现基于关键词的关系数据库信息检索,将使用户不需要任何SQL语言和底层数据库模式的知识,用搜索引擎的方式来获取数据库中的相关数据.描述了一个基于关键词的关系数据库信息检索系统SEEKER的设计和实现.现有的关系数据库关键词查询系统只能检索关系数据库中的文本属性,而SEEKER还可以检索数据库元数据以及数字属性.并且,SEEKER采用了更合理的排序公式,支持Top-k查询.实验结果显示,SEEKER具有良好的查询性能.  相似文献   

2.
The ongoing surge in the amount of online information has made the process of accurate retrieval much more difficult. Providers of information retrieval systems have come under a lot of pressure to improve their techniques to cater for the modern user. Conventional systems are often limited as they fail to understand the true search intent of the user. This is usually a result of both poor query formulation by the user and an inability of the search engine to process the query adequately. In this paper, an approach is presented that attempts to learn a user’s short-term interests through the clustering of their search results. A profile is maintained for each user to assist in the process of context resolution for a given query. The details of such an approach and experimental results to evaluate its effectiveness are presented in this paper.  相似文献   

3.
As an approach to search/retrieve such objects as pictures, music, perfumes and apparels on the Internet, sensitivity-vectors or kansei-vectors are useful since textual keywords are not sufficient to find objects that users want. The sensitivity-vector is an array of values. Each value indicates a degree of feeling or impression represented by a sensitivity word or kansei word. However, due to the gap between user??s subjective sensitivity (impression, image and feeling) degree and the corresponding value in the database. Also, such an approach is not enough to retrieve what users want. This paper proposes a retrieval method to automatically and dynamically reduce such gaps by estimating a subjective criterion deviation (we call ??SCD??) using the user??s retrieval history and fuzzy modeling. Additionally, the proposed method can avoid users?? burden caused by conventional methods such as completing required questionnaires. This method can also reflect the dynamic change of user??s preference which cannot be accomplished by using questionnaires. For the evaluation, an experiment was performed by building and using a perfume retrieval system. Through observing the transition of the deviation reduction degree, it was clarified that the proposed method is effective. In the experiment, the machine could learn users?? subjective criteria deviation as well as its dynamic change caused by factors such as user??s preference, if the learning rate is well adjusted.  相似文献   

4.
基于关系数据库的关键词查找技术像使用搜索引擎一样获取数据库中相关的数据.针对RDBMS上具体书目索引数据库的关键词查找高效性问题,提出了对返回结果集的一种排序策略.以查询序列与结果元组树之间的相似值作为排序依据,参照传统信息检索系统上关键词查找结果集排序的相似值计算公式,提出数据库上查询序列与结果元组树之间的相似值公式,并分析与重新定义了相关影响因子的标准化函数表达式.通过在简单数据库上的分析验证了该改进是合理的.  相似文献   

5.
An adaptive learning automata-based ranking function discovery algorithm   总被引:1,自引:0,他引:1  
Due to the massive amount of heterogeneous information on the web, insufficient and vague user queries, and use of the same query by different users for different aims, the information retrieval process deals with a huge amount of uncertainty and doubt. Under such circumstances, designing an efficient retrieval function and ranking algorithm by which the most relevant results are provided is of the greatest importance. In this paper, a learning automata-based ranking function discovery algorithm in which different sources of information are combined is proposed. In this method, the learning automaton is used to adjust the portion of the final ranking that is assigned to each source of evidence based on the user feedback. All sources of information are first given the same importance. The proportion of a given source increases, if the documents provided by this source are reviewed by the user and decreases otherwise. As the proposed algorithm proceeds, the probability of appearance of each source in the final ranking gets proportional to its relevance to the user queries. Several simulation experiments are conducted on well-known data collections and query types to show the performance of the proposed algorithm. The obtained results demonstrate that the proposed algorithm outperforms several existing methods in terms of precision at position n, mean average precision, and normalized discount cumulative gain.  相似文献   

6.
Recommender systems try to help users in their decisions by analyzing and ranking the available alternatives according to their preferences and interests, modeled in user profiles. The discovery and dynamic update of the users’ preferences are key issues in the development of these systems. In this work we propose to use the information provided by a user during his/her interaction with a recommender system to infer his/her preferences over the criteria used to define the decision alternatives. More specifically, this paper pays special attention on how to learn the user’s preferred value in the case of numerical attributes. A methodology to adapt the user profile in a dynamic and automatic way is presented. The adaptations in the profile are performed after each interaction of the user with the system and/or after the system has gathered enough information from several user selections. We have developed a framework for the automatic evaluation of the performance of the adaptation algorithm that permits to analyze the influence of different parameters. The obtained results show that the adaptation algorithm is able to learn a very accurate model of the user preferences after a certain amount of interactions with him/her, even if the preferences change dynamically over time.  相似文献   

7.
In this paper, an unsupervised learning network is explored to incorporate a self-learning capability into image retrieval systems. Our proposal is a new attempt to automate recursive content-based image retrieval. The adoption of a self-organizing tree map (SOTM) is introduced, to minimize the user participation in an effort to automate interactive retrieval. The automatic learning mode has been applied to optimize the relevance feedback (RF) method and the single radial basis function-based RF method. In addition, a semiautomatic version is proposed to support retrieval with different user subjectivities. Image similarity is evaluated by a nonlinear model, which performs discrimination based on local analysis. Experimental results show robust and accurate performance by the proposed method, as compared with conventional noninteractive content-based image retrieval (CBIR) systems and user controlled interactive systems, when applied to image retrieval in compressed and uncompressed image databases.  相似文献   

8.
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.  相似文献   

9.
本文旨在解决数据资产管理系统中信息检索效率低、检索结果准确率低下的痛点,基于排序学习算法构建智能检索系统,提升检索结果和用户请求的相关性。对排序学习算法理论进行研究,对常用的排序学习算法进行相关优化,将分类问题扩展到文本排序问题之上,定义相关的目标函数及损失函数,使用机器学习的方法来提升检索结果的准确度。基于垂直分布式搜索引擎技术及排序学习算法构建智能检索系统,通过相关性工程提升检索请求转化的效率。实验表明本系统可以在优化检索速率的基础之上,提升检索语句与返回结果之间的相关性和检索的准确度。  相似文献   

10.
Mémoire proposes a general framework for reasoning from cases in biology and medicine. Part of this project is to propose a memory organization capable of handling large cases and case bases as occur in biomedical domains. This article presents the essential principles for an efficient memory organization based on pertinent work in information retrieval (IR). IR systems have been able to scale up to terabytes of data taking advantage of large databases research to build Internet search engines. They search for pertinent documents to answer a query using term-based ranking and/or global ranking schemes. Similarly, case-based reasoning (CBR) systems search for pertinent cases using a scoring function for ranking the cases. Mémoire proposes a memory organization based on inverted indexes which may be powered by databases to search and rank efficiently through large case bases. It can be seen as a first step toward large-scale CBR systems, and in addition provides a framework for tight cooperation between CBR and IR.  相似文献   

11.
This paper describes an FAQ system on the Personal Computer (PC) domain, which employs ontology as the key technique to pre-process FAQs and process user query. It is also equipped with an enhanced ranking technique to present retrieved, query-relevant results. Basically, the system bases on the wrapper technique to help clean, retrieve, and transform FAQ information collected from a heterogeneous environment and stores it in an ontological database. During retrieval of FAQs, the system trims irrelevant query keywords, employs either full keywords match or partial keywords match to retrieve FAQs, and removes conflicting FAQs before turning the final results to the user. Ontology plays the key role in all the above activities. To produce a more effective presentation of the search results, the system employs an enhanced ranking technique, which includes Appearance Probability, Satisfaction Value, Compatibility Value, and Statistic Similarity Value as four measures properly weighted to rank the FAQs. Our experiments show the system does improve precision rate and produces better ranking results. The proposed FAQ system manifests the following interesting features. First, the ontology-supported FAQ extraction from webpages can clean FAQ information by removing redundant data, restore missing data, and resolve inconsistent data. Second, the FAQs are stored in an ontology-directed internal format, which supports semantics-constrained retrieval of FAQs. Third, the ontology-supported natural language processing of user query helps pinpoint user’s intent. Finally, the partial keywords match-based ranking method helps present user-most-wanted, conflict-free FAQ solutions for the user.  相似文献   

12.
Increasing application demands are pushing databases toward providing effective and efficient support for content-based retrieval over multimedia objects. In addition to adequate retrieval techniques, it is also important to enable some form of adaptation to users' specific needs. This paper introduces a new refinement method for retrieval based on the learning of the users' specific preferences. The proposed system indexes objects based on shape and groups them into a set of clusters, with each cluster represented by a prototype. Clustering constructs a taxonomy of objects by forming groups of closely-related objects. The proposed approach to learn the users' preferences is to refine corresponding clusters from objects provided by the users in the foreground, and to simultaneously adapt the database index in the background. Queries can be performed based solely on shape, or on a combination of shape with other features such as color. Our experimental results show that the system successfully adapts queries into databases with only a small amount of feedback from the users. The quality of the returned results is superior to that of a color-based query, and continues to improve with further use.  相似文献   

13.
Cost-Sensitive Active Visual Category Learning   总被引:1,自引:0,他引:1  
We present an active learning framework that predicts the tradeoff between the effort and information gain associated with a candidate image annotation, thereby ranking unlabeled and partially labeled images according to their expected ??net worth?? to an object recognition system. We develop a multi-label multiple-instance approach that accommodates realistic images containing multiple objects and allows the category-learner to strategically choose what annotations it receives from a mixture of strong and weak labels. Since the annotation cost can vary depending on an image??s complexity, we show how to improve the active selection by directly predicting the time required to segment an unlabeled image. Our approach accounts for the fact that the optimal use of manual effort may call for a combination of labels at multiple levels of granularity, as well as accurate prediction of manual effort. As a result, it is possible to learn more accurate category models with a lower total expenditure of annotation effort. Given a small initial pool of labeled data, the proposed method actively improves the category models with minimal manual intervention.  相似文献   

14.
A unified approach to ranking in probabilistic databases   总被引:1,自引:0,他引:1  
Ranking is a fundamental operation in data analysis and decision support and plays an even more crucial role if the dataset being explored exhibits uncertainty. This has led to much work in understanding how to rank the tuples in a probabilistic dataset in recent years. In this article, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing it as a multi-criterion optimization problem and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called PRF ω and PRF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially PRF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking.  相似文献   

15.
Learning to rank, a task to learn ranking functions to sort a set of entities using machine learning techniques, has recently attracted much interest in information retrieval and machine learning research. However, most of the existing work conducts a supervised learning fashion. In this paper, we propose a transductive method which extracts paired preference information from the unlabeled test data. Then we design a loss function to incorporate this preference data with the labeled training data, and learn ranking functions by optimizing the loss function via a derived Ranking SVM framework. The experimental results on the LETOR 2.0 benchmark data collections show that our transductive method can significantly outperform the state-of-the-art supervised baseline.  相似文献   

16.
A new scheme of learning similarity measure is proposed for content-based image retrieval (CBIR). It learns a boundary that separates the images in the database into two clusters. Images inside the boundary are ranked by their Euclidean distances to the query. The scheme is called constrained similarity measure (CSM), which not only takes into consideration the perceptual similarity between images, but also significantly improves the retrieval performance of the Euclidean distance measure. Two techniques, support vector machine (SVM) and AdaBoost from machine learning, are utilized to learn the boundary. They are compared to see their differences in boundary learning. The positive and negative examples used to learn the boundary are provided by the user with relevance feedback. The CSM metric is evaluated in a large database of 10009 natural images with an accurate ground truth. Experimental results demonstrate the usefulness and effectiveness of the proposed similarity measure for image retrieval.  相似文献   

17.
RUBRIC: A System for Rule-Based Information Retrieval   总被引:1,自引:0,他引:1  
A research prototype software system for conceptual information retrieval has been developed. The goal of the system, called RUBRIC, is to provide more automated and relevant access to unformatted textual databases. The approach is to use production rules from artificial intelligence to define a hierarchy of retrieval subtopics, with fuzzy context expressions and specific word phrases at the bottom. RUBRIC allows the definition of detailed queries starting at a conceptual level, partial matching of a query and a document, selection of only the highest ranked documents for presentation to the user, and detailed explanation of how and why a particular document was selected. Initial experiments indicate that a RUBRIC rule set better matches human retrieval judgment than a standard Boolean keyword expression, given equal amounts of effort in defining each. The techniques presented may be useful in stand-alone retrieval systems, front-ends to existing information retrieval systems, or real-time document filtering and routing.  相似文献   

18.
19.
《Information & Management》2002,39(7):559-570
Search performance can be greatly improved by using domain knowledge to assist users in developing a problem specification tailored to the information contained in the system. A methodology is presented for utilizing intelligent information retrieval techniques and domain-specific knowledge to improve user searching. For databases involving a relatively narrow domain, a “system thesaurus” combined with expert systems technology can be used to create an intelligent front end to assist the user in retrieving information with greater precision and recall. Evaluation of the prototype showed greatly improved search effectiveness and satisfaction over the traditional catalog system.  相似文献   

20.
In recent years feedback approaches have been used in relating low-level image features with concepts to overcome the subjective nature of the human image interpretation. Generally, in these systems when the user starts with a new query, the entire prior experience of the system is lost. In this paper, we address the problem of incorporating prior experience of the retrieval system to improve the performance on future queries. We propose a semi-supervised fuzzy clustering method to learn class distribution (meta knowledge) in the sense of high-level concepts from retrieval experience. Using fuzzy rules, we incorporate the meta knowledge into a probabilistic feature relevance feedback approach to improve the retrieval performance. Results on synthetic and real databases show that our approach provides better retrieval precision compared to the case when no retrieval experience is used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号