首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given query would appear in relevant documents and control the impact of syntactic phrases in a retrieval model. Various experimental results over different types of queries and document collections show that our retrieval model based on variability of syntactic phrases is very effective in terms of retrieval performance, especially for long natural language queries.  相似文献   

2.
A content-search information retrieval process based on conceptual graphs   总被引:1,自引:0,他引:1  
An intelligent information retrieval system is presented in this paper. In our approach, which complies with the logical view of information retrieval, queries, document contents and other knowledge are represented by expressions in a knowledge representation language based on the conceptual graphs introduced by Sowa. In order to take the intrinsic vagueness of information retrieval into account, i.e. to search documents imprecisely and incompletely represented in order to answer a vague query, different kinds of probabilistic logic are often used. The search process described in this paper uses graph transformations instead of probabilistic notions. This paper is focused on the content-based retrieval process, and the cognitive facet of information retrieval is not directly addressed. However, our approach, involving the use of a knowledge representation language for representing data and a search process based on a combinatorial implementation of van Rijsbergen’s logical uncertainty principle, also allows the representation of retrieval situations. Hence, we believe that it could be implemented at the core of an operational information retrieval system. Two applications, one dealing with academic libraries and the other concerning audiovisual documents, are briefly presented.  相似文献   

3.
In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval (IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk capacity for holding their local data.
Damianos GavalasEmail:
  相似文献   

4.
In this paper, we address a fundamental problem related to the induction of Boolean logic: Given a set of data, represented as a set of binary “truen-vectors” (or “positive examples”) and a set of “falsen-vectors” (or “negative examples”), we establish a Boolean function (or an extension)f, so thatfis true (resp., false) in every given true (resp., false) vector. We shall further require that such an extension belongs to a certain specified class of functions, e.g., class of positive functions, class of Horn functions, and so on. The class of functions represents our a priori knowledge or hypothesis about the extensionf, which may be obtained from experience or from the analysis of mechanisms that may or may not cause the phenomena under consideration. The real-world data may contain errors, e.g., measurement and classification errors might come in when obtaining data, or there may be some other influential factors not represented as variables in the vectors. In such situations, we have to give up the goal of establishing an extension that is perfectly consistent with the given data, and we are satisfied with an extensionfhaving the minimum number of misclassifications. Both problems, i.e., the problem of finding an extension within a specified class of Boolean functions and the problem of finding a minimum error extension in that class, will be extensively studied in this paper. For certain classes we shall provide polynomial algorithms, and for other cases we prove their NP-hardness.  相似文献   

5.
In this paper, we present clear and formal definitions of ranking factors that should be concerned in opinion retrieval and propose a new opinion retrieval model which simultaneously combines the factors from the generative modeling perspective. The proposed model formally unifies relevance-based ranking with subjectivity detection at the document level by taking multiple ranking factors into consideration: topical relevance, subjectivity strength, and opinion-topic relatedness. The topical relevance measures how strongly a document relates to a given topic, and the subjectivity strength indicates the likelihood that the document contains subjective information. The opinion-topic relatedness reflects whether the subjective information is expressed with respect to the topic of interest. We also present the universality of our model by introducing the model’s derivations that represent other existing opinion retrieval approaches. Experimental results on a large-scale blog retrieval test collection demonstrate that not only are the individual ranking factors necessary in opinion retrieval but they cooperate advantageously to produce a better document ranking when used together. The retrieval performance of the proposed model is comparable to that of previous systems in the literature.  相似文献   

6.
Some computationally hard problems, e.g., deduction in logical knowledge bases– are such that part of an instance is known well before the rest of it, and remains the same for several subsequent instances of the problem. In these cases, it is useful to preprocess off-line this known part so as to simplify the remaining on-line problem. In this paper we investigate such a technique in the context of intractable, i.e., NP-hard, problems. Recent results in the literature show that not all NP-hard problems behave in the same way: for some of them preprocessing yields polynomial-time on-line simplified problems (we call them compilable), while for other ones their compilability implies some consequences that are considered unlikely. Our primary goal is to provide a sound methodology that can be used to either prove or disprove that a problem is compilable. To this end, we define new models of computation, complexity classes, and reductions. We find complete problems for such classes, “completeness” meaning they are “the less likely to be compilable.” We also investigate preprocessing that does not yield polynomial-time on-line algorithms, but generically “decreases” complexity. This leads us to define “hierarchies of compilability,” that are the analog of the polynomial hierarchy. A detailed comparison of our framework to the idea of “parameterized tractability” shows the differences between the two approaches.  相似文献   

7.
In the context of information retrieval (IR) from text documents, the term weighting scheme (TWS) is a key component of the matching mechanism when using the vector space model. In this paper, we propose a new TWS that is based on computing the average term occurrences of terms in documents and it also uses a discriminative approach based on the document centroid vector to remove less significant weights from the documents. We call our approach Term Frequency With Average Term Occurrence (TF-ATO). An analysis of commonly used document collections shows that test collections are not fully judged as achieving that is expensive and maybe infeasible for large collections. A document collection being fully judged means that every document in the collection acts as a relevant document to a specific query or a group of queries. The discriminative approach used in our proposed approach is a heuristic method for improving the IR effectiveness and performance and it has the advantage of not requiring previous knowledge about relevance judgements. We compare the performance of the proposed TF-ATO to the well-known TF-IDF approach and show that using TF-ATO results in better effectiveness in both static and dynamic document collections. In addition, this paper investigates the impact that stop-words removal and our discriminative approach have on TF-IDF and TF-ATO. The results show that both, stop-words removal and the discriminative approach, have a positive effect on both term-weighting schemes. More importantly, it is shown that using the proposed discriminative approach is beneficial for improving IR effectiveness and performance with no information on the relevance judgement for the collection.  相似文献   

8.
Ontologies represent domain concepts and relations in a form of semantic network. Many research works use ontologies in the information matchmaking and retrieval. This trend is further accelerated by the convergence of various information sources supported by ontologies. In this paper, we propose a novel multi-modality ontology model that integrates both the low-level image features and the high-level text information to represent image contents for image retrieval. By embedding this ontology into an image retrieval system, we are able to realize intelligent image retrieval with high precision. Moreover, benefiting from the soft-coded ontology model, this system has good flexibility and can be easily extended to the larger domains. Currently, our experiment is conducted on the animal domain canine. An ontology has been built based on the low-level features and the domain knowledge of canine. A prototype retrieval system is set up to assess the performance. We compare our experiment results with traditional text-based image search engine and prove the advantages of our approach.  相似文献   

9.
In this paper, we develop a theoretical understanding of multi-sensory knowledge and user context and their inter-relationships. This is used to develop a generic representation framework for multi-sensory knowledge and context. A representation framework for context can have a significant impact on media applications that dynamically adapt to user needs. There are three key contributions of this work: (a) theoretical analysis, (b) representation framework and (c) experimental validation. Knowledge is understood to be a dynamic set of multi-sensory facts with three key properties – multi-sensory, emergent and dynamic. Context is the dynamic subset of knowledge that affects the communication between entities. We develop a graph based, multi-relational representation framework for knowledge, and model its temporal dynamics using a linear dynamical system. Our approach results in a stable and convergent system. We applied our representation framework to a image retrieval system with a large collection of photographs from everyday events. Our experimental validation with the retrieval evaluated against two reference algorithms indicates that our context based approach provides significant gains in real-world usage scenarios.  相似文献   

10.
Information retrieval in document image databases   总被引:2,自引:0,他引:2  
With the rising popularity and importance of document images as an information source, information retrieval in document image databases has become a growing and challenging problem. In this paper, we propose an approach with the capability of matching partial word images to address two issues in document image retrieval: word spotting and similarity measurement between documents. First, each word image is represented by a primitive string. Then, an inexact string matching technique is utilized to measure the similarity between the two primitive strings generated from two word images. Based on the similarity, we can estimate how a word image is relevant to the other and, thereby, decide whether one is a portion of the other. To deal with various character fonts, we use a primitive string which is tolerant to serif and font differences to represent a word image. Using this technique of inexact string matching, our method is able to successfully handle the problem of heavily touching characters. Experimental results on a variety of document image databases confirm the feasibility, validity, and efficiency of our proposed approach in document image retrieval.  相似文献   

11.
12.
By collecting statistics over runtime executions of a program we can answer complex queries, such as “what is the average number of packet retransmissions” in a communication protocol, or “how often does process P1 enter the critical section while process P2 waits” in a mutual exclusion algorithm. We present an extension to linear-time temporal logic that combines the temporal specification with the collection of statistical data. By translating formulas of this language to alternating automata we obtain a simple and efficient query evaluation algorithm. We illustrate our approach with examples and experimental results.  相似文献   

13.
Subspace and similarity metric learning are important issues for image and video analysis in the scenarios of both computer vision and multimedia fields. Many real-world applications, such as image clustering/labeling and video indexing/retrieval, involve feature space dimensionality reduction as well as feature matching metric learning. However, the loss of information from dimensionality reduction may degrade the accuracy of similarity matching. In practice, such basic conflicting requirements for both feature representation efficiency and similarity matching accuracy need to be appropriately addressed. In the style of “Thinking Globally and Fitting Locally”, we develop Locally Embedded Analysis (LEA) based solutions for visual data clustering and retrieval. LEA reveals the essential low-dimensional manifold structure of the data by preserving the local nearest neighbor affinity, and allowing a linear subspace embedding through solving a graph embedded eigenvalue decomposition problem. A visual data clustering algorithm, called Locally Embedded Clustering (LEC), and a local similarity metric learning algorithm for robust video retrieval, called Locally Adaptive Retrieval (LAR), are both designed upon the LEA approach, with variations in local affinity graph modeling. For large size database applications, instead of learning a global metric, we localize the metric learning space with kd-tree partition to localities identified by the indexing process. Simulation results demonstrate the effective performance of proposed solutions in both accuracy and speed aspects.  相似文献   

14.
Defining operational semantics for a process algebra is often based either on labeled transition systems that account for interaction with a context or on the so-called reduction semantics: we assume to have a representation of the whole system and we compute unlabeled reduction transitions (leading to a distribution over states in the probabilistic case). In this paper we consider mixed models with states where the system is still open (towards interaction with a context) and states where the system is already closed. The idea is that (open) parts of a system “P” can be closed via an operator “PG” that turns already synchronized actions whose “handle” is specified inside “G” into prioritized reduction transitions (and, therefore, states performing them into closed states). We show that we can use the operator “PG” to express multi-level priorities and external probabilistic choices (by assigning weights to handles inside G), and that, by considering reduction transitions as the only unobservable τ transitions, the proposed technique is compatible, for process algebra with general recursion, with both standard (probabilistic) observational congruence and a notion of equivalence which aggregates reduction transitions in a (much more aggregating) trace based manner. We also observe that the trace-based aggregated transition system can be obtained directly in operational semantics and we present the “aggregating” semantics. Finally, we discuss how the open/closed approach can be used to also express discrete and continuous (exponential probabilistic) time and we show that, in such timed contexts, the trace-based equivalence can aggregate more with respect to traditional lumping based equivalences over Markov Chains.  相似文献   

15.
The bisimulation “up-to-…” technique provides an effective way to relieve the amount of work in proving bisimilarity of two processes. This paper develops a fresh and direct approach to generalize this set-theoretic “up-to-...” principle to the setting of coalgebra theory. The notion of consistent function is introduced, as a generalization of Sangiorgi's sound function. Then, in order to prove that there are only bisimilar pairs in a relation, it is sufficient to find a morphism from it to the “lifting” of its image under some consistent function. One example is given showing that every self-bisimulation in normed BPA is just such a relation. What's more, we investigate the connection between span-bisimulation and ref-bisimulation. As a result, λ-bisimulation turns out to be covered by our new principle.  相似文献   

16.
We present a novel “dynamic learning” approach for an intelligent image database system to automatically improve object segmentation and labeling without user intervention, as new examples become available, for object-based indexing. The proposed approach is an extension of our earlier work on “learning by example,” which addressed labeling of similar objects in a set of database images based on a single example. The proposed dynamic learning procedure utilizes multiple example object templates to improve the accuracy of existing object segmentations and labels. Multiple example templates may be images of the same object from different viewing angles, or images of related objects. This paper also introduces a new shape similarity metric called normalized area of symmetric differences (NASD), which has desired properties for use in the proposed “dynamic learning” scheme, and is more robust against boundary noise that results from automatic image segmentation. Performance of the dynamic learning procedures has been demonstrated by experimental results.  相似文献   

17.
With the rapid growth of text documents, document clustering technique is emerging for efficient document retrieval and better document browsing. Recently, some methods had been proposed to resolve the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels by using frequent itemsets derived from association rule mining for clustering documents. In order to improve the quality of document clustering results, we propose an effective Fuzzy Frequent Itemset-based Document Clustering (F2IDC) approach that combines fuzzy association rule mining with the background knowledge embedded in WordNet. A term hierarchy generated from WordNet is applied to discover generalized frequent itemsets as candidate cluster labels for grouping documents. We have conducted experiments to evaluate our approach on Classic4, Re0, R8, and WebKB datasets. Our experimental results show that our proposed approach indeed provide more accurate clustering results than prior influential clustering methods presented in recent literature.  相似文献   

18.
Spectra Laboratories is a specialty laboratory with the majority of specimens originating from dialysis centers. In a previous article, “Automating a Dialysis Laboratory” (JALA, Nov. 1998), the unique nature of this patient population and the challenges of automating a dialysis laboratory were discussed. In that article, fibrin clotting was identified as one of the most significant challenges to automating front end automation. Because of the difficulty associated with automation of specimen handling when serum samples are frequently received clotted, we focused our efforts on management of result data and auto-edit of non-serum based testing. At the same time, we also reconsidered serum based sample handling which is a large proportion of our testing. What follows is our approach to the fibrin problem and automation, how these problems were addressed, and how “know thine enemy” resulted in potential solutions to the problem.  相似文献   

19.
In this paper we show that it is possible to model observable behaviour of coalgebras independently from their internal dynamics, but within the general framework of representing behaviour by a map into a “final” coalgebra.In the first part of the paper we characterise Set-endofunctors F with the property that bisimilarity of elements of F-coalgebras coincides with having the same observable behaviour. We show that such functors have the final coalgebra of a rather simple nature, and preserve some weak pullbacks. We also show that this is the case if and only if F-bisimilarity corresponds to logical equivalence in the finitary fragment of the coalgebraic logic.In the second part of the paper, we present a construction of a “final” coalgebra that captures the observable behaviour of F-coalgebras. We keep the word “final” quoted since the object we are going to construct need not belong to the original category. The construction is carried out for arbitrary Set-endofunctor F, throughout the construction we remain in Set, but the price to pay is the introduction of new morphisms. The paper concludes with a hint to a possible application to modelling weak bisimilarity for coalgebras.  相似文献   

20.
Many techniques have been proposed to address the problem of mocap data retrieval by using a short motion as input, and they are commonly categorized as content-based retrieval. However, it is difficult for users who do not have equipments to create mocap data samples to take advantage of them. On the contrary, simple retrieval methods which only require text as input can be used by everyone. Nevertheless, not only that it is not clear how to measure mocap data relevance in regard to textual search queries, but the search results will also be limited to the mocap data samples, the annotations of which contain the words in the search query. In this paper, the authors propose a novel method that builds on the TF (term frequency) and IDF (inverse document frequency) weights, commonly used in text document retrieval, to measure mocap data relevance in regard to textual search queries. We extract segments from mocap data samples and regard these segments as words in text documents. However, instead of using IDF which prioritizes infrequent segments, we opt to use DF (document frequency) to prioritize frequent segments. Since motions are not required as input, everybody will be able to take advantage of our approach, and we believe that our work also opens up possibilities for applying developed text retrieval methods in mocap data retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号