首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Most of the common techniques in text retrieval are based on the statistical analysis terms (words or phrases). Statistical analysis of term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that represent the concepts of the sentence, which leads to discovering the topic of the document. In this paper, a new concept-based retrieval model is introduced. The proposed concept-based retrieval model consists of conceptual ontological graph (COG) representation and concept-based weighting scheme. The COG representation captures the semantic structure of each term within a sentence. Then, all the terms are placed in the COG representation according to their contribution to the meaning of the sentence. The concept-based weighting analyzes terms at the sentence and document levels. This is different from the classical approach of analyzing terms at the document level only. The weighted terms are then ranked, and the top concepts are used to build a concept-based document index for text retrieval. The concept-based retrieval model can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning. Experiments using the proposed concept-based retrieval model on different data sets in text retrieval are conducted. The experiments provide comparison between traditional approaches and the concept-based retrieval model obtained by the combined approach of the conceptual ontological graph and the concept-based weighting scheme. The evaluation of results is performed using three quality measures, the preference measure (bpref), precision at 10 documents retrieved (P(10)) and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based retrieval model is used, confirming that such model enhances the quality of text retrieval.  相似文献   

Abstract: Vast amounts of medical information reside within text documents, so that the automatic retrieval of such information would certainly be beneficial for clinical activities. The need for overcoming the bottleneck provoked by the manual construction of ontologies has generated several studies and research on obtaining semi-automatic methods to build ontologies. Most techniques for learning domain ontologies from free text have important limitations. Thus, they can extract concepts so that only taxonomies are generally produced although there are other types of semantic relations relevant in knowledge modelling. This paper presents a language-independent approach for extracting knowledge from medical natural language documents. The knowledge is represented by means of ontologies that can have multiple semantic relationships among concepts.  相似文献   

Engineers create engineering documents with their own terminologies, and want to search existing engineering documents quickly and accurately during a product development process. Keyword-based search methods have been widely used due to their ease of use, but their search accuracy has been often problematic because of the semantic ambiguity of terminologies in engineering documents and queries. The semantic ambiguity can be alleviated by using a domain ontology. Also, if queries are expanded to incorporate the engineer’s personalized information needs, the accuracy of the search result would be improved. Therefore, we propose a framework to search engineering documents with less semantic ambiguity and more focus on each engineer’s personalized information needs. The framework includes four processes: (1) developing a domain ontology, (2) indexing engineering documents, (3) learning user profiles, and (4) performing personalized query expansion and retrieval. A domain ontology is developed based on product structure information and engineering documents. Using the domain ontology, terminologies in documents are disambiguated and indexed. Also, a user profile is generated from the domain ontology. By user profile learning, user’s interests are captured from the relevant documents. During a personalized query expansion process, the learned user profile is used to reflect user’s interests. Simultaneously, user’s searching intent, which is implicitly inferred from the user’s task context, is also considered. To retrieve relevant documents, an expanded query in which both user’s interests and intents are reflected is then matched against the document collection. The experimental results show that the proposed approach can substantially outperform both the keyword-based approach and the existing query expansion method in retrieving engineering documents. Reflecting a user’s information needs precisely has been identified to be the most important factor underlying this notable improvement.  相似文献   

Cross-lingual text retrieval (CLTR) is a technique for locating relevant documents in different languages. The authors have developed fuzzy conceptual indexing (FCI) to extend CLTR to include documents that share concepts but don't contain exact translations of query terms. In FCI, documents and queries are represented as a function of language-independent concepts, thus enabling direct mapping between them across multiple languages. Experimental results suggest that concept-based CLTR outperforms translation-based CLTR in identifying conceptually relevant documents.  相似文献   

With the growing availability of online information systems, a need for user interfaces that are flexible and easy to use has arisen. For such type of systems, an interface that allows the formulation of approximate queries can be of great utility since these allow the user to quickly explore the database contents even when he is unaware of the exact values of the database instances. Our work focuses on this problem, presenting a new model for ranking approximate answers and a new algorithm to compute the semantic similarity between attribute values, based on information retrieval techniques. To demonstrate the utility and usefulness of the approach, we perform a series of usability tests. The results suggest that our approach allows the retrieval of more relevant answers with less effort by the user.  相似文献   

The volume of available information is growing, especially on the web, and in parallel the questions of the users are changing and becoming harder to satisfy. Thus there is a need for organizing the available information in a meaningful way in order to guide and improve document indexing for information retrieval applications taking into account more complex data such as semantic relations. In this paper we show that Formal Concept Analysis (FCA) and concept lattices provide a suitable and powerful support for such a task. Accordingly, we use FCA to compute a concept lattice, which is considered both a semantic index to organize documents and a search space to model terms. We introduce the notions of cousin concepts and classification-based reasoning for navigating the concept lattice and retrieve relevant information based on the content of concepts. Finally, we detail a real-world experiment and show that the present approach has very good capabilities for semantic indexing and document retrieval.  相似文献   

A technology for automatically assembling large software libraries which promote software reuse by helping the user locate the components closest to her/his needs is described. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using an indexing scheme based on the notions of lexical affinities and quantity of information. Then a hierarchy for browsing is automatically generated using a clustering technique which draws only on the information provided by the attributes. Due to the free-text indexing scheme, tools following this approach can accept free-style natural language queries  相似文献   

This paper provides a formal specification for concept-based image retrieval using triples. To effectively manage a vast amount of images, we may need an image retrieval system capable of indexing and searching images based on the characteristics of their content. However, such a content-based image retrieval technique alone may not satisfy user queries if retrieved images turn out to be relevant only when they are conceptually related with the queries. In this paper, we develop an image retrieval mechanism to extract semantics of images based on triples. The semantics can be captured by deriving concepts from its constituent objects and spatial relationships between them. The concepts are basically composite objects formed from the aggregation of the constituents. In our mechanism, all the spatial relationships between objects including the concepts are uniformly represented by triples, which are used for indexing images as well as capturing their semantics. We also develop a query evaluation for supporting the concept-based image retrieval. ©1999 John Wiley & Sons, Inc.  相似文献   

The increasing importance of text-based information retrieval (IR) developments in the architecture, engineering, and construction industries (AEC) and the lack of sharable testing resources to support these developments call for an approach that can be used to generate domain-specific reference collections. To address this need, the authors investigated the characteristics of the testing environment in AEC and ways to adapt dominant collection preparation methods for the domain. This paper presents the authors’ collection generation approach through the preparation process of the Taiwanese National Center for Research on Earthquake Engineering (NCREE) collection. The collection’s Chinese-to-English translation instruments are also discussed as matching semantic/linguistic resources are highly valued in AEC’s text-based IR developments. The paper also includes a use case for the NCREE collection to show how a collection generated by the proposed approach could be applied to support research experiment and validation. The direct outputs, the NCREE collection and its translation instruments, are sharable and reusable testing resources, while mechanisms for seeking collections from other researchers are part of the extended research endeavors.  相似文献   

从特征选择、局部区域划分和词汇语义相似性计算入手,利用随机词汇迭代模型(random terms iterativemodel,RTIM)进行海量兴趣点(point of interest,POI)文本分类.通过词汇频度、集中度和离散度方法筛选出特征词汇;依据文本与各POI类别间的相似度进行局部区域划分;在每个局部区域内基于词汇在文本中的排列顺序构建词频向量,基于词频向量中词频的随机删除和重构,获取特征映射矩阵;通过特征映射矩阵将文本转为特征向量,并采用SVM分类器进行POI文本分类.实验证明,该方法有效提升了POI文本分类准确性和覆盖率.  相似文献   

Nowadays, spatial and temporal data play an important role in social networks. These data are distributed and dispersed in several heterogeneous data sources. These peculiarities make that geographic information retrieval being a non-trivial task, considering that the spatial data are often unstructured and built by different collaborative communities from social networks. The problem arises when user queries are performed with different levels of semantic granularity. This fact is very typical in social communities, where users have different levels of expertise. In this paper, a novelty approach based on three matching-query layers driven by ontologies on the heterogeneous data sources is presented. A technique of query contextualization is proposed for addressing to available heterogeneous data sources including social networks. It consists of contextualizing a query in which whether a data source does not contain a relevant result, other sources either provide an answer or in the best case, each one adds a relevant answer to the set of results. This approach is a collaborative learning system based on experience level of users in different domains. The retrieval process is achieved from three domains: temporal, geographical and social, which are involved in the user-content context. The work is oriented towards defining a GIScience collaborative learning for geographic information retrieval, using social networks, web and geodatabases.  相似文献   

Mapping medical concepts from a terminology system to the concepts in the narrative text of a medical document is necessary to provide semantically accurate information for further processing steps. The MetaMap Transfer (MMTx) program is a semantic annotation system that generates a rough mapping of concepts from the Unified Medical Language System (UMLS) Metathesaurus to free medical text, but this mapping still contains erroneous and ambiguous bits of information. Since manually correcting the mapping is an extremely cumbersome and time-consuming task, we have developed the MapFace editor.The editor provides a convenient way of navigating the annotated information gained from the MMTx output, and enables users to correct this information on both a conceptual and a syntactical level, and thus it greatly facilitates the handling of the MMTx program. Additionally, the editor provides enhanced visualization features to support the correct interpretation of medical concepts within the text. We paid special attention to ensure that the MapFace editor is an intuitive and convenient tool to work with. Therefore, we recently conducted a usability study in order to create a well founded background serving as a starting point for further improvement of the editor's usability.  相似文献   

The World Wide Web is a world of great richness, but finding information on the Web is also a great challenge. Keyword-based querying has been an immediate and efficient way to specify and retrieve related information that the user inquires. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given, as in most cases. In order to clarify the ambiguity of the short queries given by users, we propose the idea of concept-based relevance feedback for Web information retrieval. The idea is to have users give two to three times more feedback in the same amount of time that would be required to give feedback for conventional feedback mechanisms. Under this design principle, we apply clustering techniques to the initial search results to provide concept-based browsing. We show the performance of various feedback interface designs and compare their pros and cons. We measure precision and relative recall to show how clustering improves performance over conventional similarity ranking and, most importantly, we show how the assistance of concept-based presentation reduces browsing labor  相似文献   

In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the structure of a form by using lines and the XY-tree approach. The approach is top-down and no domain knowledge such as the preprinted data or filled-in data is used. Geometrical modifications and slight variations are handled by this representation. Logically identical forms are associated to the same or similar hierarchical structure. Identification and the retrieval of similar forms are performed by computing the edit distances between the generated trees. Received: August 21, 2001 / Accepted: November 5, 2001  相似文献   

Since engineering design is heavily informational, engineers want to retrieve existing engineering documents accurately during the product development process. However, engineers have difficulties searching for documents because of low retrieval accuracy. One of the reasons for this is the limitation of existing document ranking approaches, in which relationships between terms in documents are not considered to assess the relevance of the retrieved documents. Therefore, we propose a new ranking approach that provides more correct evaluation of document relevance to a given query. Our approach exploits domain ontology to consider relationships among terms in the relevance scoring process. Based on domain ontology, the semantics of a document are represented by a graph (called Document Semantic Network) and, then, proposed relation-based weighting schemes are used to evaluate the graph to calculate the document relevance score. In our ranking approach, user interests and searching intent are also considered in order to provide personalized services. The experimental results show that the proposed approach outperforms existing ranking approaches. A precisely represented semantics of a document as a graph and multiple relation-based weighting schemes are important factors underlying the notable improvement.  相似文献   

In this paper a content-based image retrieval method that can search large image databases efficiently by color, texture, and shape content is proposed. Quantized RGB histograms and the dominant triple (hue, saturation, and value), which are extracted from quantized HSV joint histogram in the local image region, are used for representing global/local color information in the image. Entropy and maximum entry from co-occurrence matrices are used for texture information and edge angle histogram is used for representing shape information. Relevance feedback approach, which has coupled proposed features, is used for obtaining better retrieval accuracy. A new indexing method that supports fast retrieval in large image databases is also presented. Tree structures constructed by k-means algorithm, along with the idea of triangle inequality, eliminate candidate images for similarity calculation between query image and each database image. We find that the proposed method reduces calculation up to average 92.2 percent of the images from direct comparison.  相似文献   

An information retrieval system can help users to retrieve documents relevant to the users’ queries. In recent years, some researchers used averaging operators (i.e., Infinite–One operators, Waller–Kraft operators, P-Norm operators and GMA operators) to handle “AND” and “OR” operations of users’ fuzzy queries for fuzzy information retrieval, but they still have some drawbacks, e.g., sometimes query results do not coincide with the intuition of the human being. In this paper, we present new averaging operators, called weighted power-mean averaging (WPMA) operators, based on the weighted power mean for dealing with fuzzy information retrieval to overcome the drawbacks of the existing methods. Furthermore, we also extend the proposed WPMA operators into the extended WPMA operators to handle weighted fuzzy queries for fuzzy information retrieval. The proposed WPMA operators are more flexible and more intelligent than the existing averaging operators to handle users’ fuzzy queries for fuzzy information retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号