首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun “is a” taxonomy, the nominalization relation allowing the use of verb “is a” taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.  相似文献   

2.
提出了一种利用级联模型来计算本体中概念间相似度的新方法.在模型的第一阶段,采用了基于距离的语义相似度计算方法,计算出概念对在本体中的路径得分;第二阶段,采用IC (Information Content)算法精确计算概念对间相似度得分,并利用概念的公共子代集合对算法进行了扩展;第三阶段我们采用了特征整合策略,将所有的相似性得分构建成特征向量来描述概念对,并且使用权重来平衡第一阶段与第二阶段的相似度结算得分.最后使用BP神经网络确定两个概念的相似性.我们对新提出的语义相似度算法进行了评估,并与现有的方法相比.实验结果表明,该方法有效提高相似度算法的准确性和科学性.  相似文献   

3.
基于领域本体的概念语义相似度计算研究   总被引:13,自引:4,他引:9  
通过对领域本体参照下传统概念的3种语义相似度的计算模型研究,针对这3种计算模型的优缺点和领域本体所特有的性质,提出了一种改进的基于领域本体的概念语义相似度计算模型.实验结果表明,该计算模型通过定量的分析利用本体构词所描述的概念、特性之间的相似度,可以指导基于领域知识本体的语义查询中概念集扩充和查询结果排序,为概念之间的语义关系提供一种有效的量化.  相似文献   

4.
The quantification of the semantic similarity between terms is an important research area that configures a valuable tool for text understanding. Among the different paradigms used by related works to compute semantic similarity, in recent years, information theoretic approaches have shown promising results by computing the information content (IC) of concepts from the knowledge provided by ontologies. These approaches, however, are hampered by the coverage offered by the single input ontology. In this paper, we propose extending IC-based similarity measures by considering multiple ontologies in an integrated way. Several strategies are proposed according to which ontology the evaluated terms belong. Our proposal has been evaluated by means of a widely used benchmark of medical terms and MeSH and SNOMED CT as ontologies. Results show an improvement in the similarity assessment accuracy when multiple ontologies are considered.  相似文献   

5.
Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the structured knowledge available in domain ontologies (such as SNOMED-CT or MeSH) and specific, closed and reliable corpora (such as clinical data). However, in recent years, the enormous growth of the Web has motivated researchers to start using it as the corpus to assist semantic analysis of language. This paper proposes and evaluates the use of the Web as background corpus for measuring the similarity of biomedical concepts. Several ontology-based similarity measures have been studied and tested, using a benchmark composed by biomedical terms, comparing the results obtained when applying them to the Web against approaches in which specific clinical data were used. Results show that the similarity values obtained from the Web for ontology-based measures are at least and even more reliable than those obtained from specific clinical data, showing the suitability of the Web as information corpus for the biomedical domain.  相似文献   

6.
In recent studies, ontology related concepts have been introduced into FIPA ACL content language to convey information for agent communication. However, these works have only applied ontology-based knowledge representation in communication message and then demonstrated the advantage of this association. In fact, although ontology can represent semantic implications needed for decidable reasoning support, it has no mechanism for defining complex rule-based representation to support inference. The motivation of this study is to address this issue by developing a semantic-based infrastructure to integrate Semantic Web technologies into ACL message contents. This semantic-based infrastructure defines two different semantic frameworks: the three-tier knowledge representation framework for message content and the Multi-layer Ontology Architecture for content language. The former is developed based on Semantic Web stack to support ontology-based reasoning and rule-based inference. The latter is adopted to develop a Lightweight Ontology-based Content Language (LOCL) to describe agent communication messages in an unambiguous and computer-interpretable way Jena reasoner is used in an application scenario that exploits agent communication with LOCL as content language, OWL as ontology language, and SWRL as rule language to demonstrate the feasibility of the proposed infrastructure.  相似文献   

7.
基于关键词处理的传统检索技术会在检索过程中遗漏大量与检索概念相关或同义的内容。本文在本体基础上重点研究语义相似度算法及相应的语义扩展算法,在此基础上将模型应用于数字期刊的信息检索中,以提高查准率和查全率。  相似文献   

8.
基于本体的概念语义相似度近年来在信息科学的多个领域获得了广泛的应用,其计算方法也为诸多学者所关注。分析现有基于本体的概念语义相似度计算方法的工作原理和优缺点,提出一种对概念共享路径的重合度和概念最低共同祖先节点的深度进行综合加权的概念语义相似度算法。该算法灵活简便、可扩展性强,能够应用于不同类型的本体。使用基因本体和植物本体的部分数据进行了实验并与两种现有算法进行了比较,实验结果证明了提出的计算方法的正确性和有效性。  相似文献   

9.
The estimation of semantic similarity between words is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modelled in an ontology have been proposed. However, in many domains, knowledge is dispersed through several partial and/or overlapping ontologies. Because most previous works on semantic similarity only support a unique input ontology, we propose a method to enable similarity estimation across multiple ontologies. Our method identifies different cases according to which ontology/ies input terms belong. We propose several heuristics to deal with each case, aiming to solve missing values, when partial knowledge is available, and to capture the strongest semantic evidence that results in the most accurate similarity assessment, when dealing with overlapping knowledge. We evaluate and compare our method using several general purpose and biomedical benchmarks of word pairs whose similarity has been assessed by human experts, and several general purpose (WordNet) and biomedical ontologies (SNOMED CT and MeSH). Results show that our method is able to improve the accuracy of similarity estimation in comparison to single ontology approaches and against state of the art related works in multi-ontology similarity assessment.  相似文献   

10.
在信息检索研究领域,资源与查询词的匹配决定信息检索质量。现有检索方法的检索结果存在过多不相关信息,不能很好满足用户需求。针对传统信息检索存在的问题与当前语义查询扩展方法的特点,本文在分析各种语义查询扩展方法及其相关研究的基础上,提出一种改进的基于领域本体的语义查询扩展方法。该方法论通过本体模型和概念相似度的计算对检索信息进行检索意图树的构建并扩展;然后在资源本体中以最短路径的方式搜索资源。实验结果表明,本文方法相较其他查询扩展方法能得到更好的检索结果。  相似文献   

11.
本体映射的核心在于语义相似度算法,单一的概念相似度计算方法往往不利于提高相似度的精度。本文针对机械零部件领域本体(MPO)提出一种基于本体加权树的语义相似度算法OWSTS,利用MPO提取领域知识文档标题信息中的核心概念,并结合OWSTS算法来确定文档信息与查询式间的语义关联程度。该方法在GB_MPO智能信息检索系统中得到较好的应用。实验表明,该方法与基于TF*IDF的信息检索方法相比,检索性能有较大提高。  相似文献   

12.
In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature-based theory of similarity and translates it into the information theoretic domain, which leverages the notion of Information Content (IC), is presented. In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, an on line experiment asking the community of researchers to rank a list of 65 word pairs has been conducted. The experiment’s web setup allowed to collect 101 similarity ratings and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that the proposed metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC-based metrics. In order to investigate the generality of both the intrinsic IC formulation and proposed similarity metric a further evaluation using the MeSH biomedical ontology has been performed. Even in this case significant results were obtained. The proposed metric and several others have been implemented in the Java WordNet Similarity Library.  相似文献   

13.
As a valuable tool for text understanding, semantic similarity measurement enables discriminative semantic-based applications in the fields of natural language processing, information retrieval, computational linguistics and artificial intelligence. Most of the existing studies have used structured taxonomies such as WordNet to explore the lexical semantic relationship, however, the improvement of computation accuracy is still a challenge for them. To address this problem, in this paper, we propose a hybrid WordNet-based approach CSSM-ICSP to measuring concept semantic similarity, which leverage the information content(IC) of concepts to weight the shortest path distance between concepts. To improve the performance of IC computation, we also develop a novel model of the intrinsic IC of concepts, where a variety of semantic properties involved in the structure of WordNet are taken into consideration. In addition, we summarize and classify the technical characteristics of previous WordNet-based approaches, as well as evaluate our approach against these approaches on various benchmarks. The experimental results of the proposed approaches are more correlated with human judgment of similarity in term of the correlation coefficient, which indicates that our IC model and similarity detection approach are comparable or even better for semantic similarity measurement as compared to others.  相似文献   

14.
In this paper we present an enhanced multi-modality ontology-based approach for web image retrieval step by step. Several ontology-based approaches have been made in the field of multimedia retrieval. Our multi-modality approach is one of the earliest attempts to integrate information from different modalities and apply the model in a complex domain. In order to develop the model, we need to answer the following questions: (1) how to find the proper structure and construct an ontology which can integrate information from different modalities; (2) how to quantify the matching degree (concept similarity) and provide an independent ranking mechanism; (3) how to ensure the scalability of this approach when applied to large domains. The first question has been answered by our multi-modality ontology which has been discussed in Wang et al. (Does ontology help in image retrieval? In: Asia-Pacific workshop on visual information processing, 2006) and its extension (Wang et al., Does ontology help in image retrieval?—a comparison between keyword, text ontology and multi-modality ontology approaches, ACM Press, New York, NY, USA, pp 109–112, 2006). More details about this work is given later. The main focus of this paper is that we propose a new ranking mechanism using Spearman’s ranking correlation to measure the similarity of concepts in the ontology. We take the priorities of information from different modalities into consideration. This algorithm gives the answer of the second question. The semantic matchmaking result is quantized and the degree of similarity between concepts is calculated. For the third question, importing of ontology will resolve the scalability issue but computing concept similarity and identify relationships when integrating different ontologies will be beyond the scope of this paper. To convince readers that our multi-modality ontology and concept similarity ranking is the right step forward, we decided to work on the animal kingdom. We believe this domain is challenging as demonstrated by images depict animals in a wide range of aspects, pose, configurations and appearances. We experimented with a data sets of 4,000 web images. Based on ground truth, we analyze the image content and text information, build up the enhanced multi-modality ontology and compare the retrieval results. Results show that we can even classify close animal species which share similar appearances and we can infer their hidden relationships from the canine family graph. By assigning a ranking to the semantic relationships we show unequivocal evidence that our improved model achieves good accuracy and performs comparable result with the Google re-ranking result in our previous work.  相似文献   

15.
一种基于本体的概念相似度计算及其应用   总被引:2,自引:0,他引:2  
概念的语义相似度研究,是知识表示以及信息检索领域中的一个重要内容。本文提出了基于语义相似度和相关度的综合概念相似度计算方法,考虑了语义距离和本体库特征,加入概念的信息重合度、概念的深度、概念的密度和不对称因子的辅助影响。通过实验和两种传统的语义相似度计算方法进行对比,本方法能更好地区分本体树中不同关系的概念对,验证了该方法的有效性。  相似文献   

16.
针对边计算法的语义相似度计算优化算法   总被引:1,自引:0,他引:1  
概念语义相似度计算是诸多应用普遍面临的问题。文中以简化单本体内概念语义相似度计算为出发点,提出针对边计算法的相似度计算优化算法。利用本体概念间的层次关系优化相似度计算过程。优化算法依据本体内一对概念间的语义相似度求出本体内所有概念间的语义相似度。仿真实验表明,优化算法能有效降低语义相似度计算复杂度,计算速度提高约一倍。  相似文献   

17.
针对目前中文词语语义相似度方法中,基于信息内容的算法研究不足的问题,对知网信息模型上使用基于信息内容的中文词语相似度算法进行了研究。根据知网采用语义表达式表示知识而缺乏完整概念结构的特点,通过抽取知网语义表达式中的抽象概念,结合原知网义原树构建具有多重继承特征的知网义项网作为基于信息内容的计算本体。根据该义项网,对基于信息内容的词语相似度算法进行了改进,提出了新的信息内容含量计算方法。经过Miller&Charles(MC30)基准平台的测试,验证了基于信息内容方法在计算中文语义相似度方面的可行性,也证明了本文的计算策略和改进算法的合理性。  相似文献   

18.
19.
基于本体结构的概念间语义相似度算法   总被引:2,自引:0,他引:2       下载免费PDF全文
针对本体模型的结构特点,从模型概念间的宽度、深度、密度等方面分析本体概念相似度的计算,将其合并为结构因素。结合语义重合度、语义距离等影响相似度的因素综合考虑,提出一种基于本体结构的计算概念间语义相似度的算法。通过建立本体模型并进行实验分析,总结出本体结构方面各因素对本题概念语义相似度的影响。  相似文献   

20.
视频数据的不断丰富以及人们对视频检索的要求越来越复杂,使得视频语义信息建模和高层语义概念提取逐渐成为视频检索中的重要组成部分.本文提出一种基于本体的视频语义概念检测方法,利用贝叶斯网络构造视频中概念语义关系的检测本体,构建了视频中概念之间的层次关系,并能够通过推理完成复合语义概念的检测.该方法从语义信息学的角度对视频内容进行分析,在一定程度上削弱了语义鸿沟的影响,并且取得了较好的查询结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号