首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 69 毫秒
1.
提出一种计算WordNet中概念间语义相似度的算法,该算法同时考虑概念的信息内容(IC)以及2个概念在WordNet is_a关系分类树中的距离信息,由此提高算法性能。给出一种计算概念IC值的新方法,通过考虑概念的子节点数及概念所处WordNet分类树中的深度,使计算结果更精确。与其他5种语义相似度算法的比较结果表明,该算法能够求得更准确的相似度。  相似文献   

2.
给出了一个新的用于计算WordNet中概念的语义相似度的IC(信息内容)模型。该模型以WordNet的is_a关系为基础,只通过WordNet本身结构就可求出WordNet中每个概念的IC值,而不需要其他语料库的参与。该模型不仅考虑了每个概念所包含的子节点的个数,而且将该概念所处WordNet分类树中的深度引入到模型当中,使得概念的IC值更为精确。实验结果显示将该模型代入到多个相似度算法当中,可以明显提高这些算法的性能。  相似文献   

3.
语义异构问题已成为目前异构数据集成领域的研究热点,本体由于其自身的优势而被用于解决语义异构。该文讨论了基于相似度的本体映射方法.着重介绍了基于语法距离、基于WordNet语义字典以及基于结构的相似度计算,最后提出了本体映射生成算法,旨在解决语义异构中本体映射问题。  相似文献   

4.
语义异构问题已成为目前异构数据集成领域的研究热点,本体由于其自身的优势而被用于解决语义异构。该文讨论了基于相似度的本体映射方法,着重介绍了基于语法距离、基于WordNet语义字典以及基于结构的相似度计算,最后提出了本体映射生成算法,旨在解决语义异构中本体映射问题。  相似文献   

5.
当前大部分WordNet词语相似度计算方法由于未充分考虑词语的语义信息和位置关系,导致相似度的准确率降低.为解决上述问题,提出了一种使用词向量模型Word2Vec计算WordNet词语相似度的新方法.在构建WordNet数据集时提出一种新形式,不再使用传统的文本语料库,同时提出信息位置排列方法对数据集加以处理.利用Wo...  相似文献   

6.
异构数据语义集成中本体映射研究   总被引:1,自引:0,他引:1  
异构数据集成的要解决的关键问题之一是异构数据源的语义异构.为了解决语义异构,实现语义集成,本文利用"本体"描述异构数据,并提出一种通过WordNet词典、Google距离等多种方法综合计算本体概念相似度的方法,实现了半自动化本体映射.  相似文献   

7.
基于贝叶斯估计的概念语义相似度算法   总被引:2,自引:0,他引:2  
传统的基于语义距离的概念语义相似度算法不能兼顾客观统计数据,基于信息量的相似度算法又难以获得权威统计样本,针对这些不足,该文提出一种基于贝叶斯估计的概念语义相似度算法。该算法首先假定概念出现概率是符合Beta分布的随机变量,然后基于语义距离的相似度算法计算先验参数,并根据统计样本计算该先验分布下基于最小风险的贝叶斯估计后验参数。随后利用基于信息量的语义相似度算法,便可获得主观经验与客观事实相结合的概念语义相似度。结合WordNet的实验分析表明,该算法与人为主观经验之间具有最大的相关系数。  相似文献   

8.
构建了一个遥感信息领域本体,基于领域本体和WordNet词典对遥感信息服务进行语义扩展,提出了一种基于本体概念相似度的遥感信息服务匹配方法,并对Leacock和Chodorow语义相似度计算模型进行改进。实验结果表明,该改进模型比距离模型和信息量模型都有提高,基于本体概念相似度的遥感信息服务匹配方法的查全率和查准率都能达到70%以上,较关键字匹配方法有显著提高。  相似文献   

9.
冯永  张洋 《计算机应用》2012,32(1):202-205
介绍了传统的基于距离的相似度计算方法,针对其在距离计算中包含语义信息不充足的现状,提出了一种改进的使用WordNet的基于概念之间边的权重的相似性度量方法。该方法综合考虑了概念在词库中所处层次的深度和密度,即概念的语义丰富程度,设计了一种通用的概念语义相似性计算方法,该方法简化了传统语义相似性算法,并解决了语义相似性计算领域的相关问题。实验结果表明,所提方法在Rubenstein数据集上与人工判断有着0.9109的相关性,与其他经典的相似性计算方法相比有着更高的准确性。  相似文献   

10.
文本相似度在信息检索、文本挖掘、抄袭检测等领域有着广泛的应用。目前,大多数研究都只是针对同一种语言的文本相似度计算,关于跨语言文本相似度计算的研究则很少,不同语言之间的差异使得跨语言文本相似度计算很困难,针对这种情况,该文提出一种基于WordNet的中泰文跨语言文本相似度的计算方法。首先对中泰文本进行预处理和特征选择,然后利用语义词典WordNet将中泰文本转换成中间层语言,最后在中间层上计算中泰文本的相似度。实验结果表明,该方法准确率达到82%。  相似文献   

11.
As a valuable tool for text understanding, semantic similarity measurement enables discriminative semantic-based applications in the fields of natural language processing, information retrieval, computational linguistics and artificial intelligence. Most of the existing studies have used structured taxonomies such as WordNet to explore the lexical semantic relationship, however, the improvement of computation accuracy is still a challenge for them. To address this problem, in this paper, we propose a hybrid WordNet-based approach CSSM-ICSP to measuring concept semantic similarity, which leverage the information content(IC) of concepts to weight the shortest path distance between concepts. To improve the performance of IC computation, we also develop a novel model of the intrinsic IC of concepts, where a variety of semantic properties involved in the structure of WordNet are taken into consideration. In addition, we summarize and classify the technical characteristics of previous WordNet-based approaches, as well as evaluate our approach against these approaches on various benchmarks. The experimental results of the proposed approaches are more correlated with human judgment of similarity in term of the correlation coefficient, which indicates that our IC model and similarity detection approach are comparable or even better for semantic similarity measurement as compared to others.  相似文献   

12.
The quantification of the semantic similarity between terms is an important research area that configures a valuable tool for text understanding. Among the different paradigms used by related works to compute semantic similarity, in recent years, information theoretic approaches have shown promising results by computing the information content (IC) of concepts from the knowledge provided by ontologies. These approaches, however, are hampered by the coverage offered by the single input ontology. In this paper, we propose extending IC-based similarity measures by considering multiple ontologies in an integrated way. Several strategies are proposed according to which ontology the evaluated terms belong. Our proposal has been evaluated by means of a widely used benchmark of medical terms and MeSH and SNOMED CT as ontologies. Results show an improvement in the similarity assessment accuracy when multiple ontologies are considered.  相似文献   

13.
14.
In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature-based theory of similarity and translates it into the information theoretic domain, which leverages the notion of Information Content (IC), is presented. In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, an on line experiment asking the community of researchers to rank a list of 65 word pairs has been conducted. The experiment’s web setup allowed to collect 101 similarity ratings and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that the proposed metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC-based metrics. In order to investigate the generality of both the intrinsic IC formulation and proposed similarity metric a further evaluation using the MeSH biomedical ontology has been performed. Even in this case significant results were obtained. The proposed metric and several others have been implemented in the Java WordNet Similarity Library.  相似文献   

15.
Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun “is a” taxonomy, the nominalization relation allowing the use of verb “is a” taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.  相似文献   

16.
The information content (IC) of a concept provides an estimation of its degree of generality/concreteness, a dimension which enables a better understanding of concept’s semantics. As a result, IC has been successfully applied to the automatic assessment of the semantic similarity between concepts. In the past, IC has been estimated as the probability of appearance of concepts in corpora. However, the applicability and scalability of this method are hampered due to corpora dependency and data sparseness. More recently, some authors proposed IC-based measures using taxonomical features extracted from an ontology for a particular concept, obtaining promising results. In this paper, we analyse these ontology-based approaches for IC computation and propose several improvements aimed to better capture the semantic evidence modelled in the ontology for the particular concept. Our approach has been evaluated and compared with related works (both corpora and ontology-based ones) when applied to the task of semantic similarity estimation. Results obtained for a widely used benchmark show that our method enables similarity estimations which are better correlated with human judgements than related works.  相似文献   

17.
Similarity measure of contents plays an important role in TV personalization, e.g., TV content group recommendation and similar TV content retrieval, which essentially are content clustering and example-based retrieval. We define similar TV contents to be those with similar semantic information, e.g., plot, background, genre, etc. Several similarity measure methods, notably vector space model based and category hierarchy model based similarity measure schemes, have been proposed for the purpose of data clustering and example-based retrieval. Each method has advantages and shortcomings of its own in TV content similarity measure. In this paper, we propose a hybrid approach for TV content similarity measure, which combines both vector space model and category hierarchy model. The hybrid measure proposed here makes the most of TV metadata information and takes advantage of the two similarity measurements. It measures TV content similarity from the semantic level other than the physical level. Furthermore, we propose an adaptive strategy for setting the combination parameters. The experimental results showed that using the hybrid similarity measure proposed here is superior to using either alone for TV content clustering and example-based retrieval.  相似文献   

18.
Semantic-oriented service matching is one of the challenges in automatic Web service discovery. Service users may search for Web services using keywords and receive the matching services in terms of their functional profiles. A number of approaches to computing the semantic similarity between words have been developed to enhance the precision of matchmaking, which can be classified into ontology-based and corpus-based approaches. The ontology-based approaches commonly use the differentiated concept information provided by a large ontology for measuring lexical similarity with word sense disambiguation. Nevertheless, most of the ontologies are domain-special and limited to lexical coverage, which have a limited applicability. On the other hand, corpus-based approaches rely on the distributional statistics of context to represent per word as a vector and measure the distance of word vectors. However, the polysemous problem may lead to a low computational accuracy. In this paper, in order to augment the semantic information content in word vectors, we propose a multiple semantic fusion (MSF) model to generate sense-specific vector per word. In this model, various semantic properties of the general-purpose ontology WordNet are integrated to fine-tune the distributed word representations learned from corpus, in terms of vector combination strategies. The retrofitted word vectors are modeled as semantic vectors for estimating semantic similarity. The MSF model-based similarity measure is validated against other similarity measures on multiple benchmark datasets. Experimental results of word similarity evaluation indicate that our computational method can obtain higher correlation coefficient with human judgment in most cases. Moreover, the proposed similarity measure is demonstrated to improve the performance of Web service matchmaking based on a single semantic resource. Accordingly, our findings provide a new method and perspective to understand and represent lexical semantics.  相似文献   

19.
基于本体概念相似度的语义Web服务匹配算法   总被引:15,自引:1,他引:14       下载免费PDF全文
通过定义本体中概念之间的语义距离来计算本体概念之间的相似度,提出一种基于该相似度的Web服务的精确匹配算法,新的算法与经典的OWL-S/UDDI匹配算法比较,不仅在等级上保持一致,而且使同一等级或不同等级之间的服务匹配都达到精确的程度。用GEIS系统中Web服务的数据进行两种算法的性能测试,得出相似度匹配算法的平均查准率是OWL-S/UDDI匹配算法的1.8倍,平均查准率是OWL-S/UDDI匹配算法的1.4倍。  相似文献   

20.
提出了一种利用级联模型来计算本体中概念间相似度的新方法.在模型的第一阶段,采用了基于距离的语义相似度计算方法,计算出概念对在本体中的路径得分;第二阶段,采用IC (Information Content)算法精确计算概念对间相似度得分,并利用概念的公共子代集合对算法进行了扩展;第三阶段我们采用了特征整合策略,将所有的相似性得分构建成特征向量来描述概念对,并且使用权重来平衡第一阶段与第二阶段的相似度结算得分.最后使用BP神经网络确定两个概念的相似性.我们对新提出的语义相似度算法进行了评估,并与现有的方法相比.实验结果表明,该方法有效提高相似度算法的准确性和科学性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号