首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
针对相关实体发现中基于Wikipedia的实体排序存在的问题:半自动的目标类型获取、粗粒度的目标类型、实体类型相关度二值判断、实体关系相关度计算未考虑停止词作用.设计了一个实体排序框架,从实体相关度、实体类型相关度和实体关系相关度3方面的组合计算来对实体进行排序,通过对比多种组合方法获取了最优的方法.提出了一种新的实体类型相关度计算方法,该方法可以自动获取细粒度的目标实体类型,并通过归纳学习获取其下义Wikipedia类别判别规则集合,通过统计候选实体类别信息中符合目标类型下义类别判别规则的类别数来计算实体类型相关度.提出了一种"去停止词重构关系"方法计算候选实体和源实体的关系相关度.实验表明提出的方法可以有效地提高实体排序效果并且降低计算时间耗费.  相似文献   

2.
针对民航突发事件领域本体非分类关系的语义查询扩展问题,提出一种面向领域本体非分类关系的语义相关度计算方法。该方法在数据属性方面,提出基于属性类型和属性值的语义相关度计算方法;在对象属性方面,针对查询词与本体概念或实例存在多种对象属性问题,提出基于对象属性的语义相关度计算方法,并结合领域本体在民航突发事件案例的语义查询过程给出了该方法的实现效果分析。该方法不仅有效地提高了语义查询的准确率和查全率,也为民航突发事件的应急决策提供了更好的方法支持。  相似文献   

3.
洪立印  徐蔚然 《软件》2013,(12):148-151
WAF(词激活力)是一种基于统计的描述词与词关系的算法,WAF不单纯是考虑的词之间的关联,还考虑了词前后顺序,词与词之间的距离,包含了概率和语言规则两种信息量。本文提出一种实体结构化数据的关系特征抽取算法,并基于该特征实现实体聚类。首先提取出实体结构化数据的语义和语境特征,以此来文本建模,然后对每个属性基于WAF值进行相似度计算,最后进行实体聚类。  相似文献   

4.
实体属性挖掘(slot filling,SF)旨在从大规模文档集中挖掘给定实体(称作查询)的特定属性信息。实体搜索是SF的重要组成部分,负责检索包含给定查询的文档(称为相关文档),供后续模块从中抽取属性信息。目前,SF领域关于实体搜索的研究较少,使用的基于布尔逻辑的检索模型忽略了实体查询的特点,仅使用查询的词形信息,受限于查询歧义性,检索结果准确率较低。针对这一问题,该文提出一种基于跨文档实体共指消解(cross document coreference resolution,CDCR)的实体搜索模型。该方法通过对召回率较高但准确率较低的候选结果进行CDCR,过滤不包含与给定实体共指实体的文档,提高检索结果的准确率。为了降低过滤造成的召回率损失,该文使用伪相关反馈方法扩充查询实体的描述信息。实验结果显示,相比于基准系统,该方法能有效提升检索结果,准确率和F1分别提升5.63%、2.56%。  相似文献   

5.
提出一个新型的面向分布式环境的信任评估模型:GTruMod。GTruMod基于后验概率思想分析实体间的协作历史,推导实体间直接信任,基于信任的社会模型特点计算实体的信誉,综合直接信任和信誉给出信任度的计算方法。基于该模型,该文提出一种基于图搜索方法对实体间信任度实施评估的算法,分析了该算法的复杂度。通过实验分析了该模型的评估性能特征,说明模型具有防范恶意推荐的能力。  相似文献   

6.
实体属性值抽取是信息抽取的重要组成部分.针对数量型属性类型多样以及取值易变的问题,设计实现了一种基于元性质的数量型属性值自动抽取系统.对系统的结构、功能框架以及相关核心技术,包括提取文本的选择、候选值的提取及评估、结果的自动验证等进行了详细讨论.通过对百度百科的五大类9个子类实体数量型属性值的抽取,平均准确率和召回率分别达到71%和89%,高于基于简单搜索的方法和传统的基于词汇-句模的方法.该方法适用于开放领域的数量型属性值获取,易于获取单值属性的精确取值.  相似文献   

7.
从非结构化商品描述文本中抽取结构化属性信息,对于电子商务实现商品的对比与推荐及用户需求预测等功能具有重要意义.现有结构化方法大多采用监督或半监督的分类方法抽取属性值与属性名,通过文法分析器分析属性值与属性名之间的文法依存关系,并根据关联规则实现属性值与属性名的匹配.这些方法存在以下不足:(1)需要人工标记部分属性值、属性名及它们之间的对应关系;(2)属性值-属性名匹配的准确度受到语言习惯、句意逻辑、语料库及属性名候选集质量的严重制约.提出了一种无监督的中文商品属性结构化方法.该方法借助搜索引擎,基于小概率事件原理分析文法关系来抽取属性值与属性名.同时,提出相对不选取条件概率场,并使用Page Rank算法来计算属性值与属性名的配对概率.该方法无需人工标记的开销,且无论商品描述中是否显式地包含相应的属性名,该方法都能自动抽取到属性值并匹配相应的属性名.使用百度搜索引擎上的真实语料,针对4类商品的中文描述进行了实验.实验结果验证了对于候选属性名的自动生成,所提出的基于搜索引擎搜索属性值,并在包含属性值的搜索结果中抽取一般名词的候选属性名生成方法与只在描述句中抽取一般名词的候选属性名生成方法相比,查全率提高了20%以上;对于非量化类属性,所提出的基于相对不选取条件概率场的属性值-属性名匹配方法与基于依存关联的方法相比,Rank-1的准确率提高了30%以上,平均MRR提高了0.3以上.  相似文献   

8.
属性抽取是构建知识图谱的关键一环,其目的是从非结构化文本中抽取出与实体相关的属性值.该文将属性抽取转化成序列标注问题,使用远程监督方法对电商相关的多种来源文本进行自动标注,缓解商品属性抽取缺少标注数据的问题.为了对系统性能进行精准评价,构建了人工标注测试集,最终获得面向电商的多领域商品属性抽取标注数据集.基于新构建的数...  相似文献   

9.
传统的实体识别中,往往是利用字符串相似性函数来计算元组对在每个属性值上的相似度从而来判断其总的相似性(例如,元组对的相似性等于每个属性值上的相似度的加权求和)。然而这一类相似性测度不能够反映属性值内部不同的词在元组对相似性计算中的不同重要性。由于不能区分哪些词对元组对匹配更重要,就导致仍然存在某些匹配的元组相似性不高,而不匹配的元组相似性高的情况,故很难将匹配元组对和不匹配元组对有效区分开。为了解决这个问题,提出了以词为特征的距离度量函数,设计了基于词特征的距离度量学习算法,和基于距离度量的实体识别算法。扩展性实验对所提出的算法的有效性进行了验证。  相似文献   

10.
根据给定查询实体与知识图谱(Knowledge Graph,KG)中其他实体的相关程度对实体进行排序,是相关实体搜索的重要支撑技术.实体间的相关性不仅体现在KG中,还体现在快速产生的Web文档中.现有的方法主要根据KG来计算实体间的相关度,但KG无法及时地反映真实世界中快速演化的知识,导致计算结果不够客观.因此,本文首先基于TransH模型提出一种候选实体搜索算法,通过分析实体在不同关系超平面中的语义表示来针对不同关系选择候选实体.为了提高候选实体排序的准确性,提出实体无向带权图模型(Entity Undirected Weighted Graph,EUWG),通过量化查询实体与候选实体在Web文档和KG中反映出的相关性,从而准确地对候选实体进行排序.实验结果表明,本文的方法能够在大规模KG中准确地搜索候选实体并对其正确排序.  相似文献   

11.
With the rapid growth of Web databases,it is necessary to extract and integrate large-scale data available in Deep Web automatically.But current Web search engines conduct page-level ranking,which are becoming inadequate for entity-oriented vertical search.In this paper,we present an entity-level ranking mechanism called LG-ERM for Deep Web queries based on local scoring and global aggregation.Unlike traditional approaches,LG-ERM considers more rank influencing factors including the uncertainty of entity...  相似文献   

12.
Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, should be able to search and query data spread over potentially large numbers of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipedia-based semantic relatedness measure and spreading activation. Wikipedia-based semantic relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBpedia achieved a mean reciprocal rank of 61.4%, an average precision of 48.7% and average recall of 57.2%.  相似文献   

13.
This paper develops methods for calculating the semantic similarity (closeness)-relatedness of natural language words. The concept of semantic relatedness allows one to construct algorithmic models for the context-linguistic analysis with a view to solving problems such as word sense disambiguation, named entity recognition, natural language text analysis, etc. A new algorithm is proposed for estimating the semantic distance between natural language words. This method is a weighted modification of the well-known Lesk approach based on the lexical intersection of glossary entries.  相似文献   

14.
The availability of encyclopedic Linked Open Data (LOD) paves the way to a new generation of knowledge-intensive applications able to exploit the information encoded in the semantically-enriched datasets freely available on the Web. In such applications, the notion of relatedness between entities plays an important role whenever, given a query, we are looking not only for exact answers but we are also interested in a ranked list of related ones. In this paper we present an approach to build a relatedness graph among resources in the DBpedia dataset that refer to the IT domain. Our final aim is to create a useful data structure at the basis of an expert system that, looking for an IT resource, returns a ranked list of related technologies, languages, tools the user might be interested in. The graph we created is a basic building block to allow an expert system to support the user in entity search tasks in the IT domain (e.g. software component search or expert finding) that goes beyond string matching typical of pure keyword-based approaches and is able to exploit the explicit and implicit semantics encoded within LOD datasets. The graph creation relies on different relatedness measures that are combined with each other to compute a ranked list of candidate resources associated to a given query. We validated our tool through experimental evaluation on real data to verify the effectiveness of the proposed approach.  相似文献   

15.
16.
17.
基于HLA的指挥Agent开发研究   总被引:7,自引:0,他引:7  
作战指挥决策是聚合级作战仿真系统中一个重要的研究方向,解决好指挥问题对扩大仿真规模,增强指挥实体的自主能力以及提高仿真结果的可信度等方面都有非常重要的意义。高层体系结构HLA是以提高仿真的可重用性,互操作性等目的而提出的新一代分布交互仿真体系结构,该文探讨了在HLA框架中开发作战指挥实体的关键技术及设计框架。  相似文献   

18.
Structured knowledge bases are an increasingly important way for storing and retrieving information. Within such knowledge bases, an important search task is finding similar entities based on one or more example entities. We present QBEES, a novel framework for defining entity similarity based on structural features, so-called aspects and maximal aspects of the entities, that naturally model potential interest profiles of a user submitting an ambiguous query. Our approach based on maximal aspects provides natural diversity awareness and includes query-dependent and query-independent entity ranking components. We present evaluation results with a number of existing entity list completion benchmarks, comparing to several state-of-the-art baselines.  相似文献   

19.
Text is composed of words and phrases. In the bag‐of‐words model, phrases in text are split into words. This may discard the semantics of phrases, which, in turn, may give an inconsistent relatedness score between 2 texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve text relatedness performance. We adopt 2 existing word relatedness measures based on Google n‐gram and Global Vectors for Word Representation, respectively, and incorporate them differently with an existing Google n‐gram–based phrase relatedness method to compute text relatedness. The combination of Google n‐gram–based word and phrase relatedness performs better than Google n‐gram–based word relatedness alone, by achieving the higher weighted mean of Pearson's r, ie, 0.639 and 0.619, respectively, on the 14 data sets from the series of Semantic Evaluation workshops SemEval‐2012, SemEval‐2013, and SemEval‐2015. Similarly, the combination of GloVe‐based word relatedness and Google n‐gram–based phrase relatedness performs better than GloVe‐based word relatedness alone, by achieving the higher weighted mean of Pearson's r, ie, 0.619 and 0.605, respectively, on the same 14 data sets. On the SemEval‐2012, SemEval‐2013, and SemEval‐2015 data sets, the text relatedness results obtained from the combination of Google n‐gram–based word and phrase relatedness ranked 24, 3, and 31 out of 89, 90, and 73 text relatedness systems, respectively.  相似文献   

20.
Because of users’ growing utilization of unclear and imprecise keywords when characterizing their information need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms occurring in the largest possible number of documents where the query keywords appear; (2) proximity, where more importance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria simultaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the retrieval performance as compared to the baseline.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号