首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 115 毫秒
1.
提出一种基于BP神经网络的二步检查法实体匹配新算法,将基于学习的思想引入到异构数据库实体匹配领域中,避开了传统方法计算属性权重的问题。实验结果显示,该算法很有效,能明显提高实体匹配的查准率,有较强的环境动态适应性,可以实现实体匹配的自动化。  相似文献   

2.
近年来,针对多源异构数据的实体匹配问题,已经有诸多学者提出不同的解决方法。然而,这些方法几乎都集中在RDFS或OWL等语义框架下进行实体匹配,不具有通用性。此外,针对多数据源实体匹配问题,目前主流解决方式是将其转换为多组两两数据源的实体匹配问题,该种方式直接进行两两匹配的计算复杂度过高,且没有从多数据源全局的角度分析问题。从这些问题出发,提出了一种的实体匹配方法,利用了实体中普遍存在的名称、属性和上下文信息,构建多种索引,缩减计算空间同时生成高质量的候选集;还定义了度量实体相似度的计算方法,有效地判别了实体对是否匹配。并根据实体间边的权重以及互斥关系,提出一种基于图划分的优化算法,划分多个等价实体构成的集合。从互联网中抓取商业领域下品牌和人物类别的真实数据进行实验测试,实验结果表明该方法取得了良好的效果。  相似文献   

3.
基于BP神经网络的属性匹配方法研究   总被引:2,自引:0,他引:2  
为了实现异构数据库的数据共享,关键的问题就是要找出数据库间的相同属性。目前主要采用的方法是通过比较所有的属性来实现属性的相似性匹配,但是当同一属性用不同数据类型表示时,由于描述属性的元数据信息和取值信息的极大差异性,这些方法就不能找出相同的属性。并且将不同数据类型描述的属性放在一起匹配,还会造成属性数据之间的干扰,影响匹配结果的准确性。为此,本文提出一种基于BP神经网络的二步检查法属性匹配算法。该算法中属性首先根据数据类型进行分类,然后用分类后的属性集分别多次训练神经网络,并对每次的匹配结果求交集作为最终的属性匹配结果,进行两阶段检查,即二步检查法。该算法能有效地消除不一致信息的干扰,降低神经网络的规模,并且可以实现不同数据类型的属性集之间属性匹配过程的并行计算。实验结果显示本文提出的方法能明显地提高系统的运行效率、属性匹配的查准率和查全率。  相似文献   

4.
物联网中存在大量异构关系的实体,其信息间的交互造成了物联网的内在矛盾。针对这一问题,提出将物联网中实体抽象化为对应的Agent,并以个体人为中心,利用本体的语义匹配及改进的物间动态关系计算方法,计算出针对个体人的物间动态关系;将原本异构的实体关系转化为Agent间基于动态关系紧密度排序的网络关系。经实验验证,该方法构建的关系网络可明显改善异构实体间交互的矛盾,而且相比原有类似方法,该方法解决问题的准确率和误差率均有改善,说明了该方法的准确性和可行性。  相似文献   

5.
针对已有证据理论(DS)方法在深层网接口集成方面的局限性,设计一种基于概念词与语义异构模型的深层网模式匹配方法。通过提取概念词对概念词模型进行预处理,识别并组合成组属性,使m︰n的复杂匹配转变为1︰1的简单匹配,提高系统执行速度。在语义异构模型中引入属性实例,将挖掘语义异构的同义属性问题,转化为对属性间各特征相似值的计算、综合评测和选取问题。实验结果表明,该方法在匹配效率和准确率上较DS方法有较大改进。  相似文献   

6.
杨丹  陈默  王刚  孙良旭 《计算机科学》2017,44(5):189-192, 205
随着实体搜索成为信息检索的一种新趋势,实体推荐也成为业界和学术界的热门研究问题之一。异构信息空间中的异构实体间彼此相互关联,因此跨类型实体推荐至关重要。此外,异构实体具有时间信息,异构信息空间中的实体不断随时间演化,用户希望得到在时间上最相关的实体推荐。提出一个时间感知的跨类型实体推荐框架T-ERe,利用异构实体间丰富的关联关系和查询日志实现跨类型的实体推荐。T-ERe考虑实体的时间信息和查询的时间上下文, 给用户推荐时间上最相关的多种类型的实体。在真实数据集上的实验结果表明了T-ERe的可行性和有效性。  相似文献   

7.
以流程工厂协同设计应用为背景,提出基于允许误差的最大语义图匹配(MSMGE)算法的异构图形数据近似语义匹配模型。利用类无向图来描述2D和3D异构图形数据的工程属性和拓扑关系,消除了图形信息的异构性,并建立各种类实体的属性标签词典来消除2D和3D属性信息的异构性,用语义表达式来表示类无向图顶点和边的语义关系,将异构图形匹配转化为近似语义图匹配。通过基于工程语义对类无向图进行语义分割和基于最大公共序列算法的语义表达式比较、语义规整和语义裁剪等方法,降低了匹配搜索空间,提高了近似语义图匹配效率,实现了近似语义图匹配判断。该研究已经在流程工厂设计软件中得到较好地应用。  相似文献   

8.
沈江  余海燕  徐曼 《自动化学报》2015,41(4):832-842
针对多属性群决策中可解释性证据融合推理的实体异构性问题,给出了一个实体异构性下证据链融合推理的多属性群决策方法.基于证据推理理论,引入证据链关联的概念,从多数据表提供的数据矩阵中获取可区分的近邻证据集,推导了各数据表的相似度矩阵,并构建半正定矩阵的二次优化模型,共享群决策专家的经验知识.使用Dempster正交规则,论证了异构实体之间可解释性推理中可信度融合的合理性,并使用证据融合规则集成各个数据表的近邻证据中获得的可信度,验证了调和多源异构数据中不一致信息的有效性.通过具有实体异构性的心脏病多决策数据诊断实例说明了方法的可行性与合理性.  相似文献   

9.
基于异构信息网络嵌入的推荐技术能够有效地捕捉网络中的结构信息,从而提升推荐性能.然而现有的基于异构信息网络嵌入的推荐技术不仅忽略了节点的属性信息与节点间多种类型的边关系,还忽略了节点不同的属性信息对推荐结果不同的影响.为了解决上述问题,提出一个自注意力机制的属性异构信息网络嵌入的商品推荐(attributed heterogeneous information network embedding with self-attention mechanism for product recommendation, AHNER)框架.该框架利用属性异构信息网络嵌入学习用户与商品统一、低维的嵌入表示,并在学习节点嵌入表示时,考虑到不同属性信息对推荐结果的影响不同和不同边关系反映用户对商品不同程度的偏好,引入自注意力机制挖掘节点属性信息与不同边类型所蕴含的潜在信息并学习属性嵌入表示.与此同时,为了克服传统点积方法作为匹配函数的局限性,该框架还利用深度神经网络学习更有效的匹配函数解决推荐问题.AHNER在3个公开数据集上进行大量的实验评估性能,实验结果表明AHNER的可行性与有效性.  相似文献   

10.
网格建立在服务的基础之上,使得空间数据库在网格环境中面临着如何提供相应的数据库服务、异构系统间的空间数据集成、空间数据如何进行互操作等问题。该文针对网格环境下异构空间数据库之间数据集成的困难,提出将OGSA-DAI中间件技术引入到空间数据库系统应用中,对OGSA-DAI进行空间数据访问的扩展,以实现异构空间数据库的有效集成,通过一个应用实例验证该方法的可行性。  相似文献   

11.
杨丹  陈默  申德荣 《计算机科学》2017,44(2):112-116
异构信息空间中的实体和关联关系普遍具有时间信息、多种时间版本的实体数据共存,而传统的实体集成忽略了时间信息,不支持时间维度上的集成。提出一种异构信息空间中时间感知的实体集成框架T-EI,从大量异构实体数据中聚集事实形成干净的、完整的、具有时间信息的实体概貌,进而支持时间感知的实体搜索。T-EI利用实体及关联关系所具有的时间信息提出时间感知的实体识别算法,并通过考虑数据时效性提出时间感知的数据融合算法。在真实数据集上的实验结果表明了T-EI的可行性和有效性。  相似文献   

12.
佘俊  张学清 《计算机应用》2010,30(11):2928-2931
为了能快速、准确地将分散在Web网页中的音乐实体抽取出来,在全方位了解音乐领域中命名实体的特征的基础上,提出了一种规则与统计相结合的中文音乐实体识别方法,并实现了音乐命名实体识别系统。通过测试发现,该系统具有较高的准确率和召回率。  相似文献   

13.
One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named entities, such as persons, companies, geographic locations, in Web documents. The names of these extracted entities are, however, non-unique; the same name on different Web pages might or might not refer to the same entity. The entity matching problem concerns of identifying the entities, which are referring to the same real-world entity. This problem is very similar to the entity resolution problem studied in relational databases, however, there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the entities.Similarity functions try to capture the degree of belief about the equivalence of two entities, thus they play a crucial role in entity matching. The accuracy of the similarity functions highly depends on the applied assessment techniques, but also on some specific features of the entities. We propose systematic design strategies for combined similarity functions in this context. Our method relies on the combination of multiple evidences, with the help of estimated quality of the individual similarity values and with particular attention to missing information that is common in Web context. We study the effectiveness of our method in two specific instances of the general entity matching problem, namely the person name disambiguation and the Twitter message classification problem. In both cases, using our techniques in a very simple algorithmic framework we obtained better results than the state-of-the-art methods.  相似文献   

14.
Heterogeneities exist in a multidatabase environment. For example, a real world entity may be differently represented in relations of different databases. In particular, keys of these relations may be incompatible. In this paper, we consider processing entity join queries when data transmission cost dominates. An entity join operation ‘integrates’ tuples representing the same entities from different relations in which inconsistent data may exist. A natural way to process the entity join is to transmit both relations to a site, resolve the possible conflicts between corresponding attributes and process the join, which is very costly. In this paper, an approach is proposed to correctly transform a global query into local subqueries to preprocess entity join queries in multiple sites with an attempt to lower the cost of data transmission. Besides, an extension of the traditional semijoin, named extended semijoin, is proposed to further reduce the cost of data transmission for entity join query processing.  相似文献   

15.
利用维基百科(Wikipedia)和已有命名实体资源,提出维基百科类的隶属度计算方法,通过匹配、计算、过滤、扩展、去噪五个步骤构建出具有较高质量和较大规模的命名实体实例集.在英语维基百科数据上进行实验,结果显示,基于隶属度方法自动获取的人名实例规模较DBpedia抽取出的人名实例规模高出近10倍,通过对不同隶属度区间的抽取实例进行人工检验,发现抽取出的前15000个维基百科类的准确率达到99%左右,能够有效支持命名实体类实例的扩充.  相似文献   

16.
In the Big Data era, ever-increasing RDF data have reached a scale in billions of entities and brought challenges to the problem of entity linkage on the Semantic Web. Although millions of entities, typically denoted by URIs, have been explicitly linked with owl:sameAs, potentially coreferent ones are still numerous. Existing automatic approaches address this problem mainly from two perspectives: one is via equivalence reasoning, which infers semantically coreferent entities but probably misses many potentials; the other is by similarity computation between property-values of entities, which is not always accurate and do not scale well. In this paper, we introduce a bootstrapping approach by leveraging these two kinds of methods for entity linkage. Given an entity, our approach first infers a set of semantically coreferent entities. Then, it iteratively expands this entity set using discriminative property-value pairs. The discriminability is learned with a statistical measure, which does not only identify important property-values in the entity set, but also takes matched properties into account. Frequent property combinations are also mined to improve linkage accuracy. We develop an online entity linkage search engine, and show its superior precision and recall by comparing with representative approaches on a large-scale and two benchmark datasets.  相似文献   

17.
知识库问答实体链接任务需要将问句内容精准链接到知识库中实体.当前方法大多难以兼顾链接实体的召回率和精确率,并且仅能根据文本信息对实体进行区分筛选.因此,文中在合并子步骤的基础上,提出融合多维度特征的知识库问答实体链接模型(MDIIEL).通过表示学习方法,将文本符号、实体和问句类型、实体在知识库中语义结构表达等信息整合并引至实体链接任务中,加强对相似实体的区分,在提高准确率的同时降低候选集的大小.实验表明,MDIIEL模型在实体链接任务性能上具有整体性提升,在大部分指标上取得较优的链接结果.  相似文献   

18.
As social media and e-commerce on the Internet continue to grow, opinions have become one of the most important sources of information for users to base their future decisions on. Unfortunately, the large quantities of opinions make it difficult for an individual to comprehend and evaluate them all in a reasonable amount of time. The users have to read a large number of opinions of different entities before making any decision. Recently a new retrieval task in information retrieval known as Opinion-Based Entity Ranking (OpER) has emerged. OpER directly ranks relevant entities based on how well opinions on them are matched with a user's preferences that are given in the form of queries. With such a capability, users do not need to read a large number of opinions available for the entities. Previous research on OpER does not take into account the importance and subjectivity of query keywords in individual opinions of an entity. Entity relevance scores are computed primarily on the basis of occurrences of query keywords match, by assuming all opinions of an entity as a single field of text. Intuitively, entities that have positive judgments and strong relevance with query keywords should be ranked higher than those entities that have poor relevance and negative judgments. This paper outlines several ranking features and develops an intuitive framework for OpER in which entities are ranked according to how well individual opinions of entities are matched with the user's query keywords. As a useful ranking model may be constructed from many ranking features, we apply learning to rank approach based on genetic programming (GP) to combine features in order to develop an effective retrieval model for OpER task. The proposed approach is evaluated on two collections and is found to be significantly more effective than the standard OpER approach.  相似文献   

19.
A large fraction of online queries targets entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge graph but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All these types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the knowledge graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs) and different type hierarchies (including DBpedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号