首页 | 官方网站   微博 | 高级检索  
     

挖掘机构别名的Jaccard相似度数据空间转换方法
引用本文:尚玉玲,曹建军,李红梅,刘 艺.挖掘机构别名的Jaccard相似度数据空间转换方法[J].计算机工程与应用,2018,54(13):88-92.
作者姓名:尚玉玲  曹建军  李红梅  刘 艺
作者单位:1.中国人民解放军理工大学 指挥信息系统学院,南京 210007 2.国防科技大学 第六十三研究所,南京 210007
摘    要:针对同一机构实体对应多个机构名称的问题,提出了一种基于Jaccard相似度数据空间转换的机构别名挖掘方法。根据机构与作者间的隶属关系,建立机构-作者二部图模型;采用Jaccard相似度度量两机构名称所对应作者姓名集合间的相似度;根据机构间的相似度矩阵,将集合型数据转换成数值型数据;通过计算机构名称对应的相似度向量间的余弦相似度,实现了机构别名的有效挖掘。最后用真实数据进行对比实验验证了该方法的优越性。

关 键 词:实体分辨  机构别名  数据空间转换  Jaccard相似度  余弦相似度  关系数据  

Jaccard similarity based data space transform for organization alias mining
SHANG Yuling,CAO Jianjun,LI Hongmei,LIU Yi.Jaccard similarity based data space transform for organization alias mining[J].Computer Engineering and Applications,2018,54(13):88-92.
Authors:SHANG Yuling  CAO Jianjun  LI Hongmei  LIU Yi
Affiliation:1.Commands Institute Information Systems, PLA University of Science and Technology, Nanjing 210007, China 2.The 63rd Institute, National University of Defense Technology, Nanjing 210007, China
Abstract:To solve the problem which the same organization entity has few names, a Jaccard Similarity based Data Space Transform for Organization Alias Mining(JS-DST-OAM) method is proposed. Based on the subjection relationship between organizations and authors, organization-author bipartite graph is built; Jaccard similarity is used to measure the similarity of two organization names by their author sets; based on the organization-organization similarity matrix, the transform from set data to numerical data is achieved; cosine similarity of organization name pairs is calculated by their similarity vectors, and it achieves the mining of organization alias. In the end, real data is used to verify its superiority.
Keywords:entity resolution  organization alias  data space transform  Jaccard similarity  cosine similarity  relationship data  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号