首页 | 本学科首页   官方微博 | 高级检索  
     

基于Levenshtein和TFRSF的文本相似度计算方法
引用本文:藏润强,孙红光,,杨凤芹,,冯国忠,,尹良亮.基于Levenshtein和TFRSF的文本相似度计算方法[J].计算机与现代化,2018,0(4):84.
作者姓名:藏润强  孙红光    杨凤芹    冯国忠    尹良亮
基金项目:国家自然科学基金青年科学基金资助项目(11501095); 吉林省科技创新人才培育计划项目(20170520051JH); 吉林省科技发展计划项目(20170204002GX); 吉林省发改委引导项目(2015Y056)
摘    要:在社交网络中查找和收集个人信息可以建立一个包含目标履历、生活、爱好以及朋友等属性的信息体系,但是不同社交网络中存在大量同名用户。为了解决同名歧义问题,采用计算用户信息相似度,可以判断2个用户是否属于同一个人。由于文档中描述信息位置颠倒会导致计算机误判,为此,本文通过对莱文斯坦(Levenshtein)和词频相关字符串频率(TFRSF)方法融合计算词频和编辑距离,判断属性值是否相同。实验结果表明,本文提出的计算文本相似度方法在多种评价指标上准确性都有所提高,准确率(Precision)、召回率(Recall)、F1值(F1 Measure)均大于87%。

关 键 词:个人信息  社交网络  莱文斯坦  词频相关字符串频率  相似度  
收稿时间:2018-05-02

Text Similarity Calculation Method Based on Levenshtein and TFRSF
ZANG Runqiang,SUN Hongguang,,YANG Fengqin,,FENG Guozhong,,YIN Liangliang.Text Similarity Calculation Method Based on Levenshtein and TFRSF[J].Computer and Modernization,2018,0(4):84.
Authors:ZANG Runqiang  SUN Hongguang    YANG Fengqin    FENG Guozhong    YIN Liangliang
Abstract: Finding and collecting personal information in social networks can establish the information system with the curriculum vitae, life, hobbies, friends and the other attributes. But there are lots of same name users in different social networks. In order to solve the ambiguity of the same name, we calculate the user information similarity to decide whether they belong to the same person. If the information describing the document position is reversed, it will lead to computer misjudgment. In order to solve this problem, the Levenshtein and TFRSF methods are used to calculate the word frequency and edit distance to judge whether the attribute values are the same. The experimental results show that the proposed method of calculating the similarity of texts improves the accuracy of various evaluation indexes. The precision, recall and F1 of this method are more than 87%.
Keywords:personal information  social network  Levenshtein  TFRSF  similarity  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号