首页 | 本学科首页   官方微博 | 高级检索  
     

一种大数据环境下的在线社交媒体位置推断方法
引用本文:王凯,余伟,杨莎,吴敏,胡亚慧,李石君.一种大数据环境下的在线社交媒体位置推断方法[J].软件学报,2015,26(11):2951-2963.
作者姓名:王凯  余伟  杨莎  吴敏  胡亚慧  李石君
作者单位:武汉大学 计算机学院, 湖北 武汉 430072,武汉大学 计算机学院, 湖北 武汉 430072,汉口学院 计算机科学与技术学院, 湖北 武汉 430212,中船重工第七二二研究所, 湖北 武汉 430079,空军预警学院, 湖北 武汉 430000,武汉大学 计算机学院, 湖北 武汉 430072
基金项目:国家自然科学基金(61272109, 61502350); 中央高校基本科研业务费专项资金(2042014kf0057); 湖北省自然科学基金(2014CFB289)
摘    要:随着在线社交媒体的快速发展和可定位设备的大量普及,地理位置作为社交媒体大数据中一种质量极高的信息资源,开始在疾病控制、人口流动性分析和广告精准投放等方面得到广泛应用.但是,由于大量用户没有指定或者不能准确指定位置,社交媒体上的地理位置数据十分稀疏.针对此数据稀疏性问题,提出一种基于用户生成内容的位置推断方法UGC-LI(user generate content driven location inference method),实现对社交媒体用户和生成文本位置的推断,为基于位置的个性化信息服务提供数据支撑.通过抽取用户生成文本中的本地词语,构建一个基于词汇地理分布差异和用户社交图谱的概率模型,在多层次的地理范围内推断用户位置.同时,提出一个基于位置的参数化语言模型,计算用户生成文本发出的城市.在真实数据集上进行的评估实验表明:UGC-LI方法能够在15km偏移距离准确定位64.2%的用户,对用户所在城市的推断准确率达到81.3%;同时,可正确定位32.7%的用户生成文本发出的城市,与现有方法相比有明显的提高.

关 键 词:位置推断  用户生成内容  数据稀疏性  在线社交媒体  社交图谱
收稿时间:2015/5/31 0:00:00
修稿时间:2015/8/26 0:00:00

Location Inference Method in Online Social Media with Big Data
WANG Kai,YU Wei,YANG Sh,WU Min,HU Ya-Hui and LI Shi-Jun.Location Inference Method in Online Social Media with Big Data[J].Journal of Software,2015,26(11):2951-2963.
Authors:WANG Kai  YU Wei  YANG Sh  WU Min  HU Ya-Hui and LI Shi-Jun
Affiliation:Computer School, Wuhan University, Wuhan 430072, China,Computer School, Wuhan University, Wuhan 430072, China,College of Computer Science and Technology, Hankou University, Wuhan 430212, China,The 722 Research Institute of China Shipbuilding Industry Corporation, Wuhan 430079, China,Air Force Early Warning Academy, Wuhan 430000, China and Computer School, Wuhan University, Wuhan 430072, China
Abstract:As a high-quality source in social media big data, the geographic location has been widely adopted in the fields of disease control, population mobility analysis and ad delivery positioning with the rapid development of online social media and the prevalence of localizable mobile devices. However, the location data are quite sparse because often the locations cannot be accurately specified by the users. To overcome this data sparsity problem, this paper proposes UGC-LI, a user generate content driven location inference method to infer the location where users and social texts are created. The method can provide supporting data for location-based personalized information services. A probability model is constructed by comprehensive considering the distribution of location words and social graph of users via local words extracted from user generated texts to locate the users in multi-granularity. Further, a parameterized linguistic model based on location is presented to calculate the city where the tweet is published. The results of experiment on real-word dataset demonstrate that this new method outperforms existing algorithms. In the experiment, 64.2% of users are identified within 15km displacement distance, 81.3% of the living cities and 32.7% of the cities where the tweets were tweeted are correctly located.
Keywords:location inference  user generate content  data sparsity  online social media  social graph
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号