首页 | 本学科首页   官方微博 | 高级检索  
     


Top-k temporal keyword search over social media data
Authors:Fan?Xia  author-information"  >  author-information__contact u-icon-before"  >  mailto:fxia@sei.ecnu.edu.cn"   title="  fxia@sei.ecnu.edu.cn"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author  author-information__orcid u-icon-before icon--orcid u-icon-no-repeat"  >  http://orcid.org/---"   itemprop="  url"   title="  View OrcID profile"   target="  _blank"   rel="  noopener"   data-track="  click"   data-track-action="  OrcID"   data-track-label="  "  >View author&#  s OrcID profile,Chengcheng?Yu,Linhao?Xu,Weining?Qian,Aoying?Zhou
Affiliation:1.School of Data Science and Engineering,East China Normal University,Shanghai Shi,China;2.Infosys,Shanghai,China
Abstract:Social media services have already become main sources for monitoring emerging topics and sensing real-life events. A social media platform manages social stream consisting of a huge volume of timestamped user generated data, including original data and repost data. However, previous research on keyword search over social media data mainly emphasizes on the recency of information. In this paper, we first propose a problem of top-k most significant temporal keyword query to enable more complex query analysis. It returns top-k most popular social items that contain the keywords in the given query time window. Then, we design a temporal inverted index with two-tiers posting list to index social time series and a segment store to compute the exact social significance of social items. Next, we implement a basic query algorithm based on our proposed index structure and give a detailed performance analysis on the query algorithm. From the analysis result, we further refine our query algorithm with a piecewise maximum approximation (PMA) sketch. Finally, extensive empirical studies on a real-life microblog dataset demonstrate the combination of two-tiers posting list and PMA sketch achieves remarkable performance improvement under different query settings.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号