首页 | 本学科首页   官方微博 | 高级检索  
     

流数据Top-K关键字查询算法
引用本文:郑诗敏,秦小麟,刘亮,周倩. 流数据Top-K关键字查询算法[J]. 计算机科学, 2016, 43(8): 142-147
作者姓名:郑诗敏  秦小麟  刘亮  周倩
作者单位:南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016
基金项目:本文受国家自然科学基金项目(61373015,61300052),江苏高校优势学科建设工程资助
摘    要:基于Spark Streaming计算框架的分布式Top-K关键字查询是统计流数据中所有关键字的热点研究问题。多数研究通过限定存储空间来实现Top-K关键字查询,并假设关键字集合已知。针对这个问题,提出一种可应用于关键字集合未知情况的分布式Top-K关键字查询算法,根据监测到的关键字动态地调整存储空间,通过更新策略的优化提升其精度。实验结果表明,该算法的性能在关键字集合未知的情况下比现有算法更优。

关 键 词:Top-K关键字查询  流数据  云计算  Spark Streaming
收稿时间:2015-07-02
修稿时间:2015-09-18

Algorithm for Top-K Keyword Query in Data Streams
ZHENG Shi-min,QIN Xiao-lin,LIU Liang and ZHOU Qian. Algorithm for Top-K Keyword Query in Data Streams[J]. Computer Science, 2016, 43(8): 142-147
Authors:ZHENG Shi-min  QIN Xiao-lin  LIU Liang  ZHOU Qian
Affiliation:College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China,College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China,College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China and College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
Abstract:Distributed Top-K keyword query based on the framework of Spark Streaming is a hot research issue.It is used to count all keywords in data streams.Most studies of Top-K keyword query limit storage space and assume that the keywords set is known.To solve this problem,we presented a distributed Top-K keyword query algorithm which can be used in cases where the keywords set is unknown.This algorithm dynamically adjusts the size of storage space according to monitored keywords and optimizes the updated strategy to improve precision.Experimental results show that the proposed algorithm under the condition of unknown keywords set has better performance.
Keywords:Top-K keyword query  Data streams  Cloud computing  Spark streaming
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号