首页 | 本学科首页   官方微博 | 高级检索  
     

基于高维空间的在线高效子空间Skyline算法——CSky
引用本文:周红福,宫学庆,郑凯,周傲英. 基于高维空间的在线高效子空间Skyline算法——CSky[J]. 计算机学报, 2007, 30(8): 1409-1417
作者姓名:周红福  宫学庆  郑凯  周傲英
作者单位:复旦大学计算机科学与工程系,上海,200433;复旦大学计算机科学与工程系,上海,200433;复旦大学计算机科学与工程系,上海,200433;复旦大学计算机科学与工程系,上海,200433
摘    要:Skyline计算是要发现数据集中不被其他点支配的所有点的集合.近来,它在实时在线服务方面的良好应用前景,使其成为数据库研究领域的一个热点.实际应用中,用户通常期望快速、渐进地返回Skyline计算结果,因此文中主要讨论了高维空间子空间Skyline渐进查询问题.据我们所知,现有的Skyline计算方法都不能直接或者通过简单修改来高效解决该种查询问题.BNL(Blocked Nested Loop)算法是一个可用来进行子空间Skyline计算的算法,但是,该方法低效且非渐进.基于此,文中提出了在线高效子空间Skyline算法--CSky(Count the Skyline).该算法充分利用了一个新颖数据结构--InvertS的特征,即通过对目标数据集进行排序,存放最可能为Skyline点的数据于算法优先扫描的位置,这使得CSky算法能高效计算出任意子空间上的Skyline;同时,CSky每次计算子空间Skyline查询时,至多访问一遍数据库;再有,算法扫描一个点时,只需和当前已发现的Skyline点进行比较即能判断该点是否为Skyline点,保证了算法的渐进性.这样,相比BNL,CSky大大减少了计算开销,具有其他基于索引的Skyline算法计算Skyline时的高效,且这种高效适用于所有子空间.理论分析和实验表明,在解决高维空间子空间Skyline查询问题方面,CSky性能大大优于BNL.

关 键 词:轮廓  子空间  渐进算法  在线算法
修稿时间:2007-03-06

CSky: An Online Efficient Algorithm for Subspace Skyline Computation in High Dimensional Space
ZHOU Hong-Fu,GONG Xue-Qing,ZHENG Kai,ZHOU Ao-Ying. CSky: An Online Efficient Algorithm for Subspace Skyline Computation in High Dimensional Space[J]. Chinese Journal of Computers, 2007, 30(8): 1409-1417
Authors:ZHOU Hong-Fu  GONG Xue-Qing  ZHENG Kai  ZHOU Ao-Ying
Affiliation:Department of Computer Science and Engineering, Fudan University, Shanghai 200433
Abstract:Skyline computation aims to find the points that are not dominated by any other point in the dataset. It has been becoming a hot topic due to its potential applications in real-time online services. Usually, such applications expect to return the first Skyline point quickly, without ransacking all the points. This paper focuses on the problem of progressive subspace Skyline queries in high dimensional space. To the best of our knowledge, the existing algorithms and their variations cannot be easily extended to support arbitrary subspace Skyline query efficiently. The BNL (Blocked Nested Loop) method can be used for subspace Skyline queries, but it is very inefficiently, and not progressive. A novel algorithm, called CSky(stands for Count the Skyline), is proposed in this paper to solve this problem. With CSky, the Skyline points can be rapidly obtained in any query subspace of a high dimensional space. It is an online algorithm based on a novel data structure called InvertS. The algorithm scans the dataset at most one pass, and the points are only compared with those detected Skyline points, thus resulting in a much smaller computation overhead in comparison with BNL. Furthermore, CSky not only can efficiently find the Skyline like other index-based algorithms, but also do it in any subspace of the whole query space. The theoretical analysis and experimental comparison show that the CSky algorithm outperforms BNL significantly in various kinds of application cases.
Keywords:Skyline  subspace  progressive algorithm  online algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号