期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

11.

Tracking clusters in evolving data streams over sliding windows 总被引：6，自引：4，他引：2

Aoying Zhou Feng Cao Weining Qian Cheqing Jin 《Knowledge and Information Systems》2008,15(2):181-214

Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement. Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters but also the evolving behaviors of individual clusters. In this paper, we present a novel method for tracking the evolution of clusters over sliding windows. In our SWClustering algorithm, we combine the exponential histogram with the temporal cluster features, propose a novel data structure, the Exponential Histogram of Cluster Features (EHCF). The exponential histogram is used to handle the in-cluster evolution, and the temporal cluster features represent the change of the cluster distribution. Our approach has several advantages over existing methods: (1) the quality of the clusters is improved because the EHCF captures the distribution of recent records precisely; (2) compared with previous methods, the mechanism employed to adaptively maintain the in-cluster synopsis can track the cluster evolution better, while consuming much less memory; (3) the EHCF provides a flexible framework for analyzing the cluster evolution and tracking a specific cluster efficiently without interfering with other clusters, thus reducing the consumption of computing resources for data stream clustering. Both the theoretical analysis and extensive experiments show the effectiveness and efficiency of the proposed method. Aoying Zhou is currently a Professor in Computer Science at Fudan University, Shanghai, P.R. China. He won his Bachelor and Master degrees in Computer Science from Sichuan University in Chengdu, Sichuan, P.R. China in 1985 and 1988, respectively, and Ph.D. degree from Fudan University in 1993. He served as the member or chair of program committee for many international conferences such as WWW, SIGMOD, VLDB, EDBT, ICDCS, ER, DASFAA, PAKDD, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE, and several other international journals. His research interests include Data mining and knowledge discovery, XML data management, Web mining and searching, data stream analysis and processing, peer-to-peer computing. Feng Cao is currently an R&D engineer in IBM China Research Laboratories. He received a B.E. degree from Xi'an Jiao Tong University, Xi'an, P.R. China, in 2000 and an M.E. degree from Huazhong University of Science and Technology, Wuhan, P.R. China, in 2003. From October 2004 to March 2005, he worked in Fudan-NUS Competency Center for Peer-to-Peer Computing, Singapore. In 2006, he received his Ph.D. degree from Fudan University, Shanghai, P.R. China. His current research interests include data mining and data stream. Weining Qian is currently an Assistant Professor in computer science at Fudan University, Shanghai, P.R. China. He received his M.S. and Ph.D. degree in computer science from Fudan University in 2001 and 2004, respectively. He is supported by Shanghai Rising-Star Program under Grant No. 04QMX1404 and National Natural Science Foundation of China (NSFC) under Grant No. 60673134. He served as the program committee member of several international conferences, including DASFAA 2006, 2007 and 2008, APWeb/WAIM 2007, INFOSCALE 2007, and ECDM 2007. His papers have been published in ICDE, SIAM DM, and CIKM. His research interests include data stream query processing and mining, and large-scale distributed computing for database applications. Cheqing Jin is currently an Assistant Professor in Computer Science at East China University of Science and Technology. He received his Bachelor and Master degrees in Computer Science from Zhejiang University in Hangzhou, P.R. China in 1999 and 2002, respectively, and the Ph.D. degree from Fudan University, Shanghai, P.R. China. He worked as a Research Assistant at E-business Technology Institute, the Hong Kong University from December 2003 to May 2004. His current research interests include data mining and data stream. 相似文献

12.

Adaptive scheduling for shared window joins over data streams

Jin Cheqing Zhou Aoying Jeffrey Xu Yu Joshua Zhexue Huang Cao Feng 《Frontiers of Computer Science in China》2007,1(4):468-477

Recently a few Continuous Query systems have been developed to cope with applications involving continuous data streams. At the same time, numerous algorithms are proposed for better performance. A recent work on this subject was to define scheduling strategies on shared window joins over data streams from multiple query expressions. In these strategies, a tuple with the highest priority is selected to process from multiple candidates. However, the performance of these static strategies is deeply influenced when data are bursting, because the priority is determined only by static information, such as the query windows, arriving order, etc. In this paper, we propose a novel adaptive strategy where the priority of a tuple is integrated with realtime information. A thorough experimental evaluation has demonstrated that this new strategy can outperform the existing strategies. 相似文献

13.

BFSQ:处理空间成员查询的方法

下载免费PDF全文

张一桢金澈清胡颢继周傲英《计算机科学与探索》2010,4(8):692-699

随着普适计算技术、定位技术、移动通讯技术的进步,移动对象数据管理技术在诸多领域中得到广泛应用。在移动对象数据管理领域中,隐私保护是一个不可忽视的问题。用户不仅期望获取高质量的服务,同时也期望能够尽量保护自身的隐私信息。研究了空间成员查询,检验在空间某区域内是否存在移动对象。所提出的BFSQ(Bloom filter-based spatial query)方法的一大特点是能够较好地保护移动数据/用户查询的隐私,同时查询结果的质量也维持在一个较高的水平。实验结果表明了新方法的高效率和有效性。相似文献

14.

A privacy-enhancing scheme against contextual knowledge-based attacks in location-based services

Jiaxun HUA Yu LIU Yibin SHEN Xiuxia TIAN Yifeng LUO Cheqing JIN 《Frontiers of Computer Science》2020,14(3):143605-227

1 Introduction and main contributions Location-based services are springing up around us,while leakages of users'privacy are inevitable during services.Even worse,adversaries may analyze intercepted service data,and extract more privacy like health and property.Therefore,privacy preservation is an indispensable guarantee on LBS security.Among the previous approaches to privacy preservation,k-anonymity-based ones have drawn much research attention[1-3].However,some privacy concern will be aroused if these schemes are adopted directly.For instance,Ut issues a queryFind the nearest hotel around mein such an area as Fig.1(privacy profile k=4).DLS algorithm[2]constructs anonymity set A because these four cells have similar probabilities of being queried in the past.However,experienced adversaries can exclude some cells if they have learned rich contextual knowledge(side information)from historical data,such as features of each cell and LBS users. 相似文献

15.

Online clustering of streaming trajectories

Jiali MAO Qiuge SONG Cheqing JIN Zhigang ZHANG Aoying ZHOU 《Frontiers of Computer Science》2018,12(2):245-263

With the increasing availability of modern mobile devices and location acquisition technologies, massive trajectory data of moving objects are collected continuously in a streaming manner. Clustering streaming trajectories facilitates finding the representative paths or common moving trends shared by different objects in real time. Although data stream clustering has been studied extensively in the past decade, little effort has been devoted to dealing with streaming trajectories. The main challenge lies in the strict space and time complexities of processing the continuously arriving trajectory data, combined with the difficulty of concept drift. To address this issue, we present two novel synopsis structures to extract the clustering characteristics of trajectories, and develop an incremental algorithm for the online clustering of streaming trajectories (called OCluST). It contains a micro-clustering component to cluster and summarize the most recent sets of trajectory line segments at each time instant, and a macro-clustering component to build large macro-clusters based on micro-clusters over a specified time horizon. Finally, we conduct extensive experiments on four real data sets to evaluate the effectiveness and efficiency of OCluST, and compare it with other congeneric algorithms. Experimental results show that OCluST can achieve superior performance in clustering streaming trajectories. 相似文献

16.

一种实时监控最近邻的近似算法

下载免费PDF全文

金澈清崇志宏周傲英《计算机科学与探索》2007,1(2):146-159

处理分布式环境下高速数据的最大挑战在于如何利用少量网络资源输出高质量的查询结果。对面向分布式环境的最近邻查询问题进行了研究,提出了一种基于过滤器的新方法,不仅能计算精确查询结果,还能够处理五类近似查询。该方法在各个远程站点均安装了智能过滤器,并通过合理设置过滤器的范围来降低数据传输量。理论分析及基于模拟数据集合和真实数据集合的实验报告均表明新方法具有较高的性能。相似文献

17.

面向大型数据集合的关键分类查找算法

许晓峰金澈清高明周傲英《计算机研究与发展》2009,46(Z2)

Top-k查询是Web和多媒体搜索、决策支持、分布式系统等众多领域中最重要的查询之一,它返回数据集合中k个最关键的元组.大型数据集合往往包含一系列分类型属性,获取对目标属性影响最大的k个分类型属性值对于许多应用中也非常重要.研究了这个问题,正式定义了k-AKC和PKC两种查询,并设计相应的查询处理算法.实验结果表明,改良算法PKCQ+具有较佳的有效性和高效性. 相似文献

18.

基于Mobile Agents的会议安排系统的设计

金澈清陈德人《计算机工程与应用》2002,38(17):235-237,256

现代社会的竞争越来越激烈,合作越来越频繁,这就要求人们更经常一起沟通,解决问题。但是,由于每个人的日程安排不同,如何构建一个合理的会议安排模型也就成为一个课题。该文首先介绍了如何利用传统的方法构造会议安排系统,并且分析了它们的弊病,然后提出基于MobileAgents的会议安排系统的设计及实现。这个方案充分利用了移动代理平台的代码分布式、可移动的优点犤3犦,赋予每个代理一定的人工智能,从而合理安排会议日程。相似文献

19.

Continuous ranking on uncertain streams 总被引：1，自引：1，他引：0

Cheqing Jin Jingwei Zhang Aoying Zhou 《Frontiers of Computer Science》2012,6(6):686-699

Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time-complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods. 相似文献

20.

MapReduce-based entity matching with multiple blocking functions

Cheqing Jin Jie Chen Huiping Liu 《Frontiers of Computer Science》2017,11(5):895-911

Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking-based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n ²), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly.It is popular to use the MapReduce (MR) framework to improve the performance and the scalability of some complicated queries by running a lot of map (/reduce) tasks in parallel. However, entity matching upon the MapReduce framework is non-trivial due to two inevitable challenges: load balancing and pair deduplication. In this paper, we propose a novel solution, called MrEm, to handle these challenges with the support of multiple blocking functions. Although the existing work can deal with load balancing and pair deduplication respectively, it still cannot deal with both challenges at the same time. Theoretical analysis and experimental results upon real and synthetic data sets illustrate the high effectiveness and efficiency of our proposed solutions. 相似文献