共查询到20条相似文献,搜索用时 15 毫秒
1.
Regant Y.S. Hung 《Information Processing Letters》2010,110(7):257-260
In this paper, we consider the problem of finding ?-approximate frequent items over a sliding window of size N. A recent work by Lee and Ting (2006) [7] solves the problem by giving an algorithm that supports query and update time, and uses space. Their query time and memory usage are essentially optimal, but the update time is not. We give a new algorithm that supports O(1) update time with high probability while maintaining the query time and memory usage as . 相似文献
2.
The discrete cosine transform (DCT) has been successfully used for a wide range of applications in digital signal processing. While there are efficient algorithms for implementing the DCT, its use becomes difficult in the sliding transform scenario where the transform window is shifted one sample at a time and the transform process is repeated. In this paper, a new two-dimensional sliding DCT (2-D SDCT) algorithm is proposed for fast implementation of the DCT on 2-D sliding windows. In the proposed algorithm, the DCT coefficients of the shifted window are computed by exploiting the recursive relationship between 2-D DCT outputs of three successive windows. The theoretical analysis shows that the computational requirement of the proposed 2-D SDCT algorithm is the lowest among existing 2-D DCT algorithms. Moreover, the proposed algorithm enables independent updating of each DCT coefficient. 相似文献
3.
We study the problem of maintaining a sketch of recent elements of a data stream. Motivated by applications involving network
data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent
elements of a stream is modeled by the sliding timestamp window, which is the set of elements with timestamps that are close to the current time. We design algorithms for maintaining sketches
of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates, the
sum and the median, of a stream of numbers. The space taken by the sketches, the time needed for querying the sketch, and
the time for inserting new elements into the sketch are all polylogarithmic with respect to the maximum window size. Our sketches
can be easily combined in a lossless and compact way, making them useful for distributed computations over data streams. Previous
works on sketching recent elements of a data stream have all considered the more restrictive scenario of synchronous streams,
where the observed order of data is the same as the time order in which the data was generated. Our notion of recency of elements
is more general than that studied in previous work, and thus our sketches are more robust to network delays and asynchrony.
The work of the authors was supported in part through NSF grants CNS 0520102 and CNS 0520009.
A preliminary version of this paper appeared in Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC)
2006, pages 82–91.
Work done while the third author was at Rensselaer Polytechnic Institute.
Authors are listed in reverse alphabetical order. 相似文献
4.
Interactive high-performance computing is doubtlessly beneficial for many computational science and engineering applications whenever simulation results should be visually processed in real time, i.e. during the computation process. Nevertheless, interactive HPC entails a lot of new challenges that have to be solved – one of them addressing the fast and efficient data transfer between a simulation back end and visualisation front end, as several gigabytes of data per second are nothing unusual for a simulation running on some (hundred) thousand cores. Here, a new approach based on a sliding window technique is introduced that copes with any bandwidth limitations and allows users to study both large and small scale effects of the simulation results in an interactive fashion. 相似文献
5.
Tracking clusters in evolving data streams over sliding windows 总被引:2,自引:4,他引:2
Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement.
Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters
but also the evolving behaviors of individual clusters. In this paper, we present a novel method for tracking the evolution
of clusters over sliding windows. In our SWClustering algorithm, we combine the exponential histogram with the temporal cluster
features, propose a novel data structure, the Exponential Histogram of Cluster Features (EHCF). The exponential histogram
is used to handle the in-cluster evolution, and the temporal cluster features represent the change of the cluster distribution.
Our approach has several advantages over existing methods: (1) the quality of the clusters is improved because the EHCF captures
the distribution of recent records precisely; (2) compared with previous methods, the mechanism employed to adaptively maintain the in-cluster synopsis
can track the cluster evolution better, while consuming much less memory; (3) the EHCF provides a flexible framework for analyzing
the cluster evolution and tracking a specific cluster efficiently without interfering with other clusters, thus reducing the
consumption of computing resources for data stream clustering. Both the theoretical analysis and extensive experiments show
the effectiveness and efficiency of the proposed method.
Aoying Zhou is currently a Professor in Computer Science at Fudan University, Shanghai, P.R. China. He won his Bachelor and Master degrees
in Computer Science from Sichuan University in Chengdu, Sichuan, P.R. China in 1985 and 1988, respectively, and Ph.D. degree
from Fudan University in 1993. He served as the member or chair of program committee for many international conferences such
as WWW, SIGMOD, VLDB, EDBT, ICDCS, ER, DASFAA, PAKDD, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE,
and several other international journals. His research interests include Data mining and knowledge discovery, XML data management,
Web mining and searching, data stream analysis and processing, peer-to-peer computing.
Feng Cao is currently an R&D engineer in IBM China Research Laboratories. He received a B.E. degree from Xi'an Jiao Tong University,
Xi'an, P.R. China, in 2000 and an M.E. degree from Huazhong University of Science and Technology, Wuhan, P.R. China, in 2003.
From October 2004 to March 2005, he worked in Fudan-NUS Competency Center for Peer-to-Peer Computing, Singapore. In 2006,
he received his Ph.D. degree from Fudan University, Shanghai, P.R. China. His current research interests include data mining
and data stream.
Weining Qian is currently an Assistant Professor in computer science at Fudan University, Shanghai, P.R. China. He received his M.S. and
Ph.D. degree in computer science from Fudan University in 2001 and 2004, respectively. He is supported by Shanghai Rising-Star
Program under Grant No. 04QMX1404 and National Natural Science Foundation of China (NSFC) under Grant No. 60673134. He served
as the program committee member of several international conferences, including DASFAA 2006, 2007 and 2008, APWeb/WAIM 2007,
INFOSCALE 2007, and ECDM 2007. His papers have been published in ICDE, SIAM DM, and CIKM. His research interests include data
stream query processing and mining, and large-scale distributed computing for database applications.
Cheqing Jin is currently an Assistant Professor in Computer Science at East China University of Science and Technology. He received his
Bachelor and Master degrees in Computer Science from Zhejiang University in Hangzhou, P.R. China in 1999 and 2002, respectively,
and the Ph.D. degree from Fudan University, Shanghai, P.R. China. He worked as a Research Assistant at E-business Technology
Institute, the Hong Kong University from December 2003 to May 2004. His current research interests include data mining and
data stream. 相似文献
6.
提出一种基于滑动窗口的概率数据流聚类方法PWStream。PWStream采用聚类特征指数直方图保存最近数据元组的信息摘要,在允许的误差范围内删除过期的数据元组;并针对数据流上概率元组提出强簇、过渡簇和弱簇的概念,设计了一种基于距离和存在概率的簇选择策略,从而可以发现更多的强簇。理论分析和实验结果表明,该方法具有良好的聚类质量和较快的数据处理能力。 相似文献
7.
可重写循环滑动窗口:面向高效的在线数据流处理 总被引:2,自引:0,他引:2
滑动窗口是在线数据流处理中的重要技术和基础设施。针对当前基于向量模型的滑动窗口存在滑动过程中需要移动过多数据,而导致效率不高的问题,本文提出一种可重写循环的滑动窗口技术。该技术在滑动过程中不移动数据,而是采用重写的方式来完成数据更新,并且它能够与当前滑动窗口无缝集成。理论分析和实验对比表明,该技术有显著的效率提升,能够高效地应用于实际的数据流处理。 相似文献
8.
Two parallel algorithms for simulating a sliding window communication protocol are described. The first exploits the parallel generation of interarrival, service and transfer times, whilst the second is based on parallel prefix computations. Both algorithms allow the processors to work largely asynchronously with each other. The efficiency of the two approaches is evaluated experimentally, using an implementation on a cluster of 12 processors connected by fast Ethernet. 相似文献
9.
This study presents an adaptive neuro-fuzzy sliding-mode-based genetic algorithm (ANFSGA) control system for a remotely operated vehicle (ROV) with four degrees of freedom (DOF)s. In many applications, ROVs will need to be capable of maneuvering to any given point, following object, and to be controllable from the surface. Therefore, an ANFSGA control system is introduced for tracking control of the ROV to achieve a high precision position control. Since the dynamic of ROVs are highly nonlinear and time varying, an ANFSGA control system is investigated according to direction-based genetic algorithm (GA) with the spirit of sliding mode control and adaptive neuro-fuzzy sliding mode (ANFS) based evolutionary procedure. In this way, on-line learning ability is employed to deal with the parametric uncertainty and disturbance by adjusting the ANFS inference parameters. In this proposed controller a GA control system is utilized to be the major controller, and stability can be indirectly insured by the concept of sliding mode control system without strict constraints and detailed system knowledge. 相似文献
10.
On-line neural training algorithm with sliding mode control and adaptive learning rate 总被引:1,自引:0,他引:1
This paper presents a new algorithm for on-line artificial neural networks (ANN) training. The network topology is a standard multilayer perceptron (MLP) and the training algorithm is based on the theory of variable structure systems (VSS) and sliding mode control (SMC). The main feature of this novel procedure is the adaptability of the gain (learning rate), which is obtained from sliding mode surface so that system stability is guaranteed. 相似文献
11.
基于衰减滑动窗口数据流聚类算法研究 总被引:2,自引:0,他引:2
数据流具有数据流量大、流量连续且快速、难以存储和恢复等特性,其挖掘质量和效率是检验挖掘算法的重要标准.传统的数据流聚类挖掘算法是基于界标窗口、滑动窗口和衰减窗口模型,其算法的聚类质量较差,时间复杂度高等不足,就此类问题,研究一种滑动衰减窗口的数据流聚类算法,并对算法进行了设计与实现,有效的改善传统数据流算法聚类质量和时间效率的问题.仿真实验结果表明了该算法的有效性,达到了较满意的效果. 相似文献
12.
流数据的统计是许多决策支持系统的关键所在。研究了流数据的分布特点,定义了评价函数F,设计了一种系统框架,扩展了指数级直方图,提出了松散性指数级直方图及其动态维护算法,基于滑动窗口技术解决了流数据的统计问题。该方案利用o((1/ε)log~2N)比特的空间,解决了流数据最近N个数据中值为l的个数统计问题,并保证相对误差不大于ε。理论和实践表明,F值越大,其优势越明显。 相似文献
13.
We present a solution to the conjugacy decision problem and the conjugacy search problem in Garside groups, which is theoretically simpler than the usual one, with no loss of efficiency. This is done by replacing the well-known cycling and decycling operations by a new one, called cyclic sliding, which appears to be a more natural choice. 相似文献
14.
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window 总被引:5,自引:7,他引:5
Yun Chi Haixun Wang Philip S. Yu Richard R. Muntz 《Knowledge and Information Systems》2006,10(3):265-294
This paper considers the problem of mining closed frequent itemsets over a data stream sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding window. The selected itemsets contain a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than representative algorithms for the sate-of-the-art approaches.
Yun Chi is currently a Ph.D. student at the Department of Computer Science, UCLA. His main areas of research include database systems, data mining, and bioinformatics. For data mining, he is interested in mining labeled trees and graphs, mining data streams, and mining data with uncertainty.
Haixun Wang is currently a research staff member at IBM T. J. Watson Research Center. He received the B.S. and the M.S. degree, both in computer science, from Shanghai Jiao Tong University in 1994 and 1996. He received the Ph.D. degree in computer science from the University of California, Los Angeles in 2000. He has published more than 60 research papers in referred international journals and conference proceedings. He is a member of the ACM, the ACM SIGMOD, the ACM SIGKDD, and the IEEE Computer Society. He has served in program committees of international conferences and workshops, and has been a reviewer for some leading academic journals in the database field.
Philip S. Yureceived the B.S. Degree in electrical engineering from National Taiwan University, the M.S. and Ph.D. degrees in electrical engineering from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and currently manager of the Software Tools and Techniques group. His research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, and performance modeling. Dr. Yu has published more than 430 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents.Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is associate editors of ACM Transactions on the Internet Technology and ACM Transactions on Knowledge Discovery in Data. He is a member of the IEEE Data Engineering steering committee and is also on the steering committee of IEEE Conference on Data Mining. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001–2004), an editor, advisory board member and also a guest co-editor of the special issue on mining of databases. He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he will be serving as the general chairman of 2006 ACM Conference on Information and Knowledge Management and the program chairman of the 2006 joint conferences of the 8th IEEE Conference on E-Commerce Technology (CEC' 06) and the 3rd IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE' 06). He was the program chairman or co-chairs of the 11th IEEE International Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, the 2nd IEEE International Workshop on Research Issues on Data Engineering:Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases, and the 2nd IEEE International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chairman of the 14th IEEE International Conference on Data Engineering and the general co-chairman of the 2nd IEEE International Conference on Data Mining. He has received several IBM honors including 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and the 84th plateau of Invention Achievement Awards. He received an Outstanding Contributions Award from IEEE International Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts" in 1999. Dr. Yu is an IBM Master Inventor.
Richard R. Muntz is a Professor and past chairman of the Computer Science Department, School of Engineering and Applied Science, UCLA. His current research interests are sensor rich environments, multimedia storage servers and database systems, distributed and parallel database systems, spatial and scientific database systems, data mining, and computer performance evaluation. He is the author of over one hundred and fifty research papers.Dr. Muntz received the BEE from Pratt Institute in 1963, the MEE from New York University in 1966, and the Ph.D. in Electrical Engineering from Princeton University in 1969. He is a member of the Board of Directors for SIGMETRICS and past chairman of IFIP WG7.3 on performance evaluation. He was a member of the Corporate Technology Advisory Board at NCR/Teradata, a member of the Science Advisory Board of NASA's Center of Excellence in Space Data Information Systems, and a member of the Goddard Space Flight Center Visiting Committee on Information Technology. He recently chaired a National Research Council study on “The Intersection of Geospatial Information and IT” which was published in 2003. He was an associate editor for the Journal of the ACM from 1975 to 1980 and the Editor-in-Chief of ACM Computing Surveys from 1992 to 1995. He is a Fellow of the ACM and a Fellow of the IEEE. 相似文献
15.
The functionLM, which arises in the pinwheel scheduling problem, was previously known to be computable in polynomial time. In this paper
we present a practical algorithm to computeLM that runs in linear time. 相似文献
16.
提出一种可以有效求解带时间窗的车辆调度问题的灾变遗传算法.遗传算法作为一种高效的启发式算法被用于解决这类组合优化问题,但是该算法存在过早收敛、易陷入局部最优等缺陷.针对此问题,在搜索过程中采用灾变算子使遗传算法跳出局部最优,并针对车辆调度问题设计一种可以直接产生可行解的交叉算子,避免染色体交叉过程中产生不可行的子代.通过仿真算例验证了所提出的算法求解带时间窗的车辆调度问题的有效性;通过与标准遗传算法、改进遗传算法和粒子群算法的比较,进一步验证了灾变遗传算法在优化性能以及算法鲁棒性方面的优势. 相似文献
17.
Yoh-Han Pao Kambiz Komeyli Darlene Shei Steven Leclair Alan Winn 《Journal of Intelligent Manufacturing》1993,4(1):23-32
In this paper we report on continuing research on the organization and functionalities of a certain type of computer-implemented associative memory. The associative memory in question is being created to serve as part of a feature-based design system, at present to be used primarily in support of the design, fabrication planning, or inspection planning of discrete mechanical machine parts. This present effort is consonant with prior related work in the realm of case-based reasoning, especially as related to the role of memory in design. Our associative memory innovations are in the use of fuzzy sets and neural net computing in the representation, storage and retrieval of design, fabrication, inspection and materials knowledge. We have designed and implemented a considerable portion of the associative memory and have demonstrated retrieval of previous designs on the basis of qualitative geometry. We have also demonstrated ability to explore materials composition with the objective of meeting critical materials properties constraints. 相似文献
18.
滑动窗口聚集查询在数据流管理系统中应用广泛,数据流到达高峰期,必须考虑滑动窗口聚集查询中出现的降载问题。分析了子集模型的特点和已有降载策略的不足,给出了数据流滑动窗口聚集查询降载问题的约束条件,提出了能保证子集结果产生的基于丢弃窗口更新策略的降载算法。理论分析和实验结果表明,该算法对数据流滑动窗口聚集查询降载问题的处理具有较高的有效性和实用性。 相似文献
19.
Shakhar Smorodinsky 《Information Processing Letters》2008,109(1):44-45
In a FOCS 1990 paper, S. Irani proved that the First-Fit online algorithm for coloring a graph uses at most O(klogn) colors for k-inductive graphs. In this note we provide a very short proof of this fact. 相似文献
20.
This paper has two complementary focuses. The first is the system design and algorithmic development for air traffic control (ATC) using an associative SIMD processor (AP). The second is the comparison of this implementation with a multiprocessor implementation and the implications of these comparisons. This paper demonstrates how one application, ATC, can more easily, more simply, and more efficiently be implemented on an AP than is generally possible on other types of traditional hardware. The AP implementation of ATC will take advantage of its deterministic hardware to use static scheduling. The software will be dramatically smaller and cheaper to create and maintain. Likewise, a large AP system will be considerably simpler and cheaper than the MIMD hardware currently used. While APs were used for ATC-type applications earlier, these are no longer available. We use a ClearSpeed CSX600 accelerator to emulate the AP solutions of ATC on an ATC prototype consisting of eight data-intensive ATC real-time tasks. Its performance is compared with an 8-core multiprocessor (MP) using OpenMP. Our extensive experiments show that the AP implementation meets all deadlines while the MP will regularly miss a large number of deadlines. The AP code will be similar in size to sequential code for the same tasks and will avoid all of the additional support software needed with an MP to handle dynamic scheduling, load balancing, shared resource management, race conditions, false sharing, etc. At this point, essentially only MIMD systems are built. Many of the advantages of using an AP to solve an ATC problem would carry over to other applications. AP solutions for a wide variety of applications will be cited in this paper. Applications that involve a high degree of data parallelism such as database management, text processing, image processing, graph processing, bioinformatics, weather modeling, managing UAS (Unmanned Aircraft Systems or drones) etc., are good candidates for AP solutions. This raises the issue of whether we should routinely consider using non-multiprocessor hardware like the AP for applications where substantially simpler software solutions will normally exist. It also raises the question of whether the use of both AP and MIMD hardware in a single hetergeneous system could provide more versatility and efficiency. Either the AP or MIMD could serve as the primary system, but could hand off jobs it could not handle efficiently to the other system. 相似文献