期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

屠莉陈崚邹凌君《软件学报》2009,20(7):1756-1767

提出基于相关分析的多数据流聚类算法.该算法将多数据流的原始数据快速压缩成一个统计概要.根据这些统计概要,可以增量式地计算相关系数来衡量数据间的相似度.提出了一种改进的k-平均算法来生成聚类结果.改进的k-平均算法可以动态、实时地调整聚类数目,并及时检测数据流的发展变化.还将算法应用到按照用户要求的聚类问题(COD),使得用户可以在任意的时间区间上查询聚类结果.提出了一种合理的时间片断划分机制,使得用户指定的任意时间区间都可以由这些时间片断组合而成.在模拟和真实数据上的实验结果都表明,该算法比其他方法具有更好的聚类质量、速度和稳定性,能够实时地反映数据流的变化. 相似文献

2.

动态增量聚类的设计与实现 总被引：2，自引：0，他引：2

下载免费PDF全文

孟海东王淑玲郝永宽《计算机工程与应用》2009,45(24):130-132

传统聚类算法往往只适用于静态数据集的聚类。对于动态数据集,新增数据后,前期的聚类结果不再可靠,运用此类算法则需要重新聚类,这样会造成效率低下和计算资源浪费。在基于密度和自适应密度可达聚类算法的基础上,提出了一种新的增量聚类算法。理论分析和实验结果证明该算法能够有效地处理动态数据集,提高聚类效率和资源的利用率。相似文献

3.

高维数据流的自适应子空间聚类算法 总被引：1，自引：0，他引：1

下载免费PDF全文

任家东周玮玮何海涛《计算机科学与探索》2010,4(9):859-864

高维数据流聚类是数据挖掘领域中的研究热点。由于数据流具有数据量大、快速变化、高维性等特点,许多聚类算法不能取得较好的聚类质量。提出了高维数据流的自适应子空间聚类算法SAStream。该算法改进了HPStream中的微簇结构并定义了候选簇,只在相应的子空间内计算新来数据点到候选簇质心的距离,减少了聚类时被检查微簇的数目,将形成的微簇存储在金字塔时间框架中,使用时间衰减函数删除过期的微簇;当数据流量大时,根据监测的系统资源使用情况自动调整界限半径和簇选择因子,从而调节聚类的粒度。实验结果表明,该算法具有良好的聚类质量和快速的数据处理能力。相似文献

4.

基于簇特征的增量聚类算法设计与实现 总被引：2，自引：0，他引：2

下载免费PDF全文

孟海东王淑玲郝永宽《计算机工程与应用》2010,46(24):132-134

对于大型数据库,如空间数据库和多媒体数据库,传统聚类算法的有效性和可扩展性受到限制。通过动态增量的方法,在基于密度和自适应密度可达聚类算法的基础上,根据BIRCH算法中聚类特征的概念,利用簇特征设计与实现了一种新的动态增量聚类算法,解决了大型数据库聚类的有效性以及空间和时间复杂度问题。理论分析和实验结果证明该算法能够有效地处理大型数据库,使聚类算法具有良好的可扩展性。相似文献

5.

扩散涌现的增量聚类算法

姜云飞余春艳沈楠《计算机工程与设计》2012,33(7):2669-2672

为了有效聚类动态数据,妥善处理已存在的类簇与新增数据的关系,高效利用计算资源,提高聚类的效率,扩散涌现的增量聚类算法被提出.该算法在扩散涌现聚类算法的基础上,利用近邻传播算法完善了算法的分裂机制,实现了新旧数据的有效聚合.实验结果表明,该算法有效实现了动态数据的聚类,提高了聚合动态数据的效率和资源的利用率. 相似文献

6.

基于聚类融合的混合属性数据增量聚类算法

李桃迎陈燕张金松秦胜君《控制与决策》2012,27(4):603-608

针对传统增量聚类方法对混合属性数据聚类时存在不稳定、随机性大和准确性不够高的缺点,提出一种基于聚类融合的混合属性数据增量聚类算法.该算法以传统增量聚类为基础,采用多种聚类算法的结果进行融合来代替原有单一划分,并重新修正了阈值的取值范围.实验表明,所提出的算法利用原有数据的特征,提高了聚类的稳定性和精确性,具有很好的聚类效果. 相似文献

7.

动态的粗糙增量聚类方法

洪亮亮罗可《计算机工程与应用》2011,47(24):106-110

数据挖掘领域中已提出了很多聚类算法及其改进形式,但对增量式聚类方法的研究较少。当数据集因为更新而发生了变化,那么数据挖掘结果也要进行必要的更新。由于数据量大,如果在新增数据后再对所有数据运用聚类算法进行聚类,效率显然不高,因此进一步研究增量式聚类算法是很有必要的。在一种改进的基于遗传算法的粗糙聚类方法（IRCBGA）的基础上,提出了一种增量式粗糙聚类方法。数值仿真表明该算法能很好地解决传统聚类算法的数据更新的聚类问题。相似文献

8.

高维数据流子空间聚类发现及维护算法 总被引：3，自引：2，他引：3

周晓云孙志挥张柏礼杨宜东《计算机研究与发展》2006,43(5):834-840

近年来由于数据流应用的大量涌现,基于数据流模型的数据挖掘算法研究已成为重要的应用前沿课题.提出一种基于Hoeffding界的高维数据流的子空间聚类发现及维护算法--SHStream.算法将数据流分段（分段长度由Hoeffding界确定）,在数据分段上进行子空间聚类,通过迭代逐步得到满足聚类精度要求的聚类结果,同时针对数据流的动态性,算法对聚类结果进行调整和维护.算法可以有效地处理高雏数据流和对任意形状分布数据的聚类问题.基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性. 相似文献

9.

一种基于数据流的软子空间聚类算法

下载免费PDF全文

朱林雷景生毕忠勤杨杰《软件学报》2013,24(11):2610-2627

针对高维数据的聚类研究表明,样本在不同数据簇往往与某些特定的数据特征子集相对应.因此,子空间聚类技术越来越受到关注.然而,现有的软子空间聚类算法都是基于批处理技术的聚类算法,不能很好地应用于高维数据流或大规模数据的聚类研究中.为此,利用模糊可扩展聚类框架,与熵加权软子空间聚类算法相结合,提出了一种有效的熵加权流数据软子空间聚类算法——EWSSC(entropy-weighting streaming subspace clustering).该算法不仅保留了传统软子空间聚类算法的特性,而且利用了模糊可扩展聚类策略,将软子空间聚类算法应用于流数据的聚类分析中.实验结果表明,EWSSC 算法对于高维数据流可以得到与批处理软子空间聚类方法近似一致的实验结果. 相似文献

10.

DEN-Stream:一种分布式数据流聚类方法

《计算机应用与软件》2016,(7)

现有的数据流聚类方法很难兼顾数据稀疏和子空间聚类等高维数据难题,而分布式数据流对数据流聚类提出包括在线计算效率、通信开销以及多路数据的融合等更多挑战。提出分布式数据流聚类方法,采用全局统一的网格划分和衰退时间以支持多路数据流融合,并周期性检查和删除过期网格来控制概要规模。通过对多路高维数据流的一遍扫描,发现高维数据流子空间任意形状的聚类,并反映数据分布随时间的演化。在线组件效率高开销低,概要信息简洁,通信代价低。实验表明,该方法能够对分布式数据流正确聚类并演进,在线组件效率高,概要规模小。相似文献

11.

一种基于网格方法的高维数据流子空间聚类算法 总被引：4，自引：0，他引：4

孙玉芬卢炎生《计算机科学》2007,34(4):199-203

基于对网格聚类方法的分析,结合由底向上的网格方法和自顶向下的网格方法,设计了一个能在线处理高维数据流的子空间聚类算法。通过利用由底向上网格方法对数据的压缩能力和自顶向下网格方法处理高维数据的能力,算法能基于对数据流的一次扫描,快速识别数据中位于不同子空间内的簇。理论分析以及在多个数据集上的实验表明算法具有较高的计算精度与计算效率。相似文献

12.

高维Turnstile型数据流聚类算法 总被引：3，自引：1，他引：3

周晓云张净孙志挥《计算机科学》2006,33(11):14-17

现有数据流聚类算法只能处理Time Series和Cash Register型数据流,并且应用于高维数据流时其精度不甚理想。提出针对高维Turnstile型数据流的子空间聚类算法HT-Stream,算法对数据空间进行网格划分,在线动态维护网格单元信息,采用倾斜时间窗口存储统计信息,根据用户指定时间跨度离线输出聚类结果。基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性。相似文献

13.

基于网格聚类的数据流多事件检测

下载免费PDF全文

袁志坚缪嘉嘉杜凯贾焰《计算机工程与科学》2008,30(9):82-85

事件检测是事件处理系统最重要的研究问题之一。异常、变化和突发是三类最典型的数据流事件。本文关注如何在数据流中同时检测多种事件,首先研究了多种事件之间的联系,然后给出了基于网格聚类的统一处理方法,最后为了评估事件的严重程度,给出了打分函数。实验验证了所提方法的正确性与有效性。相似文献

14.

基于数据概要描述的分布式数据流聚类模型与算法

毛国君曹永存《计算机科学》2013,40(6):187-191

数据流挖掘可有效解决大容量流式数据的知识发现问题,并已得到广泛研究.数据流的一个典型的例子是传感器采集的流式数据.然而,随着传感器网络的应用普及,这些流式数据在很多情况下是分布式采集和管理的,这就必然导致分布式地挖掘数据流的需求.分布式数据流挖掘的最大障碍是由分布式而导致的挖掘质量或者效率问题.为适应分布式数据流的聚类挖掘,探讨了分布式数据流的挖掘模型,并且基于该模型设计了对应的概要数据结构和关键的挖掘算法,给出了算法的理论评估或者实验验证.实验说明,提出的模型和算法可以有效地减少数据通信代价,并且能保证较高的全局模式的聚类质量. 相似文献

15.

数据流聚类算法研究

朱颖雯陈松灿《数据采集与处理》2022,37(4):894-908

许多应用程序会产生大量的流数据,如网络流、web点击流、视频流、事件流和语义概念流。数据流挖掘已成为热点问题,其目标是从连续不断的流数据中提取隐藏的知识/模式。聚类作为数据流挖掘领域的一个重要问题,在近期被广泛研究。不同于传统的静态数据聚类问题,数据流聚类面临有限内存、一遍扫描、实时响应和概念漂移等许多约束。本文对数据流挖掘中的各种聚类算法进行了总结。首先介绍了数据流挖掘的约束;随后给出了数据流聚类的一般模型,并描述了其与传统数据聚类之间的关联;最后提出数据流聚类领域中进一步的研究热点和研究方向。相似文献

16.

基于数据流的概念聚类

史金成胡学钢《计算机工程》2010,36(9):62-64

分析二部图的二元组和概念聚类问题之间的关系,在此基础上结合数据流的特点,提出一种适用于对象属性为布尔型的数据流概念聚类算法。将数据流分段,对每一批到来的数据流,生成局部的近似极大ε二元组集合,对全局的近似极大ε二元组集合进行更新,从而有效地对整个数据流进行聚类。实验结果表明,该算法具有良好的时间效率和空间效率。相似文献

17.

On High Dimensional Projected Clustering of Data Streams 总被引：3，自引：0，他引：3

Charu?C.?Aggarwal Email author Jiawei?Han Jianyong?Wang Philip?S.?Yu 《Data mining and knowledge discovery》2005,10(3):251-273

The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan, stream analysis methods have been proposed in this context. However, a lot of stream data is high-dimensional in nature. High-dimensional data is inherently more complex in clustering, classification, and similarity search. Recent research discusses methods for projected clustering over high-dimensional data sets. This method is however difficult to generalize to data streams because of the complexity of the method and the large volume of the data streams.In this paper, we propose a new, high-dimensional, projected data stream clustering method, called HPStream. The method incorporates a fading cluster structure, and the projection based clustering methodology. It is incrementally updatable and is highly scalable on both the number of dimensions and the size of the data streams, and it achieves better clustering quality in comparison with the previous stream clustering methods. Our performance study with both real and synthetic data sets demonstrates the efficiency and effectiveness of our proposed framework and implementation methods.Charu C. Aggarwal received his B.Tech. degree in Computer Science from the Indian Institute of Technology (1993) and his Ph.D. degree in Operations Research from the Massachusetts Institute of Technology (1996). He has been a Research Staff Member at the IBM T. J. Watson Research Center since June 1996. He has applied for or been granted over 50 US patents, and has published over 75 papers in numerous international conferences and journals. He has twice been designated Master Inventor at IBM Research in 2000 and 2003 for the commercial value of his patents. His contributions to the Epispire project on real time attack detection were awarded the IBM Corporate Award for Environmental Excellence in 2003. He has been a program chair of the DMKD 2003, chair for all workshops organized in conjunction with ACM KDD 2003, and is also an associate editor of the IEEE Transactions on Knowledge and Data Engineering Journal. His current research interests include algorithms, data mining, privacy, and information retrieval.Jiawei Han is a Professor in the Department of Computer Science at the University of Illinois at Urbana–Champaign. He has been working on research into data mining, data warehousing, stream and RFID data mining, spatiotemporal and multimedia data mining, biological data mining, social network analysis, text and Web mining, and software bug mining, with over 300 conference and journal publications. He has chaired or served in many program committees of international conferences and workshops, including ACM SIGKDD Conferences (2001 best paper award chair, 1996 PC co-chair), SIAM-Data Mining Conferences (2001 and 2002 PC co-chair), ACM SIGMOD Conferences (2000 exhibit program chair), International Conferences on Data Engineering (2004 and 2002 PC vice-chair), and International Conferences on Data Mining (2005 PC co-chair). He also served or is serving on the editorial boards for Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, Journal of Computer Science and Technology, and Journal of Intelligent Information Systems. He is currently serving on the Board of Directors for the Executive Committee of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Jiawei has received three IBM Faculty Awards, the Outstanding Contribution Award at the 2002 International Conference on Data Mining, ACM Service Award (1999) and ACM SIGKDD Innovation Award (2004). He is an ACM Fellow (since 2003). He is the first author of the textbook “Data Mining: Concepts and Techniques” (Morgan Kaufmann, 2001).Jianyong Wang received the Ph.D. degree in computer science in 1999 from the Institute of Computing Technology, the Chinese Academy of Sciences. Since then, he ever worked as an assistant professor in the Department of Computer Science and Technology, Peking (Beijing) University in the areas of distributed systems and Web search engines (May 1999–May 2001), and visited the School of Computing Science at Simon Fraser University (June 2001–December 2001), the Department of Computer Science at the University of Illinois at Urbana-Champaign (December 2001–July 2003), and the Digital Technology Center and Department of Computer Science and Engineering at the University of Minnesota (July 2003–November 2004), mainly working in the area of data mining. He is currently an associate professor in the Department of Computer Science and Technology, Tsinghua University, Beijing, China.Philip S. Yuis the manager of the Software Tools and Techniques group at the IBM Thomas J. Watson Research Center. The current focuses of the project include the development of advanced algorithms and optimization techniques for data mining, anomaly detection and personalization, and the enabling of Web technologies to facilitate E-commerce and pervasive computing. Dr. Yu,s research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, disk arrays, computer architecture, performance modeling and workload analysis. Dr. Yu has published more than 340 papers in refereed journals and conferences. He holds or has applied for more than 200 US patents. Dr. Yu is an IBM Master Inventor.Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He will become the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering on Jan. 2001. He is an associate editor of ACM Transactions of the Internet Technology and also Knowledge and Information Systems Journal. He is a member of the IEEE Data Engineering steering committee. He also serves on the steering committee of IEEE Intl. Conference on Data Mining. He received an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts”. Philip S. Yu received the B.S. Degree in E.E. from National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University. 相似文献

18.

基于频繁模式的数据流聚类算法

史志英张伟陈春燕《微计算机应用》2008,29(1):50-53

数据流具有数据量无限且流速快的特点.针对上述问题,本文讨论了基于频繁模式的数据流聚类算法.本算法应用改造后的FP-Tree,更新树时增加一个数组减少了遍历树的时间,使算法的效率得到了很大的提高. 相似文献

19.

基于粒子群优化算法的数据流聚类算法 总被引：1，自引：0，他引：1

肖裕权周肆清《微机发展》2011,(10):43-46,50

针对当前基于滑动窗口的聚类算法中对原始数据信息的损失问题和提高聚类质量和准确性,在现有基于滑动窗口模型数据流聚类算法的基础上,提出了一种基于群体协作的粒子群优化算法（PSO）的新数据流聚类算法。这种优化的新数据流聚类算法利用改进的时间聚类特征指数直方图作为数据流的概要结构以及应用PSO在聚类过程中对聚类质量的局部迭代优化。实验结果表明,此方法有效减少了内存的开销,解决了对原始数据信息损失的问题。与传统的数据流聚类算法相比,基于粒子群优化算法的数据流聚类算法在聚类质量和准确性上明显优于传统的数据流聚类算法。相似文献