共查询到19条相似文献,搜索用时 171 毫秒
1.
在数据仓库、数据挖掘和联机分析处理系统中,海量数据的载人虽然不是时时发生,但是海量数据的载人效率直接影响昔系统性能,如何高效地进行海量数据的载人十分重要。提出了两种技术,即基于UB—Tree的海量数据的初始化载入技术以及海量数据的增量载人技术,阐述了基于UB—Tree的海量数据载人的技术及其算法,提出了海量数据载人模型,建立基于UB—Tree的初始化载人,以及如何在已有的UB—Tree上做增量载人。经过性能分析,算法减少了I/O和CPU代价,为一种有效的海量数据载人方法。 相似文献
2.
基于三级存储器的Join算法 总被引:2,自引:0,他引:2
研究了基于三级存储器的海量关系数据库的Join算法.目前,在所有磁带数据Join算法中,基于Hash思想的算法是最优的.但是,这些算法没有考虑从第三级存储器中读取数据时,磁带定位时间对算法性能的影响.磁带的磁头随机定位耗时大,是影响基于三级存储器的数据操作算法时间复杂性的关键因素.针对这个问题,提出了两种新的基于三级存储器的海量关系数据库连接算法,即Disk-Based-Hash-Join算法和Tertiary-Only-Hash-Join算法.这两种算法采用了磁盘缓冲技术和散列数据集中存储方法,降低了算法的磁带磁头随机定位时间复杂性,提高了基于三级存储器的连接算法的性能.理论分析和实验结果表明,提出的基于三级存储器连接算法的性能高于目前所有同类算法的性能,可以有效地应用于海量数据管理系统. 相似文献
3.
基于 MapReduce 的关联规则增量更新算法 总被引:1,自引:0,他引:1
云计算以其强大的存储和计算能力而成为解决海量数据挖掘问题的有效途径.经典的关联规则增量更新算法FUP 需要频繁扫描原数据集,不适用于海量数据的处理.文中以提高海量数据上关联规则增量更新效率为目标,将 FUP算法与云计算的 MapReduce 编程模式相结合,提出了一种基于 MapReduce 的关联规则增量更新算法 MRFUP.该算法只需扫描原数据集一次,并能充分利用云计算强大的存储和并行计算能力.基于 Hadoop 的实验结果表明,MRFUP 算法可提高对海量数据的处理能力和效率,适用于海量数据的关联规则挖掘 相似文献
4.
电信经营分析系统承担着决策支撑的重要职能,面对越来越多的海量数据,如何有效处理这些海量数据从而提取有价值信息是面临的一大突出问题,利用云计算技术解决这些问题是一种新的有效的途径。针对电信经营分析系统中(简称BASS)现有存储、处理和ETL算法的不足,结合云计算技术提出了分布式海量数据存储、Hbase海量数据管理、Map/Reduce编程模式、以及基于拆分机制的海量数据处理(简称SMB-DP)算法和基于任务运行时间和优先级的ETL任务调度(简称AGB-ETL)贪婪算法,本文着重从这几个方面对现有经分系统进行改进和优化。 相似文献
5.
以往在数据立方体上实现的联机聚集往往需要附加空间来存储联机聚集估算所需要的信息,极大地影响了数据立方体的存储和维护性能.提出了基于QC-Tree的用于范围查询处理的联机聚集PE(progressively estimate)算法以及它与简单聚集算法相结合的混合聚集算法HPE(hybrid progressively estimate);还提出了一种能够同时处理多个范围查询的联机聚集算法MPE(multiple progressively estimate).与以往联机聚集算法不同,这些算法不需要任何附加空间,而是利用QC-Tree自身保存的聚集数据和语义关系来估算聚集结果.由于QC-Tree是一种极为高效的数据立方体存储结构,因此能够以较理想的性能实现数据立方体上的联机聚集.对算法的分析和实验结果表明,所提出的算法具有较好的性能. 相似文献
6.
证券交易对海量数据管理和分析存在潜在需求,对证券交易决策支持系统做了分析,针对海量数据管理,提出基于DW、DM、OLAP的系统框架,并设计了网络结构、主题、数据维度、数据库,最后从数据获取、数据的技术分析及多维分析讲述系统的实现,为决策者管理和分析海量证券交易信息提供决策支持. 相似文献
7.
8.
云计算以其强大的存储和计算能力而成为解决海量数据挖掘问题的有效途径。经典的关联规则增量更新算法FUP需要频繁扫描原数据集,不适用于海量数据的处理。文中以提高海量数据上关联规则增量更新效率为目标,将FUP算法与云计算的MapReduce编程模式相结合,提出了一种基于MapReduce的关联规则增量更新算法MRFUP。该算法只需扫描原数据集一次,并能充分利用云计算强大的存储和并行计算能力。基于Hadoop的实验结果表明,MRFUP算法可提高对海量数据的处理能力和效率,适用于海量数据的关联规则挖掘。 相似文献
9.
数据更新是数据仓库上支持联机分析处理的一种重要操作。增量更新是一种有效的数据更新方法。实现了二维层次式数据立方体(Cube)存储结构HDC的建立以及基于此结构的数据增量更新算法。 相似文献
10.
航空票务系统OLAP数据仓库设计与实现 总被引:3,自引:0,他引:3
如何对航空票务系统中累积的海量数据进行分析处理从而为决策提供支持,已经成为航空票务系统亟待解决的问题.通过对航空票务系统的研究,提出了利用联机分析处理和数据仓库技术建立决策支持系统的解决方案.详细论述了航空票务决策分析系统中数据仓库模型与多维数据集的建立,以及数据的抽取、转换、加载和分析结果的前端展现,并且对利用表分区技术解决决策系统中海量数据的存储问题进行了重点论述. 相似文献
11.
Jieping Ye Qi Li Hui Xiong Park H. Janardan R. Kumar V. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(9):1208-1222
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically. 相似文献
12.
《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2008,38(6):1513-1524
13.
Yu-Neng Fan Tzu-Liang Tseng Ching-Chin Chern Chun-Che Huang 《Expert systems with applications》2009,36(9):11439-11450
The incremental technique is a way to solve the issue of added-in data without re-implementing the original algorithm in a dynamic database. There are numerous studies of incremental rough set based approaches. However, these approaches are applied to traditional rough set based rule induction, which may generate redundant rules without focus, and they do not verify the classification of a decision table. In addition, these previous incremental approaches are not efficient in a large database. In this paper, an incremental rule-extraction algorithm based on the previous rule-extraction algorithm is proposed to resolve there aforementioned issues. Applying this algorithm, while a new object is added to an information system, it is unnecessary to re-compute rule sets from the very beginning. The proposed approach updates rule sets by partially modifying the original rule sets, which increases the efficiency. This is especially useful while extracting rules in a large database. 相似文献
14.
15.
Das S. Abraham A. Konar A. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2008,38(1):218-237
Differential evolution (DE) has emerged as one of the fast, robust, and efficient global search heuristics of current interest. This paper describes an application of DE to the automatic clustering of large unlabeled data sets. In contrast to most of the existing clustering techniques, the proposed algorithm requires no prior knowledge of the data to be classified. Rather, it determines the optimal number of partitions of the data "on the run." Superiority of the new method is demonstrated by comparing it with two recently developed partitional clustering techniques and one popular hierarchical clustering algorithm. The partitional clustering algorithms are based on two powerful well-known optimization algorithms, namely the genetic algorithm and the particle swarm optimization. An interesting real-world application of the proposed method to automatic segmentation of images is also reported. 相似文献
16.
Dawei Zhou Arun Karthikeyan Kangyang Wang Nan Cao Jingrui He 《Data mining and knowledge discovery》2017,31(2):400-423
Nowadays, massive graph streams are produced from various real-world applications, such as financial fraud detection, sensor networks, wireless networks. In contrast to the high volume of data, it is usually the case that only a small percentage of nodes within the time-evolving graphs might be of interest to people. Rare category detection (RCD) is an important topic in data mining, focusing on identifying the initial examples from the rare classes in imbalanced data sets. However, most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving data. In this paper, we introduce a novel setting of RCD on time-evolving graphs. To address this problem, we propose two incremental algorithms, SIRD and BIRD, which are constructed upon existing density-based techniques for RCD. These algorithms exploit the time-evolving nature of the data by dynamically updating the detection models enabling a “time-flexible” RCD. Moreover, to deal with the cases where the exact priors of the minority classes are not available, we further propose a modified version named BIRD-LI based on BIRD. Besides, we also identify a critical task in RCD named query distribution, which targets to allocate the limited budget among multiple time steps, such that the initial examples from the rare classes are detected as early as possible with the minimum labeling cost. The proposed incremental RCD algorithms and various query distribution strategies are evaluated empirically on both synthetic and real data sets. 相似文献
17.
在网络流量模式挖掘中,发现邻接序列模式(CSP)是一个重要问题,为网络流量分析提出了一种新的树型数据结构。为了有效存储包含指定项的所有序列,该树组合了前缀树和后缀树,这种特殊的树结构确保了CSP检测的有效性。实验表明与已有方法相比,使用该结构不仅改进了CSP挖掘的时间性能,而且改进了空间性能。 相似文献
18.
Incremental computation of time-varying query expressions 总被引:1,自引:0,他引:1
We present and analyze algorithms for the incremental computation of time-varying queries in which selection predicates refer to the state of a clock. Such queries occur naturally in many situations where temporal data are processed. Incremental techniques for query computation have proven to be more efficient than other techniques in many situations. However, all existing incremental techniques for query computation assume that old query results remain valid if no intermediate changes are made to the underlying database. Unfortunately, this assumption does not hold for time-varying queries whose results may change just because time passes. In order to solve this problem, we introduce the notion of a superview which contains all current tuples that will eventually satisfy the selection predicate of a time-varying selection. Based on the notion of superview, we develop efficient algorithms for the incremental computation of time-varying selections. Our algorithms, combined with existing incremental algorithms, allow complex time-varying queries to benefit from the proven efficiency of incremental techniques. It is important to notice that without our algorithms, the existing algorithms for incremental computation would be useless for any time-varying query expression 相似文献
19.
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods. 相似文献