首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.

High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High average-utility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the average-utility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability.

  相似文献   

3.
Efficient Computation of Iceberg Cubes by Bounding Aggregate Functions   总被引:1,自引:0,他引:1  
The iceberg cubing problem is to compute the multidimensional group-by partitions that satisfy given aggregation constraints. Pruning unproductive computation for iceberg cubing when nonantimonotone constraints are present is a great challenge because the aggregate functions do not increase or decrease monotonically along the subset relationship between partitions. In this paper, we propose a novel bound prune cubing (BP-Cubing) approach for iceberg cubing with nonantimonotone aggregation constraints. Given a cube over n dimensions, an aggregate for any group-by partition can be computed from aggregates for the most specific n--dimensional partitions (MSPs). The largest and smallest aggregate values computed this way become the bounds for all partitions in the cube. We provide efficient methods to compute tight bounds for base aggregate functions and, more interestingly, arithmetic expressions thereof, from bounds of aggregates over the MSPs. Our methods produce tighter bounds than those obtained by previous approaches. We present iceberg cubing algorithms that combine bounding with efficient aggregation strategies. Our experiments on real-world and artificial benchmark data sets demonstrate that BP-Cubing algorithms achieve more effective pruning and are several times faster than state-of-the-art iceberg cubing algorithms and that BP-Cubing achieves the best performance with the top-down cubing approach.  相似文献   

4.
用SOM聚类实现多级高维点数据索引   总被引:4,自引:0,他引:4  
高维点数据的索引是基于内容的信息检索的主要研究问题之一,从SOM聚类算法出发,利用自组织映射的良好性能,解决了R-Tree及其变体算法中的边界索引问题,并能适应维数更高的点数据,同时针对传统聚类算法只能组织一级索引的局限,提出了利用SOM网络组织多级索引,并用半径进行剪枝处理的优化办法,实验结果表明,提出的方法不仅克服了传统聚类方法的搜索过程可能产生的查询错误,而且大大提高了索引的构建和查询效率。  相似文献   

5.
6.
基于Hebb规则的分布神经网络学习算法   总被引:1,自引:0,他引:1  
田大新  刘衍珩  李宾  吴静 《计算机学报》2007,30(8):1379-1388
随着知识发现与数据挖掘领域数据量的不断增加,为了处理大规模数据,scaling up学习成为KDD的热点研究领域.文中提出了基于Hebb规则的分布式神经网络学习算法实现scaling up学习.为了提高学习速度,完整数据集被分割成不相交的子集并由独立的子神经网络来学习;通过对算法完整性及竞争Hebb学习的风险界的分析,采用增长和修剪策略避免分割学习降低算法的学习精度.对该算法的测试实验首先采用基准测试数据circlein-the-square测试了其学习能力,并与SVM,ARTMAP和BP神经网络进行比较;然后采用UCI中的数据集USCensus1990测试其对大规模数据的学习性能.  相似文献   

7.
一种基于多维集的关联模式挖掘算法   总被引:2,自引:0,他引:2  
大多数维间关联规则挖掘算法如基于数据立方体的关联规则挖掘算法都假定对象的属性取值只具有单值性.将对象的属性取值扩展到多值,据此提出多维集的概念和基于多维集关联规则的语义特征.在此语义特征下,提出了一个多维集的关联规则挖掘算法.该算法利用多维集关联规则的限制特征,能够在数据集缩减的同时进行侯选集的三重剪枝,因此,具有比直接使用apriori等算法更好的性能,分析了算法的性能和正确性、完备性,并通过实验对算法有效性进行了对比.  相似文献   

8.
局部异常检测(Local outlier factor,LOF)能够有效解决数据倾斜分布下的异常检测问题,在很多应用领域具有较好的异常检测效果.本文面向大数据异常检测,提出了一种快速的Top-n局部异常点检测算法MTLOF(Multi-granularity upper bound pruning based top-n LOF detection),融合索引结构和多层LOF上界设计了多粒度的剪枝策略,以快速发现Top-n局部异常点.首先,提出了四个更接近真实LOF值的上界,以避免直接计算LOF值,并对它们的计算复杂度进行了理论分析;其次,结合索引结构和UB1、UB2上界,提出了两层的Cell剪枝策略,不仅采用全局Cell剪枝策略,还引入了基于Cell内部数据对象分布的局部剪枝策略,有效解决了高密度区域的剪枝问题;再次,利用所提的UB3和UB4上界,提出了两个更加合理有效的数据对象剪枝策略,UB3和UB4上界更加接近于真实LOF值,有利于剪枝更多数据对象,而基于计算复用的上界计算方法,大大降低了计算成本;最后,优化了初始Top-n局部异常点的选择方法,利用区域划分和建立的索引结构,在数据稀疏区域选择初始局部异常点,有利于将LOF值较大的数据对象选为初始局部异常点,有效提升初始剪枝临界值,使得初始阶段剪枝掉更多的数据对象,进一步提高检测效率.在六个真实数据集上的综合实验评估验证MTLOF算法的高效性和可扩展性,相比最新的TOLF(Top-n LOF)算法,时间效率提升可高达3.5倍.  相似文献   

9.
This paper explores the problem of solving triangular linear systems on parallel distributed-memory machines. Working within the LogP model, tight asymptotic bounds for solving these systems using forward/backward substitution are presented. Specifically, lower bounds on execution time independent of the data layout, lower bounds for data layouts in which the number of data items per processor is bounded, and lower bounds for specific data layouts commonly used in designing parallel algorithms for this problem are presented in this paper. Furthermore, algorithms are provided which have running times within a constant factor of the lower bounds described. One interesting result is that the popular two-dimensional block matrix layout necessarily results in significantly longer running times than simpler one-dimensional schemes. Finally, a generalization of the lower bounds to banded triangular linear systems is presented.  相似文献   

10.
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyclic reduction and prefix summing. For each method, a variety of lower bounds on execution time for solving tridiagonal linear systems are presented. Specifically, lower bounds are presented that (a) hold when the number of data items per processor is bounded, (b) are general lower bounds, and (c) for specific data layouts commonly used in designing parallel algorithms to solve tridiagonal linear systems. Furthermore, algorithms are presented that have running times within a constant factor of the lower bounds provided. Lastly, a comparison of bounds for odd-even cyclic reduction and prefix summing is given.  相似文献   

11.
空间数据库中约束K最接近对查询   总被引:1,自引:0,他引:1  
定义了满足空间约束的K最接近对查询,该查询检索两个数据集在给定约束区域中的K最接近对。在空间数据库中,对采用R树类型索引存储的数据集给出了三个查询处理算法。其中两阶段的RJ和JR算法采用了变换范围查询和最接近对查询执行顺序的策略。单阶段基于堆的SPH算法采用了最好优先的策略,并利用给出的裁减规则、更新规则和访问顺序规则来提高查询处理效率。实验表明SPH具有较好的适用性和性能。  相似文献   

12.
在数据流子空间上的连续概率轮廓查询(CPSQS)基础上,提出一种基于网格索引结构的概率轮廓查询算法。采用适合于子空间轮廓计算的网格索引结构,将数据空间划分成若干个格,利用格间的支配关系,减少对象之间的比较次数。同时挖掘全空间与子空间上格的概率上下界关系,设计有效的剪枝策略提高CPSQS算法的性能。理论分析和实验结果表 明,该算法能满足实际应用中用户的个性化查询要求,降低查询响应时间。  相似文献   

13.
本文充分利用了 Eclat算法的概念格理论和等价类划分方法,将约束条件融入基于垂直数据分布的关联规则挖掘算法中。提出了一种新的反单调和单调约束条件下关联规则的挖掘算法,分别为EclatA算法和EclatM算法。算法采用自底向上的搜索方法,在发现频繁项集的同时进行约束条件的检验。数据库的扫描次数较少,无需对候选项集进行剪枝,占用内存较小。实验证明:该算法的执行效率比已有算法有显著提高。  相似文献   

14.
Langford  John  Blum  Avrim 《Machine Learning》2003,51(2):165-179
A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998). We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency.  相似文献   

15.
从图数据库中挖掘频繁跳跃模式   总被引:4,自引:0,他引:4  
刘勇  李建中  高宏 《软件学报》2010,21(10):2477-2493
很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的.  相似文献   

16.
超图是普通图的泛化表示, 在许多应用领域都很常见, 包括互联网、生物信息学和社交网络等. 独立集问题是图分析领域的一个基础性研究问题, 传统的独立集算法大多都是针对普通图数据, 如何在超图数据上实现高效的最大独立集挖掘是一个亟待解决的问题. 针对这一问题, 提出一种超图独立集的定义. 首先分析超图独立集搜索的两个特性, 然后提出一种基于贪心策略的基础算法. 接着提出一种超图近似最大独立集搜索的剪枝框架即精确剪枝与近似剪枝相结合, 以精确剪枝策略缩小图的规模, 以近似剪枝策略加快搜索速度. 此外, 还提出4种高效的剪枝策略, 并对每种剪枝策略进行理论证明. 最后, 通过在10个真实超图数据集上进行实验, 结果表明剪枝算法可以高效地搜索到更接近于真实结果的超图最大独立集.  相似文献   

17.
在以无线网络为代表的移动计算环境中,数据广播是一种有效的数据访问方式。为响应最多用户数据请求,提出了优先级计算模型,进而提出了一种基于优先级的广播内容选择算法。该算法综合考虑了事务存取多个数据项和满足定时限制的要求,根据用户请求队列状态动态选择广播内容,并应用剪枝机制减少了选择开销。实验结果表明它比现有算法有明显的优越性。  相似文献   

18.
Allocating fixed-priority periodic tasks on multiprocessor systems   总被引:2,自引:0,他引:2  
In this paper, we study the problem of allocating a set of periodic tasks on a multiprocessor system such that tasks are scheduled to meet their deadlines on individual processors by the Rate-Monotonic scheduling algorithm. A new schedulability condition is developed for the Rate-Monotonic scheduling that allows us to develop more efficient on-line allocation algorithms. Two on-line allocation algorithms—RM-FF and RM-BF are presented, and shown that their worst-case performance, over the optimal allocation, is upper bounded by 2.33 and lower bounded by 2.28. Then RM-FF and RM-BF are further improved to form two new algorithms: Refined-RM-FF (RRM-FF) and Refined-RM-BF (RRM-BF), both of which have a worst-case performance bound of 2. We also show that when the maximum allowable utilization of a task is small, the worst-case performance of all the new algorithms can be significantly improved. The worst-case performance bounds of RRM-FF and RRM-BF are currently the best bounds in the class of on-line scheduling algorithms proposed to solve the same scheduling problem. Simulation studies show that the average-case performance of the newly proposed algorithms is significantly superior to those in the existing literature.  相似文献   

19.
N-gram models are the most widely used language models in large vocabulary continuous speech recognition. Since the size of the model grows rapidly with respect to the model order and available training data, many methods have been proposed for pruning the least relevant -grams from the model. However, correct smoothing of the N-gram probability distributions is important and performance may degrade significantly if pruning conflicts with smoothing. In this paper, we show that some of the commonly used pruning methods do not take into account how removing an -gram should modify the backoff distributions in the state-of-the-art Kneser-Ney smoothing. To solve this problem, we present two new algorithms: one for pruning Kneser-Ney smoothed models, and one for growing them incrementally. Experiments on Finnish and English text corpora show that the proposed pruning algorithm provides considerable improvements over previous pruning algorithms on Kneser-Ney smoothed models and is also better than the baseline entropy pruned Good-Turing smoothed models. The models created by the growing algorithm provide a good starting point for our pruning algorithm, leading to further improvements. The improvements in the Finnish speech recognition over the other Kneser-Ney smoothed models are statistically significant, as well.  相似文献   

20.
胡蓉  徐蔚鸿 《控制与决策》2013,28(10):1564-1567
利用误差下降率定义输入数据对系统输出的敏感性,并以此作为规则产生标准,提出一种有效增量顺序学习模糊神经网络。将修剪策略引入规则产生过程,因此该算法产生的模糊神经网络不需要进行修剪。通过仿真实验,本算法在达到与其他算法相当性能的情况下,能够获得更高的准确率和更简单的结构。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号