首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life and synthetic data sets verify our theoretical study.  相似文献   

2.
面向电子商务的增量挖掘算法   总被引:1,自引:0,他引:1       下载免费PDF全文
对商务信息挖掘理论进行了研究,给出了电子商务运作模式的定义、性质、定理及相关证明;提出了一种用于跟踪电子商务活动的数据模型,在模型中定义了网页监控点与商务指标之间的“点-集”映射关系,建立了商务网站的访问事件、网页中事先设定的监控点,以及商务工作流程之间的联系;根据这些研究结果对关联规则挖掘算法中的发现和剪枝过程进行改进,设计了面向电子商务的增量挖掘算法,并在第三方物流信息系统中实现。实践表明,该算法在频繁变化的数据集中的挖掘效率大大高于传统的非增量挖掘算法,基于智能Agent技术的商务信息挖掘模型有效地提高了电子商务的实时跟踪与分析能力。  相似文献   

3.
The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan.  相似文献   

4.
基于网格的数据分析方法以网格为单位处理数据,避免了数据对象点对点的计算,极大提高了数据分析的效率。但是,传统基于网格的方法在数据分析过程中独立处理网格,忽略了网格之间的耦合关系,影响了分析的精确度。在应用网格检测数据流异常的过程中不再独立处理网格,而是考虑了网格之间的耦合关系,提出了一种基于网格耦合的数据流异常检测算法GCStream-OD。该算法通过网格耦合精确地表达了数据流对象之间的相关性,并通过剪枝策略提高算法的效率。在5个真实数据集上的实验结果表明,GCStream-OD算法具有较高的异常检测质量和效率。  相似文献   

5.
6.
FP-growth算法是一种基于FP-tree数据结构的高效的频繁模式挖掘算法,它不产生候选集。构造频繁模式树FP-tree需扫描数据库两次,在第二遍扫描中还扫描了那些仅包含了非频繁项的事务,针对此问题,在深入分析了FP-tree特性的基础上, 改进了FP-tree构造过程,同时用一种基于Hash表的辅助存储结构,节省了项目查找时间,提高了挖掘效率。  相似文献   

7.
关联规则是数据挖掘领域的一个重要研究方向。针对关联规则的增量挖掘问题,该文提出一种快速算法FIAFAR。算法使用CAN-树存储原始交易数据库,弥补了FP-树的不足,适应于增量挖掘以及最小支持度变化的情况。采用子父节点指针的设计,可以快速生成条件模式树,提高算法的效率。实验验证了算法的有效性。  相似文献   

8.
频繁模式挖掘进展及典型应用   总被引:1,自引:0,他引:1       下载免费PDF全文
对近年来频繁模式的挖掘进行了总结。首先对有代表性的挖掘算法从算法思想、关键技术、算法的优缺点进行了分析概括,此后列举了一些典型频繁模式及关联规则的领域应用。综述内容的选择主要基于某一个研究后续被关注程度,组织过程中力争将相关研究进行归类,以给出其发展概貌。上述工作可以为频繁模式挖掘及关联规则的研究提供有益的参考。  相似文献   

9.
摘要:数据网格是在计算网格的基础上发展起来的网格技术,具有资源共享、协同工作、虚拟组织以及对分布式数据库群进行处理和分析的特点,在知识发现领域具有重要的研究价值。因此,本文提出了一种基于数据网格进行知识关联规则挖掘的方法。该方法采用数据网格树对动态数据网格进行全局控制和管理,采用挖掘作业命令的形式触发域挖掘作业,采用素数存储的方法进行了关联规则挖掘。通过仿真实验表明该挖掘方法在数据库群和数据网格关联规则挖掘方面具有优势。  相似文献   

10.
Efficient Incremental Maintenance of Frequent Patterns with FP-Tree   总被引:3,自引:0,他引:3       下载免费PDF全文
Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the discovered patterns is crucial. Most existing methods need to scan the entire database repeatedly, which is an obvious disadvantage. In this paper, an efficient incremental mining algorithm, Incremental-Mining (IM), is proposed for maintenance of the frequent patterns when new incremental data come. Based on the frequent pattern tree (FP-tree) structure, IM gives a way to make the most of the things from the previous mining process, and requires scanning the original data once at most. Furthermore, IM can identify directly the differential set of frequent patterns, which may be more informative to users. Moreover, IM can deal with changing thresholds as well as changing data, thus provide a full maintenance scheme. IM has been implemented and the performance study shows it outperforms three other incremental algorithms: FUP, DB-tree and re-running frequent pattern growth (FP-growth).  相似文献   

11.
在数据库中增加数据且调整最小支持度时,数据库中关联规则会发生变化,为从数据量和最小支持度同时发生变化的数据库中快速获取频繁项集,发现变化后的关联规则,通过对FIM和AIUA算法进行分析,提出一种结合两种算法优点的增量数据关联规则挖掘My_FIM_AIUA算法,该算法能减少数据库扫描次数,减少候选项集数量。通过实验表明My_FIM_AIUA算法能在数据量和最小支持度同时变化时快速找到频繁项集,提高挖掘增量数据关联规则的速度。  相似文献   

12.
针对分布式环境下FP-tree的构造及合并,给出了一种网格环境下FP-tree的分布式构造算法GridDBMA。该算法中,各站点根据全局项目头表,独立构造局部频繁模式树BFP-tree,然后,利用合并算法将各局部树合并为一棵全局频繁模式树,并在全局频繁模式树上提取出所求的频繁项目集,通过对传统频繁模式树的存储结构的改进,减少了树的规模及站点间的网络通信量,并使树的遍历更加方便有效,提高了合并效率,从而提高了整个频繁项目集的挖掘效率。最后,采用天体光谱数据作为形式背景,实验验证了该算法的正确性和有效性。  相似文献   

13.
数据流分类中的增量特征选择算法   总被引:1,自引:0,他引:1  
李敏  王勇  蔡立军 《计算机应用》2010,30(9):2321-2323
概念流动的出现及数据的高维性增加了数据流特征选择的复杂性。信息增益是最有效的特征选择算法之一,但计算量大。对信息增益做了等价替换,提出一种基于改进信息增益的混合增量特征选择(IFS)算法。该算法首先利用与分类器无关的评价函数选出候选特征集合,然后将分类器作用于候选特征集合,利用分类精度作为评价标准去选择特征子集,在遇到概念漂移时重新选择特征子集。通过在超平面数据集和UCI数据集上的实验,表明基于IFS算法的分类器能够很快地适应概念漂移,并且比基于全部特征的分类算法有更高的精度。  相似文献   

14.
针对轨迹聚类算法在相似性度量中多以空间特征为度量标准,缺少对时间特征的度量,提出了一种基于时空模式的轨迹数据聚类算法。该算法以划分再聚类框架为基础,首先利用曲线边缘检测方法提取轨迹特征点;然后根据轨迹特征点对轨迹进行子轨迹段划分;最后根据子轨迹段间时空相似性,采用基于密度的聚类算法进行聚类。实验结果表明,使用所提算法提取的轨迹特征点在保证特征点具有较好简约性的前提下较为准确地描述了轨迹结构,同时基于时空特征的相似性度量因同时兼顾了轨迹的空间与时间特征,得到了更好的聚类结果。  相似文献   

15.
Most incremental mining and online mining algorithms concentrate on finding association rules or patterns consistent with entire current sets of data. Users cannot easily obtain results from only interesting portion of data. This may prevent the usage of mining from online decision support for multidimensional data. To provide ad-hoc, query-driven, and online mining support, we first propose a relation called the multidimensional pattern relation to structurally and systematically store context and mining information for later analysis. Each tuple in the relation comes from an inserted dataset in the database. We then develop an online mining approach called three-phase online association rule mining (TOARM) based on this proposed multidimensional pattern relation to support online generation of association rules under multidimensional considerations. The TOARM approach consists of three phases during which final sets of patterns satisfying various mining requests are found. It first selects and integrates related mining information in the multidimensional pattern relation, and then if necessary, re-processes itemsets without sufficient information against the underlying datasets. Some implementation considerations for the algorithm are also stated in detail. Experiments on homogeneous and heterogeneous datasets were made and the results show the effectiveness of the proposed approach.  相似文献   

16.
增量关联规则的向量法挖掘   总被引:2,自引:0,他引:2  
该文针对数据库记录增加时,如何高效地挖掘频集,提出了VFUP算法,并和其他增量更新算法进行比较,说明了该算法的高效性。  相似文献   

17.
Mining association rules and mining sequential patterns both are to discover customer purchasing behaviors from a transaction database, such that the quality of business decision can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the association rules and sequential patterns from a large database, and users may be only interested in some information.

Moreover, the criteria of the discovered association rules and sequential patterns for the user requirements may not be the same. Many uninteresting information for the user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only interesting knowledge to them from a large database of customer transactions. In this paper, a data mining language is presented. From the data mining language, users can specify the interested items and the criteria of the association rules or sequential patterns to be discovered. Also, the efficient data mining techniques are proposed to extract the association rules and the sequential patterns according to the user requirements.  相似文献   


18.
夏英  刘婉蓉 《计算机应用》2008,28(12):3224-3226
现有的关联规则算法大多都致力于解决增量式更新问题,需要多次扫描数据集,无法对海量数据进行有效处理。针对此问题,提出了基于滑动窗口的关联规则增量式更新算法(SWIUA),利用滑动窗口进行数据更新,挖掘出用户感兴趣的关联规则。该算法只需要扫描原始数据集和更新的数据各一遍,降低了I/O时间;并采用优化策略对候选项集过滤和删除,提高了关联规则的挖掘性能,能有效处理大量新增数据。  相似文献   

19.
Data envelopment analysis (DEA) uses extreme observations to identify superior performance, making it vulnerable to outliers. This paper develops a unified model to identify both efficient and inefficient outliers in DEA. Finding both types is important since many post analyses, after measuring efficiency, depend on the entire distribution of efficiency estimates. Thus, outliers that are distinguished by poor performance can significantly alter the results. Besides allowing the identification of outliers, the method described is consistent with a relaxed set of DEA axioms. Several examples demonstrate the need for identifying both efficient and inefficient outliers and the effectiveness of the proposed method. Applications of the model reveal that observations with low efficiency estimates are not necessarily outliers. In addition, a strategy to accelerate the computation is proposed that can apply to influential observation detection.  相似文献   

20.
在分析了当前基于距离的离群数据挖掘算法的基础上,提出了一种基于SOM的离群数据挖掘集成框架,其具有可扩展性、可预测性、交互性、适应性、简明性等特征.实验结果表明,基于SOM的离群数据挖掘是有效的.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号