首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
海量文本数据库中的高效并行频繁项集挖掘方法   总被引:1,自引:1,他引:0       下载免费PDF全文
针对大规模文本数据库中频繁项集挖掘的特殊要求,本文提出了一种新的并行挖掘算法parFIM。parFIM以一种简单的数据结构H-Struct为基础,对数据进行纵向划分从而实现并行挖掘。算法同时考虑了去除短模式和减少重复模式。实验结果表明,parFIM能够很好地适用于大规模文本数据库中的频繁项集挖掘任务。  相似文献   

2.
This paper addresses the problem of finding frequent closed patterns (FCPs) from very dense data sets. We introduce two compressed hierarchical FCP mining algorithms: C-Miner and B-Miner. The two algorithms compress the original mining space, hierarchically partition the whole mining task into independent subtasks, and mine each subtask progressively. The two algorithms adopt different task partitioning strategies: C-Miner partitions the mining task based on Compact Matrix Division, whereas B-Miner partitions the task based on Base Rows Projection. The compressed hierarchical mining algorithms enhance the mining efficiency and facilitate a progressive refinement of results. Moreover, because the subtasks can be mined independently, C-Miner and B-Miner can be readily paralleled without incurring significant communication overhead. We have implemented C-Miner and B-Miner, and our performance study on synthetic data sets and real dense microarray data sets shows their effectiveness over existing schemes. We also report experimental results on parallel versions of these two methods.  相似文献   

3.
神经网络与非线性模式数据挖掘研究   总被引:1,自引:2,他引:1  
邓乾罡  孟波 《计算机工程与设计》2004,25(10):1667-1668,1694
论述了人工智能技术在数据挖掘领域应用的一些理论进展。非线性模式的规则提取是数据挖掘的一个主要任务,然而,目前有效的方法却很少。着重论述了一个专用于对非线性模式数据进行数据挖掘的模型,并且给出了简要的算法和一个例子。  相似文献   

4.
过程挖掘对于部署新的商业流程以及审计、分析和改进已有的流程是非常有帮助的。在商业流程系统日志中,同名任务和重复任务是大量存在的。现有的挖掘算法都不能很好地区分,这导致在过程挖掘的结果中往往会产生不准确的流程模型。为了提高过程挖掘的准确性,提出了一种改进方法,它不仅能够挖掘日志中的循环结构、非自由选择结构等复杂结构,还能够挖掘日志中的同名任务和重复任务。  相似文献   

5.
序列模式挖掘是一项重要的数据挖掘任务,而Apriori算法是一种有效的关联规则挖掘方法,本文介绍了如何将Apriori算法应用于序列模式挖掘。  相似文献   

6.
The paper presents the implementation of an association rules discovery data mining task using Grid technologies. For the mining task we are using the Apriori algorithm on top of the Globus toolkit. The case study presents the design and integration of the data mining algorithm with the Globus services. The paper compares the Grid version with related work in the field and we outline the conclusions and future work.  相似文献   

7.
《Information Systems》2005,30(1):71-88
Many large organizations have multiple databases distributed in different branches, and therefore multi-database mining is an important task for data mining. To reduce the search cost in the data from all databases, we need to identify which databases are most likely relevant to a data mining application. This is referred to as database selection. For real-world applications, database selection has to be carried out multiple times to identify relevant databases that meet different applications. In particular, a mining task may be without reference to any specific application. In this paper, we present an efficient approach for classifying multiple databases based on their similarity between each other. Our approach is application-independent.  相似文献   

8.
Discovering shared conceptualizations in folksonomies   总被引:2,自引:0,他引:2  
Social bookmarking tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. Unlike ontologies, shared conceptualizations are not formalized, but rather implicit. We present a new data mining task, the mining of all frequent tri-concepts, together with an efficient algorithm, for discovering these implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution. Finally, we show the applicability of our approach on three large real-world examples.  相似文献   

9.
张诚  郑诚 《微机发展》2007,17(7):60-62
关联规则是数据挖掘研究中的一个重要的主题。一些算法都是假设数据中根本的关联基于时间是稳定的。然而,在现实世界领域,数据具有自己的特征,因此关联随着时间发生巨大的改变。现有的数据挖掘算法没有考虑关联的改变,这导致了严重的性能下降,特别是挖掘出的关联规则被用来分类和预测。尽管关联改变的挖掘是一个重要的问题,因为需要基于过去的历史数据来预测未来,现有的数据挖掘算法不符合这样的工作。文中引入模糊数据挖掘算法来发现基于时间的关联规则的改变。基于挖掘出的模糊规则,能预测关联规则在未来如何改变。实验表明了算法的有效性。  相似文献   

10.
Truong  Tin  Duong  Hai  Le  Bac  Fournier-Viger  Philippe  Yun  Unil 《Applied Intelligence》2022,52(6):6106-6128
Applied Intelligence - High utility sequence mining is a popular data mining task, which aims at finding sequences having a high utility (importance) in a quantitative sequence database. Though it...  相似文献   

11.
基于元信息的粗糙集规则并行挖掘方法   总被引:1,自引:0,他引:1  
苏健  高济 《计算机科学》2003,30(3):35-39
1.引言在当前的信息化时代,为从大量积累的历史数据中获取有用的知识,使得数据挖掘已成为研究热点。Pawlak教授提出粗糙集合理论,经过众多学者的研究和完善,已成为数据挖掘的重要手段。在大数据环境下,数据挖掘方法的速度将直接影响整个数据挖掘系统的性能,如何有效地提高数据挖掘方法的速度,是迫切需要解决的问题。与此同时,计算机网络存在大量的运算资源,充分利用这些资源是提高数据挖掘方法速度的有效途径。为此,本文提出  相似文献   

12.
描述数据挖掘的概念和特点,并着重讨论数据挖掘的方法、任务,从教学的实际情况出发,阐述利用数据挖掘技术在教学中带来的指导作用。  相似文献   

13.
Due to the increasing availability and sophistication of data recording techniques, multiple information sources and distributed computing are becoming the important trends of modern information systems. Many applications such as security informatics and social computing require a ubiquitous data analysis platform so that decisions can be made rapidly under distributed and dynamic system environments. Although data mining has now been popularly used to achieve such goals, building a data mining system is, however, a nontrivial task, which may require a complete understanding on numerous data mining techniques as well as solid programming skills. Employing agent techniques for data analysis thus becomes increasingly important, especially for users not familiar with engineering and computational sciences, to implement an effective ubiquitous mining platform. Such data mining agents should, in practice, be intelligent, complete, and compact. In this paper, we present an interactive data mining agent — OIDM (online interactive data mining), which provides three categories (classification, association analysis, and clustering) of data mining tools, and interacts with the user to facilitate the mining process. The interactive mining is accomplished through interviewing the user about the data mining task to gain efficient and intelligent data mining control. OIDM can help users find appropriate mining algorithms, refine and compare the mining process, and finally achieve the best mining results. Such interactive data mining agent techniques provide alternative solutions to rapidly deploy data mining techniques to broader areas of data intelligence and knowledge informatics.  相似文献   

14.
Knowledge and Information Systems - Pattern mining is a fundamental data mining task with applications in several domains. In this work, we consider the scenario in which we have a sequence of...  相似文献   

15.
互斥关系模式挖掘算法研究   总被引:2,自引:0,他引:2  
序列模式挖掘是数据挖掘的一个重要领域,结构关系模式挖掘是在序列模式挖掘基础上提出的一种新的挖掘任务.重点对结构关系模式的一个重要分支--互斥关系模式进行了研究,在给出与互斥关系模式相关概念的基础上讨论了互斥关系模式挖掘的两种算法,即基本检测法和分类检测法.实验结果表明,两种算法都是有效的,在序列模式数量很大时,分类检测法的挖掘效率高于基本检测法.结构关系模式挖掘和序列模式挖掘一样在实际应用中有着重要的价值,一些在序列模式挖掘过程中不能发现的隐藏模式将在结构关系模式中被发现,互斥关系模式的研究将进一步为结构关系模式挖掘理论的完善提供支持.  相似文献   

16.
日志信息的预处理是日志挖掘任务中的重要阶段,是当前研究的重点,同时也是整个日志挖掘过程的基础和实施有效挖掘算法的前提,在日志挖掘中起着重要的作用。目前主要的日志挖掘主要采用国外的几种软件,而日志挖掘中重要的数据预处理软件国内暂无。文中主要介绍了数据挖掘中的日志挖掘,分析了数据预处理的过程,以及如何实现日志挖掘中的数据预处理,并在Delphi开发工具中成功完成了IIS文本日志文件到Xls格式及XML格式文件的转换,实现了日志挖掘中的数据预处理。  相似文献   

17.
关联规则挖掘研究述评   总被引:19,自引:0,他引:19  
1 引言近年来,数据挖掘(又称为数据库中知识发现,KDD)引起了信息产业界的极大关注。关联规则挖掘作为数据挖掘的一种重要模式,已成为数据挖掘领域的一个非常重要的研究课题。它在商务管理、生产控制、市场分析、工程设计、科学探索等领域都有着重要的应用,目前又逐渐向生物医药、金融分析、电信等领域渗透。  相似文献   

18.
Shi  Chuan  Zhang  Zhiqiang  Ji  Yugang  Wang  Weipeng  Yu  Philip S.  Shi  Zhiping 《World Wide Web》2019,22(1):153-184
World Wide Web - Recently heterogeneous information network (HIN) analysis has attracted a lot of attention, and many data mining tasks have been exploited on HIN. As an important data mining task,...  相似文献   

19.
时间序列的相似性挖掘是数据挖掘中的重要内容,通过对水文时间序列的相似性挖掘研究,设计并实现一个基于J2EE组件技术的相似性挖掘系统。并对该系统进行了测试实验,证明其有效性和正确性.  相似文献   

20.
Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes ‘exceptional’ varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号