共查询到20条相似文献,搜索用时 15 毫秒
1.
针对大规模文本数据库中频繁项集挖掘的特殊要求,本文提出了一种新的并行挖掘算法parFIM。parFIM以一种简单的数据结构H-Struct为基础,对数据进行纵向划分从而实现并行挖掘。算法同时考虑了去除短模式和减少重复模式。实验结果表明,parFIM能够很好地适用于大规模文本数据库中的频繁项集挖掘任务。 相似文献
2.
Liping Ji Kian-Lee Tan Tung A.K.H. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(9):1175-1187
This paper addresses the problem of finding frequent closed patterns (FCPs) from very dense data sets. We introduce two compressed hierarchical FCP mining algorithms: C-Miner and B-Miner. The two algorithms compress the original mining space, hierarchically partition the whole mining task into independent subtasks, and mine each subtask progressively. The two algorithms adopt different task partitioning strategies: C-Miner partitions the mining task based on Compact Matrix Division, whereas B-Miner partitions the task based on Base Rows Projection. The compressed hierarchical mining algorithms enhance the mining efficiency and facilitate a progressive refinement of results. Moreover, because the subtasks can be mined independently, C-Miner and B-Miner can be readily paralleled without incurring significant communication overhead. We have implemented C-Miner and B-Miner, and our performance study on synthetic data sets and real dense microarray data sets shows their effectiveness over existing schemes. We also report experimental results on parallel versions of these two methods. 相似文献
3.
神经网络与非线性模式数据挖掘研究 总被引:1,自引:2,他引:1
论述了人工智能技术在数据挖掘领域应用的一些理论进展。非线性模式的规则提取是数据挖掘的一个主要任务,然而,目前有效的方法却很少。着重论述了一个专用于对非线性模式数据进行数据挖掘的模型,并且给出了简要的算法和一个例子。 相似文献
4.
5.
序列模式挖掘是一项重要的数据挖掘任务,而Apriori算法是一种有效的关联规则挖掘方法,本文介绍了如何将Apriori算法应用于序列模式挖掘。 相似文献
6.
《Advances in Engineering Software》2007,38(5):295-300
The paper presents the implementation of an association rules discovery data mining task using Grid technologies. For the mining task we are using the Apriori algorithm on top of the Globus toolkit. The case study presents the design and integration of the data mining algorithm with the Globus services. The paper compares the Grid version with related work in the field and we outline the conclusions and future work. 相似文献
7.
《Information Systems》2005,30(1):71-88
Many large organizations have multiple databases distributed in different branches, and therefore multi-database mining is an important task for data mining. To reduce the search cost in the data from all databases, we need to identify which databases are most likely relevant to a data mining application. This is referred to as database selection. For real-world applications, database selection has to be carried out multiple times to identify relevant databases that meet different applications. In particular, a mining task may be without reference to any specific application. In this paper, we present an efficient approach for classifying multiple databases based on their similarity between each other. Our approach is application-independent. 相似文献
8.
Discovering shared conceptualizations in folksonomies 总被引:2,自引:0,他引:2
Robert Jschke Andreas Hotho Christoph Schmitz Bernhard Ganter Gerd Stumme 《Journal of Web Semantics》2008,6(1):38-53
Social bookmarking tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. Unlike ontologies, shared conceptualizations are not formalized, but rather implicit. We present a new data mining task, the mining of all frequent tri-concepts, together with an efficient algorithm, for discovering these implicit shared conceptualizations. Our approach extends the data mining task of discovering all closed itemsets to three-dimensional data structures to allow for mining folksonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution. Finally, we show the applicability of our approach on three large real-world examples. 相似文献
9.
关联规则是数据挖掘研究中的一个重要的主题。一些算法都是假设数据中根本的关联基于时间是稳定的。然而,在现实世界领域,数据具有自己的特征,因此关联随着时间发生巨大的改变。现有的数据挖掘算法没有考虑关联的改变,这导致了严重的性能下降,特别是挖掘出的关联规则被用来分类和预测。尽管关联改变的挖掘是一个重要的问题,因为需要基于过去的历史数据来预测未来,现有的数据挖掘算法不符合这样的工作。文中引入模糊数据挖掘算法来发现基于时间的关联规则的改变。基于挖掘出的模糊规则,能预测关联规则在未来如何改变。实验表明了算法的有效性。 相似文献
10.
Truong Tin Duong Hai Le Bac Fournier-Viger Philippe Yun Unil 《Applied Intelligence》2022,52(6):6106-6128
Applied Intelligence - High utility sequence mining is a popular data mining task, which aims at finding sequences having a high utility (importance) in a quantitative sequence database. Though it... 相似文献
11.
基于元信息的粗糙集规则并行挖掘方法 总被引:1,自引:0,他引:1
1.引言在当前的信息化时代,为从大量积累的历史数据中获取有用的知识,使得数据挖掘已成为研究热点。Pawlak教授提出粗糙集合理论,经过众多学者的研究和完善,已成为数据挖掘的重要手段。在大数据环境下,数据挖掘方法的速度将直接影响整个数据挖掘系统的性能,如何有效地提高数据挖掘方法的速度,是迫切需要解决的问题。与此同时,计算机网络存在大量的运算资源,充分利用这些资源是提高数据挖掘方法速度的有效途径。为此,本文提出 相似文献
12.
描述数据挖掘的概念和特点,并着重讨论数据挖掘的方法、任务,从教学的实际情况出发,阐述利用数据挖掘技术在教学中带来的指导作用。 相似文献
13.
Xin-Dong Wu 《计算机科学技术学报》2009,24(6):1018-1027
Due to the increasing availability and sophistication of data recording techniques, multiple information sources and distributed
computing are becoming the important trends of modern information systems. Many applications such as security informatics
and social computing require a ubiquitous data analysis platform so that decisions can be made rapidly under distributed and
dynamic system environments. Although data mining has now been popularly used to achieve such goals, building a data mining
system is, however, a nontrivial task, which may require a complete understanding on numerous data mining techniques as well
as solid programming skills. Employing agent techniques for data analysis thus becomes increasingly important, especially
for users not familiar with engineering and computational sciences, to implement an effective ubiquitous mining platform.
Such data mining agents should, in practice, be intelligent, complete, and compact. In this paper, we present an interactive
data mining agent — OIDM (online interactive data mining), which provides three categories (classification, association analysis,
and clustering) of data mining tools, and interacts with the user to facilitate the mining process. The interactive mining
is accomplished through interviewing the user about the data mining task to gain efficient and intelligent data mining control.
OIDM can help users find appropriate mining algorithms, refine and compare the mining process, and finally achieve the best
mining results. Such interactive data mining agent techniques provide alternative solutions to rapidly deploy data mining
techniques to broader areas of data intelligence and knowledge informatics. 相似文献
14.
Knowledge and Information Systems - Pattern mining is a fundamental data mining task with applications in several domains. In this work, we consider the scenario in which we have a sequence of... 相似文献
15.
互斥关系模式挖掘算法研究 总被引:2,自引:0,他引:2
序列模式挖掘是数据挖掘的一个重要领域,结构关系模式挖掘是在序列模式挖掘基础上提出的一种新的挖掘任务.重点对结构关系模式的一个重要分支--互斥关系模式进行了研究,在给出与互斥关系模式相关概念的基础上讨论了互斥关系模式挖掘的两种算法,即基本检测法和分类检测法.实验结果表明,两种算法都是有效的,在序列模式数量很大时,分类检测法的挖掘效率高于基本检测法.结构关系模式挖掘和序列模式挖掘一样在实际应用中有着重要的价值,一些在序列模式挖掘过程中不能发现的隐藏模式将在结构关系模式中被发现,互斥关系模式的研究将进一步为结构关系模式挖掘理论的完善提供支持. 相似文献
16.
日志信息的预处理是日志挖掘任务中的重要阶段,是当前研究的重点,同时也是整个日志挖掘过程的基础和实施有效挖掘算法的前提,在日志挖掘中起着重要的作用。目前主要的日志挖掘主要采用国外的几种软件,而日志挖掘中重要的数据预处理软件国内暂无。文中主要介绍了数据挖掘中的日志挖掘,分析了数据预处理的过程,以及如何实现日志挖掘中的数据预处理,并在Delphi开发工具中成功完成了IIS文本日志文件到Xls格式及XML格式文件的转换,实现了日志挖掘中的数据预处理。 相似文献
17.
关联规则挖掘研究述评 总被引:19,自引:0,他引:19
1 引言近年来,数据挖掘(又称为数据库中知识发现,KDD)引起了信息产业界的极大关注。关联规则挖掘作为数据挖掘的一种重要模式,已成为数据挖掘领域的一个非常重要的研究课题。它在商务管理、生产控制、市场分析、工程设计、科学探索等领域都有着重要的应用,目前又逐渐向生物医药、金融分析、电信等领域渗透。 相似文献
18.
Shi Chuan Zhang Zhiqiang Ji Yugang Wang Weipeng Yu Philip S. Shi Zhiping 《World Wide Web》2019,22(1):153-184
World Wide Web - Recently heterogeneous information network (HIN) analysis has attracted a lot of attention, and many data mining tasks have been exploited on HIN. As an important data mining task,... 相似文献
19.
20.
Cláudio Rebelo de Sá Wouter Duivesteijn Paulo Azevedo Alípio Mário Jorge Carlos Soares Arno Knobbe 《Machine Learning》2018,107(11):1775-1807
Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes ‘exceptional’ varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge. 相似文献