共查询到20条相似文献,搜索用时 15 毫秒
1.
Eliseo Clementini Paolino Di Felice Krzysztof Koperski 《Data & Knowledge Engineering》2000,34(3):251-270
Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a demanding field since huge amounts of spatial data have been collected in various applications, ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning. The collected data far exceeds people's ability to analyze it. Thus, new and efficient methods are needed to discover knowledge from large spatial databases. Most of the spatial data mining methods do not take into account the uncertainty of spatial information. In our work we use objects with broad boundaries, the concept that absorbs all the uncertainty by which spatial data is commonly affected and allows computations in the presence of uncertainty without rough simplifications of the reality. The topological relations between objects with a broad boundary can be organized into a three-level concept hierarchy. We developed and implemented a method for an efficient determination of such topological relations. Based on the hierarchy of topological relations we present a method for mining spatial association rules for objects with uncertainty. The progressive refinement approach is used for the optimization of the mining process. 相似文献
2.
关联是数据挖掘领域的一个重要研究课题。对模糊关联规则挖掘进行了研究,针对普通关联规则不能精确表达数据库中模糊信息关联性的问题,提出了一种新的模糊关联规则挖掘算法FARM_New,结果表明算法是有效的,提高了模糊挖掘的速度。 相似文献
3.
Anthony J.T. Lee Author Vitae Ying-Ho Liu Author Vitae Author Vitae Hsiu-Hui Lin Author Vitae Author Vitae 《Journal of Systems and Software》2009,82(4):603-618
In this paper, we propose a novel algorithm, called 9DSPA-Miner, to mine frequent patterns from an image database, where every image is represented by the 9D-SPA representation. Our proposed method consists of three phases. First, we scan the database once and create an index structure. Next, the index structure is scanned to find all frequent patterns of length two. Finally, we use the frequent k-patterns (k ? 2) to generate candidate (k + 1)-patterns and check if the support of each candidate generated is not less than the user-specified minimum support threshold by using the index structure. Then, the steps in the third phase are repeated until no more frequent patterns can be found. Since the 9DSPA-Miner algorithm uses the characteristics of the 9D-SPA representation to prune most of impossible candidates, the experiment results demonstrate that it is more efficient and scalable than the modified Apriori method. 相似文献
4.
5.
一个有效的分布式并行挖掘关联规则算法 总被引:2,自引:2,他引:2
提出了一个基于分布式结构的快速有效的关联规则挖掘算法,它采用了分布式结构,各节点并行计算,与相关算法相比有效地减少了通信量和候选项集数目,算法可扩展性好,实现简单。 相似文献
6.
在关系数据库中,数据丢失现象常常是不可避免的。在不完全数据库中挖掘关联规则的关键问题是如何估算关联规则的支持度和置信度。给出了不完全数据库中关联规则挖掘的两种求估方法,并进行了简单的比较。 相似文献
7.
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. In real-world applications, transactions may contain quantitative values and each item may have a lifespan from a temporal database. In this paper, we thus propose a data mining algorithm for deriving fuzzy temporal association rules. It first transforms each quantitative value into a fuzzy set using the given membership functions. Meanwhile, item lifespans are collected and recorded in a temporal information table through a transformation process. The algorithm then calculates the scalar cardinality of each linguistic term of each item. A mining process based on fuzzy counts and item lifespans is then performed to find fuzzy temporal association rules. Experiments are finally performed on two simulation datasets and the foodmart dataset to show the effectiveness and the efficiency of the proposed approach. 相似文献
8.
基于规模约简和多支持度的关联规则挖掘 总被引:1,自引:0,他引:1
关联规则挖掘的经典算法是Apriori算法,但是存在两大突出的问题,即多次扫描事务数据库和使用单一的支持度,导致了由于事务数据库的规模而增加搜索时间和产生冗余规则或有效规则被丢弃。以往的改进算法只从其中一方面进行考虑。因此同时考虑存在问题,给出了一种基于规模约简和多支持度的关联规则挖掘算法。分析和试验显示在效率上有提高。 相似文献
9.
We propose a methodology that upgrades the methods of the Lagrangian analysis of surface sea-water parcels. This methodology includes data mining with efficient visualization techniques, namely, spatial–temporal association rules and multi-level directed graphs with different levels of space and time granularity. In the resulting multi-level directed graphs we can intertwine knowledge from various disciplines related to oceanography (in our application) and perform the mining of such graphs. We evaluate the proposed methodology on Lagrangian tracking of virtual particles in the velocity field of the numerical model called the Mediterranean Ocean Forecasting Model (MFS). We describe an efficient algorithm based on label propagation clustering, which finds cycles and paths in multi-level directed graphs and reveals how the number and size of the cycles depend on the seasons. In addition, we offer three interesting results of the visualization and mining of such graphs, that is, the 12 months periodicity of the exchange of water masses among sea areas, the separation of Mediterranean Sea circulation in summer and winter situations, obtained with the hierarchical clustering of multi-level directed graphs, and finally, with visualization with multi-level directed graphs we confirm the reversal of sea circulation in the Ionian Sea over the last decades. The aforementioned results received a very favorable evaluation from oceanographic experts. 相似文献
10.
Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent. 相似文献
11.
In this paper, we propose a new algorithm named Parallel Multipass with Inverted Hashing and Pruning (PMIHP) for mining association
rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction
databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets
(i.e., sets of words) that need to be counted. The new PMIHP algorithm is a parallel version of our Multipass with Inverted
Hashing and Pruning (MIHP) algorithm (Holt, Chung in: Proc of the 14th IEEE int’l conf on tools with artificial intelligence,
2002, pp 49–56), which was shown to be quite efficient than other existing algorithms in the context of mining text databases.
The PMIHP algorithm reduces the overhead of communication between miners running on different processors because they are
mining local databases asynchronously and prune the global candidates by using the Inverted Hashing and Pruning technique.
Compared with the well-known Count Distribution algorithm (Agrawal, Shafer in: (1996) IEEE Trans Knowl Data Eng 8(6):962–969),
PMIHP demonstrates superior performance characteristics for mining association rules in large text databases, and when the
minimum support level is low, its speedup is superlinear as the number of processors increases. These experiments were performed
on a cluster of Linux workstations using a collection of Wall Street Journal articles.
This research was supported in part by Ohio Board of Regents, LexisNexis, and AFRL/Wright Brothers Institute (WBI). 相似文献
12.
宫雨 《计算机工程与设计》2007,28(24):5838-5840
约束关联规则是关联规则研究中的重要问题,目前的研究大多集中在单变量约束,对双变量约束的研究较少,而双变量约束在实际中也有重要作用.针对这种情况,提出了双变量约束中具有下界约束的关联规则问题.在此基础上,给出了下界约束的定义,然后分析了满足下界约束频繁集的性质,并给出了相关的证明.最后提出了基于FP-Tree的下界约束算法,采用了预先测试的方法,降低了需要测试项集的数量和计算成本.实验结果表明,该算法具有较高的效率. 相似文献
13.
FARICS: a method of mining spatial association rules and collocations using clustering and Delaunay diagrams 总被引:2,自引:0,他引:2
The paper presents problems pertaining to spatial data mining. Based on the existing solutions a new method of knowledge extraction
in the form of spatial association rules and collocations has been worked out and is proposed herein. Delaunay diagram is
used for determining neighborhoods. Based on the neighborhood notion, spatial association rules and collocations are defined.
A novel algorithm for finding spatial rules and collocations has been presented. The approach allows eliminating the parameters
defining neighborhood of objects, thus avoiding multiple “test and trial” repetitions of the process of mining for various
parameter values. The presented method has been implemented and tested. The results of the experiments have been discussed. 相似文献
14.
本文采用一种基于布尔矩阵的频繁集挖掘算法。该算法直接通过支持矩阵行向量的按位与运算来找出频繁集,而不需要Apriori算法的连接和剪枝,通过不断压缩支持矩阵,不仅节约了存储空间,还提高了算法的效率。 相似文献
15.
关联规则挖掘中对Apriori算法的一种改进研究 总被引:2,自引:0,他引:2
通过对关联规则挖掘算法的详细分析,提出了一种基于无向项集图的动态频繁项集挖掘算法.当事务数据库和最小支持度发生变化时,该算法只需重新遍历一次无向项集图即可得到新的频繁项集.该算法不仅简单、只需扫描一次数据库,而且还具有搜索速度快、节省内存空间等优点. 相似文献
16.
针对分布式数据挖掘需要节点间进行大量数据交换的缺点,根据张春生,宋琳琳提出的关联规则局部性原理,不进行数据交换,通过节点挖掘,直接得到局部性全局关联规则,通过各节点间规则的合并,直接得到非局部全局关联规则,该算法简单易行,不需要节点间的数据交换,提高了数据挖掘效率,不仅挖掘出其他分布式数据挖掘算法挖掘出的全局关联规则,还能够发现其他算法不能发现的局部全局规则. 相似文献
17.
18.
随着信息化时代的不断发展,数据挖掘技术日趋成熟,满足了人们对于大量信息的处理要求。目前,数据挖掘技术已经越来越多的应用于金融、通讯和交通等各行各业,但是在教育领域的应用相对较少。针对这一现状,在传统的分析方法上采用了关联规则挖掘,聚类挖掘等多种算法,对青少年同伴关系、人际关系和网络成瘾等各方面进行研究,得到不同性别的青少年同伴亲疏程度不同等一系列结论,提出了数据挖掘技术在教育学领域的应用的新前景。 相似文献
19.
We develop techniques for discovering patterns with periodicity in this work. Patterns with periodicity are those that occur at regular time intervals, and therefore there are two aspects to the problem: finding the pattern, and determining the periodicity. The difficulty of the task lies in the problem of discovering these regular time intervals, i.e., the periodicity. Periodicities in the database are usually not very precise and have disturbances, and might occur at time intervals in multiple time granularities. To overcome these difficulties and to be able to discover the patterns with fuzzy periodicity, we propose the fuzzy periodic calendar which defines fuzzy periodicities. Furthermore, we develop algorithms for mining fuzzy periodicities and the fuzzy periodic association rules within them. Experimental results have shown that our method is effective in discovering fuzzy periodic association rules. 相似文献
20.
针对在生物信息网络中对复杂和大规模的数据集进行挖掘时所出现的算法挖掘精度低、运行速度慢、内存占用大等问题,提出一种基于关联规则映射的生物信息网络多维数据挖掘算法.该算法结合网络数据集之间的关联映射关系,从而确定网络数据集的关联规则,并引入挖掘因子和相对误差来提高算法的挖掘精度;根据多维子空间中数据集之间的关联程度进行子空间区分以及子空间内数据集区分,从而实现对不同数据集的有效挖掘.在实验中,对不同数据集数量下的算法内存占用情况、算法挖掘精度、算法运行时间进行仿真,从实验结果可以看出基于关联规则映射的挖掘算法可以有效地提高挖掘精度,在减少内存占用和提升计算速度上也具有一定的优势. 相似文献