首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
多策略通用数据采掘工具MSMiner   总被引:6,自引:0,他引:6  
介绍了一种多策略通用数据采掘工具MSMiner的设计与实现。MSMiner建立在数据仓库之一,采用面向对象的方法描述关于数据源、采掘算法、采掘步骤和用户的元数据,该系统集成决策树、关联规则、传统统计分析、聚类分析、神经网络和可视化等多种数据采掘算法,以任务模型的形式生成和执行数据 采掘及决策支持任务。其特点是支持数据库、数据仓库、文本以及Web页面等形式数据源,可以动态地添加采掘算法,对数据和采掘策略的组织灵活有效,具有很好的可扩充性和通用性。  相似文献   

2.
一种基于Apriori的改进算法   总被引:16,自引:1,他引:15  
关联规则采掘是数据采掘中重要的研究课题。该文对关联规则采掘中的Apriori算法进行了深入研究。作者探讨了Apriori算法,指出了该算法的某些不足,提出了一种改进算法。  相似文献   

3.
空间数据采掘的研究与发展   总被引:19,自引:0,他引:19  
数据采掘的研究已从关系型和事务型数据库扩展到空间数据库。空间数据采掘是一个很有发展的领域,它是在大量空间数据中进行知识发现的技术。文中总结了空间数据采掘领域中的研究成果,概括出空间数据采掘的体系结构、查询语言及相关方法,并探讨了目前存在的问题和发展方向。  相似文献   

4.
基于Rough Set理论发现最小归纳依赖关系的方法研究   总被引:4,自引:0,他引:4  
程岩  黄梯云 《计算机工程》2000,26(3):26-27,48
归纳依赖关系是数据库研究领域的重要概念,在数据库中自动发现最小归纳依赖关系对数据采掘具有重大意义。介绍了归纳依赖关系的概念、原理及利用Rough Set理论度量数据属性间归纳依赖强度的方法,提出了一个在数据库中自动发现最小归纳依赖关系的算法。  相似文献   

5.
本文介绍数据仓库和数据采掘的基本概念,讨论了数据采掘的实现方法,提出了在数据仓库中进行采掘时出现的若干问题和对策。  相似文献   

6.
数据采掘技术回顾   总被引:33,自引:2,他引:33  
数据丰富而知识贫乏的状况导致了数据采掘的出现,并且在短短的几年内,引起了许多不同领域的人们的极大兴趣。数据采掘的应用也变得日益广泛起来,从传统的专家系统到当前最热门的Internet服务,都需要使用数据采掘技术来适应数据库规模的不断扩大。为了对数据采掘有一个比较清楚直观的了解,本文基于知识的种类将数据采掘分成四类,并在每一类中展现了一些有代表性的和比较新的技术。  相似文献   

7.
数据采掘技术的研究   总被引:14,自引:0,他引:14  
邵盛  白素怀 《微机发展》1999,9(3):51-52
本文首先论述知识发现和数据采掘的概念,然后介绍数据采掘所采用的方法及数据采掘的应用领域,最后指出数据采掘是一种新型的、有着广泛应用背景的数据库技术。  相似文献   

8.
基于Rough Set的空间数据分类方法   总被引:18,自引:1,他引:17  
石云  孙玉芳  左春 《软件学报》2000,11(5):673-678
近来,数据采掘的研究已从关系型和事务型数据库扩展到空间数据库.空间数据采掘是一个很有发展前景的领域,其中空间数据分类的研究尚处在起步阶段.该文分析和比较了现有的几个空间数据分类方法的利和弊,提出利用Rough Set的三阶段空间分类过程.实验结果表明,该算法对于解决包含不完整空间信息的问题是有效的.  相似文献   

9.
采掘关联规则的高效并行算法   总被引:33,自引:1,他引:32  
采掘关联规则是数据采掘领域的一个重要问题。文中对采掘关联规则问题进行了简单的回顾,给出了一种提高顺序采掘关联规则效率的方法;分析了已有并采掘关联规则算法的优缺点;设计了一个效率较高的并行采掘关联规则的算法PMAR;并与其它相应算法进行了比较,实验证明,算法PMAR是有效的。  相似文献   

10.
一种多概念层数值关联规则采掘方法   总被引:2,自引:0,他引:2  
目前,数据采掘已成为人工智能、数据库等领域的重要研究课题。数据采掘是从大量的数据中自动高效地提取未知的、可用的、可信的、可理解的知识的数据处理新技术。为了便于理解,数据采掘的结果可以用人们熟悉的概念来表示。某一领域中的概念按其内涵和外延往往形成一定的关系。在数据采掘中,主要有概念格和概念层次两种描述这种关系的方法,概念格主  相似文献   

11.
Mining very large databases   总被引:1,自引:0,他引:1  
Ganti  V. Gehrke  J. Ramakrishnan  R. 《Computer》1999,32(8):38-45
Established companies have had decades to accumulate masses of data about their customers, suppliers, products and services, and employees. Data mining, also known as knowledge discovery in databases, gives organizations the tools to sift through these vast data stores to find the trends, patterns, and correlations that can guide strategic decision making. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases however, are much too large to be held in main memory. To be efficient, the data mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if (given a fixed amount of main memory), its runtime increases linearly with the number of records in the input database. Recent work has focused on scaling data mining algorithms to very large data sets. The authors describe a broad range of algorithms that address three classical data mining problems: market basket analysis, clustering, and classification  相似文献   

12.
针对大规模文本数据库中频繁项集挖掘的特殊要求,本文提出了一种新的并行挖掘算法parFIM。parFIM以一种简单的数据结构H-Struct为基础,对数据进行纵向划分从而实现并行挖掘。算法同时考虑了去除短模式和减少重复模式。实验结果表明,parFIM能够很好地适用于大规模文本数据库中的频繁项集挖掘任务。  相似文献   

13.
基于簇特征的增量聚类算法设计与实现   总被引:2,自引:0,他引:2       下载免费PDF全文
对于大型数据库,如空间数据库和多媒体数据库,传统聚类算法的有效性和可扩展性受到限制。通过动态增量的方法,在基于密度和自适应密度可达聚类算法的基础上,根据BIRCH算法中聚类特征的概念,利用簇特征设计与实现了一种新的动态增量聚类算法,解决了大型数据库聚类的有效性以及空间和时间复杂度问题。理论分析和实验结果证明该算法能够有效地处理大型数据库,使聚类算法具有良好的可扩展性。  相似文献   

14.
Approaches for scaling DBSCAN algorithm to large spatial databases   总被引:7,自引:0,他引:7       下载免费PDF全文
The huge amount of information stored in datablases owned by coporations(e.g.retail,financial,telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining.Clustering.in data mining,is a useful technique for discovering intersting data distributions and patterns in the underlying data,and has many application fields,such as statistical data analysis,pattern recognition,image processsing,and other business application,s Although researchers have been working on clustering algorithms for decades,and a lot of algorithms for clustering have been developed,there is still no efficient algorithm for clustering very large databases and high dimensional data,As an outstanding representative of clustering algorithms,DBSCAN algorithm shows good performance in spatial data clustering.However,for large spatial databases,DBSCAN requires large volume of memory supprot and could incur substatial I/O costs because it operates directly on the entrie database,In this paper,several approaches are proposed to scale DBSCAN algorithm to large spatial databases.To begin with,a fast DBSCAN algorithm is developed.which considerably speeeds up the original DBSCAN algorithm,Then a sampling based DBSCAN algorithm,a partitioning-based DBSCAN algorithm,and a parallel DBSCAN algorithm are introduced consecutively.Following that ,based on the above-proposed algorithms,a synthetic algorithm is also given,Finally,some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.  相似文献   

15.
增量式K-Medoids聚类算法   总被引:3,自引:0,他引:3  
高小梅  冯志  冯兴杰 《计算机工程》2005,31(Z1):181-183
聚类是一种非常有用的数据挖掘方法,可用于发现隐藏在数据背后的分组和数据分布信息。目前已经提出了许多聚类算法及其变种,但在增量式聚类算法研究方面所做的工作较少。当数据集因更新而发生变化时,数据挖掘的结果也应该进行相应的更新。由于数据量大,在更新后的数据集上重新执行聚类算法以更新挖掘结果显然比较低效,因此亟待研究增量式聚类算法。该文通过对K-Medoids聚类算法的改进,提出一种增量式K-Medoids聚类算法。它能够很好地解决传统聚类算法在伸缩性、数据定期更新时所面临的问题。  相似文献   

16.
Visual data mining in large geospatial point sets   总被引:2,自引:0,他引:2  
Visual data-mining techniques have proven valuable in exploratory data analysis, and they have strong potential in the exploration of large databases. Detecting interesting local patterns in large data sets is a key research challenge. Particularly challenging today is finding and deploying efficient and scalable visualization strategies for exploring large geospatial data sets. One way is to share ideas from the statistics and machine-learning disciplines with ideas and methods from the information and geo-visualization disciplines. PixelMaps in the Waldo system demonstrates how data mining can be successfully integrated with interactive visualization. The increasing scale and complexity of data analysis problems require tighter integration of interactive geospatial data visualization with statistical data-mining algorithms.  相似文献   

17.
Frequent itemset mining presents one of the fundamental building blocks in data mining. However, despite the crucial recent advances that have been made in data mining literature, few of both standard and improved solutions scale. This is particularly the case when (1) the quantity of data tends to be very large and/or (2) the minimum support is very low. In this paper, we address the problem of parallel frequent itemset mining (PFIM) in very large databases and study the impact and effectiveness of using specific data placement strategies in a massively distributed environment. By offering a clever data placement and an optimal organization of the extraction algorithms, we show that the arrangement of both the data and the different processes can make the global job either completely inoperative or very effective. In this setting, we propose two different highly scalable, PFIM algorithms, namely P2S (parallel-2-steps) and PATD (parallel absolute top-down). P2S algorithm allows discovering itemsets from large databases in two simple, yet efficient parallel jobs, while PATD renders the mining process of very large databases more simple and compact. Its mining process is made up of only one parallel job, which dramatically reduces the running time, the communication cost and the energy power consumption overhead in a distributed computational platform. Our different proposed approaches have been extensively evaluated on massive real-world data sets. The experimental results confirm the effectiveness and scalability of our proposals by the important scale-up obtained with very low minimum supports compared to other alternatives.  相似文献   

18.
Clustering in very large databases based on distance and density   总被引:8,自引:0,他引:8       下载免费PDF全文
Clustering in vergy large databases or data warehouses,with many applications in areas such as spatial computation,web information coollection,pattern recognition and econmic analysis,is a huge task that challenges data mining researches.Current clustering methods always have the problems:1)scanning the whole databased leads to high I/O cost and expensive maintenance(e.g.,R^*-tree);2)pre-specifying the uncertain parameter k,with which clustering can only be refined by trial and test many times;3) lacking high efficiency in treating arbitrary shape under very large data set environment.In this paper,we first present a new hybrid-clustering algorithm to solve these problesm,This new algorithm,which combines both distance and density strategies,can handle any arbitrary shape clusters effectively.It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality.Furthermore,this algorithm can easily eliminate noises and inentify outliers.An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms(CURE and DBSCAN).The results show that our algorithm outperforms them in terms of efficiency and cost,and even gets much more speedup as the data size scales up much larger.  相似文献   

19.
Frequent itemset mining allows us to find hidden, important information from large databases. Moreover, processing incremental databases in the itemset mining area has become more essential because a huge amount of data has been accumulated continually in a variety of application fields and users want to obtain mining results from such incremental data in more efficient ways. One of the major problems in incremental itemset mining is that the corresponding mining results can be very large-scale according to threshold settings and data volumes. In addition, it is considerably hard to analyze all of them and find meaningful information. Furthermore, not all of the mining results become actually important information. In this paper, to solve these problems, we propose an algorithm for mining weighted maximal frequent itemsets from incremental databases. By scanning a given incremental database only once, the proposed algorithm can not only conduct its mining operations suitable for the incremental environment but also extract a smaller number of important itemsets compared to previous approaches. The proposed method also has an effect on expert and intelligent systems since it can automatically provide more meaningful pattern results reflecting characteristics of given incremental databases and threshold settings, which can help users analyze the given data more easily. Our comprehensive experimental results show that the proposed algorithm is more efficient and scalable than previous state-of-the-art algorithms.  相似文献   

20.
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach   总被引:5,自引:1,他引:4  
An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号