首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
石油工业大数据具有无限潜力与价值,将大数据与数据挖掘技术应用其中,不仅可以提升石油行业工业化水平,而且对石油行业智慧化发展起到强有力地推动作用.由此提出了一个Web架构驱动的、集成了数据挖掘五大模块的新型工业知识挖掘系统-即石油工业数据挖掘系统,包含:数据集管理、预处理算法管理、数据挖掘算法管理以及数据挖掘流程管理和数据结果可视化五大模块.本系统实现了完全自助式的数据提取、数据预处理、数据分析与知识挖掘和结果可视化展示的完整知识挖掘流程.通过以Web的形式满足油田不同层级的用户在不同场景下的即时使用需求,极大提高了系统的灵活性.通过本系统,油田的技术开发人员可忽略大数据的搭建以及其他复杂构建过程,更好的服务于油田数据建模和分析.  相似文献   

2.
基于J2EE的空间数据挖掘系统设计与实现   总被引:1,自引:0,他引:1  
在分析空间数据挖掘特点的基础上,充分集成空间数据仓库技术、空间数据挖掘技术以及空间信息表达等技术,设计了一个基于J2EE的空间数据挖掘原型。重点介绍该原型系统的功能框架与体系结构、空间关联规则挖掘模块、挖掘结果的可视化表达模块的设计和实现办法。最后给出系统以某市土地利用现状数据集为例的空间关联规则挖掘结果界面。结果表明该系统可较好地满足可靠性、扩展性、可用性等业务需要。  相似文献   

3.
立足于XML技术,以数据挖掘为应用背景,研究、探讨了面向XML数据库的挖掘技术。基本出发点是充分发挥传统数据挖掘中成熟丰富的技术优势,兼顾XML数据库的特点,实现二者的紧密结合,最终达到能够对以XML数据库为挖掘源进行简便、有效的挖掘这一目的。在全面介绍XML及其相关技术的基础上,针对提取关联规则这一数据挖掘目的,给出了基于XML数据库的关联规则挖掘方法,旨在探索出一种面向XML数据较好的挖掘方法,解决从XML这种半结构化数据中挖掘知识的问题。  相似文献   

4.
一种改进的Apriori算法   总被引:6,自引:2,他引:4  
数据挖掘中的关联规则挖掘能够发现大量数据中项集之间有趣的关联或相关联系,特别是随着大量数据不停地收集和存储,从数据库中挖掘关联规则就越来越有其必要性.通过对关联规则挖掘技术及其相关算法Apaod进行分析,发现该技术存在的问题.Apriori算法是关联规则挖掘中的经典算法.对Apriori算法做了改进.借助0-1矩阵给出了计算项集的支持度计数的更快方法,同时还简化了Apriori算法中的连接和剪枝操作,从而在时间和空间上提高了Apriori算法的效率.  相似文献   

5.
数据挖掘技术目前在商业、金融业等方面都得到了广泛的应用,而在教育领域应用较少.数据挖掘中的关联规则挖掘能够发现大量数据中项集之间有趣的关联或相关联系,特别是随着大量数据不停地收集和存储,从数据库中挖掘关联规则就越来越有其必要性.文中从滁州学院教师档案数据库提取相关教师的记录,并结合课堂教学质量评估中的实际数据,利用改进的Apriori算法找出教师本身的素质与学生评价结果之间的内在关系.  相似文献   

6.
由于传统系统存在新用户推荐及推荐效果较差的问题,提出一种基于数据挖掘的思政理论资源个性化推荐系统。硬件中服务器模块共由22台服务器构成。处理器模块选用的是S3C3210x处理器。软件中数据挖掘模块主要利用Orange工具箱来实现思政理论资源的数据挖掘。数据处理模块能够实现资源数据的转换与爬取、降维处理。资源个性化推荐模块主要通过混合推荐技术实现个性化推荐。数据库模块中包括用户兴趣表、资源信息表、用户信息表。以此进行性能测试。实验结果表明,其推荐效果优于传统系统,可实现新用户的个性化推荐。  相似文献   

7.
数据挖掘技术是在大量的数据中发现未知知识的数据分析技术,利用数据挖掘技术分析客户数据,发现其中的规律,从而为商务决策提供依据.本文研究了关联规则的相关分析并应用于网上书店系统,实现客户订单数据的关联规则挖掘.  相似文献   

8.
将数据挖掘与相关的数据可视化技术和联机分析处理技术集成,构造一个应用于电子商务Web环境中的以数据挖掘技术为基础的数据可视化分析系统模型——电子商务数据挖掘可视化模型(EDVM),并技术实现主要模块功能,使之能够进行挖掘结果的动态更新与可视化输出,并通过实验初步验证了EDVM模型的有效性。  相似文献   

9.
研究了关联规则分类算法,应用关联规则Apriori算法,对远程教育考试系统数据样本进行数据分析,从分析的结果中发现有价值的数据模式,寻找其中存在的关系和规则,可以为教学和考试环节发挥调节、控制、指导作用,为远程教育管理提供合理、科学的决策支持.以分类关联规则挖掘为主线,研究了数据挖掘流程中数据预处理技术、分类关联规则挖掘建模及实施应用等过程的实现.实验结果表明,该分类应用系统实现了对考试数据的自动分类,具有较好的分类运算速度.  相似文献   

10.
关联规则技术在教学评价中的应用   总被引:1,自引:0,他引:1  
主要研究了基于知识发现的教学评价系统的开发过程,介绍了系统开发工具及关联规则挖掘等主要功能子模块的设计和实现.论文应用关联规则Apriori算法,对教学评价数据样本进行数据分析,使用数据库中用户交互数据记录,利用最小支持度和最小置信度,挖掘出频繁项集,从分析的结果中发现有价值的数据模式,寻找其中存在的关系和规则,为教育教学活动发挥指导作用,为教学管理提供合理、科学的决策支持,并且提出了对系统进一步的改进建议.  相似文献   

11.
Spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. Therefore, providing general concepts for neighborhood relations as well as an efficient implementation of these concepts will allow a tight integration of spatial data mining algorithms with a spatial database management system. This will speed up both, the development and the execution of spatial data mining algorithms. In this paper, we define neighborhood graphs and paths and a small set of database primitives for their manipulation. We show that typical spatial data mining algorithms are well supported by the proposed basic operations. For finding significant spatial patterns, only certain classes of paths “leading away” from a starting object are relevant. We discuss filters allowing only such neighborhood paths which will significantly reduce the search space for spatial data mining algorithms. Furthermore, we introduce neighborhood indices to speed up the processing of our database primitives. We implemented the database primitives on top of a commercial spatial database management system. The effectiveness and efficiency of the proposed approach was evaluated by using an analytical cost model and an extensive experimental study on a geographic database.  相似文献   

12.
This paper presents a simple, efficient computer-based method for discovering causal relationships from databases that contain observational data. Observational data is passively observed, as contrasted with experimental data. Most of the databases available for data mining are observational. There is great potential for mining such databases to discover causal relationships. We illustrate how observational data can constrain the causal relationships among measured variables, sometimes to the point that we can conclude that one variable is causing another variable. The presentation here is based on a constraint-based approach to causal discovery. A primary purpose of this paper is to present the constraint-based causal discovery method in the simplest possible fashion in order to (1) readily convey the basic ideas that underlie more complex constraint-based causal discovery techniques, and (2) permit interested readers to rapidly program and apply the method to their own databases, as a start toward using more elaborate causal discovery algorithms.  相似文献   

13.
Data mining is a method for extracting useful information that is necessary for a system from a database. As the types of data processed by the system are diversified, the transformed pattern mining techniques for processing these type of data have been proposed. Unlike the traditional pattern mining methods, erasable pattern mining is a technique for finding the patterns that can be removed by coming with a small profit. Erasable pattern mining should be able to process data by considering both the environment that the data are generated from and the characteristics of the data. An uncertain database is a database that is composed of uncertain data. Since erasable patterns discovered from uncertain data contain significant information, these patterns need to be extracted. In addition, databases gradually increase, because the data from various fields is generated and accumulated over data streams. Data streams should be processed as intelligently as possible to provide the useful data to the system in real time. In this paper, we propose an efficient erasable pattern mining algorithm that processes uncertain data that is generated over data streams. The uncertain erasable patterns discovered through the suggested technique are more meaningful information by considering the probability of the item and the profit. Moreover, the proposed method can perform efficient mining operations by using both tree and list structures. The performance of the suggested algorithm is verified through the performance tests compared with state-of-the-art algorithms using real data sets and synthetic data sets.  相似文献   

14.
High Performance OLAP and Data Mining on Parallel Computers   总被引:2,自引:0,他引:2  
On-Line Analytical Processing (OLAP) techniques are increasingly being used in decision support systems to provide analysis of data. Queries posed on such systems are quite complex and require different views of data. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Multidimensional OLAP systems store data in multidimensional arrays on which analytical operations are performed. Knowledge discovery and data mining requires complex operations on the underlying data which can be very expensive in terms of computation time. High performance parallel systems can reduce this analysis time. Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present algorithms for construction of data cubes on distributed-memory parallel computers. Data is loaded from a relational database into a multidimensional array. We present two methods, sort-based and hash-based for loading the base cube and compare their performances. Data cubes are used to perform consolidation queries used in roll-up operations using dimension hierarchies. Finally, we show how data cubes are used for data mining using Attribute Focusing techniques. We present results for these on the IBM-SP2 parallel machine. Results show that our algorithms and techniques for OLAP and data mining on parallel systems are scalable to a large number of processors, providing a high performance platform for such applications.  相似文献   

15.
An Overview of Data Mining and Knowledge Discovery   总被引:9,自引:0,他引:9       下载免费PDF全文
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research.  相似文献   

16.
This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently support data intensive computing in response to the rapid growing in database size and the need of high performance analyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinatorwrapper mechanism to support the integration of heterogeneous information resources on the Internet, and the fault tolerant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing.  相似文献   

17.
Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form “the existence of item A implies the existence of item B.” However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in large-scale data mining applications with a large number of variables. In this paper, we consider the problem of determining casual relationships, instead of mere associations, when mining market basket data. We identify some problems with the direct application of Bayesian learning ideas to mining large databases, concerning both the scalability of algorithms and the appropriateness of the statistical techniques, and introduce some initial ideas for dealing with these problems. We present experimental results from applying our algorithms on several large, real-world data sets. The results indicate that the approach proposed here is both computationally feasible and successful in identifying interesting causal structures. An interesting outcome is that it is perhaps easier to infer the lack of causality than to infer causality, information that is useful in preventing erroneous decision making.  相似文献   

18.
An introduction to the approaches used to discretise continuous database features is given, together with a discussion of the potential benefits of such techniques. These benefits are investigated by applying discretisation algorithms to two large commercial databases; the discretisations yielded are then evaluated using a simulated annealing based data mining algorithm. The results produced suggest that dramatic reductions in problem size may be achieved, yielding improvements in the speed of the data mining algorithm. However, it is also demonstrated under certain circumstances that the discretisation produced may give an increase in problem size or allow overfitting by the data mining algorithm. Such cases, within which often only a small proportion of the database belongs to the class of interest, highlight the need both for caution when producing discretisations and for the development of more robust discretisation algorithms.  相似文献   

19.
Set-oriented data mining in relational databases   总被引:2,自引:0,他引:2  
Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.

In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases.  相似文献   


20.
Mining sequential patterns from large databases has been recognized by many researchers as an attractive task of data mining and knowledge discovery.Previous algorithms scan the databases for many times,which is often unendurable due to the very large amount of databases.In this paper,the authors introduce an effective algorithm for mining sequential patterns from large databases.In the algorithm,the original database is not used at all for counting the support of sequences after the first pass.Rather,a tidlist structure generated in the previous pass is employed for the purpose based on set intersection operations,avoiding the multiple scans of the databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号