首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
挖掘数据流界标窗口Top-K频繁项集   总被引:3,自引:0,他引:3  
数据流频繁项集挖掘是目前数据挖掘与知识发现领域的热点研究课题,在许多领域有重要应用.然而支持度阈值的设定需要一定的领域知识,设置不当会给后续的分析处理带来很多困难和不必要的负担,因此挖掘数据流top-K频繁项集有重要意义.提出一个挖掘数据流界标窗口top-K频繁项集的动态增量近似算法TOPSIL-Miner,为此设计了存储流数据摘要信息的概要结构TOPSIL-Tree以及动态记录挖掘相关信息的树层最大支持度表MaxSL、项目序表OIL,TOPSET 和最小支持度表MinSL等,并分析了与这些概要结构相关的挖掘特性.在此基础上研究算法的3种优化措施:1)剪枝当前数据流的平凡项集;2)挖掘过程中启发式自适应提升挖掘阈值;3)动态提升剪枝阈值.对算法的误差上界进行了分析研究.最后通过实验验证了算法的可行性、精确性和时空高效性.  相似文献   

2.
空间数据挖掘及其与智能系统的集成框架   总被引:4,自引:1,他引:4  
空间数据挖掘是指从空间数据库中抽取隐含的知识、空间关系和非显式地存储在空间数据库 中有意义的特征或模式.它在遥感、地理信息系统、医疗影像、信息融合系统等领域具有广 阔的应用前景,因此日渐受到关注和重视.本文从知识发现、认知科学与智能系统交叉结合的 角度,提出了基于数据库和知识库双库协同机制的空间数据挖掘模型,并系统地介绍了从空间 数据库中可发现的知识类型及挖掘方法,然后提出了基于空间数据挖掘的新型智能系统总体 框架和系统开发基本原则,最后探讨了空间数据挖掘的发展方向.  相似文献   

3.
Association rule mining is an effective data mining technique which has been used widely in health informatics research right from its introduction. Since health informatics has received a lot of attention from researchers in last decade, and it has developed various sub-domains, so it is interesting as well as essential to review state of the art health informatics research. As knowledge discovery researchers and practitioners have applied an array of data mining techniques for knowledge extraction from health data, so the application of association rule mining techniques to health informatics domain has been focused and studied in detail in this survey. Through critical analysis of applications of association rule mining literature for health informatics from 2005 to 2014, it has been explored that, instead of the more efficient alternative approaches, the Apriori algorithm is still a widely used frequent itemset generation technique for application of association rule mining for health informatics. Moreover, other limitations related to applications of association rule mining for health informatics have also been identified and recommendations have been made to mitigate those limitations. Furthermore, the algorithms and tools utilized for application of association rule mining have also been identified, conclusions have been drawn from the literature surveyed, and future research directions have been presented.  相似文献   

4.
Advanced Scout: Data Mining and Knowledge Discovery in NBA Data   总被引:1,自引:0,他引:1  
Advanced Scout is a PC-based data mining application used by National Basketball Association (NBA)coaching staffs to discover interesting patterns in basketball game data. We describe Advanced Scout software from the perspective of data mining and knowledge discovery. This paper highlights the pre-processing of raw data that the program performs, describes the data mining aspects of the software and how the interpretation of patterns supports the processof knowledge discovery. The underlying technique of attribute focusing asthe basis of the algorithm is also described. The process of pattern interpretation is facilitated by allowing the user to relate patterns to video tape.  相似文献   

5.
Abstract: Although data mining and knowledge discovery techniques have recently been used to diagnose human disease, little research has been conducted on disease diagnostic modelling using human gene information. Furthermore, to our knowledge, no study has reported on diagnosis models using single nucleotide polymorphism (SNP) information. A disease diagnosis model using data mining techniques and SNP information should prove promising from a practical perspective as more information on human genes becomes available. Data mining and knowledge discovery techniques can be put to practical use detecting human disease, since a haplotype analysis using high-density SNP markers has gained great attention for evaluating human genes related to various human diseases. This paper explores how data mining and knowledge discovery can be applied to medical informatics using human gene information. As an example, we applied case-based reasoning to a cancer detection problem using human gene information and SNP analysis because case-based reasoning has been applied in medicine relatively less often than other data mining techniques. We propose a modified case-based reasoning method that is appropriate for associated categorical variables to use in detecting gastric cancer.  相似文献   

6.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity.  相似文献   

7.
空间数据挖掘发展研究   总被引:8,自引:1,他引:8  
空间数据挖掘是指对空间数据库中非显式存在的知识、空间关系或其它有意义的模式等的提取,在地理信息系统、地理市场、遥感、图像数据勘测、医学图像处理、导航、交通控制、环境研究等各种领域有着广泛的应用。该文从空间数据挖掘的定义、过程、特征和任务等方面对空间数据挖掘技术进行了研究,并介绍了一个空间数据挖掘原型—GeoMiner和未来的研究方向。  相似文献   

8.
As geospatial data grows explosively, there is a great demand for the incorporation of data mining techniques into a geospatial context. Association rules mining is a core technique in data mining and is a solid candidate for the associative analysis of large geospatial databases. In this article, we propose a geospatial knowledge discovery framework for automating the detection of multivariate associations based on a given areal base map. We investigate a series of geospatial preprocessing steps involving data conversion and classification so that the traditional Boolean and quantitative association rules mining can be applied. Our framework has been integrated into GISs using a dynamic link library to allow the automation of both the preprocessing and data mining phases to provide greater ease of use for users. Experiments with real-crime datasets quickly reveal interesting frequent patterns and multivariate associations, which demonstrate the robustness and efficiency of our approach.  相似文献   

9.
提出了基于数据抽取器的知识发现模型。在模型中,将知识发现过程分成数据预处理、数据抽取、数据挖掘和结果分析四个阶段。该模型利用标准的SQL语言构造数据抽取器,为不同的学习算法准备数据,减少数据挖掘算法对数据库直接调用的次数,避免了直接对大型数据库的数据进行调用,使得对大型数据库进行快速数据挖掘成为可能。可以加快知识发现过程,提高数据挖掘效率,实现对于大型数据库的知识发现。最后设计了SQL-C4.5算法,该算法实现了利用数据抽取器为决策树算法C4.5抽取必要的统计数据,实现了C4.5决策树的构建。  相似文献   

10.
遥感图像多维量化关联规则挖掘   总被引:13,自引:0,他引:13  
数据与数据库的爆炸式增长引发了一个十分突出的问题,就是如何高效、智能地将海量的数据转化为有用的信息和知识?近年来,数据挖掘技术的广泛研究正是基于这个目的。初步研究了卫星遥感数据的关联规则挖掘及其在土壤侵蚀和退耕还林上的应用。根据多维空间数据的特点,将遥感数据的属性值划分为不同的块。同时为了充分利用现有的关联规则挖掘的算法,还将划分好的数据转变为事务数据库形式。最后,利用Apriori算法提取了土壤侵蚀强度与坡度、植被覆盖度以及坡耕地之间有意义的关联,为退耕还林还草决策提供有益的支持。  相似文献   

11.
数据挖掘是在数据中发现隐藏的结构和模式。但发现的许多模式对用卢来说可能是已知的,从而使这些模式毫无意义,毫无兴趣性。文献中多强调分类规则的准确性和可理解性,但发现兴趣规则在数据挖掘算法中依然是一个令人生畏的挑战。本文采用一种遗传数据挖掘方法,在分类规则产生的同时对其兴趣性进行度量,直接产生兴趣规则。实验表明该方法是可行的、高效的。  相似文献   

12.
粗集理论能支持数据挖掘与知识发现的多个步骤,如数据预处理、数据简化、规则生成、数据依赖关系获取等,为数据挖掘与知识发现提供了新的思路和方法。本文将粗集理论引入空间数据挖掘领域,介绍了粗集理论的基础理论和一系列方法,给出了应用实例,并探讨粗集理论在空间数据挖掘中的应用。  相似文献   

13.
关联挖掘中的时效度研究   总被引:1,自引:0,他引:1  
传统的关联挖掘算法,以支持度和置信度作为评价标准来衡量规则是否有价值。然而,这种模式不能体现出数据的时效敏感特性,如Web数据和长期积累数据。文中将首次建立一个全新的时基模型来重新估计数据规则的价值,并给出时效度(time validity)作为新的规则价值衡量标准。最后,给出了基于这个新的时基模型的一种新并行算法。这种算法使得我们在挖掘过程中使用增量挖掘,而且使得用户可以通过互操作来优化挖掘过程。  相似文献   

14.
This paper describes our work on developing a language-independent technique for discovery of implicit knowledge from multilingual information sources. Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus only on processing monolingual documents (particularly English documents): little attention has been paid to apply the techniques to handle the documents in Asian languages, and further extend the mining algorithms to support the aspects of multilingual information sources. In this work, we attempt to develop a language-neutral method to tackle the linguistics difficulties in the text mining process. Using a variation of automatic clustering techniques, which apply a neural net approach, namely the Self-Organizing Maps (SOM), we have conducted several experiments to uncover associated documents based on a Chinese corpus, Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. The experiments show some interesting results and a couple of potential paths for future work in the field of multilingual information discovery. Besides, this work is expected to act as a starting point for exploring the impacts on linguistics issues with the machine-learning approach to mining sensible linguistics elements from multilingual text collections.  相似文献   

15.
MineSet aids knowledge discovery and supports decision making based on relational data. It uses visualization and data mining to arrive at interesting results. Providing diverse visualization tools lets users choose the most appropriate method for a given problem. The client-server architecture performs most of the computationally intensive tasks on a server, while the processed results return to the client for visualization. The paper discusses MineSet database visualization and data mining visualization  相似文献   

16.
张允  姚军 《微计算机信息》2007,23(34):260-262
知识发现是数字油藏的重要内容,也是建设数字油藏的主要目的之一。针对油气田开发的需要和油藏数据体的特点,本文综合利用数据清洗、数据挖掘、知识评估、知识解释、可视化等技术,提出了在数字油藏中进行知识发现的一种新思路,并用实例分析说明其实现方法,即以决策树技术分析油气田开发中采收率的影响因素为倒,通过连续属性值的离散化处理和决策树的构建、剪枝以及知识评估和解释,从而达到准确、快速地挖掘出油藏数据库、油藏数据仓库和其它油藏数据体中大量有意义的规则、模式等知识。  相似文献   

17.
Geo-spatial data mining in the analysis of a demographic database   总被引:2,自引:0,他引:2  
Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting patterns not explicitly stored in spatial databases. The approaches usually followed in the analysis of geo-spatial data with the aim of knowledge discovery are essentially characterised by the development of new algorithms, which treat the position and extension of objects mainly through the manipulation of their co-ordinates. In this paper a new approach to this process is presented, where geographic identifiers give the positional aspects of geographic data. These identifiers are manipulated using qualitative reasoning principles, which allow for the inference of new spatial relations required for the data mining step of the knowledge discovery process. The analysis of a demographic database, with the proposed principles, enabled the discovery of patterns that are hidden in the explored geo-spatial and demographic data.Acknowledgements Our acknowledgment to NEPS (Núcleo de Estudos da População e Sociedade) of University of Minho, for making the demographic data available.  相似文献   

18.
CLARANS: a method for clustering objects for spatial data mining   总被引:14,自引:0,他引:14  
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, it proposes a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that, when compared with existing clustering methods, CLARANS is very efficient and effective. Second, the paper investigates how CLARANS can handle not only point objects, but also polygon objects efficiently. One of the methods considered, called the IR-approximation, is very efficient in clustering convex and nonconvex polygon objects. Third, building on top of CLARANS, the paper develops two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes. Both algorithms can discover knowledge that is difficult to find with existing spatial data mining algorithms.  相似文献   

19.
An Overview of Data Mining and Knowledge Discovery   总被引:9,自引:0,他引:9       下载免费PDF全文
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research.  相似文献   

20.
Frequent itemset mining over data streams becomes a hot topic in data mining and knowledge discovery in recent years, and has been applied to different areas. However, the setting of a minimum support threshold needs some domain knowledge. It will bring a lot of difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K frequent itemsets over data streams. In this paper, a dynamical incremental approximate algorithm TOPSIL-Miner is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain information about mining results. Moreover, three optimal strategies are exploited to reduce time and space cost of the algorithm: (1) pruning trivial nodes in the current data stream, (2) promoting mining support threshold during mining process adaptively and heuristically, and (3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness and the high efficiency and precision of the algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号