首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 937 毫秒
An Overview of Data Mining and Knowledge Discovery   总被引:9,自引:0,他引:9       下载免费PDF全文
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research.  相似文献   

Data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. Data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users’ requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research.  相似文献   

Abstract: Although data mining and knowledge discovery techniques have recently been used to diagnose human disease, little research has been conducted on disease diagnostic modelling using human gene information. Furthermore, to our knowledge, no study has reported on diagnosis models using single nucleotide polymorphism (SNP) information. A disease diagnosis model using data mining techniques and SNP information should prove promising from a practical perspective as more information on human genes becomes available. Data mining and knowledge discovery techniques can be put to practical use detecting human disease, since a haplotype analysis using high-density SNP markers has gained great attention for evaluating human genes related to various human diseases. This paper explores how data mining and knowledge discovery can be applied to medical informatics using human gene information. As an example, we applied case-based reasoning to a cancer detection problem using human gene information and SNP analysis because case-based reasoning has been applied in medicine relatively less often than other data mining techniques. We propose a modified case-based reasoning method that is appropriate for associated categorical variables to use in detecting gastric cancer.  相似文献   

数据挖掘是一个崭新的计算机应用领域,而生物信息学是生物学与计算机科学以及应用数学等学科相互交叉而形成的一门新兴学科。综述了数据挖掘技术的内容、过程、方法和模式,介绍了生物信息学的内涵和新的应用技术,同时探索了数据挖掘技术对生物信息挖掘应用的途径。  相似文献   

空间数据挖掘就是从空间数据库中抽取隐含的知识、空间关系或空间数据库中存储的其他隐含模式的过程。空间数据挖掘在地理信息系统、空间遥感、测绘、资源和环境管理等很多领域都有广泛的应用前景。论文对空间数据挖掘的一些主要技术进行研究和探讨,介绍了基于这些技术设计实现的一个空间数据挖掘部件。  相似文献   

Ontologies are recognised as important tools, not only for effective and efficient information sharing, but also for information extraction and text mining. In the biomedical domain, the need for a common ontology for information sharing has long been recognised, and several ontologies are now widely used. However, there is confusion among researchers concerning the type of ontology that is needed for text mining , and how it can be used for effective knowledge management, sharing, and integration in biomedicine. We argue that there are several different ways to define an ontology and that, while the logical view is popular for some applications, it may be neither possible nor necessary for text mining. We propose a text-centered approach for knowledge sharing, as an alternative to formal ontologies. We argue that a thesaurus (i.e. an organised collection of terms enriched with relations) is more useful for text mining applications than formal ontologies.  相似文献   

用数据挖掘方法扩充知识库的研究及应用   总被引:1,自引:0,他引:1  
讨论了数据挖掘与机器学习对于扩充知识库的异同,分析了知识库、数据库与数据挖掘在知识系统中的关系.提出了一种基于XML的知识表示方法XKR(XML-based Knowledge Representatlon).XKR用XML作为统一的形式描述语言,把产生式、框架、语义网络、过程表示法等等多种传统的表示方法融合到一起,由于XML本身包含语义并能够无限扩充,所以XKR可以描述不同背景不同类型的知识,实现知识融合,通过应用实践发现XKR知识库有优点也有缺陷,文章指出了改进思路.  相似文献   

高维数据挖掘算法的研究与进展   总被引:1,自引:1,他引:1  
生物信息学和电子商务应用的迅速发展积累了大量高维数据,对高维数据的挖掘变得越来越重要,一般的数据挖掘方法在处理高维数据时会遇到维灾的问题,同时传统相似性度量在高维空间中也变得没有意义。文章从频繁项集挖掘、聚类、分类等三个方面对最新的高维数据挖掘算法的现状进行了综述,对这些算法如何解决高维数据挖掘存在的问题进行研究。  相似文献   

Abstract: The bioscience field has seen some spectacular advances in genomic and proteomic technologies that are able to deliver vast quantities of information on cellular activity. Such technologies are of critical importance to biology, medical science and in drug discovery. However, living systems are highly complex and to fully exploit these technologies requires knowledge at many different levels. Information such as genome sequence data, gene expression data, protein-to-protein interactions and metabolic pathways is required to understand the complexity of biological processes. The challenge for bioinformatics is to tackle the problem of fragmentation of knowledge by integrating the many sources of heterogeneous information into a coherent entity. Another problem is that the high level of biological complexity and the fragmented nature of biological research has meant that it is difficult to keep fully conversant with the latest research and discoveries. Progress in one area of biology may have implications for other areas but the dissemination of this knowledge is not straightforward; difficulties such as differences in naming conventions for genes and biological processes has led to confusion and the lack of productivity. This paper reviews the most recent research to overcome the fragmentation problem where technologies such as text mining and ontologies are used within the knowledge discovery process and the specific technical challenges they address.  相似文献   

数据挖掘是从数据库中发现潜在有用知识或者感兴趣模式的过程。在数据挖掘领域中主要集中于单一支持度下的关联规则挖掘,在事务数据库中发现项目之间的关联性,而在实际应用中,项目可以有不同的最小支持度,不同的项目可能具有不同的标准去判断其重要性,因此提出一个在最大值支持度约束下,发现有用的模糊关联规则挖掘算法,在该约束下,利用逐层搜索的迭代方法发现频繁项目集,通过实例证明了该挖掘算法是易于理解和有意义的,具有很好的效率。  相似文献   

基于数据挖掘的PACS智能辅助诊断模型研究   总被引:1,自引:0,他引:1  
随着PACS系统在我国各医院的普及,如何高效利用存储于PACS数据库中海量信息,从中获取有价值的和隐含的知识已经成为当前PACS应用的新热点。将数据挖掘技术与PACS系统相结合,对基于数据挖掘的智能辅助诊断模型进行研究,给出模型的构造框架,并对模型功效进行分析和评估。  相似文献   

Textual Data Mining to Support Science and Technology Management   总被引:10,自引:0,他引:10  
This paper surveys applications of data mining techniques to large text collections, and illustrates how those techniques can be used to support the management of science and technology research. Specific issues that arise repeatedly in the conduct of research management are described, and a textual data mining architecture that extends a classic paradigm for knowledge discovery in databases is introduced. That architecture integrates information retrieval from text collections, information extraction to obtain data from individual texts, data warehousing for the extracted data, data mining to discover useful patterns in the data, and visualization of the resulting patterns. At the core of this architecture is a broad view of data mining—the process of discovering patterns in large collections of data—and that step is described in some detail. The final section of the paper illustrates how these ideas can be applied in practice, drawing upon examples from the recently completed first phase of the textual data mining program at the Office of Naval Research. The paper concludes by identifying some research directions that offer significant potential for improving the utility of textual data mining for research management applications.  相似文献   

Association rule mining is an effective data mining technique which has been used widely in health informatics research right from its introduction. Since health informatics has received a lot of attention from researchers in last decade, and it has developed various sub-domains, so it is interesting as well as essential to review state of the art health informatics research. As knowledge discovery researchers and practitioners have applied an array of data mining techniques for knowledge extraction from health data, so the application of association rule mining techniques to health informatics domain has been focused and studied in detail in this survey. Through critical analysis of applications of association rule mining literature for health informatics from 2005 to 2014, it has been explored that, instead of the more efficient alternative approaches, the Apriori algorithm is still a widely used frequent itemset generation technique for application of association rule mining for health informatics. Moreover, other limitations related to applications of association rule mining for health informatics have also been identified and recommendations have been made to mitigate those limitations. Furthermore, the algorithms and tools utilized for application of association rule mining have also been identified, conclusions have been drawn from the literature surveyed, and future research directions have been presented.  相似文献   

Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a demanding field since huge amounts of spatial data have been collected in various applications, ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning. The collected data far exceeds people's ability to analyze it. Thus, new and efficient methods are needed to discover knowledge from large spatial databases. Most of the spatial data mining methods do not take into account the uncertainty of spatial information. In our work we use objects with broad boundaries, the concept that absorbs all the uncertainty by which spatial data is commonly affected and allows computations in the presence of uncertainty without rough simplifications of the reality. The topological relations between objects with a broad boundary can be organized into a three-level concept hierarchy. We developed and implemented a method for an efficient determination of such topological relations. Based on the hierarchy of topological relations we present a method for mining spatial association rules for objects with uncertainty. The progressive refinement approach is used for the optimization of the mining process.  相似文献   

关联规则(Association Rule)是数据挖掘领域中一个重要的研究课题,广泛应用于各个领域,既可以检验行业内长期形成的知识模式,也能够发现隐藏的新规律。本文利用关联规则对独立学院招生信息进行分析,建立独立学院招生决策模型。首先选取考生高考志愿表中的专业信息,然后利用关联规则进行挖掘,最后对挖掘出的规则进行分析和应用。实验结果表明,利用关联规则对考生高考志愿信息进行挖掘是可行的、有效的,它为独立学院编制招生计划和制定招生宣传方案提供了一种新的参考依据,在独立学院招生领域具有较好的应用前景。  相似文献   

结构化数据挖掘与复杂类型数据挖掘既有联系,又有区别。如何将这两者统一起来,建立一个统一的理论框架,以指导数据挖掘与知识发现研完,已经成为一个迫切需要解决的问题。本文提出了知识发现状态空间统一模型UMKDSS,将结构化数据挖掘与复杂类型数据挖掘联系起来,为复杂类型数据挖掘提供理论指导。文章最后给出了UMKDSS在Web文本挖掘中的应用实例。  相似文献   

视频挖掘技术综述   总被引:5,自引:0,他引:5       下载免费PDF全文
随着视频数据越来越容易获取和存储,视频数据的有效利用问题日益突出。视频数据挖掘近年来受到了国内外研究人员的极大关注。它旨在提取视频数据的语义信息并挖掘出隐含其中的有用模式和知识,从而实现智能视频应用,辅助人们决策。通过对国内外研究进展的跟踪分析,归纳了视频挖掘的概念,并对其实现方法和应用领域做了较为详细的总结和讨论,指出了视频挖掘技术研究所面临的挑战。  相似文献   

Market basket analysis is one of the typical applications in mining association rules. The valuable information discovered from data mining can be used to support decision making. Generally, support and confidence (objective) measures are used to evaluate the interestingness of association rules. However, in some cases, by using these two measures, the discovered rules may be not profitable and not actionable (not interesting) to enterprises. Therefore, how to discover the patterns by considering both objective measures (e.g. probability) and subjective measures (e.g. profit) is a challenge in data mining, particularly in marketing applications. This paper focuses on pattern evaluation in the process of knowledge discovery by using the concept of profit mining. Data Envelopment Analysis is utilized to calculate the efficiency of discovered association rules with multiple objective and subjective measures. After evaluating the efficiency of association rules, they are categorized into two classes, relatively efficient (interesting) and relatively inefficient (uninteresting). To classify these two classes, Decision Tree (DT)‐based classifier is built by using the attributes of association rules. The DT classifier can be used to find out the characteristics of interesting association rules, and to classify the unknown (new) association rules.  相似文献   

“事件”(event)是指在特定时空发生的对人类社会和自然界产生较为明显影响的事情。社会动乱、暴恐事件、传染病大流行等例子是给国家和社会安全带来严重威胁的“事件”。如果能够提前对这些事件的发生进行有效预测,将有助于做好应对准备,大大减少不必要的损失,因此事件预测技术在实际中具有重大社会应用价值,能够在社会安全、风险感知、传染病防控等方面发挥重要作用。对事件进行科学准确的预测曾经是一个十分具有挑战性的问题,近期大数据和数据挖掘的发展为事件预测技术带来了新的机遇。本文就以数据驱动的事件预测技术最新研究进展做一系统化的综述,介绍了事件预测的形式化建模与性能度量指标,对事件预测技术领域的最新研究成果进行了科学归类与总结,分为频繁模式挖掘、传统分类模型、时间序列预测、时序点过程、地理空间位置预测、事件图谱、无监督方法、多技术融合预测八大类方法,将每类方法做了系统地阐述,接着探讨了事件预测技术的主要应用领域,最后展望了这一技术未来面临的挑战和潜在的研究方向,以期进一步推动事件预测技术的发展与应用。  相似文献   

The development of mobile technologies has paved the way for new and various applications taking advantage of trajectory data resulting from moving objects activities in their associated ecosystems. Such data can be mainly handled either by real time applications or by oriented decision-making tools going from trajectory data warehouse technology to data mining classical advanced instruments. Indeed, applications dealing with moving objects encompass hidden significant knowledge that can be made visible through analytical and mining tools. This precious knowledge could not come properly in hands only if, the trajectory data problem modeling is global, precise, and concise. The aim of this paper is to investigate the appropriate literature on moving objects, trajectory data, and trajectory data warehouse modeling going from classical to ontological existing patterns. A comparison will be made between them, through which strong and limited contributions will be shown. This work aims to be valuable for researchers aiming to select and use modeling approaches in mobile objects ecosystems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号