首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The number, variety and complexity of projects involving data mining or knowledge discovery in databases activities have increased just lately at such a pace that aspects related to their development process need to be standardized for results to be integrated, reused and interchanged in the future. Data mining projects are quickly becoming engineering projects, and current standard processes, like CRISP-DM, need to be revisited to incorporate this engineering viewpoint. This is the central motivation of this paper that makes the point that experience gained about the software development process over almost 40 years could be reused and integrated to improve data mining processes. Consequently, this paper proposes to reuse ideas and concepts underlying the IEEE Std 1074 and ISO 12207 software engineering model processes to redefine and add to the CRISP-DM process and make it a data mining engineering standard.  相似文献   

3.
Enterprise Resource Planning systems tend to deploy Supply Chain Management and/or Customer Relationship Management techniques, in order to successfully fuse information to customers, suppliers, manufacturers and warehouses, and therefore minimize system-wide costs while satisfying service level requirements. Although efficient, these systems are neither versatile nor adaptive, since newly discovered customer trends cannot be easily integrated with existing knowledge. Advancing on the way the above mentioned techniques apply on ERP systems, we have developed a multi-agent system that introduces adaptive intelligence as a powerful add-on for ERP software customization. The system can be thought of as a recommendation engine, which takes advantage of knowledge gained through the use of data mining techniques, and incorporates it into the resulting company selling policy. The intelligent agents of the system can be periodically retrained as new information is added to the ERP. In this paper, we present the architecture and development details of the system, and demonstrate its application on a real test case.  相似文献   

4.
本文探讨数据挖掘技术在中油集团新疆培训中心的应用。现有培训管理信息系统的数据库积累了大量历史数据,在此基础上使用数据挖掘技术,应用微软SQLServer2005的数据挖掘集成环境,以Microsoft时序算法为例,建立数据挖掘模型,进行数据挖掘,预测各承办部门的培训能力,实现为管理人员合理配置培训资源的决策提供有用信息,最后总结了在开发过程遇到的问题及解决办法。  相似文献   

5.
Business intelligence based on data mining has been one of the popular and indispensable tools for identifying business opportunity in sales and marketing of new products. The traditional data mining methods based on association rules may be inadequate in completely uncovering the hidden patterns of sales based on transaction records. This paper presents a qualitative correlation coefficient mining method which is capable of uncovering hidden patterns of sales and market. Hence, a prototype business intelligence system (BIS) named correlation coefficient sales data mining system (CCSDMS) has been developed and successfully trial implemented in a selected reference site. A series of experiments have been conducted to evaluate the performance of the proposed system. The results generated by the BIS are compared with a well known market available data mining system. The proposed quantitative correlation coefficient mining method is found to possess higher accuracy, better computational effectiveness and higher predictive power. With the new approach, associations for product relations and customer periodic demands are revealed and this can help to leverage organizational marketing capital to enhance quality and speed of promotions as well as awareness of product relations.  相似文献   

6.
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHSRHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.  相似文献   

7.
8.
基于代理的分布式数据挖掘系统设计   总被引:14,自引:1,他引:13  
陈刚 《计算机工程》2001,27(9):65-67,192
提出了一在于代理的分布式数据挖掘系统,用来实现大容量的数据在分布式存放情况下的数据挖掘,因为本系统只传送数据挖掘的中间结果,所以大大减少了网络的数据传输量,并通过一个应用实例来加以说明。  相似文献   

9.
Abstract: Although data mining and knowledge discovery techniques have recently been used to diagnose human disease, little research has been conducted on disease diagnostic modelling using human gene information. Furthermore, to our knowledge, no study has reported on diagnosis models using single nucleotide polymorphism (SNP) information. A disease diagnosis model using data mining techniques and SNP information should prove promising from a practical perspective as more information on human genes becomes available. Data mining and knowledge discovery techniques can be put to practical use detecting human disease, since a haplotype analysis using high-density SNP markers has gained great attention for evaluating human genes related to various human diseases. This paper explores how data mining and knowledge discovery can be applied to medical informatics using human gene information. As an example, we applied case-based reasoning to a cancer detection problem using human gene information and SNP analysis because case-based reasoning has been applied in medicine relatively less often than other data mining techniques. We propose a modified case-based reasoning method that is appropriate for associated categorical variables to use in detecting gastric cancer.  相似文献   

10.
全文对数据挖掘在网络教育应用进行了研究探讨,提高网络教学质量与学习效率.并提出了基于OLAP技术的数据挖掘应用和基于代理(Agent)的系统架构模型,搭建一套用于网络教学的辅助管理系统.  相似文献   

11.
全文对数据挖掘在网络教育应用进行了研究探讨,提高网络教学质量与学习效率。并提出了基于OLAP技术的数据挖掘应用和基于代理(Agent)的系统架构模型,搭建一套用于网络教学的辅助管理系统。  相似文献   

12.
Although it is widely accepted that research from data mining, knowledge discovery, and data warehousing should be synthesized, little research addresses the integration of existing data management and analysis software. We develop an intelligent middleware that facilitates linear correlation discovery, the discovery of associations between attributes and attribute groups. This middleware integrates data management and data analysis tools to improve traditional data analysis in three perspectives: (1) identify appropriate linear correlation functions to perform based on the semantics of a data set; (2) execute appropriate functions contained in the data analysis packages; and (3) derive useful knowledge from data analysis.  相似文献   

13.
Data mining corrosion from eddy current non-destructive tests   总被引:1,自引:0,他引:1  
Quicker, more effective methods of corrosion prediction and classification can help to ensure a safe and operational transportation system for both civilian and military sectors. This is especially critical now as transportation providers attempt to meet the increased expense of repairing aging aircraft with smaller budgets. These budget constraints make it imperative to find corrosion and to correctly determine the appropriate time to replace corroded parts. If the part is replaced too soon, the result is wasted resources. However, if the part is not replaced soon enough, it could cause a catastrophic accident. The discovery of models that limit the possibility of a costly accident while optimizing resource utilization would allow transportation providers to efficiently focus their maintenance efforts. While our concern in this study was with aircraft, the results will also be useful to other transportation providers. This paper describes the discovery and comparison of empirical models to predict corrosion damage from non-destructive test (NDT) data. The NDT data were derived from eddy current (EC) scans of the United States Air Force's (USAF) KC-135 aircraft. While we might suspect a link between NDT results and corrosion, up until now this link has not been formally established. Instead, the NDT data have been converted into false color images that are analyzed visually by maintenance operators. The models we discovered are quite complex and suggest that with the appropriate data mining approaches we can sometimes more effectively handle noisy data through more complex models rather than simpler ones. Our results also show that while a variety of modeling techniques can predict corrosion with reasonable accuracy, regression trees are particularly effective in modeling the complex relationships between the EC measurements and the actual amount of corrosion.  相似文献   

14.
基于多Agent的分布式数据挖掘模型   总被引:12,自引:1,他引:12  
论文分析了分布式数据挖掘的优势和所面临的问题,讨论了Agent对分布式数据挖掘性能的增强。又进一步提出了一个基于Agent的分布式数据挖掘形式模型,并结合数据挖掘方法和知识集成技术对该模型进行了深入的分析和讨论。  相似文献   

15.
A method for analyzing production systems by applying multi-objective optimization and data mining techniques on discrete-event simulation models, the so-called Simulation-based Innovization (SBI) is presented in this paper. The aim of the SBI analysis is to reveal insight on the parameters that affect the performance measures as well as to gain deeper understanding of the problem, through post-optimality analysis of the solutions acquired from multi-objective optimization. This paper provides empirical results from an industrial case study, carried out on an automotive machining line, in order to explain the SBI procedure. The SBI method has been found to be particularly suitable in this case study as the three objectives under study, namely total tardiness, makespan and average work-in-process, are in conflict with each other. Depending on the system load of the line, different decision variables have been found to be influencing. How the SBI method is used to find important patterns in the explored solution set and how it can be valuable to support decision making in order to improve the scheduling under different system loadings in the machining line are addressed.  相似文献   

16.
数据仓库环境下以用户为中心的数据清洗过程模型   总被引:8,自引:1,他引:7  
数据清洗是数据仓库和数据挖掘中非常重要的一个环节。本文首先分析总结了数据清洗的有关概念,给出了数据清洗中需要解决的质量问题,并总结了解决这些问题的技术和方法。在此基础上提出了以人为中心的数据清洗过程模型。该模型集成了工作流技术、数据集成、数据转换和数据挖掘技术。给出了每个工具箱应该提供的基本功能。  相似文献   

17.
The insurance industry of Hong Kong has been experiencing steady growth in the last decade. One of the current problems in the industry is that, in general, insurance agent turnover is high. The selection of new agents is treated as a regular recruitment exercise. This study focuses on the characteristics of data warehousing and the appropriate data mining techniques that can be used to support agent selection in the insurance industry. We examine the application of three popular data mining methods – discriminant analysis, decision trees and artificial neural networks – incorporated with a data warehouse to the prediction of the length of service, sales premiums and persistence indices of insurance agents. An intelligent decision support system, namely Intelligent Agent Selection Assistant for Insurance, is presented, which will help insurance managers to select quality agents by using data mining in a data warehouse environment.  相似文献   

18.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity.  相似文献   

19.
In this paper, we describe two distributed, data intensive applications that were demonstrated at iGrid 2005 (iGrid Demonstration US109 and iGrid Demonstration US121). One involves transporting astronomical data from the Sloan Digital Sky Survey (SDSS) and the other involves computing histograms from multiple high-volume data streams. Both rely on newly developed data transport and data mining middleware. Specifically, we describe a new version of the UDT network protocol called Composible-UDT, a file transfer utility based upon UDT called UDT-Gateway, and an application for building histograms on high-volume data flows called BESH (for Best Effort Streaming Histogram). For both demonstrations, we include a summary of the experimental studies performed at iGrid 2005.  相似文献   

20.
A hybrid case adaptation approach for case-based reasoning   总被引:1,自引:1,他引:0  
Case-Based Reasoning is a methodology for problem solving based on past experiences. This methodology tries to solve a new problem by retrieving and adapting previously known solutions of similar problems. However, retrieved solutions, in general, require adaptations in order to be applied to new contexts. One of the major challenges in Case-Based Reasoning is the development of an efficient methodology for case adaptation. The most widely used form of adaptation employs hand coded adaptation rules, which demands a significant knowledge acquisition and engineering effort. An alternative to overcome the difficulties associated with the acquisition of knowledge for case adaptation has been the use of hybrid approaches and automatic learning algorithms for the acquisition of the knowledge used for the adaptation. We investigate the use of hybrid approaches for case adaptation employing Machine Learning algorithms. The approaches investigated how to automatically learn adaptation knowledge from a case base and apply it to adapt retrieved solutions. In order to verify the potential of the proposed approaches, they are experimentally compared with individual Machine Learning techniques. The results obtained indicate the potential of these approaches as an efficient approach for acquiring case adaptation knowledge. They show that the combination of Instance-Based Learning and Inductive Learning paradigms and the use of a data set of adaptation patterns yield adaptations of the retrieved solutions with high predictive accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号