首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
一种基于决策表的分类规则挖掘新算法   总被引:2,自引:0,他引:2  
The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car‘s classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed.  相似文献   

2.
An Overview of Data Mining and Knowledge Discovery   总被引:9,自引:0,他引:9       下载免费PDF全文
With massive amounts of data stored in databases,mining information and knowledge in databases has become an important issue in recent research.Researchers in many different fields have shown great interest in date mining and knowledge discovery in databases.Several emerging applications in information providing services,such as data warehousing and on-line services over the Internet,also call for various data mining and knowledge discovery tchniques to understand used behavior better,to improve the service provided,and to increase the business opportunities.In response to such a demand,this article is to provide a comprehensive survey on the data mining and knowledge discorvery techniques developed recently,and introduce some real application systems as well.In conclusion,this article also lists some problems and challenges for further research.  相似文献   

3.
An encoding method has a direct effect on the quality and the representation of the discovered knowledge in data mining systems. Biological macromolecules are encoded by strings of characters, called primary structures. Knowing that data mining systems usually use relational tables to encode data, we have then to reencode these strings and transform them into relational tables. In this paper, we do a comparative study of the existing static encoding methods, that are based on the Biologist know-how, and our new dynamic encoding one, that is based on the construction of Discriminant and Minimal Substrings (DMS). Different classification methods are used to do this study. The experimental results show that our dynamic encoding method is more efficient than the static ones, to encode biological macromolecules within a data mining perspective.  相似文献   

4.
Semistructued data are specified in lack of any fixed and rigid schema,even though typically some implicit structure appears in the data.The huge amounts of on-line applications make it important and imperative to mine the schema of semistructured data ,both for the users(e.g.,to gather useful information and facilitate querying)and for the systems (e.g.,to optimize access).The critical problem is to discover the hidden structure in the semistructured data.Current methods in extracting Web data structure are either in a general way independent of application background,or bound in some concrete environment such as HTML,XML etc.But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances of Web data.In this paper,the problem of incremental mining of schema for semistructured data after the update of the raw data is discusses.An algorithm for incrementally mining the schema of semistructured data is provided,and some experimental results are also given,which show that incremental mining for semistructured data is more efficient than non-incremental mining.  相似文献   

5.
Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.  相似文献   

6.
Web使用挖掘系统研制中的主要问题和应对策略   总被引:6,自引:0,他引:6  
张锋  常会友 《计算机科学》2003,30(6):129-132
With the rapid development of WWW,Web Usage Mining,as well as Web Mining,has become a hot direction in academic and industrial circles.It is generally believed that there are three tasks,preprocessing,knowledge discovery and pattern analysis,in Web Usage Mining.Though Web Usage Mining is still ranged in the application of traditional data mining techniques,in view of changes in application environment and operated data concerned,some new difficulties have arisen accordingly.This paper takes efforts to address such challenges in the three phases and introduces some proposed solutions simultaneously.  相似文献   

7.
Ji Rong  Li 《通讯和计算机》2013,(5):720-723
Optimal fuzzy-valued feature subset selection is a technique for fuzzy-valued feature subset selection. By viewing the imprecise feature values as fuzzy sets, the information it contains would not be lost compared with the traditional methods. The performance of classification depends directly on the quality of training corpus. In practical applications, noise examples are unavoidable in the training corpus and thus influence the effect of the classification approach. This paper presents an algorithm for eliminating the class noise based on the analysis of the representative class information of the examples. The representative class information can be acquired by mining the most classification ambiguity of feature values. The proposed algorithm is applied to fuzzy decision tree induction. The experimental results show that the algorithm can effectively reduce the introduction of noise examples and raise the accuracy of classification on the data sets with a high noise ratio.  相似文献   

8.
Mining frequent patterns from datasets is one of the key success of data mining research. Currently,most of the studies focus on the data sets in which the elements are independent, such as the items in the marketing basket. However, the objects in the real world often have close relationship with each other. How to extract frequent patterns from these relations is the objective of this paper. The authors use graphs to model the relations, and select a simple type for analysis. Combining the graph theory and algorithms to generate frequent patterns, a new algorithm called Topology, which can mine these graphs efficiently, has been proposed.The performance of the algorithm is evaluated by doing experiments with synthetic datasets and real data. The experimental results show that Topology can do the job well. At the end of this paper, the potential improvement is mentioned.  相似文献   

9.
时态数据挖掘研究进展   总被引:12,自引:0,他引:12  
Temporal data mining is one of the important braches of data mining.In this paper with the present documents first we systematically classify the present research on temporal data mining.Next,we give our generalizations and analyses to the main braches.Finally problems of the current research of temporal data mining are pointed out and solutions are prposed.  相似文献   

10.
Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.  相似文献   

11.
随着数据挖掘技术的日趋成熟,其在生活中的作用也越来越重要。本文首先介绍了数据挖掘,聚类分析和分类分析的相关知识,然后将层次聚类应用到分类规则挖掘中。  相似文献   

12.
遗传算法是数据挖掘中一种重要的分类挖掘算法,但简单的遗传算法具有很大的随机性,出错率较高,难以满足数据挖掘的需要。为此,提出一种基于遗传算法和Apriori的分类挖掘算法——GAA。从编码设计、适应度函数、遗传算子的设计方面进行讨论和分析,结合一个具体实例进行应用,结果表明算法在代数较少情况下,可有效提高分类的准确性,具有一定的应用价值。  相似文献   

13.
通过分类挖掘技术促进中职教学的发展。首先讨论了中职教学的现状及其特点,然后介绍了分类挖掘的概念及相关技术。以及描述了决策树分类挖掘算法。并通过实例说明了怎样利用分类挖掘工具进行信息挖掘。最后运用分类挖掘技术找出数据库中有效信息,帮助教师全面了解学生,从而针对学生的各项特征作出教学策略的调整,以达到提高教学水平的目的。  相似文献   

14.
迄今为止,数据挖掘与知识发现软件的功能不再停留在"挖掘"这个单一功能的实现,而已延伸到数据挖掘与知识发现的过程,即包括数据的预处理、数据挖掘、模型评估与可视化,在单纯的模型可视化基础上扩充了数据可视化与数据挖掘过程可视化.主要讨论了数据挖掘的方法与可视化技术,指出了未来的研究方向.  相似文献   

15.
数据挖掘是致力于数据分析和理解、揭示数据内部蕴藏知识的技术。由于数据库中存在着大量数据,因此从数据库中发现有用的信息显得十分重要。对数据挖掘技术的研究,国内外己经取得了许多令人瞩日的成就,并成功地应用到了许多领域,但在教育领域中的应用并不广泛。探索在高校教学中数据挖掘分类技术的应用,提出数据挖掘技术在高校教学应用中的实施方案,并以高校教学中学生成绩的分析为例介绍方案的实施过程。  相似文献   

16.
决策树方法在煤炭物流客户分析中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
目前物流企业中积累了大量的客户历史数据,为了有效利用这些数据,使用数据挖掘方法对客户进行分类管理和服务是CRM中非常重要的一方面。而决策树是进行分类分析与数据挖掘的常用方法。研究了运用C4.5算法对煤炭物流客户信息构造决策树,并把提取到的规则应用到公路煤炭物流公司的客户关系管理中,结果证明具有较好的应用价值。  相似文献   

17.
邓正宏  张阳 《计算机工程与设计》2007,28(6):1292-1293,1323
分类分析是数据挖掘技术中的关键技术,但传统的分类分析算法对入侵检测数据进行处理有许多不足之处.使用DRC-BK算法分类入侵数据,一方面可以取得良好的分类准确度,另一方面分类规则可以被人类专家理解,从而有助于制定入侵预防和防止的措施,非常适合入侵数据的二次挖掘.  相似文献   

18.
数据挖掘是近年来随着数据库技术的成熟和计算机存储技术的新发展而出现的一门新兴学科。本文讨论了数据挖掘的有关概念、它在社会中的重要作用以及现阶段数据挖掘的主要功能及实现技术。重点分析与研究了数据挖掘中的关联规则、分类和聚类等核心功能及实现技术。  相似文献   

19.
目前数据挖掘算法的评价   总被引:13,自引:2,他引:11  
首先讨论了数据挖掘算法的评价标准问题,然后运用数据封装分析的方法评价了目前的分类算法,基于实验结果,对目前的关联规则挖掘算法进行了评价。  相似文献   

20.
利用SAS数据挖掘软件和SQL Server2000构成的数据挖掘模型对读者数据进行聚类分析,通过TREE过程将聚类结果以树状图的形式展示出来,得到读者群体的分类并依据分类为读者提供个性化服务。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号