首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Current inductive machine learning algorithms typically use greedy search with limited lookahead. This prevents them to detect significant conditional dependencies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, an extension of RELIEF developed by Kira and Rendell [10, 11], for heuristic guidance of inductive learning algorithms. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each selection step. The algorithm is tested on several artificial and several real world problems and the results are compared with some other well known machine learning algorithms. Excellent results on artificial data sets and two real world problems show the advantage of the presented approach to inductive learning.  相似文献   

2.
This paper presents results comparing three simple inductive learning systems using different representations for concepts, namely: CNF formulae, DNF formulae, and decision trees. The CNF learner performs surprisingly well. Results on five natural data sets indicates that it frequently trains faster and produces more accurate and simpler concepts.  相似文献   

3.
A class of decision diagrams for representation of the normal forms of Boolean functions was introduced. Consideration was given, in particular, to the disjunctive diagrams representing the disjunctive normal forms (DNF). In distinction to the binary decision diagrams (BDD) and reduced ordered binary decision diagram (ROBDD), the disjunctive diagram representing an arbitrary DNF is constructed in a time which is polynomial of the size of the DNF binary code. Corresponding algorithms were described, and the results were presented of the computer-aided experiments where the proposed diagrams were used to reduce the information content accumulated in the course of deciding hard variants of Boolean satisfiability problem (SAT).  相似文献   

4.
As we know, learning in real world is interactive, incremental and dynamical in multiple dimensions, where new data could be appeared at anytime from anywhere and of any type. Therefore, incremental learning is of more and more importance in real world data mining scenarios. Decision trees, due to their characteristics, have been widely used for incremental learning. In this paper, we propose a novel incremental decision tree algorithm based on rough set theory. To improve the computation efficiency of our algorithm, when a new instance arrives, according to the given decision tree adaptation strategies, the algorithm will only modify some existing leaf node in the currently active decision tree or add a new leaf node to the tree, which can avoid the high time complexity of the traditional incremental methods for rebuilding decision trees too many times. Moreover, the rough set based attribute reduction method is used to filter out the redundant attributes from the original set of attributes. And we adopt the two basic notions of rough sets: significance of attributes and dependency of attributes, as the heuristic information for the selection of splitting attributes. Finally, we apply the proposed algorithm to intrusion detection. The experimental results demonstrate that our algorithm can provide competitive solutions to incremental learning.  相似文献   

5.
Greedy approaches suffer from a restricted search space which could lead to suboptimal classifiers in terms of performance and classifier size. This study discusses exhaustive search as an alternative to greedy search for learning short and accurate decision rules. The Exhaustive Procedure for LOgic-Rule Extraction (EXPLORE) algorithm is presented, to induce decision rules in disjunctive normal form (DNF) in a systematic and efficient manner. We propose a method based on subsumption to reduce the number of values considered for instantiation in the literals, by taking into account the relational operator without loss of performance. Furthermore, we describe a branch-and-bound approach that makes optimal use of user-defined performance constraints. To improve the generalizability we use a validation set to determine the optimal length of the DNF rule. The performance and size of the DNF rules induced by EXPLORE are compared to those of eight well-known rule learners. Our results show that an exhaustive approach to rule learning in DNF results in significantly smaller classifiers than those of the other rule learners, while securing comparable or even better performance. Clearly, exhaustive search is computer-intensive and may not always be feasible. Nevertheless, based on this study, we believe that exhaustive search should be considered an alternative for greedy search in many problems.  相似文献   

6.
Boolean Feature Discovery in Empirical Learning   总被引:19,自引:7,他引:12  
  相似文献   

7.
We apply a DNA-based massively parallel exhaustive search to solving the computational learning problems of DNF (disjunctive normal form) Boolean formulae. Learning DNF formulae from examples is one of the most important open problems in computational learning theory and the problem of learning 3-term DNF formulae is known as intractable if RP NP. We propose new methods to encode any k-term DNF formula to a DNA strand, evaluate the encoded DNF formula for a truth-value assignment by using hybridization and primer extension with DNA polymerase, and find a consistent DNF formula with the given examples. By employing these methods, we show that the class of k-term DNF formulae (for any constant k) and the class of general DNF formulae are efficiently learnable on DNA computer.Second, in order for the DNA-based learning algorithm to be robust for errors in the data, we implement the weighted majority algorithm on DNA computers, called DNA-based majority algorithm via amplification (DNAMA), which take a strategy of ``amplifying' the consistent (correct) DNA strands. We show a theoretical analysis for the mistake bound of the DNA-based majority algorithm via amplification, and imply that the amplification to ``double the volumes' of the correct DNA strands in the test tube works well.  相似文献   

8.
9.
Recent work has looked at extending clustering algorithms with instance level must-link (ML) and cannot-link (CL) background information. Our work introduces δ and ε cluster level constraints that influence inter-cluster distances and cluster composition. The addition of background information, though useful at providing better clustering results, raises the important feasibility question: Given a collection of constraints and a set of data, does there exist at least one partition of the data set satisfying all the constraints? We study the complexity of the feasibility problem for each of the above constraints separately and also for combinations of constraints. Our results clearly delineate combinations of constraints for which the feasibility problem is computationally intractable (i.e., NP-complete) from those for which the problem is efficiently solvable (i.e., in the computational class P). We also consider the ML and CL constraints in conjunctive and disjunctive normal forms (CNF and DNF respectively). We show that for ML constraints, the feasibility problem is intractable for CNF but efficiently solvable for DNF. Unfortunately, for CL constraints, the feasibility problem is intractable for both CNF and DNF. This effectively means that CL-constraints in a non-trivial form cannot be efficiently incorporated into clustering algorithms. To overcome this, we introduce the notion of a choice-set of constraints and prove that the feasibility problem for choice-sets is efficiently solvable for both ML and CL constraints. We also present empirical results which indicate that the feasibility problem occurs extensively in real world problems.  相似文献   

10.
Cybernetics studies information process in the context of interaction with physical systems. Because such information is sometimes vague and exhibits complex interactions; it can only be discerned using approximate representations. Machine learning provides solutions that create approximate models of information and decision trees are one of its main components. However, decision trees are susceptible to information overload and can get overly complex when a large amount of data is inputted in them. Granulation of decision tree remedies this problem by providing the essential structure of the decision tree, which can decrease its utility. To evaluate the relationship that exists between granulation and decision tree complexity, data uncertainty and prediction accuracy, the deficiencies obtained by nursing homes during annual inspections were taken as a case study. Using rough sets, three forms of granulation were performed: (1) attribute grouping, (2) removing insignificant attributes and (3) removing uncertain records. Attribute grouping significantly reduces tree complexity without having any strong effect upon data consistency and accuracy. On the other hand, removing insignificant features decrease data consistency and tree complexity, while increasing the error in prediction. Finally, decrease in the uncertainty of the dataset results in an increase in accuracy and has no impact on tree complexity.  相似文献   

11.
While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, X-of-N. An X-of-N representation is a set containing one or more attribute-value pairs. For a given instance, the value of an X-of-N representation corresponds to the number of its attribute-value pairs that are true of the instance. A single X-of-N representation can directly and simply represent any concept that can be represented by a single conjunctive, a single disjunctive, or a single M-of-N representation commonly used for constructive induction, and the reverse is not true. In this paper, we describe a constructive decision tree learning algorithm, called XofN. When building decision trees, this algorithm creates one X-of-N representation, either as a nominal attribute or as a numeric attribute, at each decision node. The construction of X-of-N representations is carried out by greedily searching the space defined by all the attribute-value pairs of a domain. Experimental results reveal that constructing X-of-N attributes can significantly improve the performance of decision tree learning in both artificial and natural domains in terms of higher prediction accuracy and lower theory complexity. The results also show the performance advantages of constructing X-of-N attributes over constructing conjunctive, disjunctive, or M-of-N representations for decision tree learning.  相似文献   

12.
Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.  相似文献   

13.
Aizenstein  Howard  Pitt  Leonard 《Machine Learning》1995,19(3):183-208
We present two related results about the learnability of disjunctive normal form (DNF) formulas. First we show that a common approach for learning arbitrary DNF formulas requires exponential time. We then contrast this with a polynomial time algorithm for learning most (rather than all) DNF formulas. A natural approach for learning boolean functions involves greedily collecting the prime implicants of the hidden function. In a seminal paper of learning theory, Valiant demonstrated the efficacy of this approach for learning monotone DNF, and suggested this approach for learning DNF. Here we show that no algorithm using such an approach can learn DNF in polynomial time. We show this by constructing a counterexample DNF formula which would force such an algorithm to take exponential time. This counterexample seems to capture much of what makes DNF hard to learn, and thus is useful to consider when evaluating the run-time of a proposed DNF learning algorithm. This hardness result, as well as other hardness results for learning DNF, relies on the construction of particular hard-to-learn formulas, formulas that appear to be relatively rare. This raises the question of whether most DNF formulas are learnable. For certain natural definitions of most DNF formulas, we answer this question affirmatively.  相似文献   

14.
大数据智能决策   总被引:7,自引:2,他引:5  
在全球信息化快速发展的背景下, 大数据已经成为一种战略资源.各行各业的决策活动在频度、广度及复杂性上较以往有着本质的不同.决策过程中的不确定性因素增多, 决策分析的难度不断加大.传统的数据分析方法以及基于人工经验的决策已难以满足大数据时代的决策需求, 大数据驱动的智能决策将成为决策研究的主旋律.该文结合大数据特性, 对大数据决策的特点进行了归纳, 并从智能决策支持系统、不确定性处理、信息融合、关联分析和增量分析等方面综述了大数据智能决策的研究与发展现状, 讨论了大数据智能决策依然面临的挑战, 并对一些潜在的研究方向进行了展望分析.  相似文献   

15.
黄艳  任苗苗  魏玲 《计算机科学》2012,39(1):193-197
概念格是一种潜力极大的有效的知识发现工具,现已被广泛应用于计算机网络、数据挖掘等领域。针对现实生活中信息的不确定性,定义了区间值决策形式背景;通过讨论条件区间形式背景与决策区间形式背景概念格之间的关系,研究了区间值决策形式背景的协调性,进一步研究了属性值向量约简,使得原背景在属性及属性区间值两个方面得到简化。  相似文献   

16.
Air traffic controllers are responsible for the safe, expeditious and orderly flow of the air traffic. Their training relies heavily on the use of simulators that can represent various normal and emergency situations. Accurate classification of air traffic scenarios can provide assistance towards a better understanding of how controllers respond to the complexity of a traffic scenario. To this end, we conducted a field study using qualified air traffic controllers, who participated in simulator sessions of terminal radar approach control in a variety of scenarios. The aim of the study was twofold, firstly to explore how decision trees and classification rules can be used for realistic classification of air traffic scenarios and secondly to explore which factors reflect better operational complexity. We applied machine learning methods to the data and developed decision trees and classification rules for these scenarios. Results indicated that decision trees and classification rules are useful tools in accurately categorizing scenarios and that complexity requires a larger set of predictors beyond simple aircraft counts. The derived decision trees and classification rules performed well in prediction, stability and interpretability. Practical benefits can be derived in the areas of operations and system design in the context of air traffic flow and capacity management systems.  相似文献   

17.
Using AUC and accuracy in evaluating learning algorithms   总被引:14,自引:0,他引:14  
The area under the ROC (receiver operating characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. We establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.  相似文献   

18.
Machine Learning on the Basis of Formal Concept Analysis   总被引:12,自引:0,他引:12  
A model of machine learning from positive and negative examples (JSM-learning) is described in terms of Formal Concept Analysis (FCA). Graph-theoretical and lattice-theoretical interpretations of hypotheses and classifications resulting in the learning are proposed. Hypotheses and classifications are compared with other objects from domains of data analysis and artificial intelligence: implications in FCA, functional dependencies in the theory of relational data bases, abduction models, version spaces, and decision trees. Results about algorithmic complexity of various problems related to the generation of formal concepts, hypotheses, classifications, and implications.  相似文献   

19.
Today, organizations try to decline academically expenses using humans and resources in addition to rising managers and operators' satisfaction. Meantime, a very important step in the process of decision is the assignment of human resources, particularly in connection with research and development (R&D) projects in which the system is highly dependent on the capabilities of human resources. In this study, we tried all the assumptions that come true in the real world, considered a model for applied R&D projects to reduce costs and increase the efficiency of projects. Therefore, an integrated multiproject scheduling and multiskill human resource assignment model under uncertainty has developed for R&D projects. Furthermore, it is assumed that the activity processing time is related to human resources assignment that means the learning effect is considered. To demonstrate the proposed model efficiency, the various dimensions instance problem was solved accurately and efficiently in GAMS software, and the results have been reported. In addition, the proposed model is validated through the input parameter sensitivity analysis. The results indicate a suitable performance of the proposed fuzzy mathematical programming model is due to the complexity of the problem.  相似文献   

20.
We describe and discuss the properties of a binary neural network that can serve as a dynamic neural filter (DNF), which maps regions of input space into spatiotemporal sequences of neuronal activity. Both deterministic and stochastic dynamics are studied, allowing the investigation of the stability of spatiotemporal sequences under noisy conditions. We define a measure of the coding capacity of a DNF and develop an algorithm for constructing a DNF that can serve as a source of given codes. On the basis of this algorithm, we suggest using a minimal DNF capable of generating observed sequences as a measure of complexity of spatiotemporal data. This measure is applied to experimental observations in the locust olfactory system, whose reverberating local field potential provides a natural temporal scale allowing the use of a binary DNF. For random synaptic matrices, a DNF can generate very large cycles, thus becoming an efficient tool for producing spatiotemporal codes. The latter can be stabilized by applying to the parameters of the DNF a learning algorithm with suitable margins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号