首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Extensive research has been performed for developing knowledge based intelligent monitoring systems for improving the reliability of manufacturing processes. Due to the high expense of obtaining knowledge from human experts, it is expected to develop new techniques to obtain the knowledge automatically from the collected data using data mining techniques. Inductive learning has become one of the widely used data mining methods for generating decision rules from data. In order to deal with the noise or uncertainties existing in the data collected in industrial processes and systems, this paper presents a new method using fuzzy logic techniques to improve the performance of the classical inductive learning approach. The proposed approach, in contrast to classical inductive learning method using hard cut point to discretize the continuous-valued attributes, uses soft discretization to enable the systems have less sensitivity to the uncertainties and noise. The effectiveness of the proposed approach has been illustrated in an application of monitoring the machining conditions in uncertain environment. Experimental results show that this new fuzzy inductive learning method gives improved accuracy compared with using classical inductive learning techniques.  相似文献   

2.
This paper aims at designing better performing feature-projection based classification algorithms and presents two new such algorithms. These algorithms are batch supervised learning algorithms and represent induced classification knowledge as feature intervals. In both algorithms, each feature participates in the classification by giving real-valued votes to classes. The prediction for an unseen example is the class receiving the highest vote. The first algorithm, OFP.MC, learns on each feature pairwise disjoint intervals which minimize feature classification error. The second algorithm, GFP.MC, constructs feature intervals by greedily improving the feature classification error. The new algorithms are empirically evaluated on twenty datasets from the UCI repository and are compared with the existing feature-projection based classification algorithms (FIL.IF, VFI5, CFP, k-NNFP, and NBC). The experiments demonstrate that the OFP.MC algorithm outperforms other feature-projection based classification algorithms. The GFP.MC algorithm is slightly inferior to the OFP.MC algorithm, but, if it is used for datasets with large number of instances, then it reduces the space requirement of the OFP.MC algorithm. The new algorithms are insensitive to boundary noise unlike the other feature-projection based classification algorithms considered here.  相似文献   

3.
模糊决策树算法与清晰决策树算法的比较研究   总被引:10,自引:2,他引:10  
ID3算法是一种典型的决策树归纳算法,这种算法在假定示例的属性值和分类值是确定的前提下,使用信息熵作为启发式建立一棵清晰的决策树。针对现实世界中存在的不确定性,人们提出了另一种决策树归纳算法,即模糊决策树算法,它是清晰决策树算法的一种推广。这两种算法在实际应用中各有自己的优劣之处,针对一个具体问题的知识获取过程,选取哪一种算法目前还没有一个较明确的依据。该文从5个方面对这两种算法进行了详细的比较,指出了属性为连续值时这两种算法的异同及优缺点,其目的是在为解决具体问题时怎样选择这两种算法提供一些有用的线索。  相似文献   

4.
The basic nearest-neighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This paper proposes methods for overcoming each of these weaknesses and combines the methods into a comprehensive learning system called the Integrated Decremental Instance-Based Learning Algorithm (IDIBL) that seeks to reduce storage, improve execution speed, and increase generalization accuracy, when compared to the basic nearest neighbor algorithm and other learning models. IDIBL tunes its own parameters using a new measure of fitness that combines confidence and cross-validation accuracy in order to avoid discretization problems with more traditional leave-one-out cross-validation. In our experiments IDIBL achieves higher generalization accuracy than other less comprehensive instance-based learning algorithms, while requiring less than one-fourth the storage of the nearest neighbor algorithm and improving execution speed by a corresponding factor. In experiments on twenty-one data sets, IDIBL also achieves higher generalization accuracy than that reported for sixteen major machine learning and neural network models.  相似文献   

5.
Machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models. Therefore, effectively dealing with noise is a key aspect in supervised learning to obtain reliable models from data. Although several authors have studied the effect of noise for some particular learners, comparisons of its effect among different learners are lacking. In this paper, we address this issue by systematically comparing how different degrees of noise affect four supervised learners that belong to different paradigms. Specifically, we consider the Naïve Bayes probabilistic classifier, the C4.5 decision tree, the IBk instance-based learner and the SMO support vector machine. We have selected four methods which enable us to contrast different learning paradigms, and which are considered to be four of the top ten algorithms in data mining (Yu et al. 2007). We test them on a collection of data sets that are perturbed with noise in the input attributes and noise in the output class. As an initial hypothesis, we assign the techniques to two groups, NB with C4.5 and IBk with SMO, based on their proposed sensitivity to noise, the first group being the least sensitive. The analysis enables us to extract key observations about the effect of different types and degrees of noise on these learning techniques. In general, we find that Naïve Bayes appears as the most robust algorithm, and SMO the least, relative to the other two techniques. However, we find that the underlying empirical behavior of the techniques is more complex, and varies depending on the noise type and the specific data set being processed. In general, noise in the training data set is found to give the most difficulty to the learners.  相似文献   

6.
It is well-known that heuristic search in ILP is prone to plateau phenomena. An explanation can be given after the work of Giordana and Saitta: the ILP covering test is NP-complete and therefore exhibits a sharp phase transition in its coverage probability. As the heuristic value of a hypothesis depends on the number of covered examples, the regions “yes” and “no” represent plateaus that need to be crossed during search without an informative heuristic value. Several subsequent works have extensively studied this finding by running several learning algorithms on a large set of artificially generated problems and argued that the occurrence of this phase transition dooms every learning algorithm to fail to identify the target concept. We note however that only generate-and-test learning algorithms have been applied and that this conclusion has to be qualified in the case of data-driven learning algorithms. Mostly building on the pioneering work of Winston on near-miss examples, we show that, on the same set of problems, a top-down data-driven strategy can cross any plateau if near-misses are supplied in the training set, whereas they do not change the plateau profile and do not guide a generate-and-test strategy. We conclude that the location of the target concept with respect to the phase transition alone is not a reliable indication of the learning problem difficulty as previously thought. Editors: Stephen Muggleton, Ramon Otero, Simon Colton.  相似文献   

7.
启发式知识获取方法研究   总被引:3,自引:0,他引:3  
归纳学习是解决知识自动获取的有效方法,针对ID3算法、基于粗集的归纳学习以及其它一些归纳学习方法存在的问题,提出了一种新的归纳学习算法ITIL。此算法用信息增益为启发式,选择尽量少的重要属性或组合,以可分辨性为依据提取规则,许多实例表明,这些规则不仅简单,而且冗余小,作为知识获取模块的一部分,ITIL已被集成到一个“基于知识发现的医疗诊断辅助系统”动态知识库子系统中。  相似文献   

8.

This paper describes two new suboptimal mask search algorithms for Fuzzy inductive reasoning (FIR), a technique for modelling dynamic systems from observations of their input/output behaviour. Inductive modelling is by its very nature an optimisation problem. Modelling large-scale systems in this fashion involves solving a high-dimensional optimisation problem, a task that invariably carries a high computational cost. Suboptimal search algorithms are therefore important. One of the two proposed algorithms is a new variant of a directed hill-climbing method. The other algorithm is a statistical technique based on spectral coherence functions. The utility of the two techniques is demonstrated by means of an industrial example. A garbage incinerator process is inductively modelled from observations of 20 variable trajectories. Both suboptimal search algorithms lead to similarly good models. Each of the algorithms carries a computational cost that is in the order of a few percent of the cost of solving the complete optimisation problem. Both algorithms can also be used to filter out variables of lesser importance, i.e. they can be used as variable selection tools.  相似文献   

9.
10.
A major challenge to appearance-based learning techniques is the robustness against data corruption and irrelevant within-class data variation. This paper presents a robust kernel for kernel-based approach to achieving better robustness on several visual learning problems. Incorporating a robust error function used in robust statistics together with a deformation invariant distance measure, the proposed kernel is shown to be insensitive to noise and robust to intra-class variations. We prove that this robust kernel satisfies the requirements for a valid kernel, so it has good properties when used with kernel-based learning machines. In the experiments, we validate the superior robustness of the proposed kernel over the state-of-the-art algorithms on several applications, including hand-written digit classification, face recognition and data visualization.  相似文献   

11.
To some extent the problem of noise reduction in machine learning has been finessed by the development of learning techniques that are noise-tolerant. However, it is difficult to make instance-based learning noise tolerant and noise reduction still plays an important role in k-nearest neighbour classification. There are also other motivations for noise reduction, for instance the elimination of noise may result in simpler models or data cleansing may be an end in itself. In this paper we present a novel approach to noise reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction techniques are based. Roughly speaking, for each training example an SVM is trained on its neighbourhood and if the SVM classification for the central example disagrees with its actual class there is evidence in favour of removing it from the training set. We provide an empirical evaluation on 15 real datasets showing improved classification accuracy when using training data edited with our method as well as specific experiments regarding the spam filtering application domain. We present a further evaluation on two artificial datasets where we analyse two different types of noise (Gaussian feature noise and mislabelling noise) and the influence of different class densities. The conclusion is that LSVM noise reduction is significantly better than the other analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class densities.  相似文献   

12.
The application of machine learning (ML) techniques to metal-based nanomaterials has contributed greatly to understanding the interaction of nanoparticles, properties prediction, and new materials discovery. However, the prediction accuracy and efficiency of distinctive ML algorithms differ with different metal-based nanomaterials problems. This, alongside the high dimensionality and nonlinearity of available datasets in metal-based nanomaterials problems, makes it imperative to review recent advances in the implementation of ML techniques for these kinds of problems. In addition to understanding the applicability of different ML algorithms to various kinds of metal-based nanomaterials problems, it is hoped that this work will help facilitate understanding and promote interest in this emerging and less explored area of materials informatics. The scope of this review covers the introduction of metal-based nanomaterials, several techniques used in generating datasets for training ML models, feature engineering techniques used in nanomaterials-machine learning applications, and commonly applied ML algorithms. Then, we present the recent advances in ML applications to metal-based nanomaterials, with emphasis on the procedure and efficiency of algorithms used for such applications. In the concluding section, we identify the most common and efficient algorithms for distinctive property predictions. The common problems encountered in ML applications for metal-based nanoinformatics were mentioned. Finally, we propose suitable solutions and future outlooks for various challenges in metal-based nanoinformatics research.  相似文献   

13.
In Inductive Logic Programming (ILP), algorithms that are purely of the bottom-up or top-down type encounter several problems in practice. Since a majority of them are greedy ones, these algorithms stop when finding clauses in local optima, according to the “quality” measure used for evaluating the results. Moreover, when learning clauses one by one, the induced clauses become less and less interesting as the algorithm is progressing to cover few remaining examples. In this paper, we propose a simulated annealing framework to overcome these problems. Using a refinement operator, we define neighborhood relations on clauses and on hypotheses (i.e. sets of clauses). With these relations and appropriate quality measures, we show how to induce clauses (in a coverage approach), or to induce hypotheses directly by using simulated annealing algorithms. We discuss the necessary conditions on the refinement operators and the evaluation measures to increase the effectiveness of the algorithm. Implementations (included a parallelized version of the algorithm) are described and experimentation results in terms of convergence of the method and in terms of accuracy are presented.  相似文献   

14.
Hypothesis testing is a collective name for problems such as classification, detection, and pattern recognition. In this paper we propose two new classes of supervised learning algorithms for feedforward, binary-output neural network structures whose objective is hypothesis testing. All the algorithms are applications of stochastic approximation and are guaranteed to provide optimization with probability one. The first class of algorithms follows the Neyman-Pearson approach and maximizes the probability of detection, subject to a given false alarm constraint. These algorithms produce layer-by-layer optimal Neyman-Pearson designs. The second class of algorithms minimizes the probability of error and leads to layer-by-layer Bayes optimal designs. Deviating from the layer-by-layer optimization assumption, we propose more powerful learning techniques which unify, in some sense, the already existing algorithms. The proposed algorithms were implemented and tested on a simulated hypothesis testing problem. Backpropagation and perceptron learning were also included in the comparisons.  相似文献   

15.
We give the first polynomial time algorithm to learn any function of a constant number of halfspaces under the uniform distribution on the Boolean hypercube to within any constant error parameter. We also give the first quasipolynomial time algorithm for learning any Boolean function of a polylog number of polynomial-weight halfspaces under any distribution on the Boolean hypercube. As special cases of these results we obtain algorithms for learning intersections and thresholds of halfspaces. Our uniform distribution learning algorithms involve a novel non-geometric approach to learning halfspaces; we use Fourier techniques together with a careful analysis of the noise sensitivity of functions of halfspaces. Our algorithms for learning under any distribution use techniques from real approximation theory to construct low-degree polynomial threshold functions. Finally, we also observe that any function of a constant number of polynomial-weight halfspaces can be learned in polynomial time in the model of exact learning from membership and equivalence queries.  相似文献   

16.
Twin support vector machine with two nonparallel classifying hyperplanes and its extensions have attracted much attention in machine learning and data mining. However, the prediction accuracy may be highly influenced when noise is involved. In particular, for the least squares case, the intractable computational burden may be incurred for large scale data. To address the above problems, we propose the double-weighted least squares twin bounded support vector machines and develop the online learning algorithms. By introducing the double-weighted mechanism, the linear and nonlinear double-weighted learning models are proposed to reduce the influence of noise. The online learning algorithms for solving the two models are developed, which can avoid computing the inverse of the large scale matrices. Furthermore, a new pruning mechanism which can avoid updating the kernel matrices in every iteration step for solving nonlinear model is also developed. Simulation results on three UCI data with noise demonstrate that the online learning algorithm for the linear double-weighted learning model can get least computation time as well considerable classification accuracy. Simulation results on UCI data and two-moons data with noise demonstrate that the nonlinear double-weighted learning model can be effectively solved by the online learning algorithm with the pruning mechanism.  相似文献   

17.
Locally weighted learning (LWL) is a class of techniques from nonparametric statistics that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional belief that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested on up to 90 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing by a humanoid robot arm, and inverse-dynamics learning for a seven and a 30 degree-of-freedom robot. In all these examples, the application of our statistical neural networks techniques allowed either faster or more accurate acquisition of motor control than classical control engineering.  相似文献   

18.
度量亦称距离函数,是度量空间中满足特定条件的特殊函数,一般用来反映数据间存在的一些重要距离关系.而距离对于各种分类聚类问题影响很大,因此度量学习对于这类机器学习问题有重要影响.受到现实存在的各种噪声影响,已有的各种度量学习算法在处理各种分类问题时,往往出现分类准确率较低以及分类准确率波动大的问题.针对该问题,本文提出一种基于最大相关熵准则的鲁棒度量学习算法.最大相关熵准则的核心在于高斯核函数,本文将其引入到度量学习中,通过构建以高斯核函数为核心的损失函数,利用梯度下降法进行优化,反复测试调整参数,最后得到输出的度量矩阵.通过这样的方法学习到的度量矩阵将有更好的鲁棒性,在处理受噪声影响的各种分类问题时,将有效地提高分类准确率.本文将在一些常用机器学习数据集(UCI)还有人脸数据集上进行验证实验.  相似文献   

19.
特征选择有助于增强集成分类器成员间的随机差异性,从而提高泛化精度。研究了随机子空间法(RandomSub-space)和旋转森林法(RotationForest)两种基于特征选择的集成分类器构造算法,分析讨论了两算法特征选择的方式与随机差异程度之间的关系。通过对UCI数据集引入噪声,比较两者在噪声环境下的分类精度。实验结果表明:当噪声增加及特征关联度下降时,基本学习算法及噪声程度对集成效果均有影响,当噪声增强到一定程度后。集成效果和单分类器的性能趋于一致。  相似文献   

20.
In the distribution-independent model of concept learning of Valiant, Angluin and Laird have introduced a formal model of noise process, called classification noise process, to study how to compensate for randomly introduced errors, or noise, in classifying the example data. In this article, we investigate the problem of designing efficient learning algorithms in the presence of classification noise. First, we develop a technique of building efficient robust learning algorithms, called noise-tolerant Occam algorithms, and show that using them, one can construct a polynomial-time algorithm for learning a class of Boolean functions in the presence of classification noise. Next, as an instance of such problems of learning in the presence of classification noise, we focus on the learning problem of Boolean functions represented by decision trees. We present a noise-tolerant Occam algorithm for k-DL (the class of decision lists with conjunctive clauses of size at most k at each decision introduced by Rivest) and hence conclude that k-DL is polynomially learnable in the presence of classification noise. Further, we extend the noise-tolerant Occam algorithm for k-DL to one for r-DT (the class of decision trees of rank at most r introduced by Ehrenfeucht and Haussler) and conclude that r-DT is polynomially learnable in the presence of classification noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号