首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.  相似文献   

2.
Boosting a Weak Learning Algorithm by Majority   总被引:11,自引:0,他引:11  
We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas presented by Schapire and represents an improvement over his results, The analysis of our algorithm provides general upper bounds on the resources required for learning in Valiant′s polynomial PAC learning framework, which are the best general upper bounds known today. We show that the number of hypotheses that are combined by our algorithm is the smallest number possible. Other outcomes of our analysis are results regarding the representational power of threshold circuits, the relation between learnability and compression, and a method for parallelizing PAC learning algorithms. We provide extensions of our algorithms to cases in which the concepts are not binary and to the case where the accuracy of the learning algorithm depends on the distribution of the instances.  相似文献   

3.
In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.  相似文献   

4.
We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ class labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples, based on algorithms for the multi-armed bandit problem. In addition, we also evaluate a group of algorithms based on the idea of incorporating second-order statistics into decision making. Most of our algorithms are competitive with the current state of art and performed better when the budget was highly limited (in particular, our new algorithm AbsoluteBR2). Finally, we present new heuristics for selecting an instance to purchase after the attribute is selected, instead of selecting an instance uniformly at random, which is typically done. While experimental results showed some performance improvements when using the new instance selectors, there was no consistent winner among these methods.  相似文献   

5.
Computing Optimal Attribute Weight Settings for Nearest Neighbor Algorithms   总被引:2,自引:0,他引:2  
Nearest neighbor (NN) learning algorithms, examples of the lazy learning paradigm, rely on a distance function to measure the similarity of testing examples with the stored training examples. Since certain attributes are more discriminative, while others can be less or totally irrelevant, attributes should be weighed differently in the distance function. Most previous studies on weight setting for NN learning algorithms are empirical. In this paper we describe our attempt on deciding theoretically optimal weights that minimize the predictive error for NN algorithms. Assuming a uniform distribution of examples in a 2-d continuous space, we first derive the average predictive error introduced by a linear classification boundary, and then determine the optimal weight setting for any polygonal classification region. Our theoretical results of optimal attribute weights can serve as a baseline or lower bound for comparing other empirical weight setting methods.  相似文献   

6.
Selective Sampling for Nearest Neighbor Classifiers   总被引:3,自引:0,他引:3  
Most existing inductive learning algorithms work under the assumption that their training examples are already tagged. There are domains, however, where the tagging procedure requires significant computation resources or manual labor. In such cases, it may be beneficial for the learner to be active, intelligently selecting the examples for labeling with the goal of reducing the labeling cost. In this paper we present LSS—a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers. The algorithm is looking for the example with the highest utility, taking its effect on the resulting classifier into account. Computing the expected utility of an example requires estimating the probability of its possible labels. We propose to use the random field model for this estimation. The LSS algorithm was evaluated empirically on seven real and artificial data sets, and its performance was compared to other selective sampling algorithms. The experiments show that the proposed algorithm outperforms other methods in terms of average error rate and stability.  相似文献   

7.
This paper presents a strategy to improve the AdaBoost algorithm with a quadratic combination of base classifiers. We observe that learning this combination is necessary to get better performance and is possible by constructing an intermediate learner operating on the combined linear and quadratic terms. This is not trivial, as the parameters of the base classifiers are not under direct control, obstructing the application of direct optimization. We propose a new method realizing iterative optimization indirectly. First we train a classifier by randomizing the labels of training examples. Subsequently, the input learner is called repeatedly with a systematic update of the labels of the training examples in each round. We show that the quadratic boosting algorithm converges under the condition that the given base learner minimizes the empirical error. We also give an upper bound on the VC-dimension of the new classifier. Our experimental results on 23 standard problems show that quadratic boosting compares favorably with AdaBoost on large data sets at the cost of training speed. The classification time of the two algorithms, however, is equivalent.  相似文献   

8.
Feature engineering is one of the most complex aspects of system design in machine learning. Fortunately, kernel methods provide the designer with formidable tools to tackle such complexity. Among others, tree kernels (TKs) have been successfully applied for representing structured data in diverse domains, ranging from bioinformatics and data mining to natural language processing. One drawback of such methods is that learning with them typically requires a large number of kernel computations (quadratic in the number of training examples) between training examples. However, in practice substructures often repeat in the data which makes it possible to avoid a large number of redundant kernel evaluations. In this paper, we propose the use of Directed Acyclic Graphs (DAGs) to compactly represent trees in the training algorithm of Support Vector Machines. In particular, we use DAGs for each iteration of the cutting plane algorithm (CPA) to encode the model composed by a set of trees. This enables DAG kernels to efficiently evaluate TKs between the current model and a given training tree. Consequently, the amount of total computation is reduced by avoiding redundant evaluations over shared substructures. We provide theory and algorithms to formally characterize the above idea, which we tested on several datasets. The empirical results confirm the benefits of the approach in terms of significant speedups over previous state-of-the-art methods. In addition, we propose an alternative sampling strategy within the CPA to address the class-imbalance problem, which coupled with fast learning methods provides a viable TK learning framework for a large class of real-world applications.  相似文献   

9.
Two new lattice reduction algorithms are presented and analyzed. These algorithms, called the Schmidt reduction and the Gram reduction, are obtained by relaxing some of the constraints of the classical LLL algorithm. By analyzing the worst case behavior and the average case behavior in a tractable model, we prove that the new algorithms still produce “good” reduced basis while requiring fewer iterations on average. In addition, we provide empirical tests on random lattices coming from applications, that confirm our theoretical results about the relative behavior of the different reduction algorithms.  相似文献   

10.
Active Sampling for Class Probability Estimation and Ranking   总被引:1,自引:0,他引:1  
In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号