首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
Perceptron-based learning algorithms   总被引:1,自引:0,他引:1  
A key task for connectionist research is the development and analysis of learning algorithms. An examination is made of several supervised learning algorithms for single-cell and network models. The heart of these algorithms is the pocket algorithm, a modification of perceptron learning that makes perceptron learning well-behaved with nonseparable training data, even if the data are noisy and contradictory. Features of these algorithms include speed algorithms fast enough to handle large sets of training data; network scaling properties, i.e. network methods scale up almost as well as single-cell models when the number of inputs is increased; analytic tractability, i.e. upper bounds on classification error are derivable; online learning, i.e. some variants can learn continually, without referring to previous data; and winner-take-all groups or choice groups, i.e. algorithms can be adapted to select one out of a number of possible classifications. These learning algorithms are suitable for applications in machine learning, pattern recognition, and connectionist expert systems.  相似文献   

2.
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in relational data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess the models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models that will result in significantly different levels of performance. We show that the commonly used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore, we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). We propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., 1−Type II error).  相似文献   

3.
We address the sequence classification problem using a probabilistic model based on hidden Markov models (HMMs). In contrast to commonly-used likelihood-based learning methods such as the joint/conditional maximum likelihood estimator, we introduce a discriminative learning algorithm that focuses on class margin maximization. Our approach has two main advantages: (i) As an extension of support vector machines (SVMs) to sequential, non-Euclidean data, the approach inherits benefits of margin-based classifiers, such as the provable generalization error bounds. (ii) Unlike many algorithms based on non-parametric estimation of similarity measures that enforce weak constraints on the data domain, our approach utilizes the HMM’s latent Markov structure to regularize the model in the high-dimensional sequence space. We demonstrate significant improvements in classification performance of the proposed method in an extensive set of evaluations on time-series sequence data that frequently appear in data mining and computer vision domains.  相似文献   

4.
Transduction is an inference mechanism adopted from several classification algorithms capable of exploiting both labeled and unlabeled data and making the prediction for the given set of unlabeled data only. Several transductive learning methods have been proposed in the literature to learn transductive classifiers from examples represented as rows of a classical double-entry table (or relational table). In this work we consider the case of examples represented as a set of multiple tables of a relational database and we propose a new relational classification algorithm, named TRANSC, that works in a transductive setting and employs a probabilistic approach to classification. Knowledge on the data model, i.e., foreign keys, is used to guide the search process. The transductive learning strategy iterates on a k-NN based re-classification of labeled and unlabeled examples, in order to identify borderline examples, and uses the relational probabilistic classifier Mr-SBC to bootstrap the transductive algorithm. Experimental results confirm that TRANSC outperforms its inductive counterpart (Mr-SBC).  相似文献   

5.
Gentile  Claudio 《Machine Learning》2003,53(3):265-299
We consider two on-line learning frameworks: binary classification through linear threshold functions and linear regression. We study a family of on-line algorithms, called p-norm algorithms, introduced by Grove, Littlestone and Schuurmans in the context of deterministic binary classification. We show how to adapt these algorithms for use in the regression setting, and prove worst-case bounds on the square loss, using a technique from Kivinen and Warmuth. As pointed out by Grove, et al., these algorithms can be made to approach a version of the classification algorithm Winnow as p goes to infinity; similarly they can be made to approach the corresponding regression algorithm EG in the limit. Winnow and EG are notable for having loss bounds that grow only logarithmically in the dimension of the instance space. Here we describe another way to use the p-norm algorithms to achieve this logarithmic behavior. With the way to use them that we propose, it is less critical than with Winnow and EG to retune the parameters of the algorithm as the learning task changes. Since the correct setting of the parameters depends on characteristics of the learning task that are not typically known a priori by the learner, this gives the p-norm algorithms a desireable robustness. Our elaborations yield various new loss bounds in these on-line settings. Some of these bounds improve or generalize known results. Others are incomparable.  相似文献   

6.
《国际计算机数学杂志》2012,89(7):1321-1333
In this study, we investigate the consistency of half supervised coefficient regularization learning with indefinite kernels. In our setting, the hypothesis space and learning algorithms are based on two different groups of input data which are drawn i.i.d. according to an unknown probability measure ρ X . The only conditions imposed on the kernel function are the continuity and boundedness instead of a Mercer kernel and the output data are not asked to be bounded uniformly. By a mild assumption of unbounded output data and a refined integral operator technique, the generalization error is decomposed into hypothesis error, sample error and approximation error. By estimating these three parts, we deduce satisfactory learning rates with proper choice of the regularization parameter.  相似文献   

7.
Srinivasan  Sriram  Dickens  Charles  Augustine  Eriq  Farnadi  Golnoosh  Getoor  Lise 《Machine Learning》2022,111(8):2799-2838
Machine Learning - Statistical relational learning (SRL) frameworks are effective at defining probabilistic models over complex relational data. They often use weighted first-order logical rules...  相似文献   

8.
9.
王星  方滨兴  张宏莉  何慧  赵蕾 《软件学报》2013,24(11):2508-2521
在关系分类模型的学习过程中,目前还没有类似统计学习理论中学习界限的支撑.研究关系分类的学习界限显得尤为重要,为此,提出了一些适用于关系分类模型的学习界限.首先推导出在模型假设空间有限和无限情况下的学习界限.接着提出一个衡量关系模型关联数据能力的复杂性度量——关系维,并证明了该复杂度和关系模型的生长函数之间的关系,得到有限VC 维和有限关系维下的学习界限.然后分析了该界限可学习和有意义的条件,并对界限的可行性进行了详细的分析.最后分析了基于马尔可夫逻辑网的传统学习界限和关系分类中的学习情况,实验结果表明,所提出的界限能够解释实际关系分类中遇到的一些问题.  相似文献   

10.
Bias/variance analysis is a useful tool for investigating the performance of machine learning algorithms. Conventional analysis decomposes loss into errors due to aspects of the learning process, but in relational domains, the inference process used for prediction introduces an additional source of error. Collective inference techniques introduce additional error, both through the use of approximate inference algorithms and through variation in the availability of test-set information. To date, the impact of inference error on model performance has not been investigated. We propose a new bias/variance framework that decomposes loss into errors due to both the learning and inference processes. We evaluate the performance of three relational models on both synthetic and real-world datasets and show that (1) inference can be a significant source of error, and (2) the models exhibit different types of errors as data characteristics are varied.  相似文献   

11.
Embar  Varun  Srinivasan  Sriram  Getoor  Lise 《Machine Learning》2021,110(7):1847-1866

Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complex aggregate graph queries (AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches.

  相似文献   

12.
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.  相似文献   

13.
Feature selection is an essential data processing step to remove irrelevant and redundant attributes for shorter learning time, better accuracy, and better comprehensibility. A number of algorithms have been proposed in both data mining and machine learning areas. These algorithms are usually used in a single table environment, where data are stored in one relational table or one flat file. They are not suitable for a multi‐relational environment, where data are stored in multiple tables joined to one another by semantic relationships. To address this problem, in this article, we propose a novel approach called FARS to conduct both Feature And Relation Selection for efficient multi‐relational classification. Through this approach, we not only extend the traditional feature selection method to select relevant features from multi‐relations, but also develop a new method to reconstruct the multi‐relational database schema and eliminate irrelevant tables to improve classification performance further. The results of the experiments conducted on both real and synthetic databases show that FARS can effectively choose a small set of relevant features, thereby enhancing classification efficiency and prediction accuracy significantly.  相似文献   

14.
Langford  John  Blum  Avrim 《Machine Learning》2003,51(2):165-179
A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998). We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency.  相似文献   

15.
This paper presents the first generalization bounds for time series prediction with a non-stationary mixing stochastic process. We prove Rademacher complexity learning bounds for both average-path generalization with non-stationary \(\beta \)-mixing processes and path-dependent generalization with non-stationary \(\phi \)-mixing processes. Our guarantees are expressed in terms of \(\beta \)- or \(\phi \)-mixing coefficients and a natural measure of discrepancy between training and target distributions. They admit as special cases previous Rademacher complexity bounds for non-i.i.d. stationary distributions, for independent but not identically distributed random variables, or for the i.i.d. case. We show that, using a new sub-sample selection technique we introduce, our bounds can be tightened under the natural assumption of asymptotically stationary stochastic processes. We also prove that fast learning rates can be achieved by extending existing local Rademacher complexity analyses to the non-i.i.d. setting. We conclude the paper by providing generalization bounds for learning with unbounded losses and non-i.i.d. data.  相似文献   

16.
Computing Optimal Attribute Weight Settings for Nearest Neighbor Algorithms   总被引:2,自引:0,他引:2  
Nearest neighbor (NN) learning algorithms, examples of the lazy learning paradigm, rely on a distance function to measure the similarity of testing examples with the stored training examples. Since certain attributes are more discriminative, while others can be less or totally irrelevant, attributes should be weighed differently in the distance function. Most previous studies on weight setting for NN learning algorithms are empirical. In this paper we describe our attempt on deciding theoretically optimal weights that minimize the predictive error for NN algorithms. Assuming a uniform distribution of examples in a 2-d continuous space, we first derive the average predictive error introduced by a linear classification boundary, and then determine the optimal weight setting for any polygonal classification region. Our theoretical results of optimal attribute weights can serve as a baseline or lower bound for comparing other empirical weight setting methods.  相似文献   

17.
We consider bounds on the prediction error of classification algorithms based on sample compression. We refine the notion of a compression scheme to distinguish permutation and repetition invariant and non-permutation and repetition invariant compression schemes leading to different prediction error bounds. Also, we extend known results on compression to the case of non-zero empirical risk.We provide bounds on the prediction error of classifiers returned by mistake-driven online learning algorithms by interpreting mistake bounds as bounds on the size of the respective compression scheme of the algorithm. This leads to a bound on the prediction error of perceptron solutions that depends on the margin a support vector machine would achieve on the same training sample.Furthermore, using the property of compression we derive bounds on the average prediction error of kernel classifiers in the PAC-Bayesian framework. These bounds assume a prior measure over the expansion coefficients in the data-dependent kernel expansion and bound the average prediction error uniformly over subsets of the space of expansion coefficients.Editor Shai Ben-David  相似文献   

18.
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found. This is an extended version of the paper entitled Strategies to Parallelize ILP Systems, published in the Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 2005), vol. 3625 of LNAI, pp. 136–153, Springer-Verlag.  相似文献   

19.
Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.  相似文献   

20.
Regularized classifiers are known to be a kind of kernel-based classification methods generated from Tikhonov regularization schemes, and the trigonometric polynomial kernels are ones of the most important kernels and play key roles in signal processing. The main target of this paper is to provide convergence rates of classification algorithms generated by regularization schemes with trigonometric polynomial kernels. As a special case, an error analysis for the support vector machines (SVMs) soft margin classifier is presented. The norms of Fejér operator in reproducing kernel Hilbert space and properties of approximation of the operator in L 1 space with periodic function play key roles in the analysis of regularization error. Some new bounds on the learning rate of regularization algorithms based on the measure of covering number for normalized loss functions are established. Together with the analysis of sample error, the explicit learning rates for SVM are also derived.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号