首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper concerns learning binary-valued functions defined on R, and investigates how a particular type of ‘regularity’ of hypotheses can be used to obtain better generalization error bounds. We derive error bounds that depend on the sample width (a notion analogous to that of sample margin for real-valued functions). This motivates learning algorithms that seek to maximize sample width.  相似文献   

2.
The approach of learning multiple “related” tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point for previous work on multiple task learning has been that the tasks to be learned jointly are somehow “algorithmically related”, in the sense that the results of applying a specific learning algorithm to these tasks are assumed to be similar. We offer an alternative approach, defining relatedness of tasks on the basis of similarity between the example generating distributions that underlie these tasks. We provide a formal framework for this notion of task relatedness, which captures a sub-domain of the wide scope of issues in which one may apply a multiple task learning approach. Our notion of task similarity is relevant to a variety of real life multitask learning scenarios and allows the formal derivation of generalization bounds that are strictly stronger than the previously known bounds for both the learning-to-learn and the multitask learning scenarios. We give precise conditions under which our bounds guarantee generalization on the basis of smaller sample sizes than the standard single-task approach. Editors: Daniel Silver, Kristin Bennett, Richard Caruana. A preliminary version of this paper appears in the proceedings of COLT’03, (Ben-David and Schuller 2003).  相似文献   

3.
Memory-based collaborative filtering (CF) aims at predicting the rating of a certain item for a particular user based on the previous ratings from similar users and/or similar items. Previous studies in finding similar users and items have several drawbacks. First, they are based on user-defined similarity measurements, such as Pearson Correlation Coefficient (PCC) or Vector Space Similarity (VSS), which are, for the most part, not adaptive and optimized for specific applications and data. Second, these similarity measures are restricted to symmetric ones such that the similarity between A and B is the same as that for B and A, although symmetry may not always hold in many real world applications. Third, they typically treat the similarity functions between users and functions between items separately. However, in reality, the similarities between users and between items are inter-related. In this paper, we propose a novel unified model for users and items, known as Similarity Learning based Collaborative Filtering (SLCF) , based on a novel adaptive bidirectional asymmetric similarity measurement. Our proposed model automatically learns asymmetric similarities between users and items at the same time through matrix factorization. Theoretical analysis shows that our model is a novel generalization of singular value decomposition (SVD). We show that, once the similarity relation is learned, it can be used flexibly in many ways for rating prediction. To take full advantage of the model, we propose several strategies to make the best use of the proposed similarity function for rating prediction. The similarity can be used either to improve the memory-based approaches or directly in a model based CF approaches. In addition, we also propose an online version of the rating prediction method to incorporate new users and new items. We evaluate SLCF using three benchmark datasets, including MovieLens, EachMovie and Netflix, through which we show that our methods can outperform many state-of-the-art baselines.  相似文献   

4.
目的 现有的深度学习模型往往需要大规模的训练数据,而小样本分类旨在识别只有少量带标签样本的目标类别。作为目前小样本学习的主流方法,基于度量的元学习方法在训练阶段大多没有使用小样本目标类的样本,导致这些模型的特征表示不能很好地泛化到目标类。为了提高基于元学习的小样本图像识别方法的泛化能力,本文提出了基于类别语义相似性监督的小样本图像识别方法。方法 采用经典的词嵌入模型GloVe(global vectors for word representation)学习得到图像数据集每个类别英文名称的词嵌入向量,利用类别词嵌入向量之间的余弦距离表示类别语义相似度。通过把类别之间的语义相关性作为先验知识进行整合,在模型训练阶段引入类别之间的语义相似性度量作为额外的监督信息,训练一个更具类别样本特征约束能力和泛化能力的特征表示。结果 在miniImageNet和tieredImageNet两个小样本学习基准数据集上进行了大量实验,验证提出方法的有效性。结果显示在miniImageNet数据集5-way 1-shot和5-way 5-shot设置上,提出的方法相比原型网络(prototypical networks)分类准确率分别提高1.9%和0.32%;在tieredImageNet数据集5-way 1-shot设置上,分类准确率相比原型网络提高0.33%。结论 提出基于类别语义相似性监督的小样本图像识别模型,提高小样本学习方法的泛化能力,提高小样本图像识别的准确率。  相似文献   

5.
Many pattern recognition algorithms are based on the nearest-neighbour search and use the well-known edit distance, for which the primitive edit costs are usually fixed in advance. In this article, we aim at learning an unbiased stochastic edit distance in the form of a finite-state transducer from a corpus of (input, output) pairs of strings. Contrary to the other standard methods, which generally use the Expectation Maximisation algorithm, our algorithm learns a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimise the parameters of a conditional transducer instead of a joint one. We apply our new model in the context of handwritten digit recognition. We show, carrying out a large series of experiments, that it always outperforms the standard edit distance.  相似文献   

6.

Similar item recommendations—a common feature of many Web sites—point users to other interesting objects given a currently inspected item. A common way of computing such recommendations is to use a similarity function, which expresses how much alike two given objects are. Such similarity functions are usually designed based on the specifics of the given application domain. In this work, we explore how such functions can be learned from human judgments of similarities between objects, using two domains of “quality and taste”—cooking recipe and movie recommendation—as guiding scenarios. In our approach, we first collect a few thousand pairwise similarity assessments with the help of crowdworkers. Using these data, we then train different machine learning models that can be used as similarity functions to compare objects. Offline analyses reveal for both application domains that models that combine different types of item characteristics are the best predictors for human-perceived similarity. To further validate the usefulness of the learned models, we conducted additional user studies. In these studies, we exposed participants to similar item recommendations using a set of models that were trained with different feature subsets. The results showed that the combined models that exhibited the best offline prediction performance led to the highest user-perceived similarity, but also to recommendations that were considered useful by the participants, thus confirming the feasibility of our approach.

  相似文献   

7.
For improving the classification performance on the cheap, it is necessary to exploit both labeled and unlabeled samples by applying semi-supervised learning methods, most of which are built upon the pair-wise similarities between the samples. While the similarities have so far been formulated in a heuristic manner such as by k-NN, we propose methods to construct similarities from the probabilistic viewpoint. The kernel-based formulation of a transition probability is first proposed via comparing kernel least squares to variational least squares in the probabilistic framework. The formulation results in a simple quadratic programming which flexibly introduces the constraint to improve practical robustness and is efficiently computed by SMO. The kernel-based transition probability is by nature favorably sparse even without applying k-NN and induces the similarity measure of the same characteristics. Besides, to cope with multiple types of kernel functions, the multiple transition probabilities obtained correspondingly from the kernels can be probabilistically integrated with prior probabilities represented by linear weights. We propose a computationally efficient method to optimize the weights in a discriminative manner. The optimized weights contribute to a composite similarity measure straightforwardly as well as to integrate the multiple kernels themselves as multiple kernel learning does, which consequently derives various types of multiple kernel based semi-supervised classification methods. In the experiments on semi-supervised classification tasks, the proposed methods demonstrate favorable performances, compared to the other methods, in terms of classification performances and computation time.  相似文献   

8.
The field of reinforcement learning (RL) has been energized in the past few decades by elegant theoretical results indicating under what conditions, and how quickly, certain algorithms are guaranteed to converge to optimal policies. However, in practical problems, these conditions are seldom met. When we cannot achieve optimality, the performance of RL algorithms must be measured empirically. Consequently, in order to meaningfully differentiate learning methods, it becomes necessary to characterize their performance on different problems, taking into account factors such as state estimation, exploration, function approximation, and constraints on computation and memory. To this end, we propose parameterized learning problems, in which such factors can be controlled systematically and their effects on learning methods characterized through targeted studies. Apart from providing very precise control of the parameters that affect learning, our parameterized learning problems enable benchmarking against optimal behavior; their relatively small sizes facilitate extensive experimentation. Based on a survey of existing RL applications, in this article, we focus our attention on two predominant, ??first order?? factors: partial observability and function approximation. We design an appropriate parameterized learning problem, through which we compare two qualitatively distinct classes of algorithms: on-line value function-based methods and policy search methods. Empirical comparisons among various methods within each of these classes project Sarsa(??) and Q-learning(??) as winners among the former, and CMA-ES as the winner in the latter. Comparing Sarsa(??) and CMA-ES further on relevant problem instances, our study highlights regions of the problem space favoring their contrasting approaches. Short run-times for our experiments allow for an extensive search procedure that provides additional insights on relationships between method-specific parameters??such as eligibility traces, initial weights, and population sizes??and problem instances.  相似文献   

9.
Learning hash functions/codes for similarity search over multi-view data is attracting increasing attention, where similar hash codes are assigned to the data objects characterizing consistently neighborhood relationship across views. Traditional methods in this category inherently suffer three limitations: 1) they commonly adopt a two-stage scheme where similarity matrix is first constructed, followed by a subsequent hash function learning; 2) these methods are commonly developed on the assumption that data samples with multiple representations are noise-free,which is not practical in real-life applications; and 3) they often incur cumbersome training model caused by the neighborhood graph construction using all N points in the database (O(N)). In this paper, we motivate the problem of jointly and efficiently training the robust hash functions over data objects with multi-feature representations which may be noise corrupted. To achieve both the robustness and training efficiency, we propose an approach to effectively and efficiently learning low-rank kernelized1 hash functions shared across views. Specifically, we utilize landmark graphs to construct tractable similarity matrices in multi-views to automatically discover neighborhood structure in the data. To learn robust hash functions, a latent low-rank kernel function is used to construct hash functions in order to accommodate linearly inseparable data. In particular, a latent kernelized similarity matrix is recovered by rank minimization on multiple kernel-based similarity matrices. Extensive experiments on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions.We use kernelized similarity rather than kernel, as it is not a squared symmetric matrix for data-landmark affinity matrix.  相似文献   

10.
Measuring visual similarity between two or more instances within a data distribution is a fundamental task in image retrieval. Theoretically, non-metric distances are able to generate a more complex and accurate similarity model than metric distances, provided that the non-linear data distribution is precisely captured by the system. In this work, we explore neural networks models for learning a non-metric similarity function for instance search. We argue that non-metric similarity functions based on neural networks can build a better model of human visual perception than standard metric distances. As our proposed similarity function is differentiable, we explore a real end-to-end trainable approach for image retrieval, i.e. we learn the weights from the input image pixels to the final similarity score. Experimental evaluation shows that non-metric similarity networks are able to learn visual similarities between images and improve performance on top of state-of-the-art image representations, boosting results in standard image retrieval datasets with respect standard metric distances.  相似文献   

11.
The transverse function approach to control, introduced by Morin and Samson in the early 2000s, is based on functions that are transverse to a set of vector fields in a sense formally similar to, although strictly speaking different from, the classical notion of transversality in differential topology. In this paper, a precise link is established between transversality and the functions used in the transverse function approach. It is first shown that a smooth function ${f : M \longrightarrow Q}$ is transverse to a set of vector fields which locally span a distribution D on Q if, and only if, its tangent mapping T f is transverse to D, where D is regarded as a submanifold of the tangent bundle T Q. It is further shown that each of these two conditions is equivalent to transversality of T f to D along the zero section of T M. These results are then used to rigorously state and prove that if M is compact and D is a distribution on Q, then the set of mappings of M into Q that are transverse to D is open in the strong (or ??Whitney C ??-??) topology on C ??(M, Q).  相似文献   

12.
13.
In this paper the k-nearest-neighbours (KNN) based method is presented for the classification of time series which use qualitative learning to identify similarities using kernels. To this end, time series are transformed into symbol strings by means of several discretization methods and a distance based on a kernel between symbols in ordinal scale is used to calculate the similarity between time series. Hence, the idea proposed is the consideration of the simultaneous use of symbolic representation together with a kernel based approach for classification of time series. The methodology has been tested and compared with quantitative learning from a television-viewing shared data set and has yielded a high success identification ratio.  相似文献   

14.
In this paper we present an analysis of the application of the two most important types of similarity measures for moving object trajectories in machine learning from vessel movement data. These similarities are applied in the tasks of clustering, classification and outlier detection. The first similarity type are alignment measures, such as dynamic time warping and edit distance. The second type are based on the integral over time between two trajectories. Following earlier work we define these measures in the context of kernel methods, which provide state-of-the-art, robust algorithms for the tasks studied. Furthermore, we include the influence of applying piecewise linear segmentation as pre-processing to the vessel trajectories when computing alignment measures, since this has been shown to give a positive effect in computation time and performance.In our experiments the alignment based measures show the best performance. Regular versions of edit distance give the best performance in clustering and classification, whereas the softmax variant of dynamic time warping works best in outlier detection. Moreover, piecewise linear segmentation has a positive effect on alignments, due to the fact that salient points in a trajectory, especially important in clustering and outlier detection, are highlighted by the segmentation and have a large influence in the alignments. Based on our experiments, integral over time based similarity measures are not well-suited for learning from vessel trajectories.  相似文献   

15.
Sublearning, a model for learning of subconcepts of a concept, is presented. Sublearning a class of total recursive functions informally means to learn all functions from that class together with all of their subfunctions. While in language learning it is known to be impossible to learn any infinite language together with all of its sublanguages, the situation changes for sublearning of functions. Several types of sublearning are defined and compared to each other as well as to other learning types. For example, in some cases, sublearning coincides with robust learning. Furthermore, whereas in usual function learning there are classes that cannot be learned consistently, all sublearnable classes of some natural types can be learned consistently. Moreover, the power of sublearning is characterized in several terms, thereby establishing a close connection to measurable classes and variants of this notion. As a consequence, there are rich classes which do not need any self-referential coding for sublearning them.  相似文献   

16.
Feature selection methods often improve the performance of attribute-value learning. We explore whether also in relational learning, examples in the form of clauses can be reduced in size to speed up learning without affecting the learned hypothesis. To this end, we introduce the notion of safe reduction: a safely reduced example cannot be distinguished from the original example under the given hypothesis language bias. Next, we consider the particular, rather permissive bias of bounded treewidth clauses. We show that under this hypothesis bias, examples of arbitrary treewidth can be reduced efficiently. We evaluate our approach on four data sets with the popular system Aleph and the state-of-the-art relational learner nFOIL. On all four data sets we make learning faster in the case of nFOIL, achieving an order-of-magnitude speed up on one of the data sets, and more accurate in the case of Aleph.  相似文献   

17.
In this paper, we develop domain decomposition spectral method for mixed inhomogeneous boundary value problems of high order differential equations defined on unbounded domains. We introduce an orthogonal family of new generalized Laguerre functions, with the weight function x ?? , ?? being any real number. The corresponding quasi-orthogonal approximation and Gauss-Radau type interpolation are investigated, which play important roles in the related spectral and collocation methods. As examples of applications, we propose the domain decomposition spectral methods for two fourth order problems, and the spectral method with essential imposition of boundary conditions. The spectral accuracy is proved. Numerical results demonstrate the effectiveness of suggested algorithms.  相似文献   

18.
Due to its storage efficiency and fast query speed, cross-media hashing methods have attracted much attention for retrieving semantically similar data over heterogeneous datasets. Supervised hashing methods, which utilize the labeled information to promote the quality of hashing functions, achieve promising performance. However, the existing supervised methods generally focus on utilizing coarse semantic information between samples (e.g. similar or dissimilar), and ignore fine semantic information between samples which may degrade the quality of hashing functions. Accordingly, in this paper, we propose a supervised hashing method for cross-media retrieval which utilizes the coarse-to-fine semantic similarity to learn a sharing space. The inter-category and intra-category semantic similarity are effectively preserved in the sharing space. Then an iterative descent scheme is proposed to achieve an optimal relaxed solution, and hashing codes can be generated by quantizing the relaxed solution. At last, to further improve the discrimination of hashing codes, an orthogonal rotation matrix is learned by minimizing the quantization loss while preserving the optimality of the relaxed solution. Extensive experiments on widely used Wiki and NUS-WIDE datasets demonstrate that the proposed method outperforms the existing methods.  相似文献   

19.
In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance.  相似文献   

20.
This paper presents a framework for automatically learning rules of a simple game of cards using data from a vision system observing the game being played. Incremental learning of object and protocol models from video, for use by an artificial cognitive agent, is presented. iLearn??a novel algorithm for inducing univariate decision trees for symbolic datasets is introduced. iLearn builds the decision tree in an incremental way allowing automatic learning of rules of the game.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号