共查询到10条相似文献,搜索用时 109 毫秒
1.
A stopping criterion for active learning 总被引:1,自引:0,他引:1
Active learning (AL) is a framework that attempts to reduce the cost of annotating training material for statistical learning methods. While a lot of papers have been presented on applying AL to natural language processing tasks reporting impressive savings, little work has been done on defining a stopping criterion. In this work, we present a stopping criterion for active learning based on the way instances are selected during uncertainty-based sampling and verify its applicability in a variety of settings. The statistical learning models used in our study are support vector machines (SVMs), maximum entropy models and Bayesian logistic regression and the tasks performed are text classification, named entity recognition and shallow parsing. In addition, we present a method for multiclass mutually exclusive SVM active learning. 相似文献
2.
Support vector machine (SVM) is a general and powerful learning machine, which adopts supervised manner. However, for many
practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones
are very expensive to be obtained. Therefore, semi-supervised learning emerges as the times require. At present, the combination
of SVM and semi-supervised learning principle such as transductive learning has attracted more and more attentions. Transductive
support vector machine (TSVM) learns a large margin hyperplane classifier using labeled training data, but simultaneously
force this hyperplane to be far away from the unlabeled data. TSVM might seem to be the perfect semi-supervised algorithm
since it combines the powerful regularization of SVMs and a direct implementation of the clustering assumption, nevertheless
its objective function is non-convex and then it is difficult to be optimized. This paper aims to solve this difficult problem.
We apply least square support vector machine to implement TSVM, which can ensure that the objective function is convex and
the optimization solution can then be easily found by solving a set of linear equations. Simulation results demonstrate that
the proposed method can exploit unlabeled data to yield good performance effectively. 相似文献
3.
Personalized transductive learning (PTL) builds a unique local model for classification of individual test samples and is therefore practically neighborhood dependant; i.e. a specific model is built in a subspace spanned by a set of samples adjacent to the test sample. While existing PTL methods usually define the neighborhood by a predefined (dis)similarity measure, this paper introduces a new concept of a knowledgeable neighborhood and a transductive Support Vector Machine (SVM) classification tree (t-SVMT) for PTL. The neighborhood of a test sample is constructed over the classification knowledge modelled by regional SVMs, and a set of such SVMs adjacent to the test sample is systematically aggregated into a t-SVMT. Compared to a regular SVM and other SVMTs, a t-SVMT, by virtue of the aggregation of SVMs, has an inherent superiority in classifying class-imbalanced datasets. The t-SVMT has also solved the over-fitting problem of all previous SVMTs since it aggregates neighborhood knowledge and thus significantly reduces the size of the SVM tree. The properties of the t-SVMT are evaluated through experiments on a synthetic dataset, eight bench-mark cancer diagnosis datasets, as well as a case study of face membership authentication. 相似文献
4.
5.
在进行组合决策时,已有的组合分类方法需要对多个组合分类器均有效的公共已知标签训练样本。为了解决在没有已知标签样本的情况下数据流组合分类决策问题,提出一种基于约束学习的数据流组合分类器的融合策略。在判定测试样本上的决策时,根据直推学习理论设计满足每一个局部分类器约束度量的方法,保证了约束的可行性,解决了分布式分类聚集时最大熵的直推扩展问题。测试数据集上的实验证明,与已有的直推学习方法相比,此方法可以获得更好的决策精度,可以应用于数据流组合分类的融合。 相似文献
6.
7.
支持向量机(support vector machine)是近年来在统计学习理论的基础上发展起来的一种新的模式识别方法,在解决小样本、非线性及高维模式识别问题中表现出许多特有的优势.直推式学习(transductive inference)试图根据已知样本对特定的未知样本建立一套进行识别的方法和准则.较之传统的归纳式学习方法而言,直推式学习往往更具普遍性和实际意义.提出了一种基于支持向量机的渐进直推式分类学习算法,在少量有标签样本和大量无标签样本所构成的混合样本训练集上取得了良好的学习效果. 相似文献
8.
Ho SS Wechsler H 《IEEE transactions on pattern analysis and machine intelligence》2008,30(9):1557-1571
There has been recently a growing interest in the use of transductive inference for learning. We expand here the scope of transductive inference to active learning in a stream-based setting. Towards that end this paper proposes Query-by-Transduction (QBT) as a novel active learning algorithm. QBT queries the label of an example based on the p-values obtained using transduction. We show that QBT is closely related to Query-by-Committee (QBC) using relations between transduction, Bayesian statistical testing, Kullback-Leibler divergence, and Shannon information. The feasibility and utility of QBT is shown on both binary and multi-class classification tasks using SVM as the choice classifier. Our experimental results show that QBT compares favorably, in terms of mean generalization, against random sampling, committee-based active learning, margin-based active learning, and QBC in the stream-based setting. 相似文献
9.
10.
近年来,Twitter 搜索在社交网络领域引起越来越多学者的关注。尽管排序学习可以融合 Twitter 中丰富的特征,但是训练数据的匮乏,会降低排序学习的性能。直推式学习作为一种常用的半监督学习方法,在解决训练数据的稀少性中发挥着重要的作用。由于在直推式学习的迭代过程中会生成噪音,基于聚类的直推式学习方法被提出。在基于聚类的直推式学习方法中有两个重要的参数,分别为聚类的阈值以及聚类文档的数量。在原有工作的基础上,提出使用另外一种不同的聚类算法。大量在标准TREC数据集Tweets11上的实验表明,聚类的阈值以及聚类过程中文档数量的选择都会对模型的检索性能产生影响。另外,也分析了基于聚类的直推式学习模型的鲁棒性在不同查询集上的表现。最后,引入名为簇凝聚度的质量控制因子,提出了一种基于聚类的自适应的直推式方法来实现 Twitter 检索。实验结果表明,基于聚类的自适应学习算法具有更好的鲁棒性。 相似文献