共查询到20条相似文献,搜索用时 4 毫秒
1.
2.
多示例多标签学习框架是一种针对解决多义性问题而提出的新型机器学习框架,在多示例多标签学习框架中,一个对象是用一组示例集合来表示,并且和一组类别标签相关联。E-MIMLSVM~+算法是多示例多标签学习框架中利用退化思想的经典分类算法,针对其无法利用无标签样本进行学习从而造成泛化能力差等问题,使用半监督支持向量机对该算法进行改进。改进后的算法可以利用少量有标签样本和大量没有标签的样本进行学习,有助于发现样本集内部隐藏的结构信息,了解样本集的真实分布情况。通过对比实验可以看出,改进后的算法有效提高了分类器的泛化性能。 相似文献
3.
多数多标记学习方法通过在输出空间中,单示例同时与多个类别标记相关联表示多义性,目前有研究通过在输入空间将单一示例转化为示例包,建立包中多示例与多标记的联系。算法在生成示例包时采用等权重平均法计算每个标记对应样例的均值。由于数据具有局部分布特征,在计算该均值时考虑数据局部分布,将会使生成的示例包更加准确。本论文充分考虑数据分布特性,提出新的分类算法。实验表明改进算法性能优于其他常用多标记学习算法。 相似文献
4.
In multi-instance multi-label learning (MIML), each example is not only represented by multiple instances but also associated with multiple class labels. Several learning frameworks, such as the traditional supervised learning, can be regarded as degenerated versions of MIML. Therefore, an intuitive way to solve MIML problem is to identify its equivalence in its degenerated versions. However, this identification process would make useful information encoded in training examples get lost and thus impair the learning algorithm's performance. In this paper, RBF neural networks are adapted to learn from MIML examples. Connections between instances and labels are directly exploited in the process of first layer clustering and second layer optimization. The proposed method demonstrates superior performance on two real-world MIML tasks. 相似文献
5.
6.
Multi-instance multi-label learning (MIML) is a newly proposed framework, in which the multi-label problems are investigated by representing each sample with multiple feature vectors named instances. In this framework, the multi-label learning task becomes to learn a many-to-many relationship, and it also offers a possibility for explaining why a concerned sample has the certain class labels. The connections between instances and labels as well as the correlations among labels are equally crucial information for MIML. However, the existing MIML algorithms can rarely exploit them simultaneously. In this paper, a new MIML algorithm is proposed based on Gaussian process. The basic idea is to suppose a latent function with Gaussian process prior in the instance space for each label and infer the predictive probability of labels by integrating over uncertainties in these functions using the Bayesian approach, so that the connection between instances and every label can be exploited by defining a likelihood function and the correlations among labels can be identified by the covariance matrix of the latent functions. Moreover, since different relationships between instances and labels can be captured by defining different likelihood functions, the algorithm may be used to deal with the problems with various multi-instance assumptions. Experimental results on several benchmark data sets show that the proposed algorithm is valid and can achieve superior performance to the existing ones. 相似文献
7.
用于多标记学习的K近邻改进算法* 总被引:1,自引:0,他引:1
ML-KNN是应用KNN算法思想解决多标记学习问题的一种算法,但存在时间复杂度高和少数类分类精度低的问题.提出一种加权ML-KNN算法WML-KNN,通过取样和加权的方法,在降低算法时间复杂度的同时提高少数类的分类精度.实验表明,WML-KNN算法性能优于其他常用多标记算法. 相似文献
8.
9.
针对有特殊结构的文本,传统的文本分类算法已经不能满足需求,为此提出一种基于多示例学习框架的文本分类算法。将每个文本当作一个示例包,文本中的标题和正文视为该包的两个示例;利用基于一类分类的多类分类支持向量机算法,将包映射到高维特征空间中;引入高斯核函数训练分类器,完成对无标记文本的分类预测。实验结果表明,该算法相较于传统的机器学习分类算法具有更高的分类精度,为具有特殊文本结构的文本挖掘领域研究提供了新的角度。 相似文献
10.
传统的多标记学习是监督意义下的学习,它要求获得完整的类别标记.但是当数据规模较大且类别数目较多时,获得完整类别标记的训练样本集是非常困难的.因而,在半监督协同训练思想的框架下,提出了基于Tri-training的半监督多标记学习算法(SMLT).在学习阶段,SMLT引入一个虚拟类标记,然后针对每一对类别标记,利用协同训练机制Tri-training算法训练得到对应的分类器;在预测阶段,给定一个新的样本,将其代入上述所得的分类器中,根据类别标记得票数的多少将多标记学习问题转化为标记排序问题,并将虚拟类标记的得票数作为阈值对标记排序结果进行划分.在UCI中4个常用的多标记数据集上的对比实验表明,SMLT算法在4个评价指标上的性能大多优于其他对比算法,验证了该算法的有效性. 相似文献
11.
《Expert systems with applications》2014,41(15):6755-6772
We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max–Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC’s ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available. 相似文献
12.
针对在求解半监督多标记分类问题时通常将其分解成若干个单标记半监督二类分类问题从而导致忽视类别之间内在联系的问题,提出基于局部学习的半监督多标记分类方法。该方法避开了多个单标记半监督二类分类问题的求解,采用“整体法”的研究思路,利用基于图的方法,引入基于样本的局部学习正则项和基于类别的拉普拉斯正则项,构建了问题的正则化框架。实验结果表明,所提算法具有较高的查全率和查准率。 相似文献
13.
14.
多标记学习是实际应用中的一类常见问题,覆盖算法在单标记学习中表现出了优秀的性能,但无法处理多标记情况。将覆盖算法推广到多标记学习中,针对多标记学习的特点和评价指标,对算法的学习和构造过程进行了改造,给出待分类样本对各类别的隶属度。将算法应用于基因数据集和自然场景数据集的学习中,实验结果表明算法能够取得较好的分类效果,且相比于大多数同类算法有更高的性能。 相似文献
15.
基于标记特征的多标记分类算法通过对标记的正反样例集合进行聚类,计算样例与聚类中心间的距离构造样例针对标记的特征子集,并生成新的训练集,在新的训练集上利用传统的二分类器进行分类。算法在构造特征子集的过程中采用等权重方式,忽略了样例之间的相关性。提出了一种改进的多标记分类算法,通过加权方式使生成的特征子集更加准确,有助于提高样例的分类精度。实验表明改进的算法性能优于其他常用的多标记分类算法。 相似文献
16.
Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard (under cryptographic assumptions), and practitioners typically resort to search heuristics which suffer from the usual local optima issues. We prove that under a natural separation condition (bounds on the smallest singular value of the HMM parameters), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations—it implicitly depends on this quantity through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications. 相似文献
17.
Multi-instance learning was first proposed by Dietterich et al. (Artificial Intelligence 89(1–2):31–71, 1997) when they were investigating the problem of drug activity prediction. Here, the training set is composed of labeled bags, each of which consists of many unlabeled instances. And the goal of this learning framework is to learn some classifier from the training set for correctly labeling unseen bags. After Dietterich et al., many studies about this new learning framework have been started and many new algorithms have been proposed, for example, DD, EM-DD, Citation-kNN and so on. All of these algorithms are working on the full data set. But as in single-instance learning, different feature in training set has different effect on the training about classifier. In this paper, we will study the problem about feature selection in multi-instance learning. We will extend the data reliability measure and make it select the key feature in multi-instance scenario. 相似文献
18.
Zhi-Hua Zhou Min-Ling Zhang Sheng-Jun Huang Yu-Feng Li 《Artificial Intelligence》2012,176(1):2291-2320
In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly. 相似文献
19.
由于标签空间过大,标签分布不平衡问题在多标签数据集中广泛存在,解决该问题在一定程度上可以提高多标签学习的分类性能.通过标签相关性提升分类性能是解决该问题的一种最常见的有效策略,众多学者进行了大量研究,然而这些研究更多地是采用基于正相关性策略提升性能.在实际问题中,除了正相关性外,标签的负相关性也可能存在,如果在考虑正相... 相似文献
20.
针对照明变化、形状变化、外观变化和遮挡对目标跟踪的影响,提出一种基于加速鲁棒特征(SURF)和多示例学习(MIL)的目标跟踪算法。首先,提取目标及其周围图像的SURF特征;然后,将SURF描述子引入到MIL中作为正负包中的示例;其次,将提取到的所有SURF特征采用聚类算法实现聚类,建立视觉词汇表;再次,通过计算视觉字在多示例包的重要程度,建立“词-文档”矩阵,并且求出包的潜在语义特征通过潜在语义分析(LSA);最后,通过包的潜在语义特征训练支持向量机(SVM),使得MIL问题可以依照有监督学习问题进行解决,进而判断是否为感兴趣目标,最终实现视觉跟踪的目的。通过实验,明确了所提算法对于目标的尺度缩放以及短时局部遮挡的情况都有一定的鲁棒性。 相似文献