首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
由于用户标签的不准确和语义模糊使得协作式标注图像检索正确率低,而现有垃圾标签过滤方法往往关注标签本身,忽略了协作式标签与图像的关联性。本文在分析协作式标注图像视觉内容与标签的关联性的基础上,提出一种基于协作式标注图像视觉内容的垃圾标签检测方法。该方法分析同一标签下图像视觉内容,设计不同的核函数用于颜色和SIFT(Scale invariant feature transform)特征子集,同时将2种低维特征映射到高维多模特征空间形成混合核函数,对同一标签下的图像进行基于混合核的最大最小距离聚类,少数群体的标签说明与图像内容关联性小则为用户标注错误的标签,从而检测垃圾标签。实验结果表明,该方法能够提高协作式图像垃圾标签检测的正确性。  相似文献   

2.
In this paper, we present a novel approach for recovering a 3-D pose from a single human body depth silhouette using nonrigid point set registration and body part tracking. In our method, a human body depth silhouette is presented as a set of 3-D points and matched to another set of 3-D points using point correspondences. To recognize and maintain body part labels, we initialize the first set of points to corresponding human body parts, resulting in a body part-labeled map. Then, we transform the points to a sequential set of points based on point correspondences determined by nonrigid point set registration. After point registration, we utilize the information from tracked body part labels and registered points to create a human skeleton model. A 3-D human pose gets recovered by mapping joint information from the skeleton model to a 3-D synthetic human model. Quantitative and qualitative evaluation results on synthetic and real data show that complex human poses can be recovered more reliably with lower errors compared to other conventional techniques for 3-D pose recovery.  相似文献   

3.
In this paper, we investigate two soft-biometric problems: 1) age estimation and 2) pose estimation, within the scenario where uncertainties exist for the available labels of the training samples. These two tasks are generally formulated as the automatic design of a regressor from training samples with uncertain nonnegative labels. First, the nonnegative label is predicted as the Frobenius norm of a matrix, which is bilinearly transformed from the nonlinear mappings of a set of candidate kernels. Two transformation matrices are then learned for deriving such a matrix by solving two semidefinite programming (SDP) problems, in which the uncertain label of each sample is expressed as two inequality constraints. The objective function of SDP controls the ranks of these two matrices and, consequently, automatically determines the structure of the regressor. The whole framework for the automatic design of a regressor from samples with uncertain nonnegative labels has the following characteristics: 1) the SDP formulation makes full use of the uncertain labels, instead of using conventional fixed labels; 2) regression with the Frobenius norm of matrix naturally guarantees the nonnegativity of the labels, and greater prediction capability is achieved by integrating the squares of the matrix elements, which to some extent act as weak regressors; and 3) the regressor structure is automatically determined by the pursuit of simplicity, which potentially promotes the algorithmic generalization capability. Extensive experiments on two human age databases: 1) FG-NET and 2) Yamaha, and the Pointing'04 head pose database, demonstrate encouraging estimation accuracy improvements over conventional regression algorithms without taking the uncertainties within the labels into account.   相似文献   

4.
The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a max-margin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide a principled interpretation of the 1-vs-All technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classification and show that, with some book-keeping, we can reduce the training time from being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We also design a specialised algorithm for linear kernels based on dual co-ordinate ascent with shrinkage that lets us effortlessly train on a million points with a hundred labels.  相似文献   

5.
Cheng  Yusheng  Song  Fan  Qian  Kun 《Applied Intelligence》2021,51(10):6997-7015

For a multi-label learning framework, each instance may belong to multiple labels simultaneously. The classification accuracy can be improved significantly by exploiting various correlations, such as label correlations, feature correlations, or the correlations between features and labels. There are few studies on how to combine the feature and label correlations, and they deal more with complete data sets. However, missing labels or other phenomena often occur because of the cost or technical limitations in the data acquisition process. A few label completion algorithms currently suitable for missing multi-label learning, ignore the noise interference of the feature space. At the same time, the threshold of the discriminant function often affects the classification results, especially those of the labels near the threshold. All these factors pose considerable difficulties in dealing with missing labels using label correlations. Therefore, we propose a missing multi-label learning algorithm with non-equilibrium based on a two-level autoencoder. First, label density is introduced to enlarge the classification margin of the label space. Then, a new supplementary label matrix is augmented from the missing label matrix with the non-equilibrium label completion method. Finally, considering feature space noise, a two-level kernel extreme learning machine autoencoder is constructed to implement the information feature and label correlation. The effectiveness of the proposed algorithm is verified by many experiments on both missing and complete label data sets. A statistical analysis of hypothesis validates our approach.

  相似文献   

6.
Contrastive learning makes it possible to establish similarities between samples by comparing their distances in an intermediate representation space (embedding space) and using loss functions designed to attract/repel similar/dissimilar samples. The distance comparison is based exclusively on the sample features. We propose a novel contrastive learning scheme by including the labels in the same embedding space as the features and performing the distance comparison between features and labels in this shared embedding space. Following this idea, the sample features should be close to its ground-truth (positive) label and away from the other labels (negative labels). This scheme allows to implement a supervised classification based on contrastive learning. Each embedded label will assume the role of a class prototype in embedding space, with sample features that share the label gathering around it. The aim is to separate the label prototypes while minimizing the distance between each prototype and its same-class samples. A novel set of loss functions is proposed with this objective. Loss minimization will drive the allocation of sample features and labels in embedding space. Loss functions and their associated training and prediction architectures are analyzed in detail, along with different strategies for label separation. The proposed scheme drastically reduces the number of pair-wise comparisons, thus improving model performance. In order to further reduce the number of pair-wise comparisons, this initial scheme is extended by replacing the set of negative labels by its best single representative: either the negative label nearest to the sample features or the centroid of the cluster of negative labels. This idea creates a new subset of models which are analyzed in detail.The outputs of the proposed models are the distances (in embedding space) between each sample and the label prototypes. These distances can be used to perform classification (minimum distance label), features dimensionality reduction (using the distances and the embeddings instead of the original features) and data visualization (with 2 or 3D embeddings).Although the proposed models are generic, their application and performance evaluation is done here for network intrusion detection, characterized by noisy and unbalanced labels and a challenging classification of the various types of attacks. Empirical results of the model applied to intrusion detection are presented in detail for two well-known intrusion detection datasets, and a thorough set of classification and clustering performance evaluation metrics are included.  相似文献   

7.
研究者目前通常通过标注标签之间的相关信息研究标签之间的相关性,未考虑未标注标签与标注标签之间的关系对标签集质量的影响.受K近邻的启发,文中提出近邻标签空间的非平衡化标签补全算法(NeLC-NLS),旨在充分利用近邻空间中元素的相关性,提升近邻标签空间的质量,从而提升多标签分类性能.首先利用标签之间的信息熵衡量标签之间关系的强弱,获得基础标签置信度矩阵.然后利用提出的非平衡标签置信度矩阵计算方法,获得包含更多信息的非平衡标签置信度矩阵.继而度量样本在特征空间中的相似度,得到k个近邻标签空间样本,并利用非平衡标签置信度矩阵计算得到近邻标签空间的标签补全矩阵.最后利用极限学习机作为线性分类器进行分类.在公开的8个基准多标签数据集上的实验表明,NeLC-NLS具有一定优势,使用假设检验和稳定性分析进一步说明算法的有效性.  相似文献   

8.
A clean map visualization requires the fewest possible overlaps and depends on how labels are attached to point features. In this paper, we address the cartographic label placement variant problem whose objective is to label a set of points maximizing the number of conflict‐free points. Thus, we propose a hybrid data mining heuristic to solve the point‐feature cartographic label placement problem based on a clustering search (CS) heuristic, a state‐of‐the‐art method for this problem. Although several works have investigated the combination of data mining and multistart metaheuristics, this is the first time data mining has been used to improve CS and simulated annealing based heuristics. Computational experiments showed that the proposed hybrid heuristic was able to reach better cost solutions than the original strategy, with the same time effort. The proposed heuristic also could find almost all known optimal solutions and improved most of the best results for the set of large instances reported so far in the literature.  相似文献   

9.
传统单标签挖掘技术研究中,每个样本只属于一个标签且标签之间两两互斥。而在多标签学习问题中,一个样本可能对应多个标签,并且各标签之间往往具有关联性。目前,标签间关联性研究逐渐成为多标签学习研究的热门问题。首先为适应大数据环境,对传统关联规则挖掘算法Apriori进行并行化改进,提出基于Hadoop的并行化算法Apriori_ING,实现各节点独立完成候选项集的生成、剪枝与支持数统计,充分发挥并行化的优势;通过Apriori_ING算法得到的频繁项集和关联规则生成标签集合,提出基于推理机的标签集合生成算法IETG。然后,将标签集合应用到多标签学习中,提出多标签学习算法FreLP。FreLP利用关联规则生成标签集合,将原始标签集分解为多个子集,再使用LP算法训练分类器。通过实验将FreLP与现有的多标签学习算法进行对比,结果表明在不同评价指标下所提算法可以取得更好的结果。  相似文献   

10.
Medication label design is frequently a contributing factor to medication errors. Design regulations and recommendations have been predominantly aimed at manufacturers’ product labels. Pharmacy-generated labels have received less scrutiny despite being an integral artifact throughout the medication use process. This article is an account of our efforts to improve the design of a hospital’s intravenous (IV) medication labels. Our analysis revealed a set of interrelated processes and stakeholders that restrict the range of feasible label designs. The technological and system constraints likely vary among hospitals and represent significant barriers to developing and implementing specific design standards. We propose both an ideal IV label design and one that adheres to the current constraints of the hospital under study.

Relevance to industry

Hospitals are tasked with creating customized medication labels with minimal guidance. Our process, findings, and proposed labels provide insight for similar investigations at other institutions.  相似文献   

11.
The performances of two different estimators of a discriminant function of a statistical pattern recognizer are compared. One estimator is based on binary label values of the objects of the learning set (hard labels) and the other on continuous or multi-discrete label values in the interval [0,1] (fuzzy labels). By the latter estimator more detailed a priori knowledge of the contributing learning objects is used. In a discrete feature space, in which a multi-nomial distribution function has been assumed to exist, the expected classification error, based on fuzzy labels, can be more accurate than the one based on hard  相似文献   

12.
Crowdsourcing provides an effective and low-cost way to collect labels from crowd workers. Due to the lack of professional knowledge, the quality of crowdsourced labels is relatively low. A common approach to addressing this issue is to collect multiple labels for each instance from different crowd workers and then a label integration method is used to infer its true label. However, to our knowledge, almost all existing label integration methods merely make use of the original attribute information and do not pay attention to the quality of the multiple noisy label set of each instance. To solve these issues, this paper proposes a novel three-stage label integration method called attribute augmentation-based label integration (AALI). In the first stage, we design an attribute augmentation method to enrich the original attribute space. In the second stage, we develop a filter to single out reliable instances with high-quality multiple noisy label sets. In the third stage, we use majority voting to initialize integrated labels of reliable instances and then use cross-validation to build multiple component classifiers on reliable instances to predict all instances. Experimental results on simulated and real-world crowdsourced datasets demonstrate that AALI outperforms all the other state-of-the-art competitors.  相似文献   

13.
Minimum Squared Error Classification (MSEC) is a learning method for predicting the class labels of samples in real time. However, as a regression algorithm, MSEC tries its best to map the training samples into their class labels using a linear projection without considering the manifold structure of the data. In this paper, we introduce a supervised label learning framework using an effective manifold learning strategy. This method which is referred to as Manifold Supervised Label Prediction (MSLP) generalizes MSEC objective function to incorporate intra-class relationships of data. Thus, in addition to relying on the relationship between a training sample and its label, we propose to also learn the relationship between the training samples while transforming them. As a testbed for MSLP, we apply it to an image identification venue in which image samples with a very low spatial resolution (16 × 16) are used. These images have been dramatically influenced by a down-sampling process in order to reduce their size and hence, improving over computation time. We also show that the blurring process for reducing the artifacts introduced by down-sampling serendipitously results in better identification accuracies. Finally, unlike MSEC that classifies a query sample based on the deviation between the predicted and the true class labels, we compare both the training and the query samples in the label prediction space. A set of comprehensive experiments on benchmark palmprint databases including Multispectral PolyU, PolyU 2D/3D, and PolyU Contact-free I shows meaningful improvements over existing state-of-the-art algorithms.  相似文献   

14.
基于特征映射的微博用户标签兴趣聚类方法   总被引:1,自引:1,他引:0  
针对现有的用户兴趣聚类方法没有考虑用户标签之间存在的语义相关性问题,提出了一种基于特征映射的微博用户标签兴趣聚类方法。首先,获取待分析用户及其所关注用户的用户标签,选取出现频数高于设定阈值的标签构建模糊矩阵的特征维;然后,考虑标签之间的语义相关性,利用特征映射的思想将用户标签根 据其与特征维标签之间的语义相似度映射到每个特征维下,计算每个特征维所对应的特征值;最后,利用模糊聚类得到了不同阈值下的用户兴趣聚类结果。实验结果表明,本文提出的基于特征映射的微博用户标签兴趣聚类方法有效地改善了用户兴趣聚类效果。  相似文献   

15.
基于PLSA主题模型的多标记文本分类   总被引:1,自引:1,他引:0  
为解决多标记文本分类时文本标记关系不明确以及特征维数 过大的问题,提出了基于概率隐语义分析(Probabilistic latent semantic analysis,PL SA)模型的多标记假设重用文本分类算法。该方法首先将训练样本通过PLSA模型映射到隐语 义空间,以文本的主题分布表示一篇文本,在去噪的同时可以大大降低数据维度。在此基础 上利用多标记假设重用算法(Multi label algorithm of hypothesis reuse,MAHR)进行 分类,由于经过PLSA降维后的特征组本身就具有语义信息,因此算法能够精确地挖掘出多标 记之间的关系并用于训练基分类器,从而避免了人为输入标记关系的缺陷。实验验证了该方 法能够充分利用PLSA降维得到的语义信息来改善多标记文本分类的性能。  相似文献   

16.
目前大部分已经存在的多标记学习算法在模型训练过程中所采用的共同策略是基于相同的标记属性特征集合预测所有标记类别.但这种思路并未对每个标记所独有的标记特征进行考虑.在标记空间中,这种标记特定的属性特征对于区分其它类别标记和描述自身特性是非常有帮助的信息.针对这一问题,本文提出了基于标记特定特征和相关性的ML-KNN改进算法MLF-KNN.不同于之前的多标记算法直接在原始训练数据集上进行操作,而是首先对训练数据集进行预处理,为每一种标记类别构造其特征属性,在得到的标记属性空间上进一步构造L1-范数并进行优化从而引入标记之间的相关性,最后使用改进后的ML-KNN算法进行预测分类.实验结果表明,在公开数据集image和yeast上,本文提出的算法MLF-KNN分类性能优于ML-KNN,同时与其它另外3种多标记学习算法相比也表现出一定的优越性.  相似文献   

17.
We introduce a new approach for finding overlapping clusters given pairwise similarities of objects. In particular, we relax the problem of correlation clustering by allowing an object to be assigned to more than one cluster. At the core of our approach is an optimization problem in which each data point is mapped to a small set of labels, representing membership in different clusters. The objective is to find a mapping so that the given similarities between objects agree as much as possible with similarities taken over their label sets. The number of labels can vary across objects. To define a similarity between label sets, we consider two measures: (i) a 0–1 function indicating whether the two label sets have non-zero intersection and (ii) the Jaccard coefficient between the two label sets. The algorithm we propose is an iterative local-search method. The definitions of label set similarity give rise to two non-trivial optimization problems, which, for the measures of set-intersection and Jaccard, we solve using a greedy strategy and non-negative least squares, respectively. We also develop a distributed version of our algorithm based on the BSP model and implement it using a Pregel framework. Our algorithm uses as input pairwise similarities of objects and can thus be applied when clustering structured objects for which feature vectors are not available. As a proof of concept, we apply our algorithms on three different and complex application domains: trajectories, amino-acid sequences, and textual documents.  相似文献   

18.
随着数据采集设备与建模技术的进步,如何高效地对三维模型进行分析与检索,成为目前几何处理领域的研究热点。当前,有许多工作都集中在模型的分类上,但是大多仅能处理单一标签。在处理多标签问题时,不仅耗费大量时间还忽略了标签之间和样本之间的关联关系。针对该问题,提出了采用多标签传播的三维模型标注方法。其核心在于利用标签相关性与样本之间的关联关系探索到整个样本空间的多标签标注潜力。具体来说,给定一小部分样本的多标签信息,再将多标签信息通过这些标注样本传播到空间中无标注的样本之上。传播的过程主要依靠迭代融合标签信息与动态度量,充分考虑了标签之间与样本之间的关联关系,最终得到整个空间的标注结果。在一些三维模型的标准数据集上(如普林斯顿形状标准模型数据库)进行实验测试,结果证明,只需要少量的交互就能快速地得到较为精确的结果。  相似文献   

19.
胡峰  刘鑫  邓维斌  代劲  刘群 《控制与决策》2023,38(6):1753-1760
偏标记学习是一种弱监督学习框架,它试图从样本的多个候选标签中选择唯一正确的标签.消歧是偏标记学习中的一种重要手段,主要通过算法判别潜在的真实标签.目前,人们普遍采用单一的特征空间或者标签空间进行消歧,容易导致算法受到不准确先验知识的引导而陷入鞍点.针对消歧过程中特征相似样本易受到异类样本影响从而影响消歧效果这一问题,定义了样本离异点和离异图;在此基础上,提出一种离异图引导消歧的偏标记学习方法.该方法利用标签空间的差异构建离异图,可以有效结合特征空间的相似性和标签空间的差异性,降低离异点为消歧过程带来的潜在风险.实验结果表明,与PLKNN、IPAL、SURE、PL-AGGD、SDIM、PL-BLC、PRODEN等方法相比较,所提出的算法在偏标签学习方法中表现更好,能够取得良好的消歧效果.  相似文献   

20.
Multi-label learning is more complicated than single-label learning since the semantics of the instances are usually overlapped and not identical. The effectiveness of many algorithms often fails when the correlations in the feature and label space are not fully exploited. To this end, we propose a novel non-negative matrix factorization (NMF) based modeling and training algorithm that learns from both the adjacencies of the instances and the labels of the training set. In the modeling process, a set of generators are constructed, and the associations among generators, instances, and labels are set up, with which the label prediction is conducted. In the training process, the parameters involved in the process of modeling are determined. Specifically, an NMF based algorithm is proposed to determine the associations between generators and instances, and a non-negative least square optimization algorithm is applied to determine the associations between generators and labels. The proposed algorithm fully takes the advantage of smoothness assumption, so that the labels are properly propagated. The experimentswere carried out on six set of benchmarks. The results demonstrate the effectiveness of the proposed algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号