首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对快速分类算法最优路径森林(OPF)分类算法进行了研究。进行了OPF分类算法研究及应用现状的调查。OPF算法是近期兴起的一种基于完全图的分类算法。在一些公共数据集上与支持向量机(SVM)、人工神经网络(ANN)等算法的对比中,该算法能取得类似或更好结果,速度更快。该算法不依赖于任何参数、不需要参数优化、不需要对各类别的形状做任何假设、能够处理多类问题。旨在全面系统的向国内读者介绍OPF算法的研究及应用进展。  相似文献   

2.
We propose two new methods to label connected components based on iterative recursion: one directly labels an original binary image while the other labels the boundary voxels followed by one-pass labelling of non-boundary object voxels. The novelty of the proposed methods is a fast labelling of large datasets without stack overflow and a flexible trade-off between speed and memory. For each iterative recursion: (1) the original volume is scanned in the raster order and an initially unlabelled object voxel v is selected, (2) a sub-volume with a user-defined size is formed around the selected voxel v, (3) within this sub-volume all object voxels 26-connected to v are labelled using iterations; and (4) subsequent iterative recursions are initiated from those border object voxels of the sub-volume that are 26-connected to v. Our experiments show that the time-memory trade-off is that the decrease in the execution time by one-third requires the increase in memory size by 3 orders. This trade-off is controlled by the user by changing the size of the sub-volume. Experiments on large three-dimensional brain phantom datasets (362 × 432 × 362 voxels of 56 MB (megabytes)) show that the proposed methods are three times faster on the average (with the maximum speedup of 10) than the existing iterative methods based on label equivalences with less than 1 MB memory consumption. Moreover, our algorithms are applicable to any dimensional data and are less dependant on the geometric complexity of connected components.  相似文献   

3.
Multi-label learning deals with objects associated with multiple class labels, and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance. Since each class might possess its own characteristics, the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning, where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations. As a representative approach, LIFT generates label-specific features by conducting clustering analysis. However, its performance may be degraded due to the inherent instability of the single clustering algorithm. To improve this, a novel multi-label learning approach named SENCE (stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble) is proposed, which stabilizes the generation process of label-specific features via clustering ensemble techniques. Specifically, more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization (EM) algorithm. Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.   相似文献   

4.
李延超  肖甫  陈志  李博 《软件学报》2020,31(12):3808-3822
主动学习从大量无标记样本中挑选样本交给专家标记.现有的批抽样主动学习算法主要受3个限制:(1)一些主动学习方法基于单选择准则或对数据、模型设定假设,这类方法很难找到既有不确定性又有代表性的未标记样本;(2)现有批抽样主动学习方法的性能很大程度上依赖于样本之间相似性度量的准确性,例如预定义函数或差异性衡量;(3)噪声标签问题一直影响批抽样主动学习算法的性能.提出一种基于深度学习批抽样的主动学习方法.通过深度神经网络生成标记和未标记样本的学习表示和采用标签循环模式,使得标记样本与未标记样本建立联系,再回到相同标签的标记样本.这样同时考虑了样本的不确定性和代表性,并且算法对噪声标签具有鲁棒性.在提出的批抽样主动学习方法中,算法使用的子模块函数确保选择的样本集合具有多样性.此外,自适应参数的优化,使得主动学习算法可以自动平衡样本的不确定性和代表性.将提出的主动学习方法应用到半监督分类和半监督聚类中,实验结果表明,所提出的主动学习方法的性能优于现有的一些先进的方法.  相似文献   

5.
Indefinite kernels have attracted more and more attentions in machine learning due to its wider application scope than usual positive definite kernels. However, the research about indefinite kernel clustering is relatively scarce. Furthermore, existing clustering methods are mainly designed based on positive definite kernels which are incapable in indefinite kernel scenarios. In this paper, we propose a novel indefinite kernel clustering algorithm termed as indefinite kernel maximum margin clustering (IKMMC) based on the state-of-the-art maximum margin clustering (MMC) model. IKMMC tries to find a proxy positive definite kernel to approximate the original indefinite one and thus embeds a new F-norm regularizer in the objective function to measure the diversity of the two kernels, which can be further optimized by an iterative approach. Concretely, at each iteration, given a set of initial class labels, IKMMC firstly transforms the clustering problem into a classification one solved by indefinite kernel support vector machine (IKSVM) with an extra class balance constraint and then the obtained prediction labels will be used as the new input class labels at next iteration until the error rate of prediction is smaller than a prespecified tolerance. Finally, IKMMC utilizes the prediction labels at the last iteration as the expected indices of clusters. Moreover, we further extend IKMMC from binary clustering problems to more complexmulti-class scenarios. Experimental results have shown the superiority of our algorithms.  相似文献   

6.
Multi-view clustering has become an important extension of ensemble clustering. In multi-view clustering, we apply clustering algorithms on different views of the data to obtain different cluster labels for the same set of objects. These results are then combined in such a manner that the final clustering gives better result than individual clustering of each multi-view data. Multi view clustering can be applied at various stages of the clustering paradigm. This paper proposes a novel multi-view clustering algorithm that combines different ensemble techniques. Our approach is based on computing different similarity matrices on the individual datasets and aggregates these to form a combined similarity matrix, which is then used to obtain the final clustering. We tested our approach on several datasets and perform a comparison with other state-of-the-art algorithms. Our results show that the proposed algorithm outperforms several other methods in terms of accuracy while maintaining the overall complexity of the individual approaches.  相似文献   

7.

The paper presents different clustering approaches in legal judgments from the Special Civil Court located at the Federal University of Santa Catarina (JEC/UFSC). The subject is Consumer Law, specifically cases in which consumers claim moral and material compensation from airlines for service failures. To identify patterns from the dataset, we apply four types of clustering algorithms: Hierarchical and Lingo (soft clustering), K-means and Affinity Propagation (hard clustering). We evaluate the results based on the following criteria: (1) entropy and purity; (2) algorithm's ability in providing labels; (3) legal expert’s evaluation; and (4) experimental complexity. The results demonstrate that the most advantageous approach is Hierarchical Clustering, since it has the best entropy and purity numbers, as well as the least difficulty for the expert to analyze the clusters, and the least experimental complexity. The main contribution of the paper is to show the advantages and disadvantages of each approach, especially to identify labels in unstructured and non-indexed legal texts.

  相似文献   

8.
Hybrid mining approach in the design of credit scoring models   总被引:1,自引:0,他引:1  
Unrepresentative data samples are likely to reduce the utility of data classifiers in practical application. This study presents a hybrid mining approach in the design of an effective credit scoring model, based on clustering and neural network techniques. We used clustering techniques to preprocess the input samples with the objective of indicating unrepresentative samples into isolated and inconsistent clusters, and used neural networks to construct the credit scoring model. The clustering stage involved a class-wise classification process. A self-organizing map clustering algorithm was used to automatically determine the number of clusters and the starting points of each cluster. Then, the K-means clustering algorithm was used to generate clusters of samples belonging to new classes and eliminate the unrepresentative samples from each class. In the neural network stage, samples with new class labels were used in the design of the credit scoring model. The proposed method demonstrates by two real world credit data sets that the hybrid mining approach can be used to build effective credit scoring models.  相似文献   

9.
应用分类方法进行聚类评价*   总被引:1,自引:1,他引:0  
针对现有基于几何结构的聚类有效性指标不能有效解决不同结构数据的聚类结果评价问题,提出了一种使用分类对聚类结果进行评价的方法。该方法把聚类得到的对象类标志作为分类问题的已知类标志,使用交叉验证法对数据集重新分类,通过对比聚类结果与分类结果之间的差异来衡量聚类有效性。一个易于聚类的数据集的结构意味着也容易进行分类,对模拟数据和真实数据的实验和分析验证了该方法的可行性和有效性。  相似文献   

10.
Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of key-frames is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overcome the difficulty of having prior knowledge of the scene duration, the shots are clustered into groups based only on their visual similarity and a label is assigned to each shot according to the group that it belongs to. Then, a sequence alignment algorithm is applied to detect when the pattern of shot labels changes, providing the final scene segmentation result. In this way shot similarity is computed based only on visual features, while ordering of shots is taken into account during sequence alignment. To cluster the shots into groups we propose an improved spectral clustering method that both estimates the number of clusters and employs the fast global k-means algorithm in the clustering stage after the eigenvector computation of the similarity matrix. The same spectral clustering method is applied to extract the key-frames of each shot and numerical experiments indicate that the content of each shot is efficiently summarized using the method we propose herein. Experiments on TV-series and movies also indicate that the proposed scene detection method accurately detects most of the scene boundaries while preserving a good tradeoff between recall and precision.  相似文献   

11.
目前,搜索结果聚类方法大多数采用基于文档的方法,不能生成有意义的聚类标签。为了解决这个问题,提出一种基于关键名词短语聚类的中文搜索结果聚类方法,该方法将名词短语、相关搜索词作为候选聚类标签,利用C-Value算法、IDF值筛选标签,然后使用Chameleon算法将标签聚类,最后将搜索结果划分到最相关的聚类簇。实验证明,该方法把关键名词短语和相关搜索词作为聚类标签,有效地提高了标签的描述性,降低了聚类算法的时间复杂度。  相似文献   

12.
Guiwu Wei 《Knowledge》2011,24(5):672-679
In this paper, the dynamic hybrid multiple attribute decision making problems, in which the decision information, provided by decision makers at different periods, is expressed in real numbers, interval numbers or linguistic labels (linguistic labels can be described by triangular fuzzy numbers), respectively, are investigated. The method first utilizes three different GRA (grey relational analysis (real-valued GRA, interval-valued GRA and fuzzy-valued GRA) to calculate the individual grey relational degree of each alternative to the positive and negative ideal alternatives based on the decision information expressed in real numbers, interval numbers and linguistic labels, respectively, provided by each decision maker at each period, and then adopt the concept of fuzzy membership grade and clustering to aggregate the grey relational degree of all the evaluated periods. Finally, an illustrative example is given to verify the developed approach and to demonstrate its practicality and effectiveness.  相似文献   

13.
刘琰琼  张文生  李益群  杨柳 《计算机工程》2011,37(5):207-209,212
传统聚类方法处理的是同构数据,无法满足异构数据同时聚类的应用需求,聚类结果的准确率较低,标签可读性较差。针对上述问题,提出一种基于电阻网络的异构数据协同聚类算法。该算法将异构关联数据抽象为多部图形式的电阻网络,进行特征计算及聚类。在对异构数据进行协同聚类后,可以得到一种聚类结构,其中每一类包含多种异构数据,它们之间可以互为标签,标签可读性高。实验结果证明,该方法是一种切实可行且效果优异的数据聚类算法。  相似文献   

14.
Spectral clustering (SC) is currently one of the most popular clustering techniques because of its advantages over conventional approaches such as K-means and hierarchical clustering. However, SC requires the use of computing eigenvectors, making it time consuming. To overcome this limitation, Lin and Cohen proposed the power iteration clustering (PIC) technique (Lin and Cohen in Proceedings of the 27th International Conference on Machine Learning, pp. 655–662, 2010), which is a simple and fast version of SC. Instead of finding the eigenvectors, PIC finds only one pseudo-eigenvector, which is a linear combination of the eigenvectors in linear time. However, in certain critical situations, using only one pseudo-eigenvector is not enough for clustering because of the inter-class collision problem. In this paper, we propose a novel method based on the deflation technique to compute multiple orthogonal pseudo-eigenvectors (orthogonality is used to avoid redundancy). Our method is more accurate than PIC but has the same computational complexity. Experiments on synthetic and real datasets demonstrate the improvement of our approach.  相似文献   

15.
基于特征映射的微博用户标签兴趣聚类方法   总被引:1,自引:1,他引:0  
针对现有的用户兴趣聚类方法没有考虑用户标签之间存在的语义相关性问题,提出了一种基于特征映射的微博用户标签兴趣聚类方法。首先,获取待分析用户及其所关注用户的用户标签,选取出现频数高于设定阈值的标签构建模糊矩阵的特征维;然后,考虑标签之间的语义相关性,利用特征映射的思想将用户标签根 据其与特征维标签之间的语义相似度映射到每个特征维下,计算每个特征维所对应的特征值;最后,利用模糊聚类得到了不同阈值下的用户兴趣聚类结果。实验结果表明,本文提出的基于特征映射的微博用户标签兴趣聚类方法有效地改善了用户兴趣聚类效果。  相似文献   

16.
Traditional supervised learning requires the groundtruth labels for the training data, which can be difficult to collect in many cases. In contrast, crowdsourcing learning collects noisy annotations from multiple non-expert workers and infers the latent true labels through some aggregation approach. In this paper, we notice that existing deep crowdsourcing work does not sufficiently model worker correlations, which is, however, shown to be helpful for learning by previous non-deep learning approaches. We propose a deep generative crowdsourcing learning approach to incorporate the strengths of Deep Neural Networks (DNNs) and exploit worker correlations. The model comprises a DNN classifier as a prior and an annotation generation process. A mixture model of workers'' capabilities within each class is introduced into the annotation generation process for worker correlation modeling. For adaptive trade-off between model complexity and data fitting, we implement fully Bayesian inference. Based on the natural-gradient stochastic variational inference techniques developed for the Structured Variational AutoEncoder (SVAE), we combine variational message passing for conjugate parameters and stochastic gradient descent for DNN parameters into a unified framework for efficient end-to-end optimization. Experimental results on 22 real crowdsourcing datasets demonstrate the effectiveness of the proposed approach.  相似文献   

17.
李绍园  韦梦龙  黄圣君 《软件学报》2022,33(4):1274-1286
传统监督学习需要训练样本的真实标记信息,而在很多情况下,真实标记并不容易收集.与之对比,众包学习从多个可能犯错的非专家收集标注,通过某种融合方式估计样本的真实标记.注意到现有深度众包学习工作对标注者相关性建模不足,而非深度众包学习方面的工作表明,标注者相关性建模利用有助于改善学习效果.提出一种深度生成式众包学习方法,以...  相似文献   

18.
This paper presents a novel method for intensity normalization of DaTSCAN SPECT brain images. The proposed methodology is based on Gaussian mixture models (GMMs) and considers not only the intensity levels, but also the coordinates of voxels inside the so-defined spatial Gaussian functions. The model parameters are obtained according to a maximum likelihood criterion employing the expectation maximization (EM) algorithm. First, an averaged control subject image is computed to obtain a threshold-based mask that selects only the voxels inside the skull. Then, the GMM is obtained for the DaTSCAN-SPECT database, performing space quantization by populating it with Gaussian kernels whose linear combination approximates the image intensity. According to a probability threshold that measures the weight of each kernel or “cluster” in the striatum area, the voxels in the non-specific region are intensity-normalized by removing clusters whose likelihood is negligible.  相似文献   

19.
The results of traditional clustering methods are usually unreliable as there is not any guidance from the data labels, while the class labels can be predicted more reliable by the semisupervised learning if the labels of partial data are given. In this paper, we propose an actively self-training clustering method, in which the samples are actively selected as training set to minimize an estimated Bayes error, and then explore semisupervised learning to perform clustering. Traditional graph-based semisupervised learning methods are not convenient to estimate the Bayes error; we develop a specific regularization framework on graph to perform semisupervised learning, in which the Bayes error can be effectively estimated. In addition, the proposed clustering algorithm can be readily applied in a semisupervised setting with partial class labels. Experimental results on toy data and real-world data sets demonstrate the effectiveness of the proposed clustering method on the unsupervised and the semisupervised setting. It is worthy noting that the proposed clustering method is free of initialization, while traditional clustering methods are usually dependent on initialization.  相似文献   

20.
为解决命名实体之间的复杂嵌套以及语料库中标注误差导致的相邻命名实体边界重叠问题,提出一种中文重叠命名实体识别方法。利用基于随机合并与拆分的层次化聚类算法将重叠命名实体标签划分到不同的聚类簇中,建立文字到实体标签之间的一对一关联关系,解决了实体标签聚类陷入局部最优的问题,并在每个标签聚类簇中采用融合中文部首的BiLSTM-CRF模型提高重叠命名实体的识别稳定性。实验结果表明,该方法通过标签聚类的方式有效避免标注误差对识别过程的干扰,F1值相比现有识别方法平均提高了0.05。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号