首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
多核学习(MKL)方法在分类及回归任务中均取得了优于单核学习方法的性能,但传统的MKL方法均用于处理两类或多类分类问题.为了使MKL方法适用于处理单类分类(OCC)问题,提出了基于中心核对齐(CKA)的单类支持向量机(OCSVM).首先利用CKA计算每个核矩阵的权重,然后将所得权重用作线性组合系数,进而将不同类型的核函...  相似文献   

2.
Multiple kernel learning (MKL) has recently become a hot topic in kernel methods. However, many MKL algorithms suffer from high computational cost. Moreover, standard MKL algorithms face the challenge of the rapid development of distributed computational environment such as cloud computing. In this study, a framework for parallel multiple kernel learning (PMKL) using hybrid alternating direction method of multipliers (H-ADMM) is developed to integrate the MKL algorithms and the multiprocessor system. The global problem with multiple kernel is divided into multiple local problems each of which is optimized in a local processor with a single kernel. An H-ADMM is proposed to make the local processors coordinate with each other to achieve the global optimal solution. The results of computational experiments show that PMKL exhibits high classification accuracy and fast computational speed.  相似文献   

3.
贾俊芳 《计算机应用》2011,31(8):2134-2137
针对传统主动学习(AL)方法对大规模的无标记样本分类收敛速度过慢的问题,提出了基于层次聚类(HC)的主动学习训练算法--HC_AL方法。通过对大规模的未标记数据进行层次聚类,并对每个层次上的类中心打标记来代替该层次上的类标记,然后将该层次上具有错误标记的类中心加入训练集。在数据集上的实验取得了较好的泛化能力和较快的收敛速度。实验结果表明通过采用分层细化、逐步求精的方法,可使主动学习的收敛速度大大提高,同时获得较为满意的学习能力。  相似文献   

4.
近年来,集成学习(Ensemble Learning,EL)分类方法成为土地覆被分类的研究热点,尤其是Boosting集成分类方法具有分类精度高、泛化能力强,在土地覆被分类中得到了显著的应用。但是,Boosting集成分类方法对噪声很敏感,如果训练样本含有噪声时,Boosting算法可能会失效,这是该方法的局限性。为了解决Boosting集成方法在土地覆被分类中存在的问题,有效克服噪声的影响,减少分类结果中的“椒盐”现象和提高分类精度,提出了基于双树复小波分解的Boosting集成学习分类方法。该方法对影像的光谱波段进行一层双树复小波分解,降低图像的噪声,将分解后的各波段作为Boosting集成学习的输入,得到最终的分类结果。实验先后比较了GBDT、XGBoost、LightGBM 3种Boosting集成学习算法在SPOT 6和Sentinel-2A影像上的分类效果。结果表明:(1)在SPOT 6影像上,3种Boosting集成算法总体分类精度均高于90%;DTCWTLightGBM分类总体精度最高,达到94.73%,Kappa系数为0.93,比LightGBM总体精度提高了1.1%...  相似文献   

5.
Minimal Learning Machine (MLM) is a recently proposed supervised learning algorithm with performance comparable to most state-of-the-art machine learning methods. In this work, we propose ensemble methods for classification and regression using MLMs. The goal of ensemble strategies is to produce more robust and accurate models when compared to a single classifier or regression model. Despite its successful application, MLM employs a computationally intensive optimization problem as part of its test procedure (out-of-sample data estimation). This becomes even more noticeable in the context of ensemble learning, where multiple models are used. Aiming to provide fast alternatives to the standard MLM, we also propose the Nearest Neighbor Minimal Learning Machine and the Cubic Equation Minimal Learning Machine to cope with classification and single-output regression problems, respectively. The experimental assessment conducted on real-world datasets reports that ensemble of fast MLMs perform comparably or superiorly to reference machine learning algorithms.  相似文献   

6.
In this work an algorithm is proposed for path planning in a rapidly changing environment. The algorithm is computationally cheap and generates a sub-optimal smooth path with bounds on the allowed velocity, acceleration, and jerk. The algorithm is designed for holonomic omniwheel platforms. It outperforms potential field algorithms regarding both convergence and optimality. Furthermore, it is able to adapt fast in a rapidly changing environment due to the low computational cost in the order of ms for a single update, in contrast with computationally more expensive methods such as wavefront algorithms and global optimization methods, where the computational cost is mostly on the order of seconds. The algorithm will be tested via simulations and experiments.  相似文献   

7.
汪敏  武禹伯  闵帆 《计算机应用》2020,40(12):3437-3444
针对传统岩性识别方法识别精度低,难以和地质经验有机结合的问题,提出了一种基于多种聚类算法和多元线性回归的多分类主动学习算法(ALCL)。首先,通过多种异构聚类算法聚类得到对应每种算法的类别矩阵,并通过查询公共点对类别矩阵进行标记和预分类;其次,提出优先级最大搜寻策略和最混乱查询策略选取用于训练聚类算法权重系数模型的关键实例;然后,定义目标求解函数,通过训练关键实例求解得到每种聚类算法的权重系数;最后,结合权重系数进行分类计算,从而对结果置信度高的样本进行分类。应用大庆油田油井的6个公开岩性数据集进行实验,实验结果表明,ALCL的分类精度最高时,比传统监督学习算法和其他主动学习算法提高了2.07%~14.01%。假设检验和显著性分析的结果验证了ALCL在岩性识别问题上具有更好的分类效果。  相似文献   

8.
汪敏  武禹伯  闵帆 《计算机应用》2005,40(12):3437-3444
针对传统岩性识别方法识别精度低,难以和地质经验有机结合的问题,提出了一种基于多种聚类算法和多元线性回归的多分类主动学习算法(ALCL)。首先,通过多种异构聚类算法聚类得到对应每种算法的类别矩阵,并通过查询公共点对类别矩阵进行标记和预分类;其次,提出优先级最大搜寻策略和最混乱查询策略选取用于训练聚类算法权重系数模型的关键实例;然后,定义目标求解函数,通过训练关键实例求解得到每种聚类算法的权重系数;最后,结合权重系数进行分类计算,从而对结果置信度高的样本进行分类。应用大庆油田油井的6个公开岩性数据集进行实验,实验结果表明,ALCL的分类精度最高时,比传统监督学习算法和其他主动学习算法提高了2.07%~14.01%。假设检验和显著性分析的结果验证了ALCL在岩性识别问题上具有更好的分类效果。  相似文献   

9.
特征抽取是图像识别的关键环节,准确的特征表达能够产生更准确的分类效果。采用软阈值编码器和正交匹配追踪(OMP)算法正交化视觉词典的方法,以提高单级计算结构的识别率,并进一步构造两级计算结构,获取图像更准确的特征,以提高图像的识别率。实验表明,采用软阈值编码器和OMP算法能提高单级计算结构提取特征的能力,提高大样本数据集中图像的识别率。两级计算结构能够提高自选数据集中图像的识别率。采用OMP算法能提高VOC2012数据中图像的识别率。在自选数据集上,两级计算结构优于单级计算结构,与NIN结构相比表现出优势,与卷积神经网络CNN相当,说明两级计算结构在自选数据集上有很好的适应性。  相似文献   

10.
基于集成的非均衡数据分类主动学习算法   总被引:1,自引:0,他引:1  
当前,处理类别非均衡数据采用的主要方法之一就是预处理,将数据均衡化之后采取传统的方法加以训练.预处理的方法主要有过取样和欠取样,然而过取样和欠取样都有自己的不足,提出拆分提升主动学习算法SBAL( Split-Boost Active Learning),该算法将大类样本集根据非均衡比例分成多个子集,子集与小类样本集合并,对其采用AdaBoost算法训练子分类器,然后集成一个总分类器,并基于QBC( Query-by-committee)主动学习算法主动选取有效样本进行训练,基本避免了由于增加样本或者减少样本所带来的不足.实验表明,提出的算法对于非均衡数据具有更高的分类精度.  相似文献   

11.

In the fields of pattern recognition and machine learning, the use of data preprocessing algorithms has been increasing in recent years to achieve high classification performance. In particular, it has become inevitable to use the data preprocessing method prior to classification algorithms in classifying medical datasets with the nonlinear and imbalanced data distribution. In this study, a new data preprocessing method has been proposed for the classification of Parkinson, hepatitis, Pima Indians, single proton emission computed tomography (SPECT) heart, and thoracic surgery medical datasets with the nonlinear and imbalanced data distribution. These datasets were taken from UCI machine learning repository. The proposed data preprocessing method consists of three steps. In the first step, the cluster centers of each attribute were calculated using k-means, fuzzy c-means, and mean shift clustering algorithms in medical datasets including Parkinson, hepatitis, Pima Indians, SPECT heart, and thoracic surgery medical datasets. In the second step, the absolute differences between the data in each attribute and the cluster centers are calculated, and then, the average of these differences is calculated for each attribute. In the final step, the weighting coefficients are calculated by dividing the mean value of the difference to the cluster centers, and then, weighting is performed by multiplying the obtained weight coefficients by the attribute values in the dataset. Three different attribute weighting methods have been proposed: (1) similarity-based attribute weighting in k-means clustering, (2) similarity-based attribute weighting in fuzzy c-means clustering, and (3) similarity-based attribute weighting in mean shift clustering. In this paper, we aimed to aggregate the data in each class together with the proposed attribute weighting methods and to reduce the variance value within the class. Thus, by reducing the value of variance in each class, we have put together the data in each class and at the same time, we have further increased the discrimination between the classes. To compare with other methods in the literature, the random subsampling has been used to handle the imbalanced dataset classification. After attribute weighting process, four classification algorithms including linear discriminant analysis, k-nearest neighbor classifier, support vector machine, and random forest classifier have been used to classify imbalanced medical datasets. To evaluate the performance of the proposed models, the classification accuracy, precision, recall, area under the ROC curve, κ value, and F-measure have been used. In the training and testing of the classifier models, three different methods including the 50–50% train–test holdout, the 60–40% train–test holdout, and tenfold cross-validation have been used. The experimental results have shown that the proposed attribute weighting methods have obtained higher classification performance than random subsampling method in the handling of classifying of the imbalanced medical datasets.

  相似文献   

12.
Feature selection, both for supervised as well as for unsupervised classification is a relevant problem pursued by researchers for decades. There are multiple benchmark algorithms based on filter, wrapper and hybrid methods. These algorithms adopt different techniques which vary from traditional search-based techniques to more advanced nature inspired algorithm based techniques. In this paper, a hybrid feature selection algorithm using graph-based technique has been proposed. The proposed algorithm has used the concept of Feature Association Map (FAM) as an underlying foundation. It has used graph-theoretic principles of minimal vertex cover and maximal independent set to derive feature subset. This algorithm applies to both supervised and unsupervised classification. The performance of the proposed algorithm has been compared with several benchmark supervised and unsupervised feature selection algorithms and found to be better than them. Also, the proposed algorithm is less computationally expensive and hence has taken less execution time for the publicly available datasets used in the experiments, which include high-dimensional datasets.  相似文献   

13.
Clustering has been widely used as a fundamental data mining tool for the automated analysis of complex datasets. There has been a growing need for the use of clustering algorithms in embedded systems with restricted computational capabilities, such as wireless sensor nodes, in order to support automated knowledge extraction from such systems. Although there has been considerable research on clustering algorithms, many of the proposed methods are computationally expensive. We propose a robust clustering algorithm with low computational complexity, suitable for computationally constrained environments. Our evaluation using both synthetic and real-life datasets demonstrates lower computational complexity and comparable accuracy of our approach compared to a range of existing methods.  相似文献   

14.
We present a meta-learning method to support selection of candidate learning algorithms. It uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the meta-learning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the meta-learning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.  相似文献   

15.
传统分类算法一般要求数据集类别分布平衡,然而在实际情况中往往面临的是不平衡的类别分布。目前存在的数据层面和模型层面算法试图从不同角度解决该问题,但面临着参数选择以及重复采样产生的额外计算等问题。针对此问题,提出了一种在小批量内样本损失自适应均衡化的方法。该算法采用了一种动态学习损失函数的方式,根据小批量内样本标签信息调整各样本损失权重,从而实现在小批量内各类别样本总损失的平衡性。通过在caltech101和ILSVRC2014数据集上的实验表明,该算法能够有效地减少计算成本并提高分类精度,且一定程度上避免了过采样方法所带来的模型过拟合风险。  相似文献   

16.
针对页岩气储层数据获取困难、标签稀缺、标注成本高昂的问题,提出一种多标准主动查询的多标签学习(MAML)算法.首先,考虑样本的信息性和代表性来对样本进行初步处理;其次,加入包括属性差异性和标签丰富性的样本丰富性约束,在此基础上选择有价值的样本进行标签查询;最后,利用多标签学习算法来预测剩余样本的标签.通过11个Yaho...  相似文献   

17.
正则化路径算法是数值求解支持向量机 (support vector machine, SVM)分类问题的有效方法,它可在相当于一次SVM求解的时间复杂度内得到所有的正则化参数及对应SVM的解.现有的SVM正则化路径算法或者不能处理具有重复数据、近似数据或线性相关数据,或者计算开销较大.针对这些问题,应用正定矩阵方程组求解方法来求解SVM正则化路径,提出正定矩阵SVM正则化路径算法(positive definite SVM path, PDSVMP).PDSVMP算法将迭代方程组的系数矩阵转换为正定矩阵,并采用Cholesky分解方法求解路径上各拐点处Lagrange乘子增量向量;与已有算法中直接求解正则化参数不同,该算法根据活动集变化情况确定参数增量,并在此基础上计算正则化参数,这样保证了理论正确性和数值稳定性,并可降低计算复杂性.实例数据集及标准数据集上的实验表明,PDSVMP算法可正确处理包含重复数据、近似数据或线性相关数据的数据集,并具有较高的计算效率.  相似文献   

18.
针对一般遗留物检测算法运算量大和难以适应遮挡情况的问题,提出了一种静止单摄像机条件下快速有效的遗留物检测算法。算法建立了两个基于累积均值更新法的背景模型,分别称之为纯背景模型和脏背景模型。通过两个背景的差别得到静止目标块,并对静止目标块进行跟踪,当静止目标停留超过设定的时间即判定其为遗留物并触发报警。由于算法避免了使用复杂度数学概率背景模型,大大减低了背景更新的计算复杂度,使算法能满足视频监控系统实时处理的要求。同时,算法在静止目标跟踪模块中增加了碰撞帧数计数使遮挡情况下的遗留物跟踪得到更好的效果。在PETS2006数据集提供的多个视频序列实验中,该算法显示了良好的性能。  相似文献   

19.
The large volume of data and computational complexity of algorithms limit the application of hyperspectral image classification to real-time operations. This work addresses the use of different parallel processing techniques to speed up the Markov random field (MRF)-based method to perform spectral-spatial classification of hyperspectral imagery. The Metropolis relaxation labelling approach is modified to take advantage of multi-core central processing units (CPUs) and to adapt it to massively parallel processing systems like graphics processing units (GPUs). The experiments on different hyperspectral data sets revealed that the implementation approach has a huge impact on the execution time of the algorithm. The results demonstrated that the modified MRF algorithm produced classification accuracy similar to conventional methods with greatly improved computational performance. With modern multi-core CPUs, good computational speed-up can be achieved even without additional hardware support. The CPU-GPU hybrid framework rendered the otherwise computationally expensive approach suitable for time-constrained applications.  相似文献   

20.
张凯军  梁循 《自动化学报》2014,40(10):2288-2294
在支持向量机(Support vector machine, SVM)中, 对核函数的定义非常重要, 不同的核会产生不同的分类结果. 如何充分利用多个不同核函数的特点, 来共同提高SVM学习的效果, 已成为一个研究热点. 于是, 多核学习(Multiple kernel learning, MKL)方法应运而生. 最近, 有的学者提出了一种简单有效的稀疏MKL算法,即GMKL (Generalized MKL)算法, 它结合了L1 范式和L2范式的优点, 形成了一个对核权重的弹性限定. 然而, GMKL算法也并没有考虑到如何在充分利用已经选用的核函数中的共有信息. 另一方面, MultiK-MHKS算法则考虑了利用典型关联分析(Canonical correlation analysis, CCA)来获取核函数之间的共有信息, 但是却没有考虑到核函数的筛选问题. 本文模型则基于这两种算法进行了一定程度的改进, 我们称我们的算法为改进的显性多核支持向量机 (Improved domain multiple kernel support vector machine, IDMK-SVM). 我们证明了本文的模型保持了GMKL 的特性, 并且证明了算法的收敛性. 最后通过模拟实验, 本文证明了本文的多核学习方法相比于传统的多核学习方法有一定的精确性优势.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号