首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
并行的贝叶斯网络参数学习算法   总被引:2,自引:0,他引:2  
针对大样本条件下EM算法学习贝叶斯网络参数的计算问题,提出一种并行EM算法(Parallel EM,PL-EM)提高大样本条件下复杂贝叶斯网络参数学习的速度.PL-EM算法在E步并行计算隐变量的后验概率和期望充分统计因子;在M步,利用贝叶斯网络的条件独立性和完整数据集下的似然函数可分解性,并行计算各个局部似然函数.实验结果表明PL-EM为解决大样本条件下贝叶斯网络参数学习提供了一种有效的方法.  相似文献   

2.
Dynamic ensemble extreme learning machine based on sample entropy   总被引:1,自引:1,他引:0  
Extreme learning machine (ELM) as a new learning algorithm has been proposed for single-hidden layer feed-forward neural networks, ELM can overcome many drawbacks in the traditional gradient-based learning algorithm such as local minimal, improper learning rate, and low learning speed by randomly selecting input weights and hidden layer bias. However, ELM suffers from instability and over-fitting, especially on large datasets. In this paper, a dynamic ensemble extreme learning machine based on sample entropy is proposed, which can alleviate to some extent the problems of instability and over-fitting, and increase the prediction accuracy. The experimental results show that the proposed approach is robust and efficient.  相似文献   

3.
Ternary Error-Correcting Output Codes (ECOC), which can unify most of the state-of-the-art decomposition frameworks such as one-versus-one, one-versus-all, sparse coding, dense coding, etc., is considered more flexible to model multiclass classification problems than Binary ECOC. Meanwhile, there are many corresponding decoding strategies that have been proposed for Ternary ECOC in earlier literatures. Note that there is few working by posterior probabilities, which can be considered as a Bayes decision rule and hence obtain a better performance in usual. Passerini et al. (2004) [16] have recently proposed a decoding strategy based on posterior probabilities. However, according to the analyses of this paper, Passerini et al.'s (2004) [16] method suffers some defects and result in bias. To overcome that, we proposed a variation of it by refining the decomposition process of probability to get smoother estimates. Our bias–variance analysis shows that the decrease in error by our variant is due to a decrease in variance. Besides, we extended an efficient method of obtaining posterior probabilities based on the linear rule for decoding process in Binary ECOC to Ternary ECOC. On ten benchmark datasets, we observe that the two decoding strategies based on posterior probabilities in this paper obtain better performance than other ones in earlier references.  相似文献   

4.
为了解决机器学习中的主观信息缺失问题,提出一种新的面向共享数据的迁移组概率学习机(TGPLM-CD).该方法基于结构风险最小化模型,将源领域所含知识和目标领域的类标签组概率信息,特别是领域间的共享数据纳入学习框架中,实现了源领域和目标领域的知识迁移,在待研究领域数据信息不足的情况下提高了分类精确度.大量数据集上的实验结果验证了所提出方法的有效性.  相似文献   

5.
Microarray data are often characterized by high dimension and small sample size. There is a need to reduce its dimension for better classification performance and computational efficiency of the learning model. The minimum redundancy and maximum relevance (mRMR), which is widely explored to reduce the dimension of the data, requires discretization and setting of external parameters. We propose an incremental formulation of the trace of ratio of the scatter matrices to determine a relevant set of genes which does not involve discretization and external parameter setting. It is analytically shown that the proposed incremental formulation is computationally efficient in comparison to its batch formulation. Extensive experiments on 14 well-known available microarray cancer datasets demonstrate that the performance of the proposed method is better in comparison to the well-known mRMR method. Statistical tests also show that the proposed method is significantly better when compared to the mRMR method.  相似文献   

6.
Decision theory shows that the optimal decision is a function of the posterior class probabilities. More specifically, in binary classification, the optimal decision is based on the comparison of the posterior probabilities with some threshold. Therefore, the most accurate estimates of the posterior probabilities are required near these decision thresholds. This paper discusses the design of objective functions that provide more accurate estimates of the probability values, taking into account the characteristics of each decision problem. We propose learning algorithms based on the stochastic gradient minimization of these loss functions. We show that the performance of the classifier is improved when these algorithms behave like sample selectors: samples near the decision boundary are the most relevant during learning.  相似文献   

7.
基于多源的跨领域数据分类快速新算法   总被引:1,自引:0,他引:1  
顾鑫  王士同  许敏 《自动化学报》2014,40(3):531-547
研究跨领域学习与分类是为了将对多源域的有监督学习结果有效地迁移至目标域,实现对目标域的无标记分 类. 当前的跨领域学习一般侧重于对单一源域到目标域的学习,且样本规模普遍较小,此类方法领域自适应性较差,面对 大样本数据更显得无能为力,从而直接影响跨域学习的分类精度与效率. 为了尽可能多地利用相关领域的有用数据,本文 提出了一种多源跨领域分类算法(Multiple sources cross-domain classification,MSCC),该算法依据被众多实验证明有效的罗杰斯特回归模型与一致性方法构建多个源域分类器并综合指导目标域的数据分类. 为了充分高效利用大样本的 源域数据,满足大样本的快速运算,在MSCC的基础上,本文结合最新的CDdual (Dual coordinate descent method)算 法,提出了算法MSCC的快速算法MSCC-CDdual,并进行了相关的理论分析. 人工数据集、文本数据集与图像数据集的实 验运行结果表明,该算法对于大样本数据集有着较高的分类精度、快速的运行速度和较高的领域自适应性. 本文的主要贡 献体现在三个方面:1)针对多源跨领域分类提出了一种新的一致性方法,该方法有利于将MSCC算法发展为MSCC-CDdual快速算法;2)提出了MSCC-CDdual快速算法,该算法既适用于样本较少的数据集又适用于大样本数据集;3) MSCC-CDdual 算法在高维数据集上相比其他算法展现了其独特的优势.  相似文献   

8.
深度学习中多模态模型的训练通常需要大量高质量不同类型的标注数据,如图像、文本、音频等. 然而,获取大规模的多模态标注数据是一项具有挑战性和昂贵的任务.为了解决这一问题,主动学习作为一种有效的学习范式被广泛应用,能够通过有针对性地选择最有信息价值的样本进行标注,从而降低标注成本并提高模型性能. 现有的主动学习方法往往面临着低效的数据扫描和数据位置调整问题,当索引需要进行大范围的更新时,会带来巨大的维护代价. 为解决这些问题,本文提出了一种面向多模态模型训练的高效样本检索技术So-CBI. 该方法通过感知模型训练类间边界点,精确评估样本对模型的价值;并设计了半有序的高效样本索引,通过结合数据排序信息和部分有序性,降低了索引维护代价和时间开销. 在多组多模态数据集上通过与传统主动学习训练方法实验对比,验证了So-CBI方法在主动学习下的训练样本检索问题上的有效性.  相似文献   

9.
In the past few years, active learning has been reasonably successful and it has drawn a lot of attention. However, recent active learning methods have focused on strategies in which a large unlabeled dataset has to be reprocessed at each learning iteration. As the datasets grow, these strategies become inefficient or even a tremendous computational challenge. In order to address these issues, we propose an effective and efficient active learning paradigm which attains a significant reduction in the size of the learning set by applying an a priori process of identification and organization of a small relevant subset. Furthermore, the concomitant classification and selection processes enable the classification of a very small number of samples, while selecting the informative ones. Experimental results showed that the proposed paradigm allows to achieve high accuracy quickly with minimum user interaction, further improving its efficiency.  相似文献   

10.
We present a novel formulation for pattern recognition in biomedical data. We adopt a binary recognition scenario where a control dataset contains samples of one class only, while a mixed dataset contains an unlabeled collection of samples from both classes. The mixed dataset samples that belong to the second class are identified by estimating posterior probabilities of samples for being in the control or the mixed datasets. Experiments on synthetic data established a better detection performance against possible alternatives. The fitness of the method in biomedical data analysis was further demonstrated on real multi-color flow cytometry and multi-channel electroencephalography data.  相似文献   

11.
The concept of a “mutualistic teacher” is introduced for unsupervised learning of the mean vectors of the components of a mixture of multivariate normal densities, when the number of classes is also unknown. The unsupervised learning problem is formulated here as a multi-stage quasi-supervised problem incorporating a cluster approach. The mutualistic teacher creates a quasi-supervised environment at each stage by picking out “mutual pairs” of samples and assigning identical (but unknown) labels to the individuals of each mutual pair. The number of classes, if not specified, can be determined at an intermediate stage. The risk in assigning identical labels to the individuals of mutual pairs is estimated. Results of some simulation studies are presented.  相似文献   

12.
Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.  相似文献   

13.
By incorporating prior knowledge in the form of implications into extreme learning machine (ELM), a novel knowledge-based extreme learning machine (KBELM) formulation is proposed in this work. In this approach, the nonlinear prior knowledge implications are converted into linear inequalities and are then included as linear equality constraints in the ELM formulation. The proposed KBELM formulation has the advantage that it leads to solving a system of linear equations. Effectiveness of the proposed approach is demonstrated on three synthetic and the publicly available Wisconsin Prognostic Breast Cancer datasets by comparing their results with ELM and optimally pruned ELM using additive and radial basis function hidden nodes.  相似文献   

14.
单隐层前馈神经网络是应用最广泛的智能建模模型之一,但该模型面对小样本集时传统的学习算法易陷入过拟合,尤其当数据集包含较大噪音时学习模型鲁棒性较差,对噪音很敏感.针对此不足,提出一种针对小样本数据集的鲁棒单隐层前馈神经网络学习算法.所提出的算法由于引入了ε-不敏感学习度量和结构风险项,能有效克服传统学习算法存在的缺陷,显示出较好的鲁棒性.在模拟和真实数据集上的实验亦证实了上述优点.  相似文献   

15.
Many types of nonlinear classifiers have been proposed to automatically generate land-cover maps from satellite images. Some are based on the estimation of posterior class probabilities, whereas others estimate the decision boundary directly. In this paper, we propose a modular design able to focus the learning process on the decision boundary by using posterior probability estimates. To do so, we use a self-configuring architecture that incorporates specialized modules to deal with conflicting classes, and we apply a learning algorithm that focuses learning on the posterior probability regions that are critical for the performance of the decision problem stated by the user-defined misclassification costs. Moreover, we show that by filtering the posterior probability map, the impulsive noise, which is a common effect in automatic land-cover classification, can be significantly reduced. Experimental results show the effectiveness of the proposed solutions on real multi- and hyperspectral images, versus other typical approaches, that are not based on probability estimates, such as Support Vector Machines.  相似文献   

16.
In this paper, we address the problem of learning a classifier for the classification of spoken character. We present a solution based on Group Method of Data Handling (GMDH) learning paradigm for the development of a robust abductive network classifier. We improve the reliability of the classification process by introducing the concept of multiple abductive network classifier system. We evaluate the performance of the proposed classifier using three different speech datasets including spoken Arabic digit, spoken English letter, and spoken Pashto digit. The performance of the proposed classifier surpasses that reported in the literature for other classification techniques on the same speech datasets.  相似文献   

17.
针对目标检测网络单阶改进目标检测器(RefineDet)对类间不平衡数据集中小样本类别检测性能差的问题,提出一种部分加权损失函数SWLoss。首先,以每个训练批量中不同类别样本数量的倒数作为启发式的类间样本平衡因子,对分类损失中的不同类别进行加权,从而提高对小样本类别学习的关注程度;然后引入多任务平衡因子对分类损失和回归损失进行加权,缩小两个任务学习速率的差异;最后,在目标类别样本数量存在大幅差异的Pascal VOC 2007数据集和点阵字符数据集上进行实验。结果表明,与原始RefineDet相比,基于SWLoss的RefineDet明显提高了小样本类别的检测精度,它在两个数据集上的平均精度均值(mAP)分别提高了1.01、9.86个百分点;与基于损失平衡函数和加权成对损失的RefineDet相比,基于SWLoss的RefineDet在两个数据集上的mAP分别提高了0.68、4.73和0.49、1.48个百分点。  相似文献   

18.
目的 人体目标再识别的任务是匹配不同摄像机在不同时间、地点拍摄的人体目标。受光照条件、背景、遮挡、视角和姿态等因素影响,不同摄相机下的同一目标表观差异较大。目前研究主要集中在特征表示和度量学习两方面。很多度量学习方法在人体目标再识别问题上了取得了较好的效果,但对于多样化的数据集,单一的全局度量很难适应差异化的特征。对此,有研究者提出了局部度量学习,但这些方法通常需要求解复杂的凸优化问题,计算繁琐。方法 利用局部度量学习思想,结合近几年提出的XQDA(cross-view quadratic discriminant analysis)和MLAPG(metric learning by accelerated proximal gradient)等全局度量学习方法,提出了一种整合全局和局部度量学习框架。利用高斯混合模型对训练样本进行聚类,在每个聚类内分别进行局部度量学习;同时在全部训练样本集上进行全局度量学习。对于测试样本,根据样本在高斯混合模型各个成分下的后验概率将局部和全局度量矩阵加权结合,作为衡量相似性的依据。特别地,对于MLAPG算法,利用样本在各个高斯成分下的后验概率,改进目标损失函数中不同样本的损失权重,进一步提高该方法的性能。结果 在VIPeR、PRID 450S和QMUL GRID数据集上的实验结果验证了提出的整合全局—局部度量学习方法的有效性。相比于XQDA和MLAPG等全局方法,在VIPeR数据集上的匹配准确率提高2.0%左右,在其他数据集上的性能也有不同程度的提高。另外,利用不同的特征表示对提出的方法进行实验验证,相比于全局方法,匹配准确率提高1.3%~3.4%左右。结论 有效地整合了全局和局部度量学习方法,既能对多种全局度量学习算法的性能做出改进,又能避免局部度量学习算法复杂的计算过程。实验结果表明,对于使用不同的特征表示,提出的整合全局—局部度量学习框架均可对全局度量学习方法做出改进。  相似文献   

19.
噪声标签在实际数据集中普遍存在,这将严重影响深度神经网络的学习效果。针对此问题,提出了一种基于标签差学习的噪声标签数据识别与数据再标记方法。该方法设计两种不同的伪标签生成策略,利用基础网络所识别的干净数据生成人工噪声数据集,并计算该数据集的标签差向量或标签差矩阵;以强化相似类别间的关联性为目标,利用全连接层与单行卷积核,设计标签差向量网络与标签差矩阵网络等两种噪声学习网络直接学习样本数据的噪声概率;设计与噪声率线性相关的阈值,对干净数据与噪声数据进行判断。通过设计实验,对包括伪标签生成策略、网络结构、训练迭代次数等影响网络识别性能的因素进行分析。在公开数据集上的测试表明,在多种噪声分布情况中,该算法在保持干净数据的准确率与召回率基本稳定的前提下,能显著提高噪声数据的准确率与召回率,提高幅度最大为16.45%及21.01%。  相似文献   

20.
In this paper, we propose a novel method for semi-supervised learning, called logistic label propagation (LLP). The proposed method employs the logistic function to classify input pattern vectors, similarly to logistic regression. To cope with unlabeled samples as well as labeled ones in the semi-supervised learning framework, the logistic functions are learnt by using similarities between samples in a manner similar to label propagation. In the proposed method, these two methods of logistic regression and label propagation are effectively incorporated in terms of posterior probabilities. LLP estimates the labels of input samples by using the learnt logistic function, whereas the method of label propagation has to optimize the whole labels whenever an input sample comes. In addition, we suggest the way to provide proper parameter setting and initialization, which frees the users from determining a parameter value in trial and error. In experiments on classification (estimating labels) in the semi-supervised learning framework, the proposed method exhibits favorable performances compared to the other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号