期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

俞奎王浩姚宏亮陈栋梁《小型微型计算机系统》2007,28(11):1972-1975

针对大样本条件下EM算法学习贝叶斯网络参数的计算问题,提出一种并行EM算法(Parallel EM,PL-EM)提高大样本条件下复杂贝叶斯网络参数学习的速度.PL-EM算法在E步并行计算隐变量的后验概率和期望充分统计因子;在M步,利用贝叶斯网络的条件独立性和完整数据集下的似然函数可分解性,并行计算各个局部似然函数.实验结果表明PL-EM为解决大样本条件下贝叶斯网络参数学习提供了一种有效的方法. 相似文献

2.

Dynamic ensemble extreme learning machine based on sample entropy 总被引：1，自引：1，他引：0

Jun-hai Zhai Hong-yu Xu Xi-zhao Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(9):1493-1502

Extreme learning machine (ELM) as a new learning algorithm has been proposed for single-hidden layer feed-forward neural networks, ELM can overcome many drawbacks in the traditional gradient-based learning algorithm such as local minimal, improper learning rate, and low learning speed by randomly selecting input weights and hidden layer bias. However, ELM suffers from instability and over-fitting, especially on large datasets. In this paper, a dynamic ensemble extreme learning machine based on sample entropy is proposed, which can alleviate to some extent the problems of instability and over-fitting, and increase the prediction accuracy. The experimental results show that the proposed approach is robust and efficient. 相似文献

3.

Decoding design based on posterior probabilities in Ternary Error-Correcting Output Codes

Jin Deng Zhou Xiao Dan Wang Hong Jian Zhou Jie Ming Zhang Ning Jia 《Pattern recognition》2012,45(4):1802-1818

Ternary Error-Correcting Output Codes (ECOC), which can unify most of the state-of-the-art decomposition frameworks such as one-versus-one, one-versus-all, sparse coding, dense coding, etc., is considered more flexible to model multiclass classification problems than Binary ECOC. Meanwhile, there are many corresponding decoding strategies that have been proposed for Ternary ECOC in earlier literatures. Note that there is few working by posterior probabilities, which can be considered as a Bayes decision rule and hence obtain a better performance in usual. Passerini et al. (2004) [16] have recently proposed a decoding strategy based on posterior probabilities. However, according to the analyses of this paper, Passerini et al.'s (2004) [16] method suffers some defects and result in bias. To overcome that, we proposed a variation of it by refining the decomposition process of probability to get smoother estimates. Our bias–variance analysis shows that the decrease in error by our variant is due to a decrease in variance. Besides, we extended an efficient method of obtaining posterior probabilities based on the linear rule for decoding process in Binary ECOC to Ternary ECOC. On ten benchmark datasets, we observe that the two decoding strategies based on posterior probabilities in this paper obtain better performance than other ones in earlier references. 相似文献

4.

面向共享数据的迁移组概率学习机

倪彤光王士同史荧中张景祥《控制与决策》2014,29(8):1363-1371

为了解决机器学习中的主观信息缺失问题,提出一种新的面向共享数据的迁移组概率学习机(TGPLM-CD).该方法基于结构风险最小化模型,将源领域所含知识和目标领域的类标签组概率信息,特别是领域间的共享数据纳入学习框架中,实现了源领域和目标领域的知识迁移,在待研究领域数据信息不足的情况下提高了分类精确度.大量数据集上的实验结果验证了所提出方法的有效性. 相似文献

5.

An incremental feature selection approach based on scatter matrices for classification of cancer microarray data

Manju Sardana R.K. Agrawal Baljeet Kaur 《国际计算机数学杂志》2015,92(2):277-295

Microarray data are often characterized by high dimension and small sample size. There is a need to reduce its dimension for better classification performance and computational efficiency of the learning model. The minimum redundancy and maximum relevance (mRMR), which is widely explored to reduce the dimension of the data, requires discretization and setting of external parameters. We propose an incremental formulation of the trace of ratio of the scatter matrices to determine a relevant set of genes which does not involve discretization and external parameter setting. It is analytically shown that the proposed incremental formulation is computationally efficient in comparison to its batch formulation. Extensive experiments on 14 well-known available microarray cancer datasets demonstrate that the performance of the proposed method is better in comparison to the well-known mRMR method. Statistical tests also show that the proposed method is significantly better when compared to the mRMR method. 相似文献

6.

Local estimation of posterior class probabilities to minimize classification errors

Guerrero-Curieses A. Cid-Sueiro J. Alaiz-Rodriguez R. Figueiras-Vidal A.R. 《Neural Networks, IEEE Transactions on》2004,15(2):309-317

Decision theory shows that the optimal decision is a function of the posterior class probabilities. More specifically, in binary classification, the optimal decision is based on the comparison of the posterior probabilities with some threshold. Therefore, the most accurate estimates of the posterior probabilities are required near these decision thresholds. This paper discusses the design of objective functions that provide more accurate estimates of the probability values, taking into account the characteristics of each decision problem. We propose learning algorithms based on the stochastic gradient minimization of these loss functions. We show that the performance of the classifier is improved when these algorithms behave like sample selectors: samples near the decision boundary are the most relevant during learning. 相似文献

7.

基于多源的跨领域数据分类快速新算法 总被引：1，自引：0，他引：1

顾鑫王士同许敏《自动化学报》2014,40(3):531-547

研究跨领域学习与分类是为了将对多源域的有监督学习结果有效地迁移至目标域,实现对目标域的无标记分类. 当前的跨领域学习一般侧重于对单一源域到目标域的学习,且样本规模普遍较小,此类方法领域自适应性较差,面对大样本数据更显得无能为力,从而直接影响跨域学习的分类精度与效率. 为了尽可能多地利用相关领域的有用数据,本文提出了一种多源跨领域分类算法（Multiple sources cross-domain classification,MSCC）,该算法依据被众多实验证明有效的罗杰斯特回归模型与一致性方法构建多个源域分类器并综合指导目标域的数据分类. 为了充分高效利用大样本的源域数据,满足大样本的快速运算,在MSCC的基础上,本文结合最新的CDdual （Dual coordinate descent method）算法,提出了算法MSCC的快速算法MSCC-CDdual,并进行了相关的理论分析. 人工数据集、文本数据集与图像数据集的实验运行结果表明,该算法对于大样本数据集有着较高的分类精度、快速的运行速度和较高的领域自适应性. 本文的主要贡献体现在三个方面:1）针对多源跨领域分类提出了一种新的一致性方法,该方法有利于将MSCC算法发展为MSCC-CDdual快速算法;2）提出了MSCC-CDdual快速算法,该算法既适用于样本较少的数据集又适用于大样本数据集;3） MSCC-CDdual 算法在高维数据集上相比其他算法展现了其独特的优势. 相似文献

8.

面向多模态模型训练的高效样本检索技术研究

唐秀伍赛侯捷陈刚《软件学报》2024,35(3)

深度学习中多模态模型的训练通常需要大量高质量不同类型的标注数据，如图像、文本、音频等. 然而，获取大规模的多模态标注数据是一项具有挑战性和昂贵的任务.为了解决这一问题，主动学习作为一种有效的学习范式被广泛应用，能够通过有针对性地选择最有信息价值的样本进行标注，从而降低标注成本并提高模型性能. 现有的主动学习方法往往面临着低效的数据扫描和数据位置调整问题，当索引需要进行大范围的更新时，会带来巨大的维护代价. 为解决这些问题，本文提出了一种面向多模态模型训练的高效样本检索技术So-CBI. 该方法通过感知模型训练类间边界点，精确评估样本对模型的价值；并设计了半有序的高效样本索引，通过结合数据排序信息和部分有序性，降低了索引维护代价和时间开销. 在多组多模态数据集上通过与传统主动学习训练方法实验对比，验证了So-CBI方法在主动学习下的训练样本检索问题上的有效性. 相似文献

9.

An active learning paradigm based on a priori data reduction and organization

《Expert systems with applications》2014,41(14):6086-6097

In the past few years, active learning has been reasonably successful and it has drawn a lot of attention. However, recent active learning methods have focused on strategies in which a large unlabeled dataset has to be reprocessed at each learning iteration. As the datasets grow, these strategies become inefficient or even a tremendous computational challenge. In order to address these issues, we propose an effective and efficient active learning paradigm which attains a significant reduction in the size of the learning set by applying an a priori process of identification and organization of a small relevant subset. Furthermore, the concomitant classification and selection processes enable the classification of a very small number of samples, while selecting the informative ones. Experimental results showed that the proposed paradigm allows to achieve high accuracy quickly with minimum user interaction, further improving its efficiency. 相似文献

10.

Quasi-supervised learning for biomedical data analysis

Bilge Karaçal? Author Vitae 《Pattern recognition》2010,43(10):3674-3682

We present a novel formulation for pattern recognition in biomedical data. We adopt a binary recognition scenario where a control dataset contains samples of one class only, while a mixed dataset contains an unlabeled collection of samples from both classes. The mixed dataset samples that belong to the second class are identified by estimating posterior probabilities of samples for being in the control or the mixed datasets. Experiments on synthetic data established a better detection performance against possible alternatives. The fitness of the method in biomedical data analysis was further demonstrated on real multi-color flow cytometry and multi-channel electroencephalography data. 相似文献

11.

Learning with a mutualistic teacher

K. Chidananda Gowda G. Krishna 《Pattern recognition》1979,11(5-6):383-390

The concept of a “mutualistic teacher” is introduced for unsupervised learning of the mean vectors of the components of a mixture of multivariate normal densities, when the number of classes is also unknown. The unsupervised learning problem is formulated here as a multi-stage quasi-supervised problem incorporating a cluster approach. The mutualistic teacher creates a quasi-supervised environment at each stage by picking out “mutual pairs” of samples and assigning identical (but unknown) labels to the individuals of each mutual pair. The number of classes, if not specified, can be determined at an intermediate stage. The risk in assigning identical labels to the individuals of mutual pairs is estimated. Results of some simulation studies are presented. 相似文献

12.

A Novel Active Learning Method Using SVM for Text Classification

Mohamed Goudjil Mouloud Koudil Mouldi Bedda Noureddine Ghoggali 《国际自动化与计算杂志》2018,15(3):290-298

Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy. 相似文献

13.

Knowledge-based extreme learning machines

S. Balasundaram Deepak Gupta 《Neural computing & applications》2016,27(6):1629-1641

By incorporating prior knowledge in the form of implications into extreme learning machine (ELM), a novel knowledge-based extreme learning machine (KBELM) formulation is proposed in this work. In this approach, the nonlinear prior knowledge implications are converted into linear inequalities and are then included as linear equality constraints in the ELM formulation. The proposed KBELM formulation has the advantage that it leads to solving a system of linear equations. Effectiveness of the proposed approach is demonstrated on three synthetic and the publicly available Wisconsin Prognostic Breast Cancer datasets by comparing their results with ELM and optimally pruned ELM using additive and radial basis function hidden nodes. 相似文献

14.

针对小样本数据集的鲁棒单隐层前馈网络建模方法

张荣邓赵红王士同蔡及时钱鹏江《控制与决策》2012,27(9):1308-1312

单隐层前馈神经网络是应用最广泛的智能建模模型之一,但该模型面对小样本集时传统的学习算法易陷入过拟合,尤其当数据集包含较大噪音时学习模型鲁棒性较差,对噪音很敏感.针对此不足,提出一种针对小样本数据集的鲁棒单隐层前馈神经网络学习算法.所提出的算法由于引入了ε-不敏感学习度量和结构风险项,能有效克服传统学习算法存在的缺陷,显示出较好的鲁棒性.在模拟和真实数据集上的实验亦证实了上述优点. 相似文献

15.

Cost-sensitive and modular land-cover classification based on posterior probability estimates

A. Guerrero-Curieses R. Alaiz-Rodríguez J. Cid-Sueiro 《International journal of remote sensing》2013,34(22):5877-5899

Many types of nonlinear classifiers have been proposed to automatically generate land-cover maps from satellite images. Some are based on the estimation of posterior class probabilities, whereas others estimate the decision boundary directly. In this paper, we propose a modular design able to focus the learning process on the decision boundary by using posterior probability estimates. To do so, we use a self-configuring architecture that incorporates specialized modules to deal with conflicting classes, and we apply a learning algorithm that focuses learning on the posterior probability regions that are critical for the performance of the decision problem stated by the user-defined misclassification costs. Moreover, we show that by filtering the posterior probability map, the impulsive noise, which is a common effect in automatic land-cover classification, can be significantly reduced. Experimental results show the effectiveness of the proposed solutions on real multi- and hyperspectral images, versus other typical approaches, that are not based on probability estimates, such as Support Vector Machines. 相似文献

16.

Spoken character classification using abductive network

Isah Abdullahi Lawal 《International Journal of Speech Technology》2017,20(4):881-890

In this paper, we address the problem of learning a classifier for the classification of spoken character. We present a solution based on Group Method of Data Handling (GMDH) learning paradigm for the development of a robust abductive network classifier. We improve the reliability of the classification process by introducing the concept of multiple abductive network classifier system. We evaluate the performance of the proposed classifier using three different speech datasets including spoken Arabic digit, spoken English letter, and spoken Pashto digit. The performance of the proposed classifier surpasses that reported in the literature for other classification techniques on the same speech datasets. 相似文献

17.

基于部分加权损失函数的RefineDet

肖振远王逸涵罗建桥熊鹰李柏林《计算机应用》2021,41(7):1928-1932

针对目标检测网络单阶改进目标检测器（RefineDet）对类间不平衡数据集中小样本类别检测性能差的问题,提出一种部分加权损失函数SWLoss。首先,以每个训练批量中不同类别样本数量的倒数作为启发式的类间样本平衡因子,对分类损失中的不同类别进行加权,从而提高对小样本类别学习的关注程度;然后引入多任务平衡因子对分类损失和回归损失进行加权,缩小两个任务学习速率的差异;最后,在目标类别样本数量存在大幅差异的Pascal VOC 2007数据集和点阵字符数据集上进行实验。结果表明,与原始RefineDet相比,基于SWLoss的RefineDet明显提高了小样本类别的检测精度,它在两个数据集上的平均精度均值（mAP）分别提高了1.01、9.86个百分点;与基于损失平衡函数和加权成对损失的RefineDet相比,基于SWLoss的RefineDet在两个数据集上的mAP分别提高了0.68、4.73和0.49、1.48个百分点。相似文献

18.

整合全局—局部度量学习的人体目标再识别

下载免费PDF全文

张晶赵旭《中国图象图形学报》2017,22(4):472-481

目的人体目标再识别的任务是匹配不同摄像机在不同时间、地点拍摄的人体目标。受光照条件、背景、遮挡、视角和姿态等因素影响,不同摄相机下的同一目标表观差异较大。目前研究主要集中在特征表示和度量学习两方面。很多度量学习方法在人体目标再识别问题上了取得了较好的效果,但对于多样化的数据集,单一的全局度量很难适应差异化的特征。对此,有研究者提出了局部度量学习,但这些方法通常需要求解复杂的凸优化问题,计算繁琐。方法利用局部度量学习思想,结合近几年提出的XQDA（cross-view quadratic discriminant analysis）和MLAPG（metric learning by accelerated proximal gradient）等全局度量学习方法,提出了一种整合全局和局部度量学习框架。利用高斯混合模型对训练样本进行聚类,在每个聚类内分别进行局部度量学习;同时在全部训练样本集上进行全局度量学习。对于测试样本,根据样本在高斯混合模型各个成分下的后验概率将局部和全局度量矩阵加权结合,作为衡量相似性的依据。特别地,对于MLAPG算法,利用样本在各个高斯成分下的后验概率,改进目标损失函数中不同样本的损失权重,进一步提高该方法的性能。结果在VIPeR、PRID 450S和QMUL GRID数据集上的实验结果验证了提出的整合全局—局部度量学习方法的有效性。相比于XQDA和MLAPG等全局方法,在VIPeR数据集上的匹配准确率提高2.0%左右,在其他数据集上的性能也有不同程度的提高。另外,利用不同的特征表示对提出的方法进行实验验证,相比于全局方法,匹配准确率提高1.3%~3.4%左右。结论有效地整合了全局和局部度量学习方法,既能对多种全局度量学习算法的性能做出改进,又能避免局部度量学习算法复杂的计算过程。实验结果表明,对于使用不同的特征表示,提出的整合全局—局部度量学习框架均可对全局度量学习方法做出改进。相似文献

19.

标签差网络在噪声标签数据集中的应用

下载免费PDF全文

江倩殷余志李熙莹《计算机工程与应用》2023,59(6):92-100

噪声标签在实际数据集中普遍存在,这将严重影响深度神经网络的学习效果。针对此问题,提出了一种基于标签差学习的噪声标签数据识别与数据再标记方法。该方法设计两种不同的伪标签生成策略,利用基础网络所识别的干净数据生成人工噪声数据集,并计算该数据集的标签差向量或标签差矩阵;以强化相似类别间的关联性为目标,利用全连接层与单行卷积核,设计标签差向量网络与标签差矩阵网络等两种噪声学习网络直接学习样本数据的噪声概率;设计与噪声率线性相关的阈值,对干净数据与噪声数据进行判断。通过设计实验,对包括伪标签生成策略、网络结构、训练迭代次数等影响网络识别性能的因素进行分析。在公开数据集上的测试表明,在多种噪声分布情况中,该算法在保持干净数据的准确率与召回率基本稳定的前提下,能显著提高噪声数据的准确率与召回率,提高幅度最大为16.45%及21.01%。相似文献

20.

Logistic label propagation

Takumi Kobayashi Kenji Watanabe Nobuyuki Otsu 《Pattern recognition letters》2012,33(5):580-588

In this paper, we propose a novel method for semi-supervised learning, called logistic label propagation (LLP). The proposed method employs the logistic function to classify input pattern vectors, similarly to logistic regression. To cope with unlabeled samples as well as labeled ones in the semi-supervised learning framework, the logistic functions are learnt by using similarities between samples in a manner similar to label propagation. In the proposed method, these two methods of logistic regression and label propagation are effectively incorporated in terms of posterior probabilities. LLP estimates the labels of input samples by using the learnt logistic function, whereas the method of label propagation has to optimize the whole labels whenever an input sample comes. In addition, we suggest the way to provide proper parameter setting and initialization, which frees the users from determining a parameter value in trial and error. In experiments on classification (estimating labels) in the semi-supervised learning framework, the proposed method exhibits favorable performances compared to the other methods. 相似文献