首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many types of nonlinear classifiers have been proposed to automatically generate land-cover maps from satellite images. Some are based on the estimation of posterior class probabilities, whereas others estimate the decision boundary directly. In this paper, we propose a modular design able to focus the learning process on the decision boundary by using posterior probability estimates. To do so, we use a self-configuring architecture that incorporates specialized modules to deal with conflicting classes, and we apply a learning algorithm that focuses learning on the posterior probability regions that are critical for the performance of the decision problem stated by the user-defined misclassification costs. Moreover, we show that by filtering the posterior probability map, the impulsive noise, which is a common effect in automatic land-cover classification, can be significantly reduced. Experimental results show the effectiveness of the proposed solutions on real multi- and hyperspectral images, versus other typical approaches, that are not based on probability estimates, such as Support Vector Machines.  相似文献   

2.
We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework.  相似文献   

3.
Although boosting methods [9, 23] for creating compositions of weak hypotheses are among the best methods of machine learning developed so far [4], they are known to degrade performance in case of noisy data and overlapping classes. In this paper we consider binary classification and propose a reduction of overlapping classes’ classification problem to a deterministic problem. We also devise a new upper generalization bound for weighted averages of weak hypotheses, which uses posterior estimates for training objects and is based on proposed reduction. If we are given accurate posterior estimates, this bound is lower than existing bound by Schapire et al. [22]. We design an AdaBoost-like algorithm which optimizes proposed generalization bound and show that when incorporated with good posterior estimates it performs better than the standard AdaBoost on real-world data sets.  相似文献   

4.
This paper presents a novel image representation method for generic object recognition by using higher-order local autocorrelations on posterior probability images. The proposed method is an extension of the bag-of-features approach to posterior probability images. The standard bag-of-features approach is approximately thought of as a method that classifies an image to a category whose sum of posterior probabilities on a posterior probability image is maximum. However, by using local autocorrelations of posterior probability images, the proposed method extracts richer information than the standard bag-of-features. Experimental results reveal that the proposed method exhibits higher classification performances than the standard bag-of-features method.  相似文献   

5.
《Information Fusion》2002,3(4):267-276
Multimodal fusion for identity verification has already shown great improvement compared to unimodal algorithms. In this paper, we propose to integrate confidence measures during the fusion process. We present a comparison of three different methods to generate such confidence information from unimodal identity verification systems. These methods can be used either to enhance the performance of a multimodal fusion algorithm or to obtain a confidence level on the decisions taken by the system. All the algorithms are compared on the same benchmark database, namely XM2VTS, containing both speech and face information. Results show that some confidence measures did improve statistically significantly the performance, while other measures produced reliable confidence levels over the fusion decisions.  相似文献   

6.
Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increase this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates obtained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. We propose a new metric, the stratified Brier score, to capture class-specific calibration, analogous to the per-class metrics widely used to assess the discriminative performance of classifiers in imbalanced scenarios. We propose a simple, effective method to mitigate the bias of probability estimates for imbalanced data that bags estimators independently calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We extend our previous work in this direction by providing ample additional empirical evidence for the utility of this strategy, using both support vector machines and boosted decision trees as base learners. Finally, we show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.  相似文献   

7.
When we use a mathematical model to represent information, we can obtain a closed and convex set of probability distributions, also called a credal set. This type of representation involves two types of uncertainty called conflict (or randomness) and non-specificity, respectively. The imprecise Dirichlet model (IDM) allows us to carry out inference about the probability distribution of a categorical variable obtaining a set of a special type of credal set (probability intervals). In this paper, we shall present tools for obtaining the uncertainty functions on probability intervals obtained with the IDM, which can enable these functions in any application of this model to be calculated.  相似文献   

8.
Multiclass posterior probability support vector machines   总被引:1,自引:0,他引:1  
Tao, et al have recently proposed the posterior probability support vector machine (PPSVM) which uses soft labels derived from estimated posterior probabilities to be more robust to noise and outliers. Tao, et al's model uses a window-based density estimator to calculate the posterior probabilities and is a binary classifier. We propose a neighbor-based density estimator and also extend the model to the multiclass case. Our bias-variance analysis shows that the decrease in error by PPSVM is due to a decrease in bias. On 20 benchmark data sets, we observe that PPSVM obtains accuracy results that are higher or comparable to those of canonical SVM using significantly fewer support vectors.  相似文献   

9.
10.
后验概率在多分类支持向量机上的应用   总被引:1,自引:0,他引:1  
支持向量机是基于统计学习理论的一种新的分类规则挖掘方法。在已有多分类支持向量机基础上,首次提出了几何距离多分类支持向量分类器;随后,将二值支持向量机的后验概率输出也推广到多分类问题,避免了使用迭代算法,在快速预测的前提下提高了预测准确率。数值实验的结果表明,这两种方法都具有很好的推广性能,能明显提高分类器对未知样本的分类准确率。  相似文献   

11.
Assessing the accuracy of land cover maps is often prohibitively expensive because of the difficulty of collecting a statistically valid probability sample from the classified map. Even when post-classification sampling is undertaken, cost and accessibility constraints may result in imprecise estimates of map accuracy. If the map is constructed via supervised classification, then the training sample provides a potential alternative source of data for accuracy assessment. Yet unless the training sample is collected by probability sampling, the estimates are, at best, of uncertain quality, and may be substantially biased. This article discusses a new approach to map accuracy assessment based on maximum posterior probability estimators. Maximum posterior probability estimators are resistant to bias induced by non-representative sampling, and so are intended for situations in which the training sample is collected without using a statistical sampling design. The maximum posterior probability approach may also be used to increase the precision of estimates obtained from a post-classification sample. In addition to discussing maximum posterior probability estimators, this article reports on a simulation study comparing three approaches to estimating map accuracy: 1) post-classification sampling, 2) resampling the training sample via cross-validation, and 3) maximum posterior probability estimation. The simulation study showed substantial reductions in bias and improvements in precision in comparisons of maximum posterior probability and cross-validation estimators when the training sample was not representative of the map. In addition, combining an ordinary post-classification estimator and the maximum posterior probability estimator produced an estimator that was at least, and usually more precise than the ordinary post-classification estimator.  相似文献   

12.
This paper deals with the assessment of the reliability of predictions made in the context of the fuzzy inductive reasoning methodology. The reliability of predictions is assessed by means of two separate confidence measures, a proximity measure and a similarity measure. A time series and a single-input/single-output system are used as two different applications to study the viability of these confidence measures.  相似文献   

13.
Traditional parametric and nonparametric classifiers used for statistical pattern recognition have their own strengths and limitations. While parametric methods assume some specific parametric models for density functions or posterior probabilities of competing classes, nonparametric methods are free from such assumptions. So, when these model assumptions are correct, parametric methods outperform nonparametric classifiers, especially when the training sample is small. But, violations of these assumptions often lead to poor performance by parametric classifiers, where nonparametric methods work well. In this article, we make an attempt to overcome these limitations of parametric and nonparametric approaches and combine their strengths. The resulting classifiers, denoted the hybrid classifiers, perform like parametric classifiers when the model assumptions are valid, but unlike parametric classifiers, they also provide safeguards against possible deviations from parametric model assumptions. In this article, we propose some multiscale methods for hybrid classification, and their performance is evaluated using several simulated and benchmark data sets.  相似文献   

14.
The Kalman filter algorithm gives an analytical expression for the point estimates of the state estimates, which is the mean of their posterior distribution. Conventional Bayesian state estimators have been developed under the assumption that the mean of the posterior of the states is the ‘best estimate’. While this may hold true in cases where the posterior can be adequately approximated as a Gaussian distribution, in general it may not hold true when the posterior is non-Gaussian. The posterior distribution, however, contains far more information about the states, regardless of its Gaussian or non-Gaussian nature. In this study, the information contained in the posterior distribution is explored and extracted to come up with meaningful estimates of the states. The need for combining Bayesian state estimation with extracting information from the distribution is demonstrated in this work.  相似文献   

15.
Consumer credit scoring is often considered a classification task where clients receive either a good or a bad credit status. Default probabilities provide more detailed information about the creditworthiness of consumers, and they are usually estimated by logistic regression. Here, we present a general framework for estimating individual consumer credit risks by use of machine learning methods. Since a probability is an expected value, all nonparametric regression approaches which are consistent for the mean are consistent for the probability estimation problem. Among others, random forests (RF), k-nearest neighbors (kNN), and bagged k-nearest neighbors (bNN) belong to this class of consistent nonparametric regression approaches. We apply the machine learning methods and an optimized logistic regression to a large dataset of complete payment histories of short-termed installment credits. We demonstrate probability estimation in Random Jungle, an RF package written in C++ with a generalized framework for fast tree growing, probability estimation, and classification. We also describe an algorithm for tuning the terminal node size for probability estimation. We demonstrate that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits.  相似文献   

16.
A computer algorithm for estimating probabilities of any significant coronary obstruction and triple vessel/left main obstructions was derived, validated, and compared with the assessments of cardiac clinician angiographers. The algorithm performed at least as well as the clinicians when the latter knew the identity of the patients whose angiograms they had decided to perform. The clinicians were more accurate when they did not know the identity of the subjects but worked from tabulated objective data. Referral and value induced bias may affect physician judgment in assessing disease probability. Application of computer aids or consultation with cardiologists not directly involved with patient management may assist in more rational assessments and decision making.  相似文献   

17.
江静  陈渝  孙界平  琚生根 《计算机应用》2022,42(6):1789-1795
用于文本表示的预训练语言模型在各种文本分类任务上实现了较高的准确率,但仍然存在以下问题:一方面,预训练语言模型在计算出所有类别的后验概率后选择后验概率最大的类别作为其最终分类结果,然而在很多场景下,后验概率的质量能比分类结果提供更多的可靠信息;另一方面,预训练语言模型的分类器在为语义相似的文本分配不同标签时会出现性能下降的情况。针对上述两个问题,提出一种后验概率校准结合负例监督的模型PosCal-negative。该模型端到端地在训练过程中动态地对预测概率和经验后验概率之间的差异进行惩罚,并在训练过程中利用带有不同标签的文本来实现对编码器的负例监督,从而为每个类别生成不同的特征向量表示。实验结果表明:PosCal-negative模型在两个中文母婴护理文本分类数据集MATINF-C-AGE和MATINF-C-TOPIC的分类准确率分别达到了91.55%和69.19%,相比ERNIE模型分别提高了1.13个百分点和2.53个百分点。  相似文献   

18.
 Stochastic independence is an idealized relationship located at one end of a continuum of values measuring degrees of dependence. Modeling real-world systems, we are often not interested in the distinction between exact independence and any degree of dependence, but between weak ignorable and strong substantial dependence. Good models map significant deviance from independence and neglect approximate independence or dependence weaker than a noise threshold. This intuition is applied to learning the structure of Bayes nets from data. We determine the conditional posterior probabilities of structures given that the degree of dependence at each of their nodes exceeds a critical noise level. Deviance from independence is measured by mutual information. The arc probabilities are set equal to the probability that the mutual information, provided by the neighbors of a node, exceeds a given threshold. A χ2 approximation for the probability density function of mutual information is used. A large number of network structures in which the arc probabilities are evaluated, is generated by stochastic simulation. Finally, the probabilities of the whole graph structures are obtained by combining the individual arc probabilities with the number of possible construction sequences compatible with the partial ordering of the graph. While selecting models with large deviance from independence results in simple networks with few but important links, selecting models with small deviance results in highly connected networks also containing less important links.  相似文献   

19.
数字图像被动盲取证是指在不依赖任何预签名提取或预嵌入信息的前提下,对图像的真伪和来源进行鉴别和取证。图像在经篡改操作时,为了消除图像在拼接边缘产生的畸变,篡改者通常会采用后处理消除伪造痕迹,其中,模糊操作是最常用的手法之一。提出一种人工模糊痕迹检测方法。将经过模糊操作后图像像素之间存在的高度相关性进行模型化表示;采用EM算法估算出图像中每个像素属于上述模型的后验概率;根据所得后验概率的大小进行模糊操作检测。实验结果表明,该算法能够有效地检测出篡改图像中的人工模糊痕迹,并对不同模糊类型、有损JPEG压缩以及全局缩放操作均具有较好的鲁棒性。  相似文献   

20.
This paper focuses on the Bayesian posterior mean estimates (or Bayes’ estimate) of the parameter set of Poisson hidden Markov models in which the observation sequence is generated by a Poisson distribution whose parameter depends on the underlining discrete-time time-homogeneous Markov chain. Although the most commonly used procedures for obtaining parameter estimates for hidden Markov models are versions of the expectation maximization and Markov chain Monte Carlo approaches, this paper exhibits an algorithm for calculating the exact posterior mean estimates which, although still cumbersome, has polynomial rather than exponential complexity, and is a feasible alternative for use with small scale models and data sets. This paper also shows simulation results, comparing the posterior mean estimates obtained by this algorithm and the maximum likelihood estimates obtained by expectation maximization approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号