共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Feature selection for multi-label naive Bayes classification 总被引:4,自引:0,他引:4
In multi-label learning, the training set is made up of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances. In this paper, this learning problem is addressed by using a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances. Feature selection mechanisms are incorporated into Mlnb to improve its performance. Firstly, feature extraction techniques based on principal component analysis are applied to remove irrelevant and redundant features. After that, feature subset selection techniques based on genetic algorithms are used to choose the most appropriate subset of features for prediction. Experiments on synthetic and real-world data show that Mlnb achieves comparable performance to other well-established multi-label learning algorithms. 相似文献
3.
In this paper, we investigate how to modify the naive Bayes classifier in order to perform classification that is restricted
to be independent with respect to a given sensitive attribute. Such independency restrictions occur naturally when the decision
process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated
by many cases in which there exist laws that disallow a decision that is partly based on discrimination. Naive application
of machine learning techniques would result in huge fines for companies. We present three approaches for making the naive
Bayes classifier discrimination-free: (i) modifying the probability of the decision being positive, (ii) training one model
for every sensitive attribute value and balancing them, and (iii) adding a latent variable to the Bayesian model that represents
the unbiased label and optimizing the model parameters for likelihood using expectation maximization. We present experiments
for the three approaches on both artificial and real-life data. 相似文献
4.
Matthijs van Berkel Gerd Vandersteen Egon Geerardyn Rik Pintelon Hans Zwart Marco de Baar 《Automatica》2014
The identification of the spatially dependent parameters in Partial Differential Equations (PDEs) is important in both physics and control problems. A methodology is presented to identify spatially dependent parameters from spatio-temporal measurements. Local non-rational transfer functions are derived based on three local measurements allowing for a local estimate of the parameters. A sample Maximum Likelihood Estimator (SMLE) in the frequency domain is used, because it takes noise properties into account and allows for high accuracy consistent parameter estimation. Confidence bounds on the parameters are estimated based on the noise properties of the measurements. This method is successfully applied to the simulations of a finite difference model of a parabolic PDE with piecewise constant parameters. 相似文献
5.
This paper studies the linear dynamic errors-in-variables problem for filtered white noise excitations. First, a frequency domain Gaussian maximum likelihood (ML) estimator is constructed that can handle discrete-time as well as continuous-time models on (a) part(s) of the unit circle or imaginary axis. Next, the ML estimates are calculated via a computationally simple and numerically stable Gauss-Newton minimization scheme. Finally, the Cramér-Rao lower bound is derived. 相似文献
6.
Maximum likelihood estimation has a rich history. It has been successfully applied to many problems including dynamical system identification. Different approaches have been proposed in the time and frequency domains. In this paper we discuss the relationship between these approaches and we establish conditions under which the different formulations are equivalent for finite length data. A key point in this context is how initial (and final) conditions are considered and how they are introduced in the likelihood function. 相似文献
7.
A new likelihood based AR approximation is given for ARMA models. The usual algorithms for the computation of the likelihood of an ARMA model require O(n) flops per function evaluation. Using our new approximation, an algorithm is developed which requires only O(1) flops in repeated likelihood evaluations. In most cases, the new algorithm gives results identical to or very close to the exact maximum likelihood estimate (MLE). This algorithm is easily implemented in high level quantitative programming environments (QPEs) such as Mathematica, MatLab and R. In order to obtain reasonable speed, previous ARMA maximum likelihood algorithms are usually implemented in C or some other machine efficient language. With our algorithm it is easy to do maximum likelihood estimation for long time series directly in the QPE of your choice. The new algorithm is extended to obtain the MLE for the mean parameter. Simulation experiments which illustrate the effectiveness of the new algorithm are discussed. Mathematica and R packages which implement the algorithm discussed in this paper are available [McLeod, A.I., Zhang, Y., 2007. Online supplements to “Faster ARMA Maximum Likelihood Estimation”, 〈http://www.stats.uwo.ca/faculty/aim/2007/faster/〉]. Based on these package implementations, it is expected that the interested researcher would be able to implement this algorithm in other QPEs. 相似文献
8.
9.
基于贝叶斯的文本分类方法 总被引:6,自引:1,他引:6
文本分类中的两个关键问题,算法和特征提取。贝叶斯算法是最有效的文本分类算法之一,但是属性间强独立性的假设在现实中并不成立,借鉴概率论中的多项式模型提出了一种改进型的贝叶斯方法;传统的特征抽取方法有词频法、互信息法、CHI统计、信息增益法等,然而上述方法对于词条的权重未作考虑,引进了权重的表征方式,给出了改进方法,由实验证明了通过以上方面的改进,文本分类的正确率得到了提高。 相似文献
10.
Seth A. Greenblatt 《Computational Economics》1994,7(2):89-108
In this study, we present a new method, called a tensor method, for the computation of unconstrained Full-Information Maximum Likelihood (FIML) estimates. The new techniqus is based upon a fourth order approximation to the log-likelihood function, rather than the second order approximation used in standard methods. The higher order terms are low rank third and fourth order tensors that are computed, at very little storage or computation cost, using information from previous iterations. We form and solve the tensor model, then present test results showing that the tensor method is far more efficient than the standard Newton's method for a wide range of unconstrained FIML estimation problems.This paper is based upon part of my doctoral dissertation at George Washington University. I would like to thank my committee members, Professors Robert Phillips and Frederick Joutz of George Washington University and John R. Norsworthy of Renssalaer Polytechnic Institute for their support and suggestions. Any errors remaining are my own. 相似文献
11.
Chien-Yo Lai Miin-Shen Yang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(2):373-381
Mixtures of distributions are popularly used as probability models for analyzing grouped data. Classification maximum likelihood
(CML) is an important maximum likelihood approach to clustering with mixture models. Yang et al. extended CML to fuzzy CML.
Although fuzzy CML presents better results than CML, it is always affected by the fuzziness index parameter. In this paper,
we consider fuzzy CML with an entropy-regularization term to create an entropy-type CML algorithm. The proposed entropy-type
CML is a parameter-free algorithm for mixture models. Some numerical and real-data comparisons show that the proposed method
provides better results than some existing methods. 相似文献
12.
Wen-Liang Hung Yen-Chang Chang Shun-Chin Chuang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2008,12(10):1013-1018
In this paper we propose an efficient algorithm based on Yang’s (Fuzzy Sets Syst 57:365–337, 1993) concept, namely the fuzzy
classification maximum likelihood (FCML) algorithm, to estimate the mixed-Weibull parameters. Compared with EM and Jiang and
Murthy (IEEE Trans Reliab 44:477–488, 1995) methods, the proposed FCML algorithm presents better accuracy. Thus, we recommend
FCML as another acceptable method for estimating the mixed-Weibull parameters. 相似文献
13.
Abstract We have devised, written and tested an implementation of the Gaussian Maximum Likelihood classification method for a commercial image processor. This has resulted in significant savings in execution time for the classification of multispectural remotely-sensed imagery, at very little cost to the accuracy, when compared to a software version of the same algorithm. 相似文献
14.
Nicolas Wicker Jean Muller Ravi Kiran Reddy Kalathur 《Computational statistics & data analysis》2008,52(3):1315-1322
Dirichlet distributions are natural choices to analyse data described by frequencies or proportions since they are the simplest known distributions for such data apart from the uniform distribution. They are often used whenever proportions are involved, for example, in text-mining, image analysis, biology or as a prior of a multinomial distribution in Bayesian statistics. As the Dirichlet distribution belongs to the exponential family, its parameters can be easily inferred by maximum likelihood. Parameter estimation is usually performed with the Newton-Raphson algorithm after an initialisation step using either the moments or Ronning's methods. However this initialisation can result in parameters that lie outside the admissible region. A simple and very efficient alternative based on a maximum likelihood approximation is presented. The advantages of the presented method compared to two other methods are demonstrated on synthetic data sets as well as for a practical biological problem: the clustering of protein sequences based on their amino acid compositions. 相似文献
15.
Steve Su 《Computational statistics & data analysis》2007,51(8):3983-3998
This paper presents a two-step procedure using the method of moment or percentile to find initial values and then maximize the numerical log likelihood to fit the appropriate generalized lambda distribution to data. This paper demonstrates the use of this procedure to fit well-known statistical distributions as well as some empirical data. Overall, the use of numerical maximum log likelihood estimation is a valuable alternative among existing methods of fitting. It provides not only convincing results in terms of quantile plots and goodness of fit tests but also has the advantage of a lower variability in its parameter estimation compared to the existing starship (King and MacGillivray, 1999) and method of moment (Karian and Dudewicz, 2000) fitting schemes. 相似文献
16.
Vickers AL Modestino JW 《IEEE transactions on pattern analysis and machine intelligence》1982,(1):61-68
A new approach to texture classification is described which is based on measurements of the spatial gray-level co-occurrence probability matrix. This approach can make use of assumed stochastic models for texture in imagery and is an approximation to the statistically optimum maximum likelihood classifier. The efficacy of the approach is demonstrated through experimental results obtained with real-world texture data. 相似文献
17.
Distributed data mining applications, such as those dealing with health care, finance, counter-terrorism and homeland defense, use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual's need and right to privacy. It is thus of great importance to develop adequate security techniques for protecting privacy of individual values used for data mining. 相似文献
18.
为提高朴素贝叶斯分类器的分类性能,考虑决策分类过程中条件属性的不同重要程度,提出了一种基于特征选择权重的贝叶斯分类算法。采用卡方值和文档频数相结合的数值来表示特征词的重要程度,对该值进行处理获得每个特征词权重,建立加权贝叶斯分类器。在研究维文特点的基础上,利用该算法构建了一个维文文本分类模型。在搜集到的维文语料库上进行的实验结果表明,该算法比朴素贝叶斯拥有更好的分类性能。 相似文献
19.
Non-parametric classifier, Naive Bayes nearest neighbor, is designed with no training phase, and its performance outperforms many well-trained learning-based image classifiers. Unfortunately, despite its high accuracy, it suffers from great computational pressure from distance computations in space of local feature. This paper explores accelerating strategies from perspectives of both algorithm design and software development. Our approach integrates space decomposition capability of Product quantization (PQ) and parallel accelerating capability of underlying computational platform, Graphics processing unit (GPU). PQ is exploited to compress the indexed local features and prune the search space. GPU is used to ease most of computational pressure by processing the tasks in parallel. To achieve good parallel efficiency, a new sequential classification process is first designed and decomposed into independent components with high parallelism. Effective parallelization techniques are then presented to make use of computational resources. Parallel heap array is built to accelerate the process of feature quantization. Distance table lookup is built to speed up the process of feature search. Comparative experiments on UIUC-Sport dataset demonstrate that our integrated solution outperforms other implementations significantly on Core-quad Intel Core i7 950 CPU and GPU of NVIDIA Geforce GTX460. Scalability experiment on 80 million tiny images database shows that our approach still performs well when large-scale image database is explored. 相似文献
20.
Time series models with parameter values that depend on the seasonal index are commonly referred to as periodic models. Periodic formulations for two classes of time series models are considered: seasonal autoregressive integrated moving average and unobserved components models. Convenient state space representations of the periodic models are proposed to facilitate model identification, specification and exact maximum likelihood estimation of the periodic parameters. These formulations do not require a priori (seasonal) differencing of the time series. The time-varying state space representation is an attractive alternative to the time-invariant vector representation of periodic models which typically leads to a high dimensional state vector in monthly periodic time series models. A key development is our method for computing the variance-covariance matrix of the initial set of observations which is required for exact maximum likelihood estimation. The two classes of periodic models are illustrated for a monthly postwar US unemployment time series. 相似文献