首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 32 毫秒
1.
马江洪  张文修  梁怡 《计算机学报》2003,26(12):1652-1659
复杂海量数据往往表现为多种结构特征的混合体,回归类混合模型就是对这种混合体的一个描述.该文基于统计学的有限混合分布理论和可识别性的相关结果,针对回归变量的三种情形:(1)解释变量固定,(2)解释变量随机,(3)解释变量固定且类别参数指定,分别讨论挖掘一般回归类的混合模型的可识别性问题,并给出同族回归类混合模型可识别的相应充分条件.这些条件的一个共同特点是它们都与一类特别的解释变量集合有关,而该类集合是由同族的回归函数与回归参数唯一确定的,其元素使不同的回归参数对应回归函数的相同值.特别地,当回归函数线性时,这类集合就是解释变量空间中的超平面.  相似文献   

2.
应用支持向量回归算法筛选出与电解质浓溶液活度系数相关的离子特性参数集:阴阳离子半径比(R./Rm)、阴离子半径Rm、阳离子半径R.和阴阳离子电荷数比(Zx/Zm)。并以此为自变量集,用支持向量回归算法或PLS算法总结活度系数的经验规律,进而提出利用一批浓电解质溶液已知的活度系数数据“转推”其他电解质溶液的活度系数的算法。用留一法考察了这种“转推”算法的准确程度。并引用离子系的对应态理论对支持向量回归求得的经验关系的物理意义作了讨论和解释。  相似文献   

3.
As a new sparse kernel modeling method, support vector regression (SVR) has been regarded as the state-of-the-art technique for regression and approximation. In [V.N. Vapnik, The Nature of Statistical Learning Theory, second ed., Springer-Verlag, 2000], Vapnik developed the ?-insensitive loss function for the support vector regression as a trade-off between the robust loss function of Huber and one that enables sparsity within the support vectors. The use of support vector kernel expansion provides us a potential avenue to represent nonlinear dynamical systems and underpin advanced analysis. However, in the standard quadratic programming support vector regression (QP-SVR), its implementation is often computationally expensive and sufficient model sparsity cannot be guaranteed. In an attempt to mitigate these drawbacks, this article focuses on the application of the soft-constrained linear programming support vector regression (LP-SVR) with hybrid kernel in nonlinear black-box systems identification. An innovative non-Mercer hybrid kernel is explored by leveraging the flexibility of LP-SVR in choosing the kernel functions. The simulation results demonstrate the ability to use more general kernel function and the inherent performance advantage of LP-SVR to QP-SVR in terms of model sparsity and computational efficiency.  相似文献   

4.
Regression techniques, such as ridge regression (RR) and logistic regression (LR), have been widely used in supervised learning for pattern classification. However, these methods mainly exploit the class label information for linear mapping function learning. They will become less effective when the number of training samples per class is small. In visual classification tasks such as face recognition, the appearance of the training sample images also conveys important discriminative information. This paper proposes a novel regression based classification model, namely Bayesian sample steered discriminative regression (BSDR), which simultaneously exploits the sample class label and the sample appearance for linear mapping function learning by virtue of the Bayesian formula. BSDR learns a linear mapping for each class to extract the image class label features, and classification can be simply done by nearest neighbor classifier. The proposed BSDR method has advantages such as small number of mappings, insensitiveness to input feature dimensionality and robustness to small sample size. Extensive experiments on several biometric databases also demonstrate the promising classification performance of our method.  相似文献   

5.
Some regularization methods, including the group lasso and the adaptive group lasso, have been developed for the automatic selection of grouped variables (factors) in conditional mean regression. In many practical situations, such a problem arises naturally when a set of dummy variables is used to represent a categorical factor and/or when a set of basis functions of a continuous variable is included in the predictor set. Complementary to these earlier works, the simultaneous and automatic factor selection is examined in quantile regression. To incorporate the factor information into regularized model fitting, the adaptive sup-norm regularized quantile regression is proposed, which penalizes the empirical check loss function by the sum of factor-wise adaptive sup-norm penalties. It is shown that the proposed method possesses the oracle property. A simulation study demonstrates that the proposed method is a more appropriate tool for factor selection than the adaptive lasso regularized quantile regression.  相似文献   

6.
Kernel regression is one model that has been applied to explain or design radial-basis neural networks. Practical application of the kernel regression method has shown that bias errors caused by the boundaries of the data can seriously effect the accuracy of this type of regression. This paper investigates the correction of boundary error by substituting an asymmetric kernel function for the symmetric kernel function at data points close to the boundary. The asymmetric kernel function allows a much closer approach to the boundary to be achieved without adversely effecting the noise-filtering properties of the kernel regression.  相似文献   

7.
The traditional regression analysis is usually applied to homogeneous observations. However, there are several real situations where the observations are not homogeneous. In these cases, by utilizing the traditional regression, we have a loss of performance in fitting terms. Then, for improving the goodness of fit, it is more suitable to apply the so-called clusterwise regression analysis. The aim of clusterwise linear regression analysis is to embed the techniques of clustering into regression analysis. In this way, the clustering methods are utilized for overcoming the heterogeneity problem in regression analysis. Furthermore, by integrating cluster analysis into the regression framework, the regression parameters (regression analysis) and membership degrees (cluster analysis) can be estimated simultaneously by optimizing one single objective function. In this paper the clusterwise linear regression has been analyzed in a fuzzy framework. In particular, a fuzzy clusterwise linear regression model (FCWLR model) with symmetrical fuzzy output and crisp input variables for performing fuzzy cluster analysis within a fuzzy linear regression framework is suggested. For measuring the goodness of fit of the suggested FCWLR model with fuzzy output, a fitting index is proposed. In order to illustrate the usefulness of FCWLR model in practice, several applications to artificial and real datasets are shown.  相似文献   

8.
Extraction of rules from artificial neural networks for nonlinearregression   总被引:2,自引:0,他引:2  
Neural networks (NNs) have been successfully applied to solve a variety of application problems including classification and function approximation. They are especially useful as function approximators because they do not require prior knowledge of the input data distribution and they have been shown to be universal approximators. In many applications, it is desirable to extract knowledge that can explain how Me problems are solved by the networks. Most existing approaches have focused on extracting symbolic rules for classification. Few methods have been devised to extract rules from trained NNs for regression. This article presents an approach for extracting rules from trained NNs for regression. Each rule in the extracted rule set corresponds to a subregion of the input space and a linear function involving the relevant input attributes of the data approximates the network output for all data samples in this subregion. Extensive experimental results on 32 benchmark data sets demonstrate the effectiveness of the proposed approach in generating accurate regression rules.  相似文献   

9.
Interval regression analysis using quadratic loss support vector machine   总被引:2,自引:0,他引:2  
Support vector machines (SVMs) have been very successful in pattern recognition and function estimation problems for crisp data. This paper proposes a new method to evaluate interval linear and nonlinear regression models combining the possibility and necessity estimation formulation with the principle of quadratic loss SVM. This version of SVM utilizes quadratic loss function, unlike the traditional SVM. For data sets with crisp inputs and interval outputs, the possibility and necessity models have been recently utilized, which are based on quadratic programming approach giving more diverse spread coefficients than a linear programming one. The quadratic loss SVM also uses quadratic programming approach whose another advantage in interval regression analysis is to be able to integrate both the property of central tendency in least squares and the possibilistic property in fuzzy regression. However, this is not a computationally expensive way. The quadratic loss SVM allows us to perform interval nonlinear regression analysis by constructing an interval linear regression function in a high dimensional feature space. The proposed algorithm is a very attractive approach to modeling nonlinear interval data, and is model-free method in the sense that we do not have to assume the underlying model function for interval nonlinear regression model with crisp inputs and interval output. Experimental results are then presented which indicate the performance of this algorithm.  相似文献   

10.
Ordinal regression is a kind of regression analysis used for predicting an ordered response variable. In these problems, the patterns are labelled by a set of ranks with an ordering among the different categories. The most common type of ordinal regression model is the cumulative link model. The cumulative link model relates an unobserved continuous latent variable with a monotone link function. Logit and probit functions are examples of link functions used in cumulative link models. In this paper, a novel generalized link function based on a generalization of the logistic distribution is proposed. The generalized link function proposed is able to reproduce other different link functions by changing two real parameters: \(\alpha \) and \(\lambda \). The generalized link function has been included in a cumulative link model where the latent function is determined by a standard neural network in order to test the performance of the proposal. For this model, a reformulation of the tunable thresholds and distribution parameters was applied to convert the constrained optimization problem into an unconstrained optimization problem. Experimental results demonstrate that our proposed approach can achieve competitive generalization performance.  相似文献   

11.
Fuzzy Regression Analysis by Support Vector Learning Approach   总被引:1,自引:0,他引:1  
Support vector machines (SVMs) have been very successful in pattern classification and function approximation problems for crisp data. In this paper, we incorporate the concept of fuzzy set theory into the support vector regression machine. The parameters to be estimated in the SVM regression, such as the components within the weight vector and the bias term, are set to be the fuzzy numbers. This integration preserves the benefits of SVM regression model and fuzzy regression model and has been attempted to treat fuzzy nonlinear regression analysis. In contrast to previous fuzzy nonlinear regression models, the proposed algorithm is a model-free method in the sense that we do not have to assume the underlying model function. By using different kernel functions, we can construct different learning machines with arbitrary types of nonlinear regression functions. Moreover, the proposed method can achieve automatic accuracy control in the fuzzy regression analysis task. The upper bound on number of errors is controlled by the user-predefined parameters. Experimental results are then presented that indicate the performance of the proposed approach.  相似文献   

12.
Efficient SVM Regression Training with SMO   总被引:30,自引:0,他引:30  
The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines (SVMs) on classification tasks defined on sparse data sets. SMO differs from most SVM algorithms in that it does not require a quadratic programming solver. In this work, we generalize SMO so that it can handle regression problems. However, one problem with SMO is that its rate of convergence slows down dramatically when data is non-sparse and when there are many support vectors in the solution—as is often the case in regression—because kernel function evaluations tend to dominate the runtime in this case. Moreover, caching kernel function outputs can easily degrade SMO's performance even more because SMO tends to access kernel function outputs in an unstructured manner. We address these problems with several modifications that enable caching to be effectively used with SMO. For regression problems, our modifications improve convergence time by over an order of magnitude.  相似文献   

13.
NDVI (Normalized Difference Vegetation Index) has been widely used to monitor vegetation changes since the early eighties. On the other hand, little use has been made of land surface temperatures (LST), due to their sensitivity to the orbital drift which affects the NOAA (National Oceanic and Atmospheric Administration) platforms flying AVHRR sensor. This study presents a new method for monitoring vegetation by using NDVI and LST data, based on an orbital drift corrected dataset derived from data provided by the GIMMS (Global Inventory Modeling and Mapping Studies) group. This method, named Yearly Land Cover Dynamics (YLCD), characterizes NDVI and LST behavior on a yearly basis, through the retrieval of 3 parameters obtained by linear regression between NDVI and normalized LST data. These 3 parameters are the angle between regression line and abscissa axis, the extent of the data projected on the regression line, and the regression coefficient. Such parameters characterize respectively the vegetation type, the annual vegetation cycle length and the difference between real vegetation and ideal cases. Worldwide repartition of these three parameters is shown, and a map integrating these 3 parameters is presented. This map differentiates vegetation in function of climatic constraints, and shows that the presented method has good potential for vegetation monitoring, under the condition of a good filtering of the outliers in the data.  相似文献   

14.
Logistic regression models are frequently used in epidemiological studies for estimating associations that demographic, behavioral, and risk factor variables have on a dichotomous outcome, such as disease being present versus absent. After the coefficients in a logistic regression model have been estimated, goodness-of-fit of the resulting model should be examined, particularly if the purpose of the model is to estimate probabilities of event occurrences. While various goodness-of-fit tests have been proposed, the properties of these tests have been studied under the assumption that observations selected were independent and identically distributed. Increasingly, epidemiologists are using large-scale sample survey data when fitting logistic regression models, such as the National Health Interview Survey or the National Health and Nutrition Examination Survey. Unfortunately, for such situations no goodness-of-fit testing procedures have been developed or implemented in available software. To address this problem, goodness-of-fit tests for logistic regression models when data are collected using complex sampling designs are proposed. Properties of the proposed tests were examined using extensive simulation studies and results were compared to traditional goodness-of-fit tests. A Stata ado function svylogitgof for estimating the F-adjusted mean residual test after svylogit fit is available at the author's website http://www.people.vcu.edu/~kjarcher/Research/Data.htm.  相似文献   

15.
When analyzing survival data, the parameter estimates and consequently the relative risk estimates of a Cox model sometimes do not converge to finite values. This phenomenon is due to special conditions in a data set and is known as 'monotone likelihood'. Statistical software packages for Cox regression using the maximum likelihood method cannot appropriately deal with this problem. A new procedure to solve the problem has been proposed by G. Heinze, M. Schemper, A solution to the problem of monotone likelihood in Cox regression, Biometrics 57 (2001). It has been shown that unlike the standard maximum likelihood method, this method always leads to finite parameter estimates. We developed a SAS macro and an SPLUS library to make this method available from within one of these widely used statistical software packages. Our programs are also capable of performing interval estimation based on profile penalized log likelihood (PPL) and of plotting the PPL function as was suggested by G. Heinze, M. Schemper, A solution to the problem of monotone likelihood in Cox regression, Biometrics 57 (2001).  相似文献   

16.
Considerable intellectual progress has been made to the development of various semiparametric varying-coefficient models over the past ten to fifteen years. An important advantage of these models is that they avoid much of the curse of dimensionality problem as the nonparametric functions are restricted only to some variables. More recently, varying-coefficient methods have been applied to quantile regression modeling, but all previous studies assume that the data are fully observed. The main purpose of this paper is to develop a varying-coefficient approach to the estimation of regression quantiles under random data censoring. We use a weighted inverse probability approach to account for censoring, and propose a majorize–minimize type algorithm to optimize the non-smooth objective function. The asymptotic properties of the proposed estimator of the nonparametric functions are studied, and a resampling method is developed for obtaining the estimator of the sampling variance. An important aspect of our method is that it allows the censoring time to depend on the covariates. Additionally, we show that this varying-coefficient procedure can be further improved when implemented within a composite quantile regression framework. Composite quantile regression has recently gained considerable attention due to its ability to combine information across different quantile functions. We assess the finite sample properties of the proposed procedures in simulated studies. A real data application is also considered.  相似文献   

17.
The Burr type III distribution allows for a wider region for the skewness and kurtosis plane, which covers several distributions including the log-logistic, and the Weibull and Burr type XII distributions. However, outliers may occur in the data set. The robust regression method such as an M-estimator with symmetric influence function has been successfully used to diminish the effect of outliers on statistical inference. However, when the data distribution is asymmetric, these methods yield biased estimators. We present an M-estimator with asymmetric influence function (AM-estimator) based on the quantile function of the Burr type III distribution to estimate the parameters for complete data with outliers. The simulation results show that the M-estimator with asymmetric influence function generally outperforms the maximum likelihood and traditional M-estimator methods in terms of the bias and root mean square errors. One real example is used to demonstrate the performance of our proposed method.  相似文献   

18.
Using remotely sensed data, landscape pattern analysis based on landscape metrics has been one of the major topics of landscape ecology, and more attention has been focused on the effects of spatial scale and the accuracy of remotely sensed data on landscape metrics. However, few studies have been conducted to assess the change of landscape metrics under the influence of land‐use categorization. In this paper, we took the Bao'an district of Shenzhen city as the study area, to analyse how land‐use categorization would influence changes in 24 landscape metrics. The results showed a significant influence, and based on the characteristics of the response curves of landscape metrics associated with the change in land‐use categorization in regression analysis, and the predictability of these relations, the 24 landscape metrics fell into three groups. (1) Type I included 12 landscape metrics, and showed a strong predictability with changing of land‐use categorization with simple function relations in regression analysis. (2) Type II included seven indices, and exhibited complicated behaviours against changing of land‐use categorization. The response curves of these metrics, which were not easy to predict, consisted of two subsections and could not be described by a single function. (3) Type III included five indices, and showed unpredictable behaviours against the change of the land‐use categorization. Their response curves could not be described by a certain function. This study highlights the need for the analysis of effects of land‐use categorization on landscape metrics so as to clearly quantify landscape patterns, and provides insights into the selection of landscape metrics for comparative research on a given area under different land‐use categorizations.  相似文献   

19.
改进的核回归图像恢复   总被引:1,自引:1,他引:0       下载免费PDF全文
Steering核回归是一种自适应的、有效的图像恢复方法,在图像去噪、放大和去模糊中都得到了广泛应用。但此模型以高斯函数为核函数,故得到的恢复图像边缘,尤其是细小边缘常常会因过分平滑而模糊。提出基于鲁棒统计的各向异性核回归图像恢复模型,该模型在Steering核回归模型基础上,结合各向异性距离,以鲁棒统计权函数代替高斯核函数。大量图像恢复实验结果显示,与Steering核回归方法相比较,所提出方法得到的恢复图像质量显著提高,尤其是在细小边缘保持方面更具有明显优势。  相似文献   

20.
Swarm intelligence (SI) and evolutionary computation (EC) algorithms are often used to solve various optimization problems. SI and EC algorithms generally require a large number of fitness function evaluations (i.e., higher computational requirements) to obtain quality solutions. This requirement becomes more challenging when optimization problems are associated with computationally expensive analyses and/or simulation tasks. To tackle this issue, meta-modeling has shown successful results in improving computational efficiency by approximating the fitness or constraint functions of these complex optimization problems. Meta-modeling approaches typically use polynomial regression, kriging, radial basis function network, and support vector machines. Less attention has been given to the generalized regression neural network approach, and yet, it offers several advantages. Specifically, the model construction process does not require iterations. Its only one parameter is known to be less sensitive and usually requires less effort in selecting an optimal parameter. We use generalized regression neural network in this paper to construct meta-models and to approximate the fitness function in particle swarm optimization. To assess the performance and quality of these solutions, the proposed meta-modeling approach is tested on ten benchmark functions. The results are promising in terms of the solution quality and computational efficiency, especially when compared against the results of particle swarm optimization without meta-modeling and several other meta-modeling methods in previously published literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号