首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features.  相似文献   

2.
The problem of variable selection within the class of generalized additive models, when there are many covariates to choose from but the number of predictors is still somewhat smaller than the number of observations, is considered. Two very simple but effective shrinkage methods and an extension of the nonnegative garrote estimator are introduced. The proposals avoid having to use nonparametric testing methods for which there is no general reliable distributional theory. Moreover, component selection is carried out in one single step as opposed to many selection procedures which involve an exhaustive search of all possible models. The empirical performance of the proposed methods is compared to that of some available techniques via an extensive simulation study. The results show under which conditions one method can be preferred over another, hence providing applied researchers with some practical guidelines. The procedures are also illustrated analysing data on plasma beta-carotene levels from a cross-sectional study conducted in the United States.  相似文献   

3.
There have been an increasing number of applications where the number of predictors is large, meanwhile data are repeatedly measured at a sequence of time points. In this article we investigate how dimension reduction method can be employed for analyzing such high-dimensional longitudinal data. Predictor dimension can be effectively reduced while full regression means information can be retained during dimension reduction. Simultaneous variable selection along with dimension reduction is studied, and graphical diagnosis and model fitting after dimension reduction are investigated. The method is flexible enough to encompass a variety of commonly used longitudinal models.  相似文献   

4.
The use of the multinomial logit model is typically restricted to applications with few predictors, because in high-dimensional settings maximum likelihood estimates tend to deteriorate. A sparsity-inducing penalty is proposed that accounts for the special structure of multinomial models by penalizing the parameters that are linked to one variable in a grouped way. It is devised to handle general multinomial logit models with a combination of global predictors and those that are specific to the response categories. A proximal gradient algorithm is used that efficiently computes stable estimates. Adaptive weights and a refitting procedure are incorporated to improve variable selection and predictive performance. The effectiveness of the proposed method is demonstrated by simulation studies and an application to the modeling of party choice of voters in Germany.  相似文献   

5.
We present a Bayesian variable selection method for the setting in which the number of independent variables or predictors in a particular dataset is much larger than the available sample size. While most of the existing methods allow some degree of correlations among predictors but do not consider these correlations for variable selection, our method accounts for correlations among the predictors in variable selection. Our correlation-based stochastic search (CBS) method, the hybrid-CBS algorithm, extends a popular search algorithm for high-dimensional data, the stochastic search variable selection (SSVS) method. Similar to SSVS, we search the space of all possible models using variable addition, deletion or swap moves. However, our moves through the model space are designed to accommodate correlations among the variables. We describe our approach for continuous, binary, ordinal, and count outcome data. The impact of choices of prior distributions and hyperparameters is assessed in simulation studies. We also examined the performance of variable selection and prediction as the correlation structure of the predictors varies. We found that the hybrid-CBS resulted in lower prediction errors and identified better the true outcome associated predictors than SSVS when predictors were moderately to highly correlated. We illustrate the method on data from a proteomic profiling study of melanoma, a type of skin cancer.  相似文献   

6.
This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into the classification of high dimension low sample size (HDLSS) data, in particular microarray data, has made considerable advances, but regression prediction for high-dimensional data with continuous responses has had less attention. Recently Bair et al. (2006) proposed an efficient prediction method based on supervised principal component regression (PCR). Motivated by the fact that using a larger number of principal components results in better regression performance, this paper extends the method of Bair et al. in several ways: a comprehensive variable ranking is combined with a selection of the best number of components for PCR, and the new method further extends to regression with multivariate responses. The new method is particularly suited to addressing HDLSS problems. Applications to simulated and real data demonstrate the performance of the new method. Comparisons with the findings of Bair et al. (2006) show that for high-dimensional data in particular the new ranking results in a smaller number of predictors and smaller errors.  相似文献   

7.
基于回归系数的变量筛选方法用于近红外光谱分析   总被引:1,自引:0,他引:1  
提出了一种基于回归系数的变量逐步筛选方法。对光谱中各变量计算其回归系数后,按其绝对值由大到小将相应变量排列,采用PLS交互检验按前向选择法逐步选择最佳变量子集。用该方法对玉米和柴油近红外光谱数据进行分析,对玉米蛋白质、柴油十六烷值和粘度分别选择出了14、12以及30个最佳变量用于建模,所得预测结果均优于全谱变量建模的预测结果。可见本方法是一种有效实用的近红外光谱变量选择方法。  相似文献   

8.
This article addresses some problems in outlier detection and variable selection in linear regression models. First, in outlier detection there are problems known as smearing and masking. Smearing means that one outlier makes another, non-outlier observation appear as an outlier, and masking that one outlier prevents another one from being detected. Detecting outliers one by one may therefore give misleading results. In this article a genetic algorithm is presented which considers different possible groupings of the data into outlier and non-outlier observations. In this way all outliers are detected at the same time. Second, it is known that outlier detection and variable selection can influence each other, and that different results may be obtained, depending on the order in which these two tasks are performed. It may therefore be useful to consider these tasks simultaneously, and a genetic algorithm for a simultaneous outlier detection and variable selection is suggested. Two real data sets are used to illustrate the algorithms, which are shown to work well. In addition, the scalability of the algorithms is considered with an experiment using generated data.I would like to thank Dr Tero Aittokallio and an anonymous referee for useful comments.  相似文献   

9.
In this paper, we propose the weighted fusion, a new penalized regression and variable selection method for data with correlated variables. The weighted fusion can potentially incorporate information redundancy among correlated variables for estimation and variable selection. Weighted fusion is also useful when the number of predictors p is larger than the number of observations n. It allows the selection of more than n variables in a motivated way. Real data and simulation examples show that weighted fusion can improve variable selection and prediction accuracy.  相似文献   

10.
The problem of selecting variables or features in a regression model in the presence of both additive (vertical) and leverage outliers is addressed. Since variable selection and the detection of anomalous data are not separable problems, the focus is on methods that select variables and outliers simultaneously. For selection, the fast forward selection algorithm, least angle regression (LARS), is used, but it is not robust. To achieve robustness to additive outliers, a dummy variable identity matrix is appended to the design matrix allowing both real variables and additive outliers to be in the selection set. For leverage outliers, these selection methods are used on samples of elemental sets in a manner similar to that used in high breakdown robust estimation. These results are compared to several other selection methods of varying computational complexity and robustness. The extension of these methods to situations where the number of variables exceeds the number of observations is discussed.  相似文献   

11.
In this paper, we propose an integrated sparse Bayesian variable selection in regressions with a large number of possibly highly correlated macroeconomic predictors. The variable selection is performed through the stochastic search variable selection technique. We assign a sparse prior distribution on the regression parameters and a correlation prior distribution for the binary vector. The performance of the proposed variable selection method is illustrated in forecasting one major macroeconomic time series of the US economy. Empirical results show that in terms of absolute forecast error and log predictive likelihood, our proposed method performs better than other three methods.  相似文献   

12.
The second-order matching problem is to determine whether or not a first-order term without variables is an instance of a second-order term that is allowed to contain not only individual variables but also function variables. It is well known that the second-order matching problem is NP-complete in general. In this paper, we first introduce the several restrictions for the second-order matching problems, such as the bounded number, arity and occurrence of function variables, ground that contains no individual variables, flat that contains no function constants, and predicate that no function variable occurs in the terms of arguments of each function variable. By combining the above restrictions, we give the sharp separations of tractable second-order matching problems from intractable ones. Finally, we compare them with the separations of decidable second-order unification problems from undecidable ones.  相似文献   

13.
随着科技的发展,网络连接数据在统计学习、机器学习等领域的应用越来越普遍.在线性回归模型中,目前关于网络连接数据的变量选择研究主要针对的是同质性样本,即样本的个体效应α相同,但在现实中大多数样本的个体效应存在异质性,在不考虑异质性的情况下会使得模型的估计和预测产生较大偏差.因此,当网络数据中个体效应存在组异质性时,本文提...  相似文献   

14.
The prediction of stream water temperature presents an interesting topic since the water temperature has a significant ecological and economical role, such as in species distribution, fishery, industry and agriculture water exploitation. The prediction of stream water temperature is usually based on appropriate mathematical model and measurements of different atmospheric factors. In this paper, a probabilistic approach to daily mean water temperature prediction is proposed. The resulting model is a combination of two Gaussian process regression models where the first model describes the long-term component of water temperature and the other model describes the short-term variations in water temperature. The proposed approach is developed even further by modeling the short-term variations with multiple Gaussian process regression models instead with a single one. Apart from that, variable selection procedure based on mutual information is presented which is suitable for input variable selection when nonlinear models for stream water prediction are developed. The proposed approach is compared with traditional modeling approaches on the measurements obtained on the Drava river in Croatia. The presented methodology can be used as a basis of the predictive tools for water resource managers.  相似文献   

15.
A new slip model derived by molecular dynamics has been used to investigate the ultra-thin gas-lubricated slider bearings beneath the three bushings of an electrostatic micromotor in micro-electro-mechanical systems (MEMS). Modified Reynolds equation is proposed based on the modified slip model. Analytical solutions for flow rate, pressure distribution, load carrying capacity and streamwise location using the modified Reynolds equation are obtained and compared with the results gained from those in the literature. It demonstrates that the new second-order slip model is of greater accuracy than that predicted by the first-order, second-order slip models and MMGL model and produces a good approximation to variable hard sphere (VHS) and variable soft sphere (VSS) models, which agree well with the solution obtained from the linearized Boltzmann equation. It is indicated that the slip effect reduces the pressure distribution and load carrying capacity, and shifts the streamwise location of the load carrying capacity, which should not be ignored to study the step-shaped slider bearings in micromotors for MEMS devices.  相似文献   

16.
In many statistical downscaling methods, atmospheric variables are chosen by using a combination of expert knowledge with empirical measures such as correlations and partial correlations. In this short communication, we describe the use of a fast, sparse variable selection method, known as RaVE, for selecting atmospheric predictors, and illustrate its use on rainfall occurrence at stations in South Australia. We show that RaVE generates parsimonious models that are both sensible and interpretable, and whose results compare favourably to those obtained by a non-homogeneous hidden Markov model (Hughes et al., 1999).  相似文献   

17.
在图匹配模型中权重的设置对匹配性能有很大影响,但直接计算的权重往往不符合匹配图像的实际情况。为此,参照二次分配问题的图匹配学习思想,给出一阶和二阶最大权对集模型的权重学习计算方法。一阶最大权对集模型直接采用图像特征点作为图的顶点,而二阶最大权对集模型则采用某些特征点之间的连接边作为顶点,2个模型都可以通过Kuhn—Munkras算法求解。一阶最大权对集模型在本质上等价于二次分配问题的线性情况。在CMUHouse数据库上的图像匹配实验结果表明,二阶最大权对集模型优于一阶最大权对集模型,且两者在学习计算时的性能也优于直接计算的情况。  相似文献   

18.
Two families of two-time level difference schemes are developed for the numerical solution of first-order hyperbolic partial differential equations with one space variable. The space derivative is replaced by (i) a first-order, (ii) a second-order backward difference approximant, and the resulting system of first-order ordinary differential equations is solved using A0-stable and L0-stable methods. The methods are used explicitly and are inexpensive to implement.The methods are tested on a number of problems from the literature involving wave-form solutions, increasing solutions with discontinuities in function values or first derivatives across a characteristic, and exponentially decaying solutions.  相似文献   

19.
The least absolute shrinkage and selection operator (LASSO) has been playing an important role in variable selection and dimensionality reduction for linear regression. In this paper we focus on two general LASSO models: Sparse Group LASSO and Fused LASSO, and apply the linearized alternating direction method of multipliers (LADMM for short) to solve them. The LADMM approach is shown to be a very simple and efficient approach to numerically solve these general LASSO models. We compare it with some benchmark approaches on both synthetic and real datasets.  相似文献   

20.
《Journal of Process Control》2014,24(7):1046-1056
Soft sensors are used to predict response variables, which are difficult to measure, using the data of predictors that can be obtained relatively easier. Arranging time-lagged data of predictors and applying partial least squares (PLS) to the dataset is a popular approach for extracting the correlation between data of the responses and predictors of the process dynamic. However, the model input dimension dramatically soars once multiple time delays are incorporated. In addition, the selection of variables in the dynamic PLS (DPLS) model is a critical step for the robustness and the accuracy of the inferential model, since irrelevant inputs deteriorate the prediction performance of the soft sensor. The sparse PLS (SPLS) is a variable selection method that simultaneously selects the important predictors and finds the correlation between the predictors and responses. The sparsity of the model is dependent on a cut-off value in the SPLS algorithm that is determined using a cross-validation procedure. Therefore, the threshold is a compromise for all latent variable directions. It is necessary to further shrink the inputs from the result of SPLS to obtain a more compact model. In the presented work, named SPLS-VIP, the variable importance in projection (VIP) method was used to filter out the insignificant inputs from the SPLS result. An industrial soft sensor for predicting oxygen concentrations in the air separation process was developed based on the proposed approach. The prediction performance and the model interpretability could be further improved from the SPLS method using the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号