首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper introduces a new approach to fitting a linear regression model to symbolic interval data. Each example of the learning set is described by a feature vector, for which each feature value is an interval. The new method fits a linear regression model on the mid-points and ranges of the interval values assumed by the variables in the learning set. The prediction of the lower and upper bounds of the interval value of the dependent variable is accomplished from its mid-point and range, which are estimated from the fitted linear regression model applied to the mid-point and range of each interval value of the independent variables. The assessment of the proposed prediction method is based on the estimation of the average behaviour of both the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment. Finally, the approaches presented in this paper are applied to a real data set and their performance is compared.  相似文献   

2.
We introduce the concept of a representative set of parameters for multiple criteria outranking methods: ELECTREGKMS and PROMETHEEGKS which apply the principle of robust ordinal regression. We exploit the necessary and the possible results provided by these methods to choose a single instance of the preference model, which would represent all other compatible instances. The representative set of parameters is selected within an interactive preference-driven procedure which allows combining some pre-defined targets into different scenarios. Each target concerns enhancement of the results of robust ordinal regression. Precisely, the DM may emphasize either the advantage of some alternatives over the others, acknowledged by all compatible outranking models, or ambiguity in the comparison for some other pairs of alternatives. Selecting the representative set of parameters, we satisfy the desire of some DMs of assigning precise values to variables of the model. We also enable exploitation of the outranking relation for these parameters in order to arrive at a representative recommendation in a traditional way.  相似文献   

3.
遗传规划在符号回归中的应用   总被引:1,自引:0,他引:1  
遗传规划(GP)是一种基于达尔文进化理论的数学规划方法。讨论了GP在符号回归中的应用。与传统的数据拟合方法相比,GP不必给出拟合函数的形式,同时,在初始群体足够大而且交叉和变异概率设置合理的情况下,不会陷入局部优化,具有更广泛的适用性。对于不给定函数形式的曲线拟合,GP可以自动得到曲线的函数形式及其参数大小,避免了传统方法的缺陷。通过具体的应用实例,说明了GP在测量数据处理中的应用。  相似文献   

4.
There is an interest in the problem of identifying different partitions of a given set of units obtained according to different subsets of the observed variables (multiple cluster structures). A model-based procedure has been previously developed for detecting multiple cluster structures from independent subsets of variables. The method relies on model-based clustering methods and on a comparison among mixture models using the Bayesian Information Criterion. A generalization of this method which allows the use of any model-selection criterion is considered. A new approach combining the generalized model-based procedure with variable-clustering methods is proposed. The usefulness of the new method is shown using simulated and real examples. Monte Carlo methods are employed to evaluate the performance of various approaches. Data matrices with two cluster structures are analyzed taking into account the separation of clusters, the heterogeneity within clusters and the dependence of cluster structures.  相似文献   

5.
徐雪松  舒俭 《计算机应用》2014,34(8):2285-2290
针对传统多模型数据集回归分析方法计算时间长、模型识别准确率低的问题,提出了一种新的启发式鲁棒回归分析方法。该方法模拟免疫系统聚类学习的原理,采用B细胞网络作为数据集的分类和存储工具,通过判断数据对模型的符合度进行分类,提高了数据分类的准确性,将模型集抽取过程分解成“聚类”“回归”“再聚类”的反复尝试过程,利用并行启发式搜索逼近模型集的解。仿真结果表明,所提方法回归分析时间明显少于传统算法,模型识别准确率明显高于传统算法。根据8模型数据集分析结果,传统算法中,效果最好的是基于RANSAC的逐次提取算法,其平均模型识别准确率为90.37%,需53.3947s;计算时间小于0.5s的传统算法,其准确率不足1%;所提算法仅需0.5094s,其准确率达到了98.25%。  相似文献   

6.
Vehicle dynamics has been an active field of research for many decades. The well-known single-track model already introduced in 1940 by Riekert and Schunck is still used to explain fundamental effects in vehicle dynamics such as under- and oversteering. However, meanwhile also very complex multibody dynamics models exist, which allow very detailed simulations. On the other hand, real-time computations necessary for active safety and driver assistance systems are demanded for models of lower complexity. As important effects at the handling limits are not covered by the linear single-track model, but complex multibody models cannot be integrated fast enough by electronic control units of safety systems, models with an adjustable degree of complexity are desired. As an example, model-predictive control is an upcoming field of application and relies on models that are integrated over the prediction horizon. Therefore, the selection of an appropriate model is an important task. The crux of the matter is to make a compromise between computation time and accuracy of the model having only a rough guess of the accuracy of the model. In this contribution a systematic approach is proposed. Instead of selecting an existing model, which is supposed to match requirements in computation time and level of detail, a complex model is reduced to match the requirements using symbolic model reduction techniques. This approach has two major advantages: First, the accuracy can be set by the user. Second, the model is continuously adjustable in its complexity or propagation precision.  相似文献   

7.
This study focuses on the one of the most critical issues of modeling under severe conditions of uncertainty: determining the relative importance (weight) of the explanatory variables. The ability to determine relative importance of explanatory variables and the reliability of such outcome are of utmost importance to the decision makers, who utilize such models as components of decision support or decision making. We compare the reliability of traditional method multiple linear regression versus fuzzy logic‐based soft regression. We provide a case study (cross‐national model of background factors facilitating economic growth) to illustrate the performance of both methods. We conclude that soft regression is definitely more reliable and consistent tool to determine relative importance of explanatory variables.  相似文献   

8.
《Applied Soft Computing》2007,7(1):425-440
Uncertainty management has been considered essential for real world applications, and spatial data and geographic information systems in particular require some means for managing uncertainty and vagueness. Rough sets have been shown to be an effective tool for data mining and uncertainty management in databases. The 9-intersection, region connection calculus (RCC) and egg–yolk methods have proven useful for modeling topological relations in spatial data. In this paper, we apply rough set definitions for topological relationships based on the 9-intersection, RCC and egg–yolk models for objects with broad boundaries. We show that rough sets can be used to express and improve on topological relationships and concepts defined with these models.  相似文献   

9.
We investigate the effects of semantically-based crossover operators in genetic programming, applied to real-valued symbolic regression problems. We propose two new relations derived from the semantic distance between subtrees, known as semantic equivalence and semantic similarity. These relations are used to guide variants of the crossover operator, resulting in two new crossover operators—semantics aware crossover (SAC) and semantic similarity-based crossover (SSC). SAC, was introduced and previously studied, is added here for the purpose of comparison and analysis. SSC extends SAC by more closely controlling the semantic distance between subtrees to which crossover may be applied. The new operators were tested on some real-valued symbolic regression problems and compared with standard crossover (SC), context aware crossover (CAC), Soft Brood Selection (SBS), and No Same Mate (NSM) selection. The experimental results show on the problems examined that, with computational effort measured by the number of function node evaluations, only SSC and SBS were significantly better than SC, and SSC was often better than SBS. Further experiments were also conducted to analyse the perfomance sensitivity to the parameter settings for SSC. This analysis leads to a conclusion that SSC is more constructive and has higher locality than SAC, NSM and SC; we believe these are the main reasons for the improved performance of SSC.  相似文献   

10.
A new multivariate non-parametric regression method is considered, which is an extension of PLS1 regression using the multivariate local polynomial regression framework. A theoretical framework that can be used in order to study the asymptotic properties of the estimator is proposed, and the method is implemented on a real data set, made up of seasonal amounts of rainfall in the north of the Nordeste region of Brazil, to be explained by climatic variables. The performances of the three methods of PLS1 regression, multivariate local regression and local PLS1 regression are compared by means of cross-validation and the use of a validation period. All calculations have been implemented in the S-Plus package. The results confirm the good properties of local PLS1 regression.  相似文献   

11.
Traditional methods on creating diesel engine models include the analytical methods like multi-zone models and the intelligent based models like artificial neural network (ANN) based models. However, those analytical models require excessive assumptions while those ANN models have many drawbacks such as the tendency to overfitting and the difficulties to determine the optimal network structure. In this paper, several emerging advanced machine learning techniques, including least squares support vector machine (LS-SVM), relevance vector machine (RVM), basic extreme learning machine (ELM) and kernel based ELM, are newly applied to the modelling of diesel engine performance. Experiments were carried out to collect sample data for model training and verification. Limited by the experiment conditions, only 24 sample data sets were acquired, resulting in data scarcity. Six-fold cross-validation is therefore adopted to address this issue. Some of the sample data are also found to suffer from the problem of data exponentiality, where the engine performance output grows up exponentially along the engine speed and engine torque. This seriously deteriorates the prediction accuracy. Thus, logarithmic transformation of dependent variables is utilized to pre-process the data. Besides, a hybrid of leave-one-out cross-validation and Bayesian inference is, for the first time, proposed for the selection of hyperparameters of kernel based ELM. A comparison among the advanced machine learning techniques, along with two traditional types of ANN models, namely back propagation neural network (BPNN) and radial basis function neural network (RBFNN), is conducted. The model evaluation is made based on the time complexity, space complexity, and prediction accuracy. The evaluation results show that kernel based ELM with the logarithmic transformation and hybrid inference is far better than basic ELM, LS-SVM, RVM, BPNN and RBFNN, in terms of prediction accuracy and training time.  相似文献   

12.
This study compared a non‐parametric and a parametric model for discriminating among uplands (non‐wetlands), woody wetlands, emergent wetlands and open water. Satellite images obtained on 6 March 2005 and 16 October 2005 from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and geographic information system (GIS) data layers formed the input for analysis using classification and regression tree (CART®) and multinomial logistic regression analysis. The overall accuracy of the CART model was 73.3%. The overall accuracy of the logit model was 76.7%. The accuracies were not statistically different from each other (McNemar χ 2 = 1.65, p = 0.19). The CART producer's accuracy of the emergent wetlands was higher than the accuracy from the multinomial logit (57.1% vs. 40.7%), whereas woody wetlands identified by the multinomial logit model presented a producer's accuracy higher than that from the CART model (68.7% vs. 52.6%). A McNemar test between the two models and National Wetland Inventory (NWI) maps showed that their accuracies were not statistically different. Overall, these two models provided promising results, although they are not sufficiently accurate to replace current methods of wetland mapping based on feature extraction in high‐resolution orthoimagery.  相似文献   

13.
The epidemiological question of concern here is “can young children at risk of obesity be identified from their early growth records?” Pilot work using logistic regression to predict overweight and obese children demonstrated relatively limited success. Hence we investigate the incorporation of non-linear interactions to help improve accuracy of prediction; by comparing the result of logistic regression with those of six mature data mining techniques. The contributions of this paper are as follows: a) a comparison of logistic regression with six data mining techniques: specifically, for the prediction of overweight and obese children at 3 years using data recorded at birth, 6 weeks, 8 months and 2 years respectively; b) improved accuracy of prediction: prediction at 8 months accuracy is improved very slightly, in this case by using neural networks, whereas for prediction at 2 years obtained accuracy is improved by over 10%, in this case by using Bayesian methods. It has also been shown that incorporation of non-linear interactions could be important in epidemiological prediction, and that data mining techniques are becoming sufficiently well established to offer the medical research community a valid alternative to logistic regression.  相似文献   

14.
The asymptotic and finite data behavior of some closed-loop identification methods are investigated. It is shown that, when the output power is limited, closed-loop identification can generally identify models with smaller variance than open-loop identification. Several variations on some two-step identification methods are compared with the direct identification method. High order FIR models are used as process models to avoid bias issues arising from inadequate model structures for the processes. Comparisons are, therefore, made based on the variance of the identified process models both for asymptotic situations and for finite data sets. Process model bias resulting from improper selection of the noise and sensitivity function models is also investigated. In this context, the results support the use of direct identification methods on closed-loop data.  相似文献   

15.
In a small case study of mixed hardwood Hyrcanian forests of Iran, three non-parametric methods, namely k-nearest neighbour (k-NN), support vector machine regression (SVR) and tree regression based on random forest (RF), were used in plot-level estimation of volume/ha, basal area/ha and stems/ha using field inventory and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. Relevant pre-processing and processing steps were applied to the ASTER data for geometric and atmospheric correction and for enhancing quantitative forest parameters. After collecting terrestrial information on trees in the 101 samples, the volume, basal area and tree number per hectare were calculated in each plot. In the k-NN implementation using different distance measures and k, the cross-validation method was used to find the best distance measure and optimal k. In SVR, the best regularized parameters of four kernel types were obtained using leave-one-out cross-validation. RF was implemented using a bootstrap learning method with regularized parameters for decision tree model and stopping. The validity of performances was examined using unused test samples by absolute and relative root mean square error (RMSE) and bias metrics. In volume/ha estimation, the results showed that all the three algorithms had similar performances. However, SVR and RF produced better results than k-NN with relative RMSE values of 28.54, 25.86 and 26.86 (m3 ha–1), respectively, using k-NN, SVR and RF algorithms, but RF could generate unbiased estimation. In basal area/ha and stems/ha estimation, the implementation results of RF showed that RF was slightly superior in relative RMSE (18.39, 20.64) to SVR (19.35, 22.09) and k-NN (20.20, 21.53), but k-NN could generate unbiased estimation compared with the other two algorithms used.  相似文献   

16.
Clustering issues are fundamental to exploratory analysis of bioinformatics data. This process may follow algorithms that are reproducible but make assumptions about, for instance, the ability to estimate the global structure by successful local agglomeration or alternatively, they use pattern recognition methods that are sensitive to the initial conditions. This paper reviews two clustering methodologies and highlights the differences that result from the changes in data representation, applied to a protein expression data set for breast cancer (n = 1,076). The two clustering methodologies are a reproducible approach to model-free clustering and a probabilistic competitive neural network. The results from the two methods are compared with existing studies of the same data set, and the preferred clustering solutions are profiled for clinical interpretation.  相似文献   

17.
We overview and discuss several methods for the Fourier analysis of symbolic data, such as DNA sequences, emphasizing their mutual connections. We consider the indicator sequence approach, the vector and the symbolic autocorrelation methods, and methods such as the spectral envelope, that for each frequency optimize the symbolic-no-numeric mapping to emphasize any periodic data features. We discuss the equivalence or connections between these methods. We show that it is possible to define the autocorrelation function of symbolic data, assuming only that we can compare any two symbols and decide if they are equal or distinct. The autocorrelation is a numeric sequence, and its Fourier transform can also be obtained by summing the squares of the Fourier transform of indicator sequences (zero/one sequences indicating the position of the symbols). Another interpretation of the spectrum is given, borrowing from the spectral envelope concept: among all symbolic-to-numeric mappings there is one that maximizes the spectral energy at each frequency, and leads to the spectrum.  相似文献   

18.
Mean tree height, dominant height, mean diameter, stem number, basal area, and timber volume of 233 field sample plots were estimated from various canopy height and canopy density metrics— derived by means of a small-footprint laser scanner over young and mature forest stands— using ordinary least-squares (OLS) regression analysis, seemingly unrelated regression (SUR), and partial least-squares (PLS) regression. The sample plots were distributed systematically throughout two separate inventory areas with size 1000 and 6500 ha, respectively. The plots were divided into three predefined strata. Separate regression models were estimated for each inventory as well as common models utilizing the plots of both inventories simultaneously. In the models estimated by combining data from the two areas, the statistical effect of inventory was found to be significant (p<0.05) in the mean height models only. A total of 115 test stands and plots with size 0.3-11.7 ha were used to validate the estimated regression models. The bias and standard deviations (parenthesized) of the differences between predicted and ground reference values of mean height, dominant height, mean diameter, stem number, basal area, and volume were −5.5% to 4.7% (3.1-7.3%), −6.0% to 0.4% (2.9-8.2%), −0.2% to 7.9% (5.5-15.8%), −21.3% to 12.5% (13.4-29.3%), −7.3% to 8.4% (7.1-13.6%), and −3.9% to 10.1% (8.3-14.9%), respectively. It was revealed that only minor discrepancies occurred between the three investigated estimation techniques. None of the techniques provided predicted values that were superior to the other techniques over all combinations of strata and variables.  相似文献   

19.
Palmprint authentication using a symbolic representation of images   总被引:2,自引:0,他引:2  
A new branch of biometrics, palmprint authentication, has attracted increasing amount of attention because palmprints are abundant of line features so that low resolution images can be used. In this paper, we propose a new texture based approach for palmprint feature extraction, template representation and matching. An extension of the SAX (Symbolic Aggregate approXimation), a time series technology, to 2D data is the key to make this new approach effective, simple, flexible and reliable. Experiments show that by adopting the simple feature of grayscale information only, this approach can achieve an equal error rate of 0.3%, and a rank one identification accuracy of 99.9% on a 7752 palmprint public database. This new approach has very low computational complexity so that it can be efficiently implemented on slow mobile embedded platforms. The proposed approach does not rely on any parameter training process and therefore is fully reproducible. What is more, besides the palmprint authentication, the proposed 2D extension of SAX may also be applied to other problems of pattern recognition and data mining for 2D images.  相似文献   

20.
Bounded data with excess observations at the boundary are common in many areas of application. Various individual cases of inflated mixture models have been studied in the literature for bound-inflated data, yet the computational methods have been developed separately for each type of model. In this article we use a common framework for computing these models, and expand the range of models for both discrete and semi-continuous data with point inflation at the lower boundary. The quasi-Newton and EM algorithms are adapted and compared for estimation of model parameters. The numerical Hessian and generalized Louis method are investigated as means for computing standard errors after optimization. Correlated data are included in this framework via generalized estimating equations. The estimation of parameters and effectiveness of standard errors are demonstrated through simulation and in the analysis of data from an ultrasound bioeffect study. The unified approach enables reliable computation for a wide class of inflated mixture models and comparison of competing models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号