首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The subject of this paper is a new approach to symbolic regression. Other publications on symbolic regression use genetic programming. This paper describes an alternative method based on Pareto simulated annealing. Our method is based on linear regression for the estimation of constants. Interval arithmetic is applied to ensure the consistency of a model. To prevent overfitting, we merit a model not only on predictions in the data points, but also on the complexity of a model. For the complexity, we introduce a new measure. We compare our new method with the Kriging metamodel and against a symbolic regression metamodel based on genetic programming. We conclude that Pareto-simulated-annealing-based symbolic regression is very competitive compared to the other metamodel approaches.  相似文献   

2.
Technical Note: Naive Bayes for Regression   总被引:1,自引:0,他引:1  
Frank  Eibe  Trigg  Leonard  Holmes  Geoffrey  Witten  Ian H. 《Machine Learning》2000,41(1):5-25
Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces model trees—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes' independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.  相似文献   

3.
The focus of this study is to use Monte Carlo method in fuzzy linear regression. The purpose of the study is to figure out the appropriate error measures for the estimation of fuzzy linear regression model parameters with Monte Carlo method. Since model parameters are estimated without any mathematical programming or heavy fuzzy arithmetic operations in fuzzy linear regression with Monte Carlo method. In the literature, only two error measures (E1 and E2) are available for the estimation of fuzzy linear regression model parameters. Additionally, accuracy of available error measures under the Monte Carlo procedure has not been evaluated. In this article, mean square error, mean percentage error, mean absolute percentage error, and symmetric mean absolute percentage error are proposed for the estimation of fuzzy linear regression model parameters with Monte Carlo method. Moreover, estimation accuracies of existing and proposed error measures are explored. Error measures are compared to each other in terms of estimation accuracy; hence, this study demonstrates that the best error measures to estimate fuzzy linear regression model parameters with Monte Carlo method are proved to be E1, E2, and the mean square error. One the other hand, the worst one can be given as the mean percentage error. These results would be useful to enrich the studies that have already focused on fuzzy linear regression models.  相似文献   

4.
We apply our new fuzzy Monte Carlo method to a certain fuzzy linear regression problem to estimate the best solution. The best solution is a vector of triangular fuzzy numbers, for the fuzzy coefficients in the model, which minimizes one of two error measures. We use a quasi-random number generator to produce random sequences of these fuzzy vectors which uniformly fill the search space. We consider an example problem and show this Monte Carlo method obtains the best solution for one error measure and is approximately best for the other error measure.  相似文献   

5.
Parse-matrix evolution for symbolic regression   总被引:1,自引:0,他引:1  
Data-driven model is highly desirable for industrial data analysis in case the experimental model structure is unknown or wrong, or the concerned system has changed. Symbolic regression is a useful method to construct the data-driven model (regression equation). Existing algorithms for symbolic regression such as genetic programming and grammatical evolution are difficult to use due to their special target programming language (i.e., LISP) or additional function parsing process. In this paper, a new evolutionary algorithm, parse-matrix evolution (PME), for symbolic regression is proposed. A chromosome in PME is a parse-matrix with integer entries. The mapping process from the chromosome to the regression equation is based on a mapping table. PME can easily be implemented in any programming language and free to control. Furthermore, it does not need any additional function parsing process. Numerical results show that PME can solve the symbolic regression problems effectively.  相似文献   

6.
针对现有符号回归方法仅关注拟合误差而忽略模型简化的问题,提出了一种基于多目标的人工鱼群算法,将拟合误差与模型复杂度同时作为目标函数进行优化.以二叉堆对语法树编码,优良分支得以稳定地遗传和继承,也更易解码.在引入蒙版、邻域、小生境、拥挤度等概念的基础上,设计和定义了适用于二叉堆编码的随机游动、觅食、追尾、逃脱等人工鱼行为算子.详尽的实验表明,提出算法在符号回归过程中能获取高质量的Pareto解.此外,对从Pareto前沿上选取折衷解及降低算法内存开销的方法也进行了讨论.  相似文献   

7.
Symbolic regression is a machine learning task: given a training dataset with features and targets, find a symbolic function that best predicts the target given the features. This paper concentrates on dynamic regression tasks, i.e. tasks where the goal changes during the model fitting process. Our study is motivated by dynamic regression tasks originating in the domain of reinforcement learning: we study four dynamic symbolic regression problems related to well-known reinforcement learning benchmarks, with data generated from the standard Value Iteration algorithm. We first show that in these problems the target function changes gradually, with no abrupt changes. Even these gradual changes, however, are a challenge to traditional Genetic Programming-based Symbolic Regression algorithms because they rely only on expression manipulation and selection. To address this challenge, we present an enhancement to such algorithms suitable for dynamic scenarios with gradual changes, namely the recently introduced type of leaf nodes called Linear Combination of Features. This type of leaf node, aided by the error backpropagation technique known from artificial neural networks, enables the algorithm to better fit the data by utilizing the error gradient to its advantage rather than searching blindly using only the fitness values. This setup is compared with a baseline of the core algorithm without any of our improvements and also with a classic evolutionary dynamic optimization technique: hypermutation. The results show that the proposed modifications greatly improve the algorithm ability to track a gradually changing target.  相似文献   

8.
Energy consumption has increased in recent decades at a rate ranging from 1.5% to 10% per year in the developed world. As a consequence, several efforts have been made to model energy consumption in order to achieve a better use of energy and to minimize environmental impact. Open problems in this area range from energy consumption forecasting to user profile mining, energy source planning, to transportation, among others. To address these problems, it is important to have suitable tools to model energy consumption data series, so that the analysts and CEOs can have knowledge about the underlying properties of the power demand in order to make high-level decisions. In this paper, we focus on the problem of energy consumption modelling, and provide a solution from the perspective of symbolic regression. More specifically, we develop hybrid genetic programming algorithms to find the algebraic expression that best models daily energy consumption in public buildings at the University of Granada as a testbed, and compare the benefits of Straight Line Programs with the classic tree representation used in symbolic regression. Regarding algorithm design, the outcomes of our experimentation suggest that Straight Line Programs outperform other representation models in the symbolic regression problems studied, and also that the hybridation with local search methods can improve the quality of the resulting algebraic expression. On the other hand, with regards to energy consumption modelling, our approach empirically demonstrates that symbolic regression can be a powerful tool to find underlying relationships between multivariate energy consumption data series.  相似文献   

9.
In many real-world regression and forecasting problems, over-prediction and under-prediction errors have different consequences and incur asymmetric costs. Such problems entail the use of cost-sensitive learning, which attempts to minimize the expected misprediction cost, rather than minimize a simple measure such as mean squared error. A method has been proposed recently for tuning a regular regression model post hoc so as to minimize the average misprediction cost under an asymmetric cost structure. In this paper, we build upon that method and propose an extended tuning method for cost-sensitive regression. The previous method becomes a special case of the method we propose. We apply the proposed method to loan charge-off forecasting, a cost-sensitive regression problem that has had a bearing on bank failures over the last few years. Empirical evaluation in the loan charge-off forecasting domain demonstrates that the method we have proposed can further lower the misprediction cost significantly.  相似文献   

10.
针对实际公共场景视频的人数统计中存在的背景干扰、光照变化、目标间遮挡等问题,提出一种结合特征图谱学习和一阶动态线性回归的人数统计方法。首先,建立图像的尺度不变特征变换(SIFT)特征与目标真实密度图之间的特征图谱映射模型,利用SIFT特征和前述映射模型得到包含目标和背景特征量的特征图谱;然后,根据通常监控视频中背景变化较小、特征图谱中的背景特征量相对稳定的特点,由特征图谱的积分与真实人数通过一阶动态线性回归建立人数回归模型;最后,通过该回归模型模型得出估计人数。在数据集MALL和PETS2009上进行实验,实验结果表明:与累积属性空间方法相比,所提方法平均绝对误差降低了2.2%;与基于角点检测的一阶动态线性回归方法相比,其平均绝对误差降低了6.5%,平均相对误差降低了2.3%。  相似文献   

11.
遗传编程在符号回归中的应用   总被引:1,自引:0,他引:1  
遗传编程是一种新型的搜索优化技术,文章介绍了遗传编程的基本原理,以及遗传编程的算法设计及其实现的几个关键问题,并研究了基于遗传编程方法的符号回归。与传统回归方法相比,该方法得到的拟合函数更精确,具有更广泛的适用性。文中通过对一个函数进行符号回归验证,说明此方法合理可行。  相似文献   

12.
As an extension of multi-class classification, machine learning algorithms have been proposed that are able to deal with situations in which the class labels are defined in a non-crisp way. Objects exhibit in that sense a degree of membership to several classes. In a similar setting, models are developed here for classification problems where an order relation is specified on the classes (i.e., non-crisp ordinal regression problems). As for traditional (crisp) ordinal regression problems, it is argued that the order relation on the classes should be reflected by the model structure as well as the performance measure used to evaluate the model. These arguments lead to a natural extension of the well-known proportional odds model for non-crisp ordinal regression problems, in which the underlying latent variable is not necessarily restricted to the class of linear models (by using kernel methods).  相似文献   

13.
A 5-axis milling machine has 39 independent geometric error components when the machine tool is considered as a set of five rigid bodies. The identification of the deterministic component of the systematic error is very important. It permits one to improve the accuracy close to the repeatability of the machine tool. This paper gives a new way to identify and compensate all the systematic angular errors separately and then use them further to identify the systematic translational error.Identification based on a new mathematical method and a stable numerical solution method is proposed. The model explains from first principles why some error components have no effect in a first order model. The identification of the total angular systematic errors can be done independently from the translation errors. However, the total translation error depends on the angular errors and the translation errors of each machine tool slide. The main problems solved are to find enough linear independent equations and avoid numerical instability in the computation. It is important to separate numerical problems and linear dependence. The very complex equations are first analyzed in symbolic form to eliminate the linear dependencies. The total of linear independent components in the model is reduced from 30 to 26 for the position dependent errors and from 9 to 3 for the position independent components. Secondly, the large system of linear equations is broken down in many smaller systems. The model is tested first with simulated errors modeled as cubic polynomials. An artifact-based identification is proposed and implemented based on drilling holes in various locations and orientations. New ways to measure the volumetric error directly are proposed. Direct measurement of the total volumetric error requires considerably less measurement than measuring all 6 components of each machine slide especially in the case of a 5-axis machine.  相似文献   

14.
针对ELM(extreme learning machine,极限学习机)学习算法可能存在的解的奇异问题,提出了岭参数优化的ELM岭回归学习算法(ELMRR).该算法利用岭回归方法代替原有的线性回归算法,以均方根误差为性能指标,采用粒子群优化算法确定最佳岭参数.为了验证该方法的有效性,对函数回归和分类问题进行仿真实验分析,结果表明该方法改善了ELM的预测性能且克服了传统岭回归算法岭参数难以确定的缺点.  相似文献   

15.
本文在对语音识别中基于自适应回归树的极大似然线性变换(MLLR)模型自适应算法深刻分析的基础上,提出了一种基于目标驱动的多层MLLR自适应(TMLLR)算法。这种算法基于目标驱动的原则,引入反馈机制,根据目标函数似然概率的增加来动态决定MLLR变换的变换类,大大提高了系统的识别率。并且由于这种算法的特殊多层结构,减少了许多中间的冗余计算,算法在具有较高的自适应精度的同时还具有较快的自适应速度。在有监督自适应实验中,经过此算法自适应后的系统识别率比基于自适应回归树的MLLR算法自适应后系统的误识率降低了10% ,自适应速度也比基于自适应回归树的MLLR算法快近一倍。  相似文献   

16.
We propose a novel linear dimensionality reduction algorithm, namely Locally Regressive Projections (LRP). To capture the local discriminative structure, for each data point, a local patch consisting of this point and its neighbors is constructed. LRP assumes that the low dimensional representations of points in each patch can be well estimated by a locally fitted regression function. Specifically, we train a linear function for each patch via ridge regression, and use its fitting error to measure how well the new representations can respect the local structure. The optimal projections are thus obtained by minimizing the summation of the fitting errors over all the local patches. LRP can be performed under either supervised or unsupervised settings. Our theoretical analysis reveals the connections between LRP and the classical methods such as PCA and LDA. Experiments on face recognition and clustering demonstrate the effectiveness of our proposed method.  相似文献   

17.
一种基于多目标优化的遗传规划模型   总被引:1,自引:0,他引:1  
遗传规划常因进化过程中层次树的复杂度无节制的增大,导致运行时间过长而难以直接在工程上应用.本文在传统遗传规划中引入多目标优化原理,这种基于多目标优化的遗传规划模型不仅产生精度更高的最优结果,而且提供了一种在随机搜索过程中有效控制树结构长度的方法.通过对符号回归问题的实验验证,得到了较好的结果.  相似文献   

18.
Automatic programming is a type of programming that has the ability to analyze and solve problems using the principles of symbolic regression analysis. These methods can solve complex problems regardless of whether they have a specific pattern or not. In this work, we are going to introduce the difference-based firefly programming (DFP) method as an improved version of the standard firefly programming method. We have analyzed the performance of this new improved method, which will be described in detail within the scope of this work. In order to evaluate the performance of the newly presented method, the results have been compared to the results of the standard method and the results of other methods that are used to solve the same type of problems. DFP has been used also in forecasting and modeling a real-world time-series problem, where it showed good performance too. In general, the results demonstrated the improved performance of the newly introduced method and showed its ability to efficiently solve complex problems.  相似文献   

19.
为了解决现有网络质量QoE感知模型数据粗差迭代次数多、线性回归参数小的问题,提出基于用户偏好的网络质量QoE感知建模仿真研究。依据用户偏好理论确定模型参数,并获取网络质量QoE感知数据,以此为基础,通过MCD算法判别并去除网络质量QoE数据粗差,以去除粗差的网络质量QoE数据为基础,利用ROI加权算法提取网络质量QoE数据特征,以得到的网络质量QoE数据特征为依据,将其代入多元线性回归方程计算网络质量QoE感知,实现了基于用户偏好的网络质量QoE感知。实验结果显示,与现有三种网络质量QoE感知模型相比较,构建的网络质量QoE感知模型降低了数据粗差迭代次数,提高了线性回归参数,充分说明构建的网络质量QoE感知模型具备更好的性能。  相似文献   

20.
基因表达式编程(Gene Expression Programming,GEP)对多项式函数为目标的符号回归问题计算效果良好,而对包含多种运算目数、非多项式函数的计算效果欠佳。受转基因生物工程中基因沉默现象的启发,提出一种GEP拓展算法SFGEP(Gene Expression Programming of Symbol Field,SFGEP)。SFGEP染色体由表达因子域与表达基因域组成,按“深度优先”原则解释染色体,利用不同操作符目数,形成基因表达的抑制因子和位置效应,实现染色体解释中基因沉默的机制。实验结果表明,相较传统多基因染色体GEP,SFGEP既保持了一定多项式函数挖掘的能力,又在包含不同运算目数操作符的非多项式函数挖掘方面具有更好的效能,SFGEP的成功率更高、收敛速度更快。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号