首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper introduces a new approach to fitting a linear regression model to symbolic interval data. Each example of the learning set is described by a feature vector, for which each feature value is an interval. The new method fits a linear regression model on the mid-points and ranges of the interval values assumed by the variables in the learning set. The prediction of the lower and upper bounds of the interval value of the dependent variable is accomplished from its mid-point and range, which are estimated from the fitted linear regression model applied to the mid-point and range of each interval value of the independent variables. The assessment of the proposed prediction method is based on the estimation of the average behaviour of both the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment. Finally, the approaches presented in this paper are applied to a real data set and their performance is compared.  相似文献   

2.
This paper presents adaptive and non-adaptive fuzzy c-means clustering methods for partitioning symbolic interval data. The proposed methods furnish a fuzzy partition and prototype for each cluster by optimizing an adequacy criterion based on suitable squared Euclidean distances between vectors of intervals. Moreover, various cluster interpretation tools are introduced. Experiments with real and synthetic data sets show the usefulness of these fuzzy c-means clustering methods and the merit of the cluster interpretation tools.  相似文献   

3.
In this paper we propose a feature selection method for symbolic interval data based on similarity margin. In this method, classes are parameterized by an interval prototype based on an appropriate learning process. A similarity measure is defined in order to estimate the similarity between the interval feature value and each class prototype. Then, a similarity margin concept has been introduced. The heuristic search is avoided by optimizing an objective function to evaluate the importance (weight) of each interval feature in a similarity margin framework. The experimental results show that the proposed method selects meaningful features for interval data. In particular, the method we propose yields a significant improvement on classification task of three real-world datasets.  相似文献   

4.
This paper introduces different pattern classifiers for interval data based on the logistic regression methodology. Four approaches are considered. These approaches differ according to the way of representing the intervals. The first classifier considers that each interval is represented by the centres of the intervals and performs a classic logistic regression on the centers of the intervals. The second one assumes each interval as a pair of quantitative variables and performs a conjoint classic logistic regression on these variables. The third one considers that each interval is represented by its vertices and a classic logistic regression on the vertices of the intervals is applied. The last one assumes each interval as a pair of quantitative variables, performs two separate classic logistic regressions on these variables and combines the results in some appropriate way. Experiments with synthetic data sets and an application with a real interval data set demonstrate the usefulness of these classifiers.  相似文献   

5.
We investigate the effects of semantically-based crossover operators in genetic programming, applied to real-valued symbolic regression problems. We propose two new relations derived from the semantic distance between subtrees, known as semantic equivalence and semantic similarity. These relations are used to guide variants of the crossover operator, resulting in two new crossover operators—semantics aware crossover (SAC) and semantic similarity-based crossover (SSC). SAC, was introduced and previously studied, is added here for the purpose of comparison and analysis. SSC extends SAC by more closely controlling the semantic distance between subtrees to which crossover may be applied. The new operators were tested on some real-valued symbolic regression problems and compared with standard crossover (SC), context aware crossover (CAC), Soft Brood Selection (SBS), and No Same Mate (NSM) selection. The experimental results show on the problems examined that, with computational effort measured by the number of function node evaluations, only SSC and SBS were significantly better than SC, and SSC was often better than SBS. Further experiments were also conducted to analyse the perfomance sensitivity to the parameter settings for SSC. This analysis leads to a conclusion that SSC is more constructive and has higher locality than SAC, NSM and SC; we believe these are the main reasons for the improved performance of SSC.  相似文献   

6.
7.
适用于区间数据的基于相互距离的相似性传播聚类   总被引:1,自引:0,他引:1  
谢信喜  王士同 《计算机应用》2008,28(6):1441-1443
符号聚类是对传统聚类的重要扩展,而区间数据是一类常见的符号数据。传统聚类中使用的对称性度量不一定适用于度量区间数据,且算法初始化也一直是干扰聚类的严重问题。因此,提出了一种适用于区间数据的度量--相互距离,并在此度量的基础上采用了一种全新的聚类方法--相似性传播聚类,解决了初始化干扰问题,从而得出了适用于区间数据的基于相互距离的相似性传播聚类。通过理论阐述和实验比较,说明了该算法比基于欧氏聚类的K-均值算法要好。  相似文献   

8.
9.
There has accumulated a large amount of literature on confidence interval construction involving lognormal data owing to the fact that many data in scientific inquiries may be approximated by this distribution. Procedures have usually been developed in a piecemeal fashion for a single mean, a single mean with excessive zeros, a difference between two means, and a difference between two differences (net health benefit). As an alternative, we present a general approach for all these cases that requires only confidence limits available in introductory texts. Simulation results confirm the validity of this approach. Examples arising from health economics are used to exemplify the methodology.  相似文献   

10.
Ridge regression and its application to medical data   总被引:1,自引:0,他引:1  
In this paper, data analysis techniques are employed to investigate the optimal properties of the ridge estimators and the stability of regression estimates. Numerical examples from the medical field are taken to compare the predictive ability of ridge regression analysis to that of ordinary regression analysis.  相似文献   

11.
12.
This paper introduces dynamic clustering methods for partitioning symbolic interval data. These methods furnish a partition and a prototype for each cluster by optimizing an adequacy criterion that measures the fitting between clusters and their representatives. To compare symbolic interval data, these methods use single adaptive (city-block and Hausdorff) distances that change at each iteration, but are the same for all clusters. Moreover, various tools for the partition and cluster interpretation of symbolic interval data furnished by these algorithms are also presented. Experiments with real and synthetic symbolic interval data sets demonstrate the usefulness of these adaptive clustering methods and the merit of the partition and cluster interpretation tools.  相似文献   

13.
Parse-matrix evolution for symbolic regression   总被引:1,自引:0,他引:1  
Data-driven model is highly desirable for industrial data analysis in case the experimental model structure is unknown or wrong, or the concerned system has changed. Symbolic regression is a useful method to construct the data-driven model (regression equation). Existing algorithms for symbolic regression such as genetic programming and grammatical evolution are difficult to use due to their special target programming language (i.e., LISP) or additional function parsing process. In this paper, a new evolutionary algorithm, parse-matrix evolution (PME), for symbolic regression is proposed. A chromosome in PME is a parse-matrix with integer entries. The mapping process from the chromosome to the regression equation is based on a mapping table. PME can easily be implemented in any programming language and free to control. Furthermore, it does not need any additional function parsing process. Numerical results show that PME can solve the symbolic regression problems effectively.  相似文献   

14.
15.
Neuro-fuzzy learning with symbolic and numeric data   总被引:1,自引:0,他引:1  
In real world datasets we often have to deal with different kinds of variables. The data can be, for example, symbolic or numeric. Data analysis methods can often deal with only one kind of data. Even when fuzzy systems are applied – which are not dependent on the scales of variables – usually only numeric data is considered. In this paper we present learning algorithms for creating fuzzy rules and training membership functions from data with symbolic and numeric variables. The algorithms are exentions to our neuro-fuzzy classification approach NEFCLASS. We also demonstrate the applicability of the algorithms on two real-world datasets.  相似文献   

16.
Gene Expression Programming (GEP) significantly surpasses traditional evolutionary approaches to solving symbolic regression problems. However, existing GEP algorithms still suffer from premature convergence and slow evolution in anaphase. Aiming at these pitfalls, we designed a novel evolutionary algorithm, namely Uniform Design-Aided Gene Expression Programming (UGEP). UGEP uses (1) a mixed-level uniform table for generating initial population and (2) multiparent crossover operators by taking advantages of the dispersibility of uniform design. In addition to a theoretic analysis, we compared UGEP to existing GEP variants via a number of experiments in dealing with symbolic regression problems including function fitting and chaotic time series prediction. Experimental results indicate that UGEP excels in terms of both the capability of achieving the global optimum and the convergence speed in solving symbolic regression problems.  相似文献   

17.
Recent developments in computing and technology, along with the availability of large amounts of raw data, have contributed to the creation of many effective techniques and algorithms in the fields of pattern recognition and machine learning. The main objectives for developing these algorithms include identifying patterns within the available data or making predictions, or both. Great success has been achieved with many classification techniques in real-life applications. With regard to binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. This study examines rare events (REs) with binary dependent variables containing many more non-events (zeros) than events (ones). These variables are difficult to predict and to explain as has been evidenced in the literature. This research combines rare events corrections to Logistic Regression (LR) with truncated Newton methods and applies these techniques to Kernel Logistic Regression (KLR). The resulting model, Rare Event Weighted Kernel Logistic Regression (RE-WKLR), is a combination of weighting, regularization, approximate numerical methods, kernelization, bias correction, and efficient implementation, all of which are critical to enabling RE-WKLR to be an effective and powerful method for predicting rare events. Comparing RE-WKLR to SVM and TR-KLR, using non-linearly separable, small and large binary rare event datasets, we find that RE-WKLR is as fast as TR-KLR and much faster than SVM. In addition, according to the statistical significance test, RE-WKLR is more accurate than both SVM and TR-KLR.  相似文献   

18.
Traditional data-based soft sensors are constructed with equal numbers of input and output data samples, meanwhile, these collected process data are assumed to be clean enough and no outliers are mixed. However, such assumptions are too strict in practice. On one hand, those easily collected input variables are sometimes corrupted with outliers. On the other hand, output variables, which also called quality variables, are usually difficult to obtain. These two problems make traditional soft sensors cumbersome. To deal with both issues, in this paper, the Student's t distributions are used during mixture probabilistic principal component regression modeling to tolerate outliers with regulated heavy tails. Furthermore, a semi-supervised mechanism is incorporated into traditional probabilistic regression so as to deal with the unbalanced modeling issue. For simulation, two case studies are provided to demonstrate robustness and reliability of the new method.  相似文献   

19.
遗传规划在符号回归中的应用   总被引:1,自引:0,他引:1  
遗传规划(GP)是一种基于达尔文进化理论的数学规划方法。讨论了GP在符号回归中的应用。与传统的数据拟合方法相比,GP不必给出拟合函数的形式,同时,在初始群体足够大而且交叉和变异概率设置合理的情况下,不会陷入局部优化,具有更广泛的适用性。对于不给定函数形式的曲线拟合,GP可以自动得到曲线的函数形式及其参数大小,避免了传统方法的缺陷。通过具体的应用实例,说明了GP在测量数据处理中的应用。  相似文献   

20.
Automatic programming is a type of programming that has the ability to analyze and solve problems using the principles of symbolic regression analysis. These methods can solve complex problems regardless of whether they have a specific pattern or not. In this work, we are going to introduce the difference-based firefly programming (DFP) method as an improved version of the standard firefly programming method. We have analyzed the performance of this new improved method, which will be described in detail within the scope of this work. In order to evaluate the performance of the newly presented method, the results have been compared to the results of the standard method and the results of other methods that are used to solve the same type of problems. DFP has been used also in forecasting and modeling a real-world time-series problem, where it showed good performance too. In general, the results demonstrated the improved performance of the newly introduced method and showed its ability to efficiently solve complex problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号