首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.

Quantitative steganalysis seeks to extract the additional information about the hidden message in the covert communications. Most of the quantitative steganalyzers available in the literature target a specific embedding algorithm and generally extract the payload information using structural paradigm. Modern steganalyzers use supervised machine learning to estimate the stego payload using sophisticated feature sets. In this paper, an Ensemble Framework based universal quantitative steganalyzer for digital images is proposed which employs optimised Extreme Learning Machines as the base regressors. The framework exploits inherent diversity of the base regressor and the use of random subspaces of the image features further reduces the prediction error. The proposed ensemble regressor exhibits improved payload predictions when evaluated vis-à-vis the individual base regressor and other state-of-the-art algorithms. The experimental results across different embedding algorithms, image datasets and variedly sized feature sets demonstrate the robustness and wide applicability of the proposed framework.

  相似文献   

2.
A novel technique is proposed for the incremental construction of sparse radial basis function (RBF) networks. The correlation between an RBF regressor and the training data is used as the criterion to position and shape the RBF node, and it is shown that this is equivalent to incrementally minimise the modelling mean square error. A guided random search optimisation method, called the repeated weighted boosting search, is adopted to append RBF nodes one by one in an incremental regression modelling procedure. The experimental results obtained using the proposed method demonstrate that it provides a viable alternative to the existing state-of-the-art modelling techniques for constructing parsimonious RBF models that generalise well.  相似文献   

3.
This article proposes a hybrid model based on regressor combination to improve the accuracy of air‐quality forecasting. The expectation‐maximization algorithm was used to impute the missing values of the dataset. The optimal hyperparameter values for the regressors were found by the grid search approach, depending on the mean absolute error (MAE), in the training session. The regressors having the minimum MAE were then globally combined for prediction. The output of the regressor with the minimum absolute error between the actual and predicted values was chosen as the prediction result of the hybrid model. The performance of the proposed model was compared with that of sequential deep learning methods, namely long short‐term memory and gated recurrent unit, in terms of MAE, mean relative error (MRE), and squared correlation coefficient (SCC) metrics. The imputed dataset was divided into training and testing subsets of different durations. According to the experimental results, our hybrid model performed better than the deep learning methods in terms of MAE, MRE, and SCC metrics, irrespective of the training data length. Furthermore, the Akaike's information criterion and the Bayesian information criterion values suggested that the quality of the hybrid model was better than that of the deep learning models.  相似文献   

4.
This paper studies the greedy ensemble selection family of algorithms for ensembles of regression models. These algorithms search for the globally best subset of regressors by making local greedy decisions for changing the current subset. We abstract the key points of the greedy ensemble selection algorithms and present a general framework, which is applied to an application domain with important social and commercial value: water quality prediction.  相似文献   

5.
针对非平衡警情数据改进的K-Means-Boosting-BP模型   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 掌握警情的时空分布规律,通过机器学习算法建立警情时空预测模型,制定科学的警务防控方案,有效抑制犯罪的发生,是犯罪地理研究的重点。已有研究表明,警情时空分布多集中在中心城区或居民密集区,在时空上属于非平衡数据,这种数据的非平衡性通常导致在该数据上训练的模型成为弱学习器,预测精度较低。为解决这种非平衡数据的回归问题,提出一种基于KMeans均值聚类的Boosting算法。方法 该算法以Boosting集成学习算法为基础,应用GA-BP神经网络生成基分类器,借助KMeans均值聚类算法进行基分类器的集成,从而实现将弱学习器提升为强学习器的目标。结果 与常用的解决非平衡数据回归问题的Synthetic Minority Oversampling Technique Boosting算法,简称SMOTEBoosting算法相比,该算法具有两方面的优势:1)在降低非平衡数据中少数类均方误差的同时也降低了数据的整体均方误差,SMOTEBoosting算法的整体均方误差为2.14E-04,KMeans-Boosting算法的整体均方误差达到9.85E-05;2)更好地平衡了少数类样本识别的准确率和召回率,KMeans-Boosting算法的召回率约等于52%,SMOTEBoosting算法的召回率约等于91%;但KMeans-Boosting算法的准确率等于85%,远高于SMOTEBoosting算法的19%。结论 KMeans-Boosting算法能够显著的降低非平衡数据的整体均方误差,提高少数类样本识别的准确率和召回率,是一种有效地解决非平衡数据回归问题和分类问题的算法,可以推广至其他需要处理非平衡数据的领域中。  相似文献   

6.
The investigation of the accuracy of methods employed to forecast agricultural commodities prices is an important area of study. In this context, the development of effective models is necessary. Regression ensembles can be used for this purpose. An ensemble is a set of combined models which act together to forecast a response variable with lower error. Faced with this, the general contribution of this work is to explore the predictive capability of regression ensembles by comparing ensembles among themselves, as well as with approaches that consider a single model (reference models) in the agribusiness area to forecast prices one month ahead. In this aspect, monthly time series referring to the price paid to producers in the state of Parana, Brazil for a 60 kg bag of soybean (case study 1) and wheat (case study 2) are used. The ensembles bagging (random forests — RF), boosting (gradient boosting machine — GBM and extreme gradient boosting machine — XGB), and stacking (STACK) are adopted. The support vector machine for regression (SVR), multilayer perceptron neural network (MLP) and K-nearest neighbors (KNN) are adopted as reference models. Performance measures such as mean absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE) are used for models comparison. Friedman and Wilcoxon signed rank tests are applied to evaluate the models’ absolute percentage errors (APE). From the comparison of test set results, MAPE lower than 1% is observed for the best ensemble approaches. In this context, the XGB/STACK (Least Absolute Shrinkage and Selection Operator-KNN-XGB-SVR) and RF models showed better performance for short-term forecasting tasks for case studies 1 and 2, respectively. Better APE (statistically smaller) is observed for XGB/STACK and RF in relation to reference models. Besides that, approaches based on boosting are consistent, providing good results in both case studies. Alongside, a rank according to the performances is: XGB, GBM, RF, STACK, MLP, SVR and KNN. It can be concluded that the ensemble approach presents statistically significant gains, reducing prediction errors for the price series studied. The use of ensembles is recommended to forecast agricultural commodities prices one month ahead, since a more assertive performance is observed, which allows to increase the accuracy of the constructed model and reduce decision-making risk.  相似文献   

7.
Boosting Methods for Regression   总被引:6,自引:0,他引:6  
Duffy  Nigel  Helmbold  David 《Machine Learning》2002,47(2-3):153-200
In this paper we examine ensemble methods for regression that leverage or boost base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method for its strong theoretical guarantees. We present several gradient descent leveraging algorithms for regression and prove AdaBoost-style bounds on their sample errors using intuitive assumptions on the base learners. We bound the complexity of the regression functions produced in order to derive PAC-style bounds on their generalization errors. Experiments validate our theoretical results.  相似文献   

8.
针对装备试验数据量有限和装备测试数据易缺失的现状,提出了一种基于集成学习的回归插补方法。以随机森林和XGBoost算法为回归器,通过设定快速填充基准和特征重要性评估策略的方法,改进数据子集重建和训练集与测试集的迭代划分策略,使用Optuna框架实现回归器超参数的自动优化,在某型导弹发射试验上进行实例验证。结果表明,使用集成学习算法的回归插补效果明显优于传统的统计量插补法以及KNN和BP神经网络,在不同缺失比例下的回归确定系数结果均保持在0.95以上,能有效解决装备小样本试验数据缺失的问题,并利用KEEL公测数据集验证了该方法的推广价值和通用性。  相似文献   

9.
To achieve fine segmentation of complex natural images, people often resort to an interactive segmentation paradigm, since fully automatic methods often fail to obtain a result consistent with the ground truth. However, when the foreground and background share some similar areas in color, the fine segmentation result of conventional interactive methods usually relies on the increase of manual labels. This paper presents a novel interactive image segmentation method via a regression-based ensemble model with semi-supervised learning. The task is formulated as a non-linear problem integrating two complementary spline regressors and strengthening the robustness of each regressor via semi-supervised learning. First, two spline regressors with a complementary nature are constructed based on multivariate adaptive regression splines (MARS) and smooth thin plate spline regression (TPSR). Then, a regressor boosting method based on a clustering hypothesis and semi-supervised learning is proposed to assist the training of MARS and TPSR by using the region segmentation information contained in unlabeled pixels. Next, a support vector regression (SVR) based decision fusion model is adopted to integrate the results of MARS and TPSR. Finally, the GraphCut is introduced and combined with the SVR ensemble results to achieve image segmentation. Extensive experimental results on benchmark datasets of BSDS500 and Pascal VOC have demonstrated the effectiveness of our method, and the comparison with experiment results has validated that the proposed method is comparable with the state-of-the-art methods for interactive natural image segmentation.  相似文献   

10.
价格预测对于大宗农产品市场的稳定具有重要意义,但是大宗农产品价格与多种因素有着复杂的相关关系.针对当前价格预测中对数据完整性依赖性强与单一模型难以全面利用多种数据特征等问题,提出了一种将基于注意力机制的卷积双向长短期记忆神经网络(CNN-BiLSTM-Attention)、支持向量机回归(SVR)与LightGBM组合的增强式集成学习方法,并分别在包含历史交易、天气、汇率、油价等多种特征数据的数据集上进行了实验.实验以小麦和棉花价格预测为目标任务,使用互信息法进行特征选择,选择误差较低的CNN-BiLSTM-Attention模型作为基模型,与机器学习模型通过线性回归进行增强式集成学习.实验结果表明该集成学习方法在小麦及棉花数据集上预测结果的均方根误差(RMSE)值分别为12.812, 74.365,较之3个基模型分别降低11.00%, 0.94%、4.44%,1.99%与13.03%, 4.39%,能够有效降低价格预测的误差.  相似文献   

11.
Learning regressors from low‐resolution patches to high‐resolution patches has shown promising results for image super‐resolution. We observe that some regressors are better at dealing with certain cases, and others with different cases. In this paper, we jointly learn a collection of regressors, which collectively yield the smallest super‐resolving error for all training data. After training, each training sample is associated with a label to indicate its ‘best’ regressor, the one yielding the smallest error. During testing, our method bases on the concept of ‘adaptive selection’ to select the most appropriate regressor for each input patch. We assume that similar patches can be super‐resolved by the same regressor and use a fast, approximate kNN approach to transfer the labels of training patches to test patches. The method is conceptually simple and computationally efficient, yet very effective. Experiments on four datasets show that our method outperforms competing methods.  相似文献   

12.
This paper presents cluster‐based ensemble classifier – an approach toward generating ensemble of classifiers using multiple clusters within classified data. Clustering is incorporated to partition data set into multiple clusters of highly correlated data that are difficult to separate otherwise and different base classifiers are used to learn class boundaries within the clusters. As the different base classifiers engage on different difficult‐to‐classify subsets of the data, the learning of the base classifiers is more focussed and accurate. A selection rather than fusion approach achieves the final verdict on patterns of unknown classes. The impact of clustering on the learning parameters and accuracy of a number of learning algorithms including neural network, support vector machine, decision tree and k‐NN classifier is investigated. A number of benchmark data sets from the UCI machine learning repository were used to evaluate the cluster‐based ensemble classifier and the experimental results demonstrate its superiority over bagging and boosting.  相似文献   

13.
Identifying the optimal subset of regressors in a regression bagging ensemble is a difficult task that has exponential cost in the size of the ensemble. In this article we analyze two approximate techniques especially devised to address this problem. The first strategy constructs a relaxed version of the problem that can be solved using semidefinite programming. The second one is based on modifying the order of aggregation of the regressors. Ordered aggregation is a simple forward selection algorithm that incorporates at each step the regressor that reduces the training error of the current subensemble the most. Both techniques can be used to identify subensembles that are close to the optimal ones, which can be obtained by exhaustive search at a larger computational cost. Experiments in a wide variety of synthetic and real-world regression problems show that pruned ensembles composed of only 20% of the initial regressors often have better generalization performance than the original bagging ensembles. These improvements are due to a reduction in the bias and the covariance components of the generalization error. Subensembles obtained using either SDP or ordered aggregation generally outperform subensembles obtained by other ensemble pruning methods and ensembles generated by the Adaboost.R2 algorithm, negative correlation learning or regularized linear stacked generalization. Ordered aggregation has a slightly better overall performance than SDP in the problems investigated. However, the difference is not statistically significant. Ordered aggregation has the further advantage that it produces a nested sequence of near-optimal subensembles of increasing size with no additional computational cost.  相似文献   

14.
This paper explores the Genetic Programming and Boosting technique to obtain an ensemble of regressors and proposes a new formula for the updating of weights, as well as for the final hypothesis. Differently from studies found in the literature, in this paper we investigate the use of the correlation metric as an additional factor for the error metric. This new approach, called Boosting using Correlation Coefficients (BCC) has been empirically obtained after trying to improve the results of the other methods. To validate this method, we conducted two groups of experiments. In the first group, we explore the BCC for time series forecasting, in academic series and in a widespread Monte Carlo simulation covering the entire ARMA spectrum. The Genetic Programming (GP) is used as a base learner and the mean squared error (MSE) has been used to compare the accuracy of the proposed method against the results obtained by GP, GP using traditional boosting and the traditional statistical methodology (ARMA). The second group of experiments aims at evaluating the proposed method on multivariate regression problems by choosing Cart (Classification and Regression Tree) as the base learner.  相似文献   

15.
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semisupervised learning has attracted much attention. Previous research on semisupervised learning mainly focuses on semisupervised classification. Although regression is almost as important as classification, semisupervised regression is largely understudied. In particular, although cotraining is a main paradigm in semisupervised learning, few works has been devoted to cotraining-style semisupervised regression algorithms. In this paper, a cotraining-style semisupervised regression algorithm, that is, COREG, is proposed. This algorithm uses two regressors, each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean squared error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.  相似文献   

16.
Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.  相似文献   

17.
Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly, a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.  相似文献   

18.
A fundamental principle in practical nonlinear data modeling is the parsimonious principle of constructing the minimal model that explains the training data well. Leave-one-out (LOO) cross validation is often used to estimate generalization errors by choosing amongst different network architectures (M. Stone, “Cross validatory choice and assessment of statistical predictions”, J. R. Stast. Soc., Ser. B, 36, pp. 117–147, 1974). Based upon the minimization of LOO criteria of either the mean squares of LOO errors or the LOO misclassification rate respectively, we present two backward elimination algorithms as model post-processing procedures for regression and classification problems. The proposed backward elimination procedures exploit an orthogonalization procedure to enable the orthogonality between the subspace as spanned by the pruned model and the deleted regressor. Subsequently, it is shown that the LOO criteria used in both algorithms can be calculated via some analytic recursive formula, as derived in this contribution, without actually splitting the estimation data set so as to reduce computational expense. Compared to most other model construction methods, the proposed algorithms are advantageous in several aspects; (i) There are no tuning parameters to be optimized through an extra validation data set; (ii) The procedure is fully automatic without an additional stopping criteria; and (iii) The model structure selection is directly based on model generalization performance. The illustrative examples on regression and classification are used to demonstrate that the proposed algorithms are viable post-processing methods to prune a model to gain extra sparsity and improved generalization.  相似文献   

19.
In this work, a consensual approach is developed for modeling RF/microwave devices. In the proposed method, multiple individual models generated by an expert system ensemble are combined by a consensus rule that results in a consistent and improved generalization outputting with the highest possible reliability and accuracy. Here, the expert system ensemble is basically constructed by the competitor and diverse regressors which in our case are back‐propagation artificial neural network (ANN), support vector (SV) regression machine, k‐nearest neighbor and least squares algorithms that perform generalization independently from each other. In the case of excessive data, to reduce the amount of the data, the expert system ensemble of regressors can be shown to be trained by a subset consisting of the SVs. Main feature of the consensual modeling can be put forward as due to diversity in generalization process of each member of the ensemble, the resulted consensus model will effectively identify and encode more aspects of the nonlinear relationship between the independent and the dependent variables than will a single model. Thus, in the consensual modeling, an enhanced single model is built by combining the most successful sides of the competitor and the diverse contributors. Finally, consensual modeling is demonstrated typically for the two devices: the first is a passive device modeling which is synthesis of the conductor‐backed coplanar waveguide with upper shielding and the second is an active device modeling which is the noise modeling of a microwave transistor. © 2010 Wiley Periodicals, Inc. Int J RF and Microwave CAE, 2010.  相似文献   

20.
Forecasting time series with genetic fuzzy predictor ensemble   总被引:2,自引:0,他引:2  
This paper proposes a genetic fuzzy predictor ensemble (GFPE) for the accurate prediction of the future in the chaotic or nonstationary time series. Each fuzzy predictor in the GFPE is built from two design stages, where each stage is performed by different genetic algorithms (GA). The first stage generates a fuzzy rule base that covers as many of training examples as possible. The second stage builds fine-tuned membership functions that make the prediction error as small as possible. These two design stages are repeated independently upon the different partition combinations of input-output variables. The prediction error will be reduced further by invoking the GFPE that combines multiple fuzzy predictors by an equal prediction error weighting method. Applications to both the Mackey-Glass chaotic time series and the nonstationary foreign currency exchange rate prediction problem are presented. The prediction accuracy of the proposed method is compared with that of other fuzzy and neural network predictors in terms of the root mean squared error (RMSE)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号