首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recurrent neural networks and robust time series prediction   总被引:22,自引:0,他引:22  
We propose a robust learning algorithm and apply it to recurrent neural networks. This algorithm is based on filtering outliers from the data and then estimating parameters from the filtered data. The filtering removes outliers from both the target function and the inputs of the neural network. The filtering is soft in that some outliers are neither completely rejected nor accepted. To show the need for robust recurrent networks, we compare the predictive ability of least squares estimated recurrent networks on synthetic data and on the Puget Power Electric Demand time series. These investigations result in a class of recurrent neural networks, NARMA(p,q), which show advantages over feedforward neural networks for time series with a moving average component. Conventional least squares methods of fitting NARMA(p,q) neural network models are shown to suffer a lack of robustness towards outliers. This sensitivity to outliers is demonstrated on both the synthetic and real data sets. Filtering the Puget Power Electric Demand time series is shown to automatically remove the outliers due to holidays. Neural networks trained on filtered data are then shown to give better predictions than neural networks trained on unfiltered time series.  相似文献   

2.
L1范局部线性嵌入   总被引:1,自引:0,他引:1       下载免费PDF全文
数据降维问题存在于包括机器学习、模式识别、数据挖掘等多个信息处理领域。局部线性嵌入(LLE)是一种用于数据降维的无监督非线性流行学习算法,因其优良的性能,LLE得以广泛应用。针对传统的LLE对离群(或噪声)敏感的问题,提出一种鲁棒的基于L1范数最小化的LLE算法(L1-LLE)。通过L1范数最小化来求取局部重构矩阵,减小了重构矩阵能量,能有效克服离群(或噪声)干扰。利用现有优化技术,L1-LLE算法简单且易实现。证明了L1-LLE算法的收敛性。分别对人造和实际数据集进行应用测试,通过与传统LLE方法进行性能比较,结果显示L1-LLE方法是稳定、有效的。  相似文献   

3.
Robust TSK fuzzy modeling for function approximation with outliers   总被引:3,自引:0,他引:3  
The Takagi-Sugeno-Kang (TSK) type of fuzzy models has attracted a great attention of the fuzzy modeling community due to their good performance in various applications. Most approaches for modeling TSK fuzzy rules define their fuzzy subspaces based on the idea of training data being close enough instead of having similar functions. Besides, training data sets algorithms often contain outliers, which seriously affect least-square error minimization clustering and learning algorithms. A robust TSK fuzzy modeling approach is presented. In the approach, a clustering algorithm termed as robust fuzzy regression agglomeration (RFRA) is proposed to define fuzzy subspaces in a fuzzy regression manner with robust capability against outliers. To obtain a more precision model, a robust fine-tuning algorithm is then employed. Various examples are used to verify the effectiveness of the proposed approach. From the simulation results, the proposed robust TSK fuzzy modeling indeed showed superior performance over other approaches  相似文献   

4.
To achieve robust estimation for noisy data set, a recursive outlier elimination-based least squares support vector machine (ROELS-SVM) algorithm is proposed in this paper. In this algorithm, statistical information from the error variables of least squares support vector machine is recursively learned and a criterion derived from robust linear regression is employed for outlier elimination. Besides, decremental learning technique is implemented in the recursive training–eliminating stage, which ensures that the outliers are eliminated with low computational cost. The proposed algorithm is compared with re-weighted least squares support vector machine on multiple data sets and the results demonstrate the remarkably robust performance of the ROELS-SVM.  相似文献   

5.
The annealing robust backpropagation (ARBP) learning algorithm   总被引:2,自引:0,他引:2  
Multilayer feedforward neural networks are often referred to as universal approximators. Nevertheless, if the used training data are corrupted by large noise, such as outliers, traditional backpropagation learning schemes may not always come up with acceptable performance. Even though various robust learning algorithms have been proposed in the literature, those approaches still suffer from the initialization problem. In those robust learning algorithms, the so-called M-estimator is employed. For the M-estimation type of learning algorithms, the loss function is used to play the role in discriminating against outliers from the majority by degrading the effects of those outliers in learning. However, the loss function used in those algorithms may not correctly discriminate against those outliers. In the paper, the annealing robust backpropagation learning algorithm (ARBP) that adopts the annealing concept into the robust learning algorithms is proposed to deal with the problem of modeling under the existence of outliers. The proposed algorithm has been employed in various examples. Those results all demonstrated the superiority over other robust learning algorithms independent of outliers. In the paper, not only is the annealing concept adopted into the robust learning algorithms but also the annealing schedule k/t was found experimentally to achieve the best performance among other annealing schedules, where k is a constant and t is the epoch number.  相似文献   

6.
For real-world applications, the obtained data are always subject to noise or outliers. The learning mechanism of cerebellar model articulation controller (CMAC), a neurological model, is to imitate the cerebellum of human being. CMAC has an attractive property of learning speed in which a small subset addressed by the input space determines output instantaneously. For fuzzy cerebellar model articulation controller (FCMAC), the concept of fuzzy is incorporated into CMAC to improve the accuracy problem. However, the distributions of errors into the addressed hypercubes may cause unacceptable learning performance for input data with noise or outliers. For robust fuzzy cerebellar model articulation controller (RFCMAC), the robust learning of M-estimator can be embedded into FCMAC to degrade noise or outliers. Meanwhile, support vector machine (SVR) is a machine learning theory based algorithm which has been applied successfully to a number of regression problems when noise or outliers exist. Unfortunately, the practical application of SVR is limited to defining a set of parameters for obtaining admirable performance by the user. In this paper, a robust learning algorithm based on support SVR and RFCMAC is proposed. The proposed algorithm has both the advantage of SVR, the ability to avoid corruption effects, and the advantage of RFCMAC, the ability to obtain attractive properties of learning performance and to increase accurate approximation. Additionally, particle swarm optimization (PSO) is applied to obtain the best parameters setting for SVR. From simulation results, it shows that the proposed algorithm outperforms other algorithms.  相似文献   

7.
针对传统子空间建模技术中存在的两个难点问题,即对训练数据中的噪音或局外点非常敏感和基于批处理方式的大尺度高维样本模型学习计算非常费时,提出了一种新的鲁棒子空间建模方法.该方法先利用基于双平方函数的鲁棒估计,基于梯度下降的学习规则和M-估计器来同时学习和估计线性模型的初始参数,自动分级检测出初始训练样本集中的样本级局外点和样本中的信号级局外点;然后利用鲁棒的增量学习来更新参数,获得可靠的子空间模型.实验证明,这种新的鲁棒子空间建模方法能有效处理不同类型的噪音数据,在学习亮度子空间模型时能有效解决亮度明显变化、遮挡、噪音污染等敏感问题,并且具有较快的学习速度.  相似文献   

8.
Data clustering has attracted a lot of research attention in the field of computational statistics and data mining. In most related studies, the dissimilarity between two clusters is defined as the distance between their centroids or the distance between two closest (or farthest) data points However, all of these measures are vulnerable to outliers and removing the outliers precisely is yet another difficult task. In view of this, we propose a new similarity measure, referred to as cohesion, to measure the intercluster distances. By using this new measure of cohesion, we have designed a two-phase clustering algorithm, called cohesion-based self-merging (abbreviated as CSM), which runs in time linear to the size of input data set. Combining the features of partitional and hierarchical clustering methods, algorithm CSM partitions the input data set into several small subclusters in the first phase and then continuously merges the subclusters based on cohesion in a hierarchical manner in the second phase. The time and the space complexities of algorithm CSM are analyzed. As shown by our performance studies, the cohesion-based clustering is very robust and possesses excellent tolerance to outliers in various workloads. More importantly, algorithm CSM is shown to be able to cluster the data sets of arbitrary shapes very efficiently and provide better clustering results than those by prior methods.  相似文献   

9.
In this study, a hybrid robust support vector machine for regression is proposed to deal with training data sets with outliers. The proposed approach consists of two stages of strategies. The first stage is for data preprocessing and a support vector machine for regression is used to filter out outliers in the training data set. Since the outliers in the training data set are removed, the concept of robust statistic is not needed for reducing the outliers’ effects in the later stage. Then, the training data set except for outliers, called as the reduced training data set, is directly used in training the non-robust least squares support vector machines for regression (LS-SVMR) or the non-robust support vector regression networks (SVRNs) in the second stage. Consequently, the learning mechanism of the proposed approach is much easier than that of the robust support vector regression networks (RSVRNs) approach and of the weighted LS-SVMR approach. Based on the simulation results, the performance of the proposed approach with non-robust LS-SVMR is superior to the weighted LS-SVMR approach when the outliers exist. Moreover, the performance of the proposed approach with non-robust SVRNs is also superior to the RSVRNs approach.  相似文献   

10.
One-class learning algorithms are used in situations when training data are available only for one class, called target class. Data for other class(es), called outliers, are not available. One-class learning algorithms are used for detecting outliers, or novelty, in the data. The common approach in one-class learning is to use density estimation techniques or adapt standard classification algorithms to define a decision boundary that encompasses only the target data. In this paper, we introduce OneClass-DS learning algorithm that combines rule-based classification with greedy search algorithm based on density of features. Its performance is tested on 25 data sets and compared with eight other one-class algorithms; the results show that it performs on par with those algorithms.  相似文献   

11.
格拉斯曼平均子空间对应着高斯数据的主成分,解决了PCA的扩展性问题,但算法假定样本的贡献取决于样本的长度,这可能导致离群点对算法的干扰较强。为此,利用无监督学习数据的局部特性或监督学习中样本的类别信息建立样本的权重,从而提出一种基于样本加权的格拉斯曼平均的算法,在UCI数据集和ORL人脸数据库上的实验结果表明,新算法有好的鲁棒性并且其识别率比已有方法提高1%~2%。  相似文献   

12.
《Applied Soft Computing》2007,7(3):957-967
In this study, CPBUM neural networks with annealing robust learning algorithm (ARLA) are proposed to improve the problems of conventional neural networks for modeling with outliers and noise. In general, the obtained training data in the real applications maybe contain the outliers and noise. Although the CPBUM neural networks have fast convergent speed, these are difficult to deal with outliers and noise. Hence, the robust property must be enhanced for the CPBUM neural networks. Additionally, the ARLA can be overcome the problems of initialization and cut-off points in the traditional robust learning algorithm and deal with the model with outliers and noise. In this study, the ARLA is used as the learning algorithm to adjust the weights of the CPBUM neural networks. It tunes out that the CPBUM neural networks with the ARLA have fast convergent speed and robust against outliers and noise than the conventional neural networks with robust mechanism. Simulation results are provided to show the validity and applicability of the proposed neural networks.  相似文献   

13.
在机器学习理论与应用中,特征选择是降低高维数据特征维度的常用方法之一。传统的特征选择方法多数基于完整数据集,对实际应用中普遍存在缺失数据的情形研究较少。针对不完整数据中含有未被观察信息和存在异常值的特点,提出一种基于概率矩阵分解技术的鲁棒特征选择方法。使用基于分簇的概率矩阵分解模型对数据集中的缺失值进行近似估计,以有效测量相邻簇之间数据的相似性,缩小问题规模,同时降低填充误差。依据缺失数据值存在少量异常值的情形,利用基于l2,1损失函数的方法进行特征选择,在此基础上给出不完整数据集的特征选择方法流程,并对其收敛性进行理论分析。该方法利用不完整数据集中的所有信息,有效应对不完整数据集中异常值带来的影响。实验结果表明,相比传统特征选择方法,该方法在合成数据集上选择更少的无关特征,可降低异常值带来的影响,在真实数据集上获得了较高的分类准确率,能够选择出更为准确的特征。  相似文献   

14.
Usually, in the regression models, the data are contaminated with unusually observations (outliers). For that reason the last 30 years have developed robust regression estimators. Among them some of the most famous are Least Trimmed Squares (LTS), MM, Penalized Trimmed Square (PTS) and others. Most of these methods, especially PTS, are based on initial leverage, concerning x outlying observations, of the data sample. However, often, multiple x-outliers pull the distance towards their value, causing leverage bias, and this is the masking problem.In this work we develop a new algorithm for robust leverage estimate based on Least Trimmed Euclidean Deviations (LTED). Extensive computational, Monte-Carlo simulations, with varying types of outliers and degrees of contamination, indicate that the LTED procedure identifies successfully the multiple outliers, and the resulting robust leverage improves significantly the PTS performance.  相似文献   

15.
The problem of selecting variables or features in a regression model in the presence of both additive (vertical) and leverage outliers is addressed. Since variable selection and the detection of anomalous data are not separable problems, the focus is on methods that select variables and outliers simultaneously. For selection, the fast forward selection algorithm, least angle regression (LARS), is used, but it is not robust. To achieve robustness to additive outliers, a dummy variable identity matrix is appended to the design matrix allowing both real variables and additive outliers to be in the selection set. For leverage outliers, these selection methods are used on samples of elemental sets in a manner similar to that used in high breakdown robust estimation. These results are compared to several other selection methods of varying computational complexity and robustness. The extension of these methods to situations where the number of variables exceeds the number of observations is discussed.  相似文献   

16.
This article presents a new rule discovery algorithm named PLCG that can find simple, robust partial rule models (sets of classification rules) in complex data where it is difficult or impossible to find models that completely account for all the phenomena of interest. Technically speaking, PLCG is an ensemble learning method that learns multiple models via some standard rule learning algorithm, and then combines these into one final rule set via clustering, generalization, and heuristic rule selection. The algorithm was developed in the context of an interdisciplinary research project that aims at discovering fundamental principles of expressive music performance from large amounts of complex real-world data (specifically, measurements of actual performances by concert pianists). It will be shown that PLCG succeeds in finding some surprisingly simple and robust performance principles, some of which represent truly novel and musically meaningful discoveries. A set of more systematic experiments shows that PLCG usually discovers significantly simpler theories than more direct approaches to rule learning (including the state-of-the-art learning algorithm Ripper), while striking a compromise between coverage and precision. The experiments also show how easy it is to use PLCG as a meta-learning strategy to explore different parts of the space of rule models.  相似文献   

17.
极端学习机因其学习速度快、泛化性能强等优点,在当今模式识别领域中已经成为了主流的研究方向;但是,由于该算法稳定性差,往往易受数据集中噪声的干扰,在实际应用中导致得到的分类效果不是很显著;因此,为了提高极端学习机分类的准确性,针对数据集样本中带有噪声和离群点问题,提出了一种基于角度优化的鲁棒极端学习机算法;该方法利用鲁棒激活函数角度优化的原则,首先降低了离群点对分类算法的影响,从而保持数据样本的全局结构信息,达到更好的去噪效果;其次,有效的避免隐层节点输出矩阵求解不准的问题,进一步增强极端学习机的泛化性能;通过应用在普遍图像数据库上的实验结果表明,这种提出的算法与其他算法相比具有更强的鲁棒性和较高的识别率。  相似文献   

18.
基于鲁棒的全局流形学习方法   总被引:4,自引:4,他引:0       下载免费PDF全文
王靖 《计算机工程》2008,34(9):192-194
非线性降维在数据挖掘、机器学习、图像分析和计算机视觉等领域应用广泛。等距映射算法(Isomap)是一种全局流形学习方法,能有效地学习等距流形的“低维嵌入”,但它对数据中的离群样本点缺乏鲁棒性。针对这种情况,该文提出一种离群点检测方法,基于Isomap的基本思想,给出一种鲁棒的全局流形学习方法,提高Isomap处理离群样本点的能力。数值实验表明了该方法的有效性。  相似文献   

19.
The classification of imbalanced data is a major challenge for machine learning. In this paper, we presented a fuzzy total margin based support vector machine (FTM-SVM) method to handle the class imbalance learning (CIL) problem in the presence of outliers and noise. The proposed method incorporates total margin algorithm, different cost functions and the proper approach of fuzzification of the penalty into FTM-SVM and formulates them in nonlinear case. We considered an excellent type of fuzzy membership functions to assign fuzzy membership values and got six FTM-SVM settings. We evaluated the proposed FTM-SVM method on two artificial data sets and 16 real-world imbalanced data sets. Experimental results show that the proposed FTM-SVM method has higher G_Mean and F_Measure values than some existing CIL methods. Based on the overall results, we can conclude that the proposed FTM-SVM method is effective for CIL problem, especially in the presence of outliers and noise in data sets.  相似文献   

20.
Support vector regression (SVR) is now a well-established method for estimating real-valued functions. However, the standard SVR is not effective to deal with severe outlier contamination of both response and predictor variables commonly encountered in numerous real applications. In this paper, we present a bounded influence SVR, which downweights the influence of outliers in all the regression variables. The proposed approach adopts an adaptive weighting strategy, which is based on both a robust adaptive scale estimator for large regression residuals and the statistic of a “kernelized” hat matrix for leverage point removal. Thus, our algorithm has the ability to accurately extract the dominant subset in corrupted data sets. Simulated linear and nonlinear data sets show the robustness of our algorithm against outliers. Last, chemical and astronomical data sets that exhibit severe outlier contamination are used to demonstrate the performance of the proposed approach in real situations.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号