Correlation structure of training data and the fitting ability of back propagation networks: Some experimental results期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Correlation structure of training data and the fitting ability of back propagation networks: Some experimental results

Authors:	Yildiz Nihat

Affiliation:	(1) Department of Physics, Cumhuriyet University, 58140 Sivas, Turkey

Abstract:	White 6–8] has theoretically shown that learning procedures used in network training are inherently statistical in nature. This paper takes a small but pioneering experimental step towards learning about this statistical behaviour by showing that the results obtained are completely in line with White's theory. We show that, given two random vectorsX (input) andY (target), which follow a two-dimensional standard normal distribution, and fixed network complexity, the network's fitting ability definitely improves with increasing correlation coefficient r_XY (0r_XY1) betweenX andY. We also provide numerical examples which support that both increasing the network complexity and training for much longer do improve the network's performance. However, as we clearly demonstrate, these improvements are far from dramatic, except in the case r_XY=+ 1. This is mainly due to the existence of a theoretical lower bound to the inherent conditional variance, as we both analytically and numerically show. Finally, the fitting ability of the network for a test set is illustrated with an example.Nomenclature X Generalr-dimensional random vector. In this work it is a one-dimensional normal vector and represents the input vector - Y Generalp-dimensional random vector. In this work it is a one-dimensional normal vector and represents the target vector - Z Generalr+p dimensional random vector. In this work it is a two-dimensional normal vector - E Expectation operator (Lebesgue integral) - g(X)=E(Y¦X) Conditional expectation - Experimental random error (defined by Eq. (2.1)) - y Realized target value - o Output value - f Network's output function. It is formally expressed asf: R ^r×WR ^p, whereW is the appropriate weight space - Average (or expected) performance function. It is defined by Eq. (2.2) as (w)=E(Yf(X,w)],w W - Network's performance - w ^* Weight vector for optimal solution. That is, the objective of network is such that (w ^) is minimum - C ₁ Component one - C ₂ Component two - Z Matrix of realised values of the random vectorZ overn observations - Z _t Transformed matrix version ofZ in such a way thatX andY have values in 0,1] - X _t,Y _t Transformed versions ofX andY and both are standard one-dimensional normal vectors - n _h Number of hidden nodes (neurons) - r _XY Correlation coefficient between eitherX andY orX _t andY _t - _s and _k in Eq. (3.1) and afterwards _s is average value of 100 differentZ _t matrices. _k is the error function ofkthZ _t, matrix. In Eq. (3.1), the summation is fromk=1 to 100, and in Eq. (3.2) fromi=1 ton. In Eq. (3.2)o _ki andy _ki are the output and target values for the kthZ _t matrix and ith observation, respectively - ^1/2(w ^) and _k(w_n) k(w_n) is the sample analogue of ^1/2(w ^*) - _Y ² In Eq. (4.1) and afterwards, _Y ² is the variance ofY - _Y ² variance ofY _t. In Sect. 4.3 the transformation isY _t=a Y+b - Y _max,Y _min the maximum and minimum values ofY - R Correlation matrix ofX andY - Covariance matrix ofX andY - Membership symbol in set theory

Keywords:	Correlation structure Fitting ability Extended-Delta-Bar method Statistical neural networks Conditional variance Asymptotic behaviour
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏