White [6–8] has theoretically shown that learning procedures used in network training are inherently statistical in nature. This paper takes a small but pioneering experimental step towards learning about this statistical behaviour by showing that the results obtained are completely in line with White's theory. We show that, given two random vectors
X (input) and
Y (target), which follow a two-dimensional standard normal distribution, and fixed network complexity, the network's fitting ability definitely improves with increasing correlation coefficient r
XY (0r
XY1) between
X and
Y. We also provide numerical examples which support that both increasing the network complexity and training for much longer do improve the network's performance. However, as we clearly demonstrate, these improvements are far from dramatic, except in the case r
XY=+ 1. This is mainly due to the existence of a theoretical lower bound to the inherent conditional variance, as we both analytically and numerically show. Finally, the fitting ability of the network for a test set is illustrated with an example.Nomenclature
X
General
r-dimensional random vector. In this work it is a one-dimensional normal vector and represents the input vector
-
Y
General
p-dimensional random vector. In this work it is a one-dimensional normal vector and represents the target vector
-
Z
General
r+p dimensional random vector. In this work it is a two-dimensional normal vector
-
E
Expectation operator (Lebesgue integral)
-
g(
X)=
E(
Y¦X)
Conditional expectation
-
Experimental random error (defined by Eq. (2.1))
-
y
Realized target value
-
o
Output value
-
f
Network's output function. It is formally expressed as
f: R
r×
WR
p, where
W is the appropriate weight space
-
Average (or expected) performance function. It is defined by Eq. (2.2) as (
w)=
E[(
Yf(X,w)],
w W
-
Network's performance
-
w
*
Weight vector for optimal solution. That is, the objective of network is such that (
w
*) is minimum
-
C
1
Component one
-
C
2
Component two
-
Z
Matrix of realised values of the random vector
Z over
n observations
-
Z
t
Transformed matrix version of
Z in such a way that
X and
Y have values in [0,1]
-
X
t
,
Y
t
Transformed versions of
X and
Y and both are standard one-dimensional normal vectors
-
n
h
Number of hidden nodes (neurons)
-
r
XY
Correlation coefficient between either
X and
Y or
X
t and
Y
t
-
s
and
k
in Eq. (3.1) and afterwards
s
is average value of 100 different
Z
t
matrices.
k
is the error function of
kth
Z
t
, matrix. In Eq. (3.1), the summation is from
k=1 to 100, and in Eq. (3.2) from
i=1 to
n. In Eq. (3.2)
o
ki
and
y
ki
are the output and target values for the kth
Z
t matrix and ith observation, respectively
-
1/2(
w
*) and
k
(w
n)
k(w
n) is the sample analogue of
1/2(
w
*)
-
Y
2
In Eq. (4.1) and afterwards,
Y
2 is the variance of
Y
-
Y
2
variance of
Y
t
. In Sect. 4.3 the transformation is
Y
t=
a
Y+
b
-
Y
max,
Y
min
the maximum and minimum values of
Y
-
R
Correlation matrix of
X and
Y
-
Covariance matrix of
X and
Y
-
Membership symbol in set theory
相似文献