Neuroscience in economics.
Ratiu, Ioan-Gheorghe ; Carstea, Claudia-Georgeta ; David, Nicoleta 等
1. INTRODUCTION
Neural Networks--originally inspired from Neuroscience provide
powerful models for statistical data analysis. Their most prominent
feature is their ability to "learn" dependencies based on a
finite number of observations. In the Neural Networks the term
"learning" means that the knowledge acquired from the samples
can be generalized to as yet unseen observations. In this sense, a
Neural Network is often called a Learning Machine. Neural Networks might
be considered as a metaphor for an agent who learns dependencies of his
environment and thus infers strategies of behavior based on a limited
number of observations.
2. STATISTICAL LEARNING THEORY
We present some results from Statistical Learning Theory (Vapnik,
1998, 1982; Pollard, 1984), which provides a basis for understanding the
generalization properties of existing Neural Network learning
algorithms. The principle is formulated which can be used to find a
classifier al whose performance is close to the one of the optimal
classifier a independently of the used hypothesis space and any
assumptions on the underlying probability [P.sub.XY]. The principle says
that choosing [[alpha].sub.1] such that:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)
leads to the set of parameters al that minimizes the deviation
[absolute value of R([[alpha].sup.*] - R ([[alpha].sub.l]]--under
conditions explicitly stated in the paper.
This principle can be explained as "choosing that classifier
[[alpha].sub.l] that minimizes the training error or empirical risk
respectively"; this principle is Empirical Risk Minimization (ERM).
If [LAMBDA] contains a finite number of possible classifiers, the
principle of choosing [[alpha].sub.l] to approximate [[alpha].sup.*] is
consistent. Consistency means that the generalization error can be
bounded with probability one if l tends to infinity. In (Vapnik 1982)
presented a new learning principle, Structural Risk Minimization (SRM).
The idea of this principle is to define a priori nested subsets
[[LAMBDA].sub.1] [subset] [[LAMBDA].sub.2] [subset] ... [subset]
[LAMBDA] of functions and applying the ERM principle (training error
minimization) in each of the predefined [[LAMBDA].sub.i] to obtain
classifiers [[alpha].sup.i.sub.l]. Exploiting the inequality, one is
able to select that classifier [??] which minimizes the right hand side.
Let us make some remarks about prior knowledge.
3. ALGORITHMS FOR NEURAL NETWORK LEARNING
In the past the term Neural Network was used to describe a network
of "neurons" with a fixed dynamic for each neuron. We want to
abstract from the biological origin and view Neural Networks as purely
mathematical models. In these networks computations are performed by
feeding the data into the n units of an input layer from which they are
passed through a sequence of hidden layers and finally to m units of the
output layer. Each continuous decision function can be arbitrarily well
approximated by a Neural Network with only one hidden layer (Baum,
1988). Let us denote the number of units in the hidden layer; it is
sufficient to consider a network described by:
h(x; [alpha]) = [f.sub.2] ([f.sub.1] (x; [beta]; [gamma]) (2)
where [f.sub.1]:[R.sup.n] [right arrow] [R.sup.r] and
[f.sub.2]:[R.sup.r] [right arrow] [R.sup.m] are continuous functions.
[alpha] = ([beta], [gamma])' is the vector of adjustable
parameters, consisting of [beta] which is the vector of weights of the
hidden layer and [gamma] being the weight vector of the output layer.
[FIGURE 1 OMITTED]
Is common practice to represent each unit where a computation is
being performed (neuron) by a node, and each connection (synapse) by an
edge of a graph. An example of a two layer Neural Network is shown in
Fig. 1. For the case of a two-layer perceptron one chooses:
[f.sub.1] (x; [beta]) = ([g.sub.1] ([[beta]'.sub.1])x), ...,
[g.sub.1] ([[beta]'.sub.r] x))' and [f.sub.2] (z; [gamma]) =
[g.sub.2] {[gamma]' z), where z is the r-dimensional vector of
hidden neuron activations, [beta] = ([[beta].sub.1], ...
[[beta].sub.r])', and [g.sub.1]:R [right arrow] R and [g.sub.2] = R
[right arrow] R are the "transfer" functions of the neurons.
This type of Neural Network is called a multilayer perceptron (MLP).
Other type is called a radial basis function network (RBF). Usually the
[g.sub.1] (x, [beta], [[sigma].sub.j]) is given by Gaussian of the form:
[g.sub.1] (x, [[beta].sub.j], [[sigma].sub.j]) = exp [(-[[parallel]
x - [[beta].sub.j][parallel].sup.2])/2 [[sigma].sup.2.sub.j] (3)
Again, we consider the case of binary classification. Similarly to
backpropagation the empirical risk becomes:
[R.sub.emp] ([alpha], [x.sub.t] [varies] 1/2 [([g.sub.2]
([r.summation over (j=1)] [[gamma].sub.j] [g.sub.1] ([x.sub.t],
[[beta].sub.j], [[sigma].sub.j)) - [y.sub.t]).sup.2] (4)
The main conceptual difference between MLP's and RBF networks
is that the former perform a global approximation in input space while
the latter implement a local approximation. The hidden neurons of an RBF
network specialize to localized regions in data space by fitting a set
of Gaussians to the data. In the extreme case, where r = l, i.e. there
are as many hidden neurons as data points in the training set, the ERM
principle cannot lead to consistent learning because such an RBF
networks can be shown to have infinite VC dimension. The local
approximation performed by RBF Networks an MLP considers the data space
as a whole and is thus able to capture complex dependencies underlying
the data. The advantage of preprocessing the data is the reduction of
their dimensionality. It is referred to as the curse of dimensionality,
i.e. the increase of necessary samples to obtain a small generalization
error grows exponentially in the number of dimensions. Another way to
incorporate this into the learning process is to the minimize
[R.sub.emp] ([alpha]) + k [[parallel][alpha]][parallel].sup.2] where k
has to be chosen beforehand. Such a technique is also called
regularization and was successfully used in the weight decay learning
algorithm.
4. ECONOMIC APPLICATIONS OF NEURAL NETWORKS
With the application of backpropagation to Neural Network learning
and the revived interest into Neural Networks, Economists started to
adopt this tool as well, since the Neural Networks for classification
and regression can easily be adapted to economic problems. The majority
of papers that use Neural Networks for classification tasks in Economics
can be found in the area of bankruptcy prediction of economic agents,
mainly banks. An integration of a Neural Network and an expert system
such that courses of action can be recommended to prevent the bankruptcy
(Coleman et al., 1991). Probably the largest share of economic
applications of Neural Networks can be found in the field of prediction
of time series in the capital markets. Usually, linear models of
financial time series (exchange rates, stock exchange series) perform
poorly and linear univariate models consistently give evidence for a
random walk. This has been taken in favor of the efficient market
hypothesis where efficiency means that the market fully and correctly
reflects all relevant information in determining security prices.
Applications of time series prediction in other than financial fields
are for macroeconomic variables, for consumers' expenditure, or for
agricultural economics. Less common application of Neural Networks in
Economics can be found in the modeling of learning processes of bounded
rational adaptive artificial agents. These learning techniques are
essentially based on the ERM principle. A new Neural Network learning
technique that utilizes the SRM principle is so called Support Vector
Learning. It has been successfully applied in the field of character
recognition, object recognition, and text categorization. We start by
developing the learning algorithm for the perceptron under the
assumption that the training set can be classified without training
error. Then we extend the learning algorithm to the case where the
objects are not linearly separable. Using a technique known as the
kernel trick we show how the learning algorithm can be extended to the
(nonlinear) case of MLP's and RBF Networks. Each symmetric function
K: [R.sup.n] x [R.sup.n] [right arrow] R that satisfies the Mercer
conditions corresponds to an inner product in some space F. Such
functions K (*, *) are called kernels. To extend the Support Vector
method to nonlinear decision functions, kernels need to be found that
can easily be calculated and at the same time map to an appropriate
feature space F.
5. SUPPORT VECTOR NETWORKS FOR
PREFERENCE LEARNING
We show how Neural Networks can be applied to the problem of
preference learning. The learned function should be transitive and
asymmetric. Theoretical Background The most important problem in solving
preference learning problems is the definition of an appropriate loss
for each decision f(x; [alpha]) whereas the true ordinal utility is
given by y. Since the y's are ordinal, no knowledge is given about
the difference y-f(x; [alpha]). The loss given in Equation (1) weights
each incorrect assignment f(x; [alpha]) by the same amount and thus is
inappropriate as well. This leads to the problem, that no risk can be
formulated which shall be minimized by a Neural Network learning
algorithm. To derive a Neural Network algorithm we make the assumption,
that there is an unknown cardinal utility U(x) an object x provides to
the customer. Moreover we assume, that if [x.sup.(1)] is preferred over
[x.sup.(2)] then U([x.sup.(2)]) > U([x.sup.(2)]), and vice versa. The
advantage of such a model is the fact, that transitivity and asymmetry
are fulfilled for each decision function. In terms of Statistical
Learning Theory this means, that our hypothesis space is maximally
reduced we only want to learn decision functions with these properties.
Let us illustrate the above discussion by an Economic Application
example. Consider a situation where two goods compete, i.e. x =
([x.sub.1], [x.sub.2]) is a vector that describes a basket of two goods.
Assume an agent who has purchased a limited number of combinations. The
agent will order these combinations according to his preferences and
assign a utility level to these combinations such as to achieve the
highest possible utility with the next purchase. To simulate this
situation we generated a limited number of combinations and classified
them according to an underlying true latent utility function: U(x) =
[x.sub.1] [x.sub.2]/2; such as to implement the agent's preference
structure. This utility function is ordinal in the sense that any
homogenous transformation of this function would not affect the
resulting order of combinations. The only given information is the set
of ordered objects. Then the process of learning the utility function is
simulated with a Support Vector Network that represents metaphorically
the learning capacity of the agent.
6. CONCLUSION
We presented three commonly used learning algorithms: perceptron
learning, backpropagation learning, and radial basis function learning.
We distinguished three types' economic applications of neural
networks: Classification of economic agents, time series prediction and
the modeling of bounded rational agents. While according to the
literature Neural Networks operated well and often better than
traditional linear methods when applied to classification tasks, their
performance in time series prediction was often reported to be just as
good as traditional methods. Finally, choosing Neural Networks as models
for bounded rational artificial adaptive agents appears to be a viable
strategy, although there existent alternatives. We presented a new
learning method, so called Support Vector Learning, which is based on
Statistical Learning Theory, shows good generalization and is easily
extended to nonlinear decision functions. This algorithm was used to
model a situation where a buyer learns his preferences from a limited
set of goods and orders them according to an ordinal utility scale.
7. REFERENCES
Baum, E. (1988). On the capabilities of multilayer perceptrons,
Journal of Complexity 3, pp 331-342
Coleman, K., Graettinger, T & Lawrence, W. (1991). Neural
Networks for Bankruptcy Prediction: The Power to Solve Financial
Problems, AI Review July/August, pp 48-50
Pollard, D. (1984). Convergence of Stochastic Processes, New York:
Springer-Verlag
Vapnik, V. (1998). Statistical Learning Theory, New York: John
Wiley and Sons
Vapnik, V. (1982). Estimation of Dependences Based on Empirical
Data, New York: Springer-Verlag