文章基本信息

标题：Neuroscience in economics.
作者：Ratiu, Ioan-Gheorghe ; Carstea, Claudia-Georgeta ; David, Nicoleta 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2009
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Neural Networks--originally inspired from Neuroscience provide powerful models for statistical data analysis. Their most prominent feature is their ability to "learn" dependencies based on a finite number of observations. In the Neural Networks the term "learning" means that the knowledge acquired from the samples can be generalized to as yet unseen observations. In this sense, a Neural Network is often called a Learning Machine. Neural Networks might be considered as a metaphor for an agent who learns dependencies of his environment and thus infers strategies of behavior based on a limited number of observations.
关键词：Economics;Machine learning;Neural circuitry;Neurosciences

Neuroscience in economics.

Ratiu, Ioan-Gheorghe ; Carstea, Claudia-Georgeta ; David, Nicoleta 等

1. INTRODUCTION

Neural Networks--originally inspired from Neuroscience provide powerful models for statistical data analysis. Their most prominent feature is their ability to "learn" dependencies based on a finite number of observations. In the Neural Networks the term "learning" means that the knowledge acquired from the samples can be generalized to as yet unseen observations. In this sense, a Neural Network is often called a Learning Machine. Neural Networks might be considered as a metaphor for an agent who learns dependencies of his environment and thus infers strategies of behavior based on a limited number of observations.

2. STATISTICAL LEARNING THEORY

We present some results from Statistical Learning Theory (Vapnik, 1998, 1982; Pollard, 1984), which provides a basis for understanding the generalization properties of existing Neural Network learning algorithms. The principle is formulated which can be used to find a classifier al whose performance is close to the one of the optimal classifier a independently of the used hypothesis space and any assumptions on the underlying probability [P.sub.XY]. The principle says that choosing [[alpha].sub.1] such that:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)

leads to the set of parameters al that minimizes the deviation [absolute value of R([[alpha].sup.*] - R ([[alpha].sub.l]]--under conditions explicitly stated in the paper.

This principle can be explained as "choosing that classifier [[alpha].sub.l] that minimizes the training error or empirical risk respectively"; this principle is Empirical Risk Minimization (ERM). If [LAMBDA] contains a finite number of possible classifiers, the principle of choosing [[alpha].sub.l] to approximate [[alpha].sup.*] is consistent. Consistency means that the generalization error can be bounded with probability one if l tends to infinity. In (Vapnik 1982) presented a new learning principle, Structural Risk Minimization (SRM). The idea of this principle is to define a priori nested subsets [[LAMBDA].sub.1] [subset] [[LAMBDA].sub.2] [subset] ... [subset] [LAMBDA] of functions and applying the ERM principle (training error minimization) in each of the predefined [[LAMBDA].sub.i] to obtain classifiers [[alpha].sup.i.sub.l]. Exploiting the inequality, one is able to select that classifier [??] which minimizes the right hand side. Let us make some remarks about prior knowledge.

3. ALGORITHMS FOR NEURAL NETWORK LEARNING

In the past the term Neural Network was used to describe a network of "neurons" with a fixed dynamic for each neuron. We want to abstract from the biological origin and view Neural Networks as purely mathematical models. In these networks computations are performed by feeding the data into the n units of an input layer from which they are passed through a sequence of hidden layers and finally to m units of the output layer. Each continuous decision function can be arbitrarily well approximated by a Neural Network with only one hidden layer (Baum, 1988). Let us denote the number of units in the hidden layer; it is sufficient to consider a network described by:

h(x; [alpha]) = [f.sub.2] ([f.sub.1] (x; [beta]; [gamma]) (2)

where [f.sub.1]:[R.sup.n] [right arrow] [R.sup.r] and [f.sub.2]:[R.sup.r] [right arrow] [R.sup.m] are continuous functions. [alpha] = ([beta], [gamma])' is the vector of adjustable parameters, consisting of [beta] which is the vector of weights of the hidden layer and [gamma] being the weight vector of the output layer.

[FIGURE 1 OMITTED]

Is common practice to represent each unit where a computation is being performed (neuron) by a node, and each connection (synapse) by an edge of a graph. An example of a two layer Neural Network is shown in Fig. 1. For the case of a two-layer perceptron one chooses:

[f.sub.1] (x; [beta]) = ([g.sub.1] ([[beta]'.sub.1])x), ..., [g.sub.1] ([[beta]'.sub.r] x))' and [f.sub.2] (z; [gamma]) = [g.sub.2] {[gamma]' z), where z is the r-dimensional vector of hidden neuron activations, [beta] = ([[beta].sub.1], ... [[beta].sub.r])', and [g.sub.1]:R [right arrow] R and [g.sub.2] = R [right arrow] R are the "transfer" functions of the neurons. This type of Neural Network is called a multilayer perceptron (MLP). Other type is called a radial basis function network (RBF). Usually the [g.sub.1] (x, [beta], [[sigma].sub.j]) is given by Gaussian of the form:

[g.sub.1] (x, [[beta].sub.j], [[sigma].sub.j]) = exp [(-[[parallel] x - [[beta].sub.j][parallel].sup.2])/2 [[sigma].sup.2.sub.j] (3)

Again, we consider the case of binary classification. Similarly to backpropagation the empirical risk becomes:

[R.sub.emp] ([alpha], [x.sub.t] [varies] 1/2 [([g.sub.2] ([r.summation over (j=1)] [[gamma].sub.j] [g.sub.1] ([x.sub.t], [[beta].sub.j], [[sigma].sub.j)) - [y.sub.t]).sup.2] (4)

The main conceptual difference between MLP's and RBF networks is that the former perform a global approximation in input space while the latter implement a local approximation. The hidden neurons of an RBF network specialize to localized regions in data space by fitting a set of Gaussians to the data. In the extreme case, where r = l, i.e. there are as many hidden neurons as data points in the training set, the ERM principle cannot lead to consistent learning because such an RBF networks can be shown to have infinite VC dimension. The local approximation performed by RBF Networks an MLP considers the data space as a whole and is thus able to capture complex dependencies underlying the data. The advantage of preprocessing the data is the reduction of their dimensionality. It is referred to as the curse of dimensionality, i.e. the increase of necessary samples to obtain a small generalization error grows exponentially in the number of dimensions. Another way to incorporate this into the learning process is to the minimize [R.sub.emp] ([alpha]) + k [[parallel][alpha]][parallel].sup.2] where k has to be chosen beforehand. Such a technique is also called regularization and was successfully used in the weight decay learning algorithm.

4. ECONOMIC APPLICATIONS OF NEURAL NETWORKS

With the application of backpropagation to Neural Network learning and the revived interest into Neural Networks, Economists started to adopt this tool as well, since the Neural Networks for classification and regression can easily be adapted to economic problems. The majority of papers that use Neural Networks for classification tasks in Economics can be found in the area of bankruptcy prediction of economic agents, mainly banks. An integration of a Neural Network and an expert system such that courses of action can be recommended to prevent the bankruptcy (Coleman et al., 1991). Probably the largest share of economic applications of Neural Networks can be found in the field of prediction of time series in the capital markets. Usually, linear models of financial time series (exchange rates, stock exchange series) perform poorly and linear univariate models consistently give evidence for a random walk. This has been taken in favor of the efficient market hypothesis where efficiency means that the market fully and correctly reflects all relevant information in determining security prices. Applications of time series prediction in other than financial fields are for macroeconomic variables, for consumers' expenditure, or for agricultural economics. Less common application of Neural Networks in Economics can be found in the modeling of learning processes of bounded rational adaptive artificial agents. These learning techniques are essentially based on the ERM principle. A new Neural Network learning technique that utilizes the SRM principle is so called Support Vector Learning. It has been successfully applied in the field of character recognition, object recognition, and text categorization. We start by developing the learning algorithm for the perceptron under the assumption that the training set can be classified without training error. Then we extend the learning algorithm to the case where the objects are not linearly separable. Using a technique known as the kernel trick we show how the learning algorithm can be extended to the (nonlinear) case of MLP's and RBF Networks. Each symmetric function K: [R.sup.n] x [R.sup.n] [right arrow] R that satisfies the Mercer conditions corresponds to an inner product in some space F. Such functions K (*, *) are called kernels. To extend the Support Vector method to nonlinear decision functions, kernels need to be found that can easily be calculated and at the same time map to an appropriate feature space F.

5. SUPPORT VECTOR NETWORKS FOR

PREFERENCE LEARNING

We show how Neural Networks can be applied to the problem of preference learning. The learned function should be transitive and asymmetric. Theoretical Background The most important problem in solving preference learning problems is the definition of an appropriate loss for each decision f(x; [alpha]) whereas the true ordinal utility is given by y. Since the y's are ordinal, no knowledge is given about the difference y-f(x; [alpha]). The loss given in Equation (1) weights each incorrect assignment f(x; [alpha]) by the same amount and thus is inappropriate as well. This leads to the problem, that no risk can be formulated which shall be minimized by a Neural Network learning algorithm. To derive a Neural Network algorithm we make the assumption, that there is an unknown cardinal utility U(x) an object x provides to the customer. Moreover we assume, that if [x.sup.(1)] is preferred over [x.sup.(2)] then U([x.sup.(2)]) > U([x.sup.(2)]), and vice versa. The advantage of such a model is the fact, that transitivity and asymmetry are fulfilled for each decision function. In terms of Statistical Learning Theory this means, that our hypothesis space is maximally reduced we only want to learn decision functions with these properties. Let us illustrate the above discussion by an Economic Application example. Consider a situation where two goods compete, i.e. x = ([x.sub.1], [x.sub.2]) is a vector that describes a basket of two goods. Assume an agent who has purchased a limited number of combinations. The agent will order these combinations according to his preferences and assign a utility level to these combinations such as to achieve the highest possible utility with the next purchase. To simulate this situation we generated a limited number of combinations and classified them according to an underlying true latent utility function: U(x) = [x.sub.1] [x.sub.2]/2; such as to implement the agent's preference structure. This utility function is ordinal in the sense that any homogenous transformation of this function would not affect the resulting order of combinations. The only given information is the set of ordered objects. Then the process of learning the utility function is simulated with a Support Vector Network that represents metaphorically the learning capacity of the agent.

6. CONCLUSION

We presented three commonly used learning algorithms: perceptron learning, backpropagation learning, and radial basis function learning. We distinguished three types' economic applications of neural networks: Classification of economic agents, time series prediction and the modeling of bounded rational agents. While according to the literature Neural Networks operated well and often better than traditional linear methods when applied to classification tasks, their performance in time series prediction was often reported to be just as good as traditional methods. Finally, choosing Neural Networks as models for bounded rational artificial adaptive agents appears to be a viable strategy, although there existent alternatives. We presented a new learning method, so called Support Vector Learning, which is based on Statistical Learning Theory, shows good generalization and is easily extended to nonlinear decision functions. This algorithm was used to model a situation where a buyer learns his preferences from a limited set of goods and orders them according to an ordinal utility scale.

7. REFERENCES

Baum, E. (1988). On the capabilities of multilayer perceptrons, Journal of Complexity 3, pp 331-342

Coleman, K., Graettinger, T & Lawrence, W. (1991). Neural Networks for Bankruptcy Prediction: The Power to Solve Financial Problems, AI Review July/August, pp 48-50

Pollard, D. (1984). Convergence of Stochastic Processes, New York: Springer-Verlag

Vapnik, V. (1998). Statistical Learning Theory, New York: John Wiley and Sons

Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data, New York: Springer-Verlag