文章基本信息

标题：Configuring artificial neural networks for stock market predictions.
作者：Ruxanda, Gheorghe ; Badea, Laura Maria
期刊名称：Technological and Economic Development of Economy
印刷版ISSN：1392-8619
出版年度：2014
期号：March
语种：English
出版社：Vilnius Gediminas Technical University
关键词：Algorithms;Artificial neural networks;Autoregression (Statistics);Financial markets;Neural networks;Stock exchanges;Stock markets;Stock-exchange;Stocks

Configuring artificial neural networks for stock market predictions.

Ruxanda, Gheorghe ; Badea, Laura Maria

Introduction

Driven by the prospect of high profits and benefits that can be extracted from speculative activities, over the past decades, predicting stock market prices has become an attainable goal for business practitioners due to powerful estimation tools and advanced computing resources. Available theories such as efficient market hypothesis (EMH) and random walk support the idea that it is virtually impossible to forecast stock prices. According to EMH, stock values reflect all available information, any new knowledge quickly being absorbed by the market in an efficient manner. On the other hand, the random walk theory assumes that past values do not impact the current values as no trend exists, all variations being the result of a random process. Nevertheless, with the proper approach and by using advanced forecasting models, the market response can be speculated. For many years, the classical Auto-Regressive Integrated Moving Average (ARIMA) technique has been used to make stock exchange predictions. However, the stock market evolution is difficult to predict using linear approaches. Recent advancements in computational area make non-linear models a viable option for time series estimations, and Artificial Neural Networks (ANNs) are such mathematical representations, very popular these days in many fields, including stock market prediction.

In this study differently configured ANNs are built and compared in terms of forecasting errors when making predictions on Bucharest Stock Market Index, BET. The paper is organized as follows: Section 1 gives an overview on the related work regarding the use of ANNs in stock market predictions. Section 2 presents relevant information about Bucharest Stock Exchange, establishing a context. Also, information about employed data and sampling technique are briefly discussed in this part of the paper. Section 3 details the model building steps and presents the mathematical backgrounds of some optimization algorithms used for network training: gradient descent and Broyden-Fletcher-Goldfarb-Shanno method (henceforth also BFGS). These are further used to make predictions on BET index. This section also presents a powerful algorithm for numerical differentiation, which is a method used to evaluate first order and second order derivatives. Section 4 provides the results of the models built with ANNs and those obtained when testing the networks on the Croatian market data. At the end of the paper the main conclusions are presented regarding the use of Artificial Neural Networks in stock market forecasting applications, and proposals for future developments.

1. ANNs and stock market forecasting--literature review

After the reticence period from the late 60's and from the 70's, when Minsky, Papert (1969) criticised Neural Networks, especially the perceptron model, many improvements and developments have been delivered by researchers to sustain the use of ANNs. As such, nowadays, ANNs have gained success in many areas, from mathematics and informatics to medicine and economics (Iordache, Spircu 2011). Given their flexibility in handling non-linear data, and considering superior results offered when compared with traditional estimation techniques, over the past years, Artificial Neural Networks have been intensely used in forecasting applications. Exchange rate prediction and stock price forecasting are common areas where ANNs have proven the ability to reach good results. Egeli et al. (2003) used ANNs to predict Istanbul Stock Exchange market index and they observed that these methods attain better results compared with Moving Average approach. Coupelon (2007) also showed that Artificial Neural Networks provide good solutions when predicting stock movements. Faria et al. (2009) compared the forecasting power of ANNs with that of the adaptive exponential smoothing method using the principal index of the Brazilian stock market. Isfan et al. (2010) compared ANNs with traditional estimation techniques using forecast results obtained on Portuguese stock market, proving also that Neural Networks are very efficient when dealing with the non-linear character of financial data. Also, Georgescu (2010) used one-value-ahead Neural Networks forecasting methods to make stock exchange predictions on the Romanian market.

More recently, Vahedi (2012) used ANNs to predict stock price in Tehran Stock Exchange using investment income, stock sales income, earnings per share and net assets. His results support the ability of ANNs to perform well in stock exchange forecasting applications by using quasi-Newton training algorithms. Using the observed values between 2003 and 2006 of the Nigerian Stock Exchange (NSE) market index, Idowu et al. (2012) showed that ANNs can generate good predictions, however, without disregarding the configuration process which needs to be performed very carefully and which represents an essential factor in generating meaningful results with these modelling techniques. Khan, Gour (2013) compared the forecasting power of different technical methods with ANNs and in the end reached to the conclusion that back-propagation Neural Networks generate better outcomes.

2. Datasets and sampling methods

The Romanian Stock Market had a tradition of over 70 years before restarting its' activity in November 1995. Starting from 1996, when the mass management/employee buy-out (MEBO) process took place, the number of transactions performed by Bucharest Stock Exchange Market has significantly increased, marking the beginning of a viable and promising trading mechanism. In 1997, the Bucharest Stock Exchange introduced its' first synthetic index, BET, aimed to give a general image of the stock exchange performance. BET is a weighted index of the free-float capitalization of the top ten most liquid companies listed at the Bucharest Stock Exchange. The liquidity ratio is calculated semi-annually and this methodology allows BET index to represent a support asset for financial derivatives and for structured products.

The evolution of the Romanian stock exchange followed an increasing trend after the implementation of BET index, gaining the attention of many investors. In 2004, Bucharest Stock Exchange capitalization reached the level of almost 12 billion USD, which back then represented about 17% of the Romanian GDP. In 2006, the average value of daily transactions surpassed the 10 million EUR threshold, this being also highlighted by the ascending trend of the Bucharest Stock Market index, BET. Nevertheless, the peak was reached in July 2007, when BET index hit a value that was ten times higher compared with September 1997, when the index was first introduced. Thus, mid of 2007 brought a maximum point for BET index, marking also the beginning of the Romanian Stock Exchange decrease. In 2008, the Romanian Stock Market severely felt the shocks induced by the economic crisis, moving towards a new inflexion point reached at the beginning of 2009, representing a new minimum this time. Since then, the general evolution of BET index outlined a rising trend.

Using the data observed between 1st of January 2005 and 31st of March 2013, the aim of this study was to forecast BET index value using lagged prices and also macroeconomic indicators which might prove relevance in explaining the evolution of this indicator. Previous studies (Zoicas, Fat 2005; Ungureanu et al. 2011) have already emphasized some bonds between certain Bucharest Stock Market indexes and macroeconomic indicators like: EUR/RON exchange rate (where RON denotes the Romanian New Leu), unemployment rate, inflation rate and different forms of interbank average interest rates (ROBID and ROBOR by maturity bands (1)). Nevertheless, considering the low frequency of the information provided by the inflation rate and unemployment rate, only the other two indicators (EUR/RON and interest rates), which offer daily observations, were further considered in this study.

Taken from the perspective of an investment, ROBID represents an alternative to stock exchange and foreign currency capital placing. Thus, this leaves only EUR/RON and ROBID macroeconomic indicators for the analysis. In order to prevent the model from considering irrelevant information which unnecessarily overloads the training process of ANNs, a simple regression model between BET index and each of the remaining macroeconomic indicators was computed. Table 1 shows that ROBID_12M provided the highest R-square (determination coefficient) for BET compared with the other ROBID ratios, namely 0.101884. However, this is still much lower compared with EUR/RON indicator which generated an R-squared of 0.556330. Thus, only EUR/RON exchange rate was further kept for predicting BET index values.

Figure 1a provides the evolution of the closing BET index over the selected timeframe compared with EUR/RON exchange rate available on the official website of the National Bank of Romania. Missing values resulted from non-transactional days, such as legal holidays, were replaced by values from previous available days. The negative correlation between the two time series is visible with the naked eye, grounding the process of further searching for connexions between these two ratios. Up until mid of 2007, the blooming Romanian economy was reflected by a rising trend observed in the evolution of BET index and by significant appreciations of the national currency as related to EUR. However, beginning with 2008, the developments of these two ratios have taken the opposite trends, severe depreciations being observed especially starting with October 2008, the point when the worldwide economic crisis has installed.

When building a model with Artificial Neural Networks, data partitioning is a very important step. The initial data was split into three datasets, as follows:

--Training set--80% of the initial dataset, which was used for model development (1st of January 2005--5th of August 2011);

--Validation set--10% of the initial dataset, used for model assessment (8th of August 2011--31st of May 2012);

--Test set--the remaining 10% of the initial dataset, which offers a completely out-of-time reassessment of the model (1st of June 2012--31st of March 2013).

[FIGURE 1 OMITTED]

The partitioning rule is based on the chronological dimension, meaning that the oldest 80% of the values have fallen into the training set, the following 10% of these were included in the validation set, and the most recent 10% were part of the test set.

For additional out-of-sample testing of the final results obtained on the Romanian market data, the models will be also checked on the Croatian Stock Exchange official index, CROBEX. Croatia is a country situated between South-Eastern Europe and Central Europe that has made remarkable progress over the past years, enhancing its adherence to EU. CROBEX was introduced in 1997 and it measures the performance of Zagreb Stock Exchange (ZSE) by including the 25 most liquid companies listed at ZSE. Figure 1b shows that within the time period 1st of January 2005-31st March 2013 there is a high resemblance between BET index and CROBEX index evolutions. Also, the relationship between EUR/HRK exchange rate and Zagreb Stock Exchange indicator highlights a similar evolution to the one observed on the Romanian market between BET stock index and EUR/RON exchange rate, suggesting that this testing data was properly chosen. Nevertheless, the testing will be performed only on the data corresponding to the period included in the test set of the Romanian data set, meaning on the timeframe 1st of June 2012-31st of March 2013.

3. Configuring ANNs for stock exchange predictions

Artificial Neural Networks are modelling techniques which have successfully been used in previous stock exchange forecasting applications. However, as with any other Neural Network model, their performance depends on a number of elements such as: the network type, the training method and other configuration components that will be further approached. Often, some of these elements are selected based on a process of trial-and-error comparison aimed to identify the model with the lowest error, usually on the test set. However, the end decision should always be taken, based on a trade-off between training costs and benefits emerged from using a certain network.

The basic principle of ANNs stands in generating a signal or an outcome based on a weighted sum of inputs which is afterwards passed through an activation function as below:

y = f([summation] wx + b), (1)

where: x is the vector of input variables; w is the vector of weights; b is the bias; f(.) is the activation function, and y is the output vector.

One of the most used types of ANNs within stock exchange applications is the feed-forward multilayer perceptron (MLP). This is organized in three categories of layers (input layer, hidden layers and output layer) and the information flow is performed in a feed-forward manner. In this work, differently configured multilayer feed-forward Neural Networks are developed to make predictions on Bucharest Stock Exchange BET index. These configurations are further detailed in the upcoming sections.

3.1. Input and output variables

This study seeks to predict BET index evolution using lagged values, but also the signals induced by the evolution of EUR/RON exchange rate. Egeli et al. (2003) used the price of the Istanbul Stock Exchange value from one day before and the previous day TL/USD exchange rate, along with other variables to predict the evolution of the stock exchange index. Considering the non-stationary character of the stock exchange data series (Georgescu 2011), the first difference of the log time series were performed on BET index and on EUR/RON exchange rate. Usually, for financial predictions the best outputs are reached when short forecasting periods are considered. Therefore, the time horizon to be predicted was set to one day ahead.

In this paper, input variables were established based on a stepwise forward regression. Thus, for EUR/RON exchange rate two steps backwards were tested and for BET index five previous steps were evaluated in respect with their p-values. Stepwise forward regression is performed by initially estimating a linear regression of the dependent variable against each independent variable. After selecting the independent variable with the lowest p-value, all possible two-variable regressions in which one of the variables is the one resulted as significant after the initial estimation are computed. If more of the two-variable regressions are significant in respect with the p-values, then the model which generates the lowest p-values is selected. Next, both of the added variables are checked against the backwards p-value criterion, and variables with p-value higher than the selected criterion are removed from the model. After that, the next variable is added after choosing the three-variable regression with the lowest p-values. After each new variable entry, they are again all tested against the backwards criterion and removed from the model if they don't meet the p-value backwards criterion. The process stops when the lowest p-value of the variables not yet included in the regression exceeds the forward stopping criterion.

Table 2 provides the results of the stepwise forward regression using the selected stopping criteria: p-value forward greater than 0.1 and p-value backwards exceeding 0.1. The stepwise regression was performed using the training and validation datasets. Results highlight that the following indicators should be used as input variables for predicting BET index at time point t:

--d_ln_BET_(-1)--the modification of BET index from one day before;

--d_ln_BET_(-3)--the modification of BET index from three days before;

--d_ln_EUR_RON_(-2)--the modification of EUR/RON exchange rate from two days before.

3.2. Hidden layers and hidden nodes

Even though there is no rule of thumb when setting the number of hidden layers and hidden nodes, care must be taken when selecting these elements. A high number of hidden nodes might generate an over-fitted model, while a network with a small number of hidden units is at risk of performing poor on new observations. Usually, these elements are selected after performing a series of experiments in which different values are tested and final forecasting errors are compared. Thenmozhi (2006) outlines that most of the studies on stock exchange prediction using ANNs include up to 12 hidden nodes. However, past research has only given some hints on how to set these values, the end decision still depending on the analysed problem and, more precisely, on the available data. For the current experiment one hidden layer was selected, and the number of hidden nodes ranged between a minimum of 2 and a maximum of 6, twice the number of input variables (Jha 2009).

3.3. Activation functions

Activation functions are applied to the weighted sum of inputs of a node in order to generate a certain outcome. Sibi et al. (2013) provide examples of activation functions that can be used when training Neural Networks with the back-propagation method. As the non-linear character of ANNs is given by the form of the activation functions, the most common types especially within the hidden layers are those taking a non-linear form. Among these, sigmoid ("S" shape) functions are often preferred for their continuous character, which makes possible differentiation, an important feature when training with back-propagation, but also for their bounded range ([0,1] or [-1,1]), which makes them easily interpreted. Some of the most common types of sigmoid functions are:

--Logistic function (Verhulst 1845):

f(x) = 1/1 + [e.sup.-x]; (2)

--Hyperbolic tangent function (tanh) (Abbe Sauri 1774):

f(x)= [e.sup.x] - [e.sup.-x]/[e.sup.x] + [e.sup.-x]; (3)

--Elliott function (Elliott 1993):

f(x) = 1/1 + [absolute value of x]. (4)

Based on previous studies and indications (Kaastra, Boyd 1995; Bishop 1995), this paper analyses, in turns, the use of logistic and hyperbolic tangent functions in the hidden layer. In the output layer, linear activation function was selected for all networks built.

3.4. Error function

Every training cycle of an ANN generates a certain cost, measured by using an error function which analyses the differences between the network outputs and the target (desired) outputs. During each cycle, the error corresponding to all training observations is reassessed, further generating new adjustments in the network weights with the purpose of minimizing the selected error function. Often used when training MLP networks are the sum of squared errors, taking the following form:

[[SOS].sub.dataset] = E(w) = 1/2 [m.summation over (d=1)][([t.sup.d] - [o.sup.d]).sup.2], (5)

where: [SOS.sub.dataset] is the error calculated on the analysed dataset; m is the number of observations within the dataset; [t.sup.d] is the target value for observation d; and [o.sup.d] is the output of the network for observation d.

3.5. Training algorithms

The way in which the network weights are adapted for meeting the desired purpose defines the training algorithm and is essentially an optimization problem. When learning with ANNs, the optimization problem becomes the minimization of the error function E(w). For networks having more than one layer of weights, there may be many local minima points for which the gradient of the weights space satisfies the condition [partial derivative]E/[partial derivative]w = 0. Therefore, in search of that global minimum point for which the error function has the lowest value, several adjustments are performed using the formula below:

[w.sub.t+1] = [w.sub.t] + [DELTA][w.sub.t+1], (6)

where t is the number of the training cycle (epoch).

Local optimization can be divided into the following three classes: non-derivative methods, first derivative (gradient) methods, and second derivative methods. Financial applications mostly use first derivative method gradient descent algorithm to make adjustments in the weights during the training cycles. Nevertheless, the rating of this algorithm is many times shaded by the local minima problem and by a slow convergence process. Second derivative optimization methods use the Hessian matrix to determine the search direction. Examples of second derivative methods are discrete Newton, quasi-Newton, and Levenberg-Marquardt. Newton's methods assume that the objective function can be locally approximated as a quadratic around the optimum, and uses the first and second derivatives to find the stationary point. In quasi-Newton methods the Hessian matrix of second derivatives of the function to be minimized does not need to be computed at any stage. The Hessian is updated by analysing successive gradient vectors instead.

Past studies (Ruxanda, Smeureanu 2012; Antucheviciene et al. 2012; Dadelo et al. 2012) indicate that decision making is mostly about finding preferable solution, within an acceptable decision time and with a bearable error level. This is where optimization algorithms play an important role as they offer a solution for the trade-off between decision process time and results. The optimization algorithms presented below will be further analysed in respect with their errors when performing stock market predictions.

3.5.1. Gradient descent

First introduced by Rumelhart et al. (1986), back-propagation algorithm using gradient descent technique is a first derivative method which uses gradient information calculated from the optimization function to determine the search direction on the response surface (Ruxanda 2010). Given a three layer MLP (one input layer, one hidden layer and one output layer), the training process within the back-propagation algorithm is described below.

The network weights are initially set to small random values. Afterwards, the input model is applied and propagated through the network generating outputs:

[h.sub.j] = f([net.sub.j]) = f([summation over k] [w.sub.jk][x.sub.k]), (7)

where: [h.sub.j] is the output of the hidden unit j; [net.sub.j] is the input of the hidden node j; [w.sub.jk] is the weight given to input k for hidden node j; [x.sub.k] is the input node k; and f(.) is the activation function for the hidden layer.

These outputs are further used as entries for the output layer. Weighted and summed up, they are passed through an activation function in order to produce the final output:

[o.sub.i] = g([net.sub.i]) = g([summation over j] [w.sub.ij][h.sub.j]) = g([summation over j] [w.sub.ij] f([summation over k] [w.sub.jk][x.sub.k])), (8)

where: [o.sub.i] is the response of the output unit i; [net.sub.i] is the input of the output node i; [w.sub.ij] is the weight given to the hidden node j for the output node i; and g(*) is the activation function from the output layer.

Considering the form of the error function provided in Equation (5), for p output nodes and m input-output pairs, the error becomes:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (9)

Then, the errors are passed back through the network using the gradient method by calculating the contribution of each hidden node and deriving the adjustments needed to generate a better output. The gradients for the hidden to output layer, and for the input to hidden layer are presented in Equations (12) and (15) respectively:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; (10)

[[delta].sup.d.sub.i] = g'([net.sup.d.sub.i])([t.sup.d.sub.i] - [o.sup.d.sub.i]); (11)

[DELTA][w.sub.ij] = [eta][m.summation over (d=1)][[delta].sup.d.sub.i][h.sup.d.sub.j]; (12)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; (13)

[[delta].sup.d.sub.j] = f'([net.sup.d.sub.j])[p.summation over (i=1)][w.sub.ij][[delta].sup.d.sub.i]; (14)

[DELTA][w.sub.jk] = [eta][m.summation over (d=1)][[delta].sup.d.sub.j][x.sup.d.sub.k], (15)

where [eta] is the learning rate.

The new weights can be adjusted using also the momentum rate, which considers the modifications performed in previous cycles:

[DELTA][w.sub.t+1] = -[eta][partial derivative]E([w.sub.t])/[partial derivative][w.sub.t] + [alpha][DELTA][w.sub.t], (16)

where: [alpha] is the momentum rate; [DELTA][w.sub.t+1] is the weight modification for cycle t + 1; and [DELTA][w.sub.t] is the modification in weights from the previous cycle.

The learning rate controls the size of the step from each iteration, and the momentum rate speeds up the convergence process in flat regions, or reduces the jumps in regions with high fluctuations by considering a fraction of the previous weight change. Although very popular in practice, a downside of the gradient descent algorithm is that the learning process is slow and thus, the convergence is highly dependent on the values chosen for the learning and momentum rates.

3.5.2. Broyden-Fletcher-Goldfarb-Shanno

Introduced in 1970 (independently by Broyden (1970); Fletcher (1970); Goldfarb (1970); Shanno (1970)), Broyden-Fletcher-Goldfarb-Shanno is a quasi-Newton optimization method which provides good convergence. Although second derivative algorithms usually require more computational resources, BFGS algorithm uses only an approximation and not the fully explicit calculation of the Hessian inverse matrix, based on estimations obtained only from first order information.

In case of BFGS algorithm the necessary condition for optimality is the minimization of the error function E(w). The weights adjustments when using BFGS training algorithm are performed in an iterative manner, as follows:

[DELTA][w.sub.t+1] = [w.sub.t+1] - [w.sub.t] = -[eta][H.sup.-1.sub.t] [partial derivative]E([w.sub.t])/[partial derivative][w.sub.t], (17)

where: t indicates the training cycle; and [H.sup.-1.sub.t] is an approximation of the Hessian inverse matrix [[[[partial derivative].sup.2]E([w.sub.t])].sup.-1] at time point t.

Quasi-Newton methods require that the approximation of matrix [H.sup.-1.sub.t+1] satisfies the condition [H.sup.-1.sub.t+1][[gamma].sub.t] = [[delta].sub.t]. The approximation of the Hessian inverse matrix used by BFGS algorithm is provided in the equation below:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (18)

where: [delta] = [w.sub.t+1] - [w.sub.t] and [[gamma].sub.t] = [partial derivative]E([w.sub.t+1])/[partial derivative][w.sub.t+1] - [partial derivative]E([w.sub.t])/[partial derivative][w.sub.t].

The initial value of [H.sup.-1.sub.0] is the identity matrix. The adjusting process is performed until a stopping criterion is met such as verifying the performance of the training process on an additional validation data set which prevents the over-fitting phenomenon from affecting the model's performance on new data.

Although in the specialized literature many other training algorithms and derivations from these are proposed (Cocianu, State 2013), in this paper, gradient descent method and BFGS training algorithm were used to make predictions on BET index values and were compared in terms of estimation errors.

3.6. Stopping conditions

With non-linear optimization algorithms it is important to choose certain stopping rules. Bishop (1995) presents five types of stopping criteria which refer to: performing a number of cycles, a certain time has elapsed, the error function has decreased below a certain value, the relative change in error is below a threshold, or the error calculated on an independent dataset (validation set) has started to increase, meaning that there is a risk of over-fitting the model. In this paper, the training process was set to stop when one of the following events is first reached: 500 cycles, or a variation of the average error for 20 consecutive epochs below 0.0000001. The error function is the sum of squared errors between the network outputs and the target values, computed as in Equation (5). The evolution of the error function on the training set was also compared with the one from the validation dataset in order to make sure that additional decreases in the training error (training SOS) don't bring increases in the validation set error (validation SOS).

3.7. An algorithm for numerical differentiation

The most important aspect related to learning algorithms which use gradient and Hessian matrix information, is based on the numerical evaluation of the first order and second order derivatives. The success of ANNs training process is strictly related to the method used to perform the numerical evaluation of the derivatives. An efficient numerical differentiation algorithm was proposed by Professor Gheorghe RUXANDA in the context of developing a language for analysis and prediction--EMI. The algorithm is based on determining an optimal variation of the argument for which the differentiation of a real function is performed, and allows the estimation of first order and second order derivatives (simple and mixed) with a high precision rate. Below is presented the description of the algorithm written in pseudo-code:

algorithm deriv;
external
    function f(x), x;
const
    cmin=$MinMachineNumber,cmax=$MaxMachineNumber,cprec=
    $MachinePrecision;
    climvs=cmin 10A7, climrs=cmin 10A16, cunit=1.0, cvd=1.005;
    precmax=1.5*10A(-IntegerPart(cprec)), precwrk=0.75*10^
    (-IntegerPart(cprec-10.5));
begin
    rs=cmin, rd=cunit, xabs=abs(x);
    if xabs > climrs then rs=precmax*xabs endif;
    if xabs > cunit then rd=xabs/precwrk endif;
    val=f(x), vs= abs(val);
    if vs < climvs then vs=cunit endif;
    vs=precwrk*vs, vd=cvd*vs, limps=rs, limpd=rd;
    sign=1.0, iter=1, cont=1;
    while (cvd*limps <= limpd) && (iter <= maxiter) && (cont ==1)
       p=sqrt(limps*limpd), delt=f(x+sign*p) -val;
       if abs(delt) <= vs then
               limps=p;
       else
               if abs(delt) >= vd then limpd=p, cont=0 endif;
       endif;
       if (delt == 0.0) && (iter > 2) then cont=0, break endif;
       iter++;
       if (iter > maxiter) && (cont == 1) && (sign == 1) then
                sign=-1, iter=0, limps=rs, limpd=rd ;
       endif;
       if cont == 0 then
               if sign == 1 then
                       vder=(f(x+p) - f(x-p))/(2.0*p);
               else
                       vder=(val-f(x-p)/p, p=sign*p;
               endif;
               vder2=(f(x+2*p) - 2*f(x) + f(x-2*p))/(4.0*p*p);
       endif;
    endwhile;
    return cont, vder, vder2;
end.

The above pseudo-code of the algorithm is applicable for the calculation of first order and second order derivatives of a single-variable function. Nevertheless, the algorithm can easily be adapted for the evaluation of multi-variable functions. Tested on several classes of functions, the proposed algorithm has revealed an average precision of 1.0e-9 for first order derivatives, and an average precision of 1.0e-5 for second order derivatives. The obtained average number of iterations needed to determine the optimal variation of the argument used for differential evaluation equals 12.

4. Results

The model building consisted of generating 100 different networks from combining the following elements:

--The number of hidden nodes, which varied from 2 to 6;

--The types of activation functions from the hidden layer: logistic sigmoid and hyperbolic tangent sigmoid;

--The training algorithms: gradient descent (GD) and BFGS.

For each distinct configured building-block, five networks were trained for results consistency in which the initial weights were different (initial weights were picked using a normal distribution of mean 0 and variance 0.1). For the gradient descent method, the learning rate and the momentum parameters were set to 0.1 each. For each type of training algorithm analysed in this paper, the best five networks in terms of sum of squared error on the test sample (test SOS) were retained. The test sample, acting as a totally independent dataset, gives indications on the model predictive power on new information. Table 3 provides the errors reached by the best five networks for each learning algorithm. Values indicate that BFGS provides more accurate predictions on all three datasets.

The lowest error is achieved by the model MLP 3-4-1 BFGS 7 T which uses BFGS training algorithm, four hidden nodes and hyperbolic tangent function in the hidden layer. This network is obtained after performing seven epochs of weights adjustments and generates a test SOS that is by 57% lower compared with the best network using gradient descent algorithm.

Considering the overall ten best networks available in Table 3 and the activation functions used in the hidden layer, hyperbolic tangent sigmoid function performs better on the Romanian Stock Market data. Regarding another ranged element, the number of hidden nodes, results showed that networks using the maximum selected number of six hidden nodes are nowhere in the best five most performing models in this experiment. Therefore, this proves that there is no need of including too many hidden nodes in the network, as this will result only into increased training time and complexity, and not into improved outcomes.

Evaluating the results obtained from applying these ten best networks on the Croatian data (Table 4), we observe that the activation function that reached the lowest error is logistic sigmoid this time. Nevertheless, the best network using BFGS training algorithm, MLP 3-3-1 BFGS 28 L, has reached an error that is by 69% lower compared with the one generated when applying the best network which uses the gradient descent method, MLP 3-2-1 GD 3 T. This gives us the reason to state that BFGS learning algorithm is a better option when modelling volatile data such as stock market values.

Conclusions

Although Artificial Neural Networks are flexible non-parametric methods that perform well on non-linear data series, their predictive power is conditioned by a set of elements which define the network configuration features. When building a predictive model for stock market prices with ANNs, it is important that the modeller selects the proper values for items like: the number of input and output variables, the number of hidden layers and hidden nodes, the activation functions in the hidden and output layers, the initial weights, the training algorithm, and the stopping criteria. Similar to Coupelon (2007) remarks, we can state that in most of the cases there are no values for these elements to be considered as best choices when forecasting stock exchange prices, the selection process being based on performing several experiments and choosing the network that offers the best results in terms of a performance metric. However, in respect with the training algorithm, this study has given evidence that BFGS outperforms the classical gradient descent method, providing lower errors even in the context of highly volatile data such as the one revealed by the stock exchange market.

Idowu et al. (2012) pointed out that although ANNs do not allow perfect estimations on volatile data such as the stock exchange market, they certainly provide closer results to the real ones compared with other techniques. In the present study, estimation results have indeed revealed small errors on the test datasets. Thus, Artificial Neural Networks can be used in an efficient manner to forecast stock market prices based on past observations. Therefore, we can affirm that EMH theory stating that stock prices cannot be predicted based on past values, can be rejected.

Further steps and research directions regarding the evaluation of ANNs in stock market predictions should consider the followings:

--Analyse how results differ when performing random partitioning for selecting the cases for training and validation datasets;

--Include more predictors to estimate stock exchange market, such as international stock exchange market index or qualitative factors;

--Using other training algorithms such as Levenberg-Marquardt (Zayani et al. 2008) which gives an optimized approach to local minima problem.

Caption: Fig. 1. BET Index vs. EUR/RON (a); CROBEX Index vs. EUR/HRK (b)

doi:10.3846/20294913.2014.889051

References

Abbe Sauri, M. 1774. Cours complet de mathematiques. A Paris, Aux depens de Ruault. 656 p.

Antucheviciene, J.; Zavadskas, E. K.; Zakarevicius, A. 2012. Ranking redevelopment decisions of derelict buildings and analysis of ranking results, Economic Computation and Economic Cybernetics Studies and Research 46(2): 37-62.

Bishop, C. 1995. Neural Networks for pattern recognition. Oxford: Clarendon Press. 482 p.

Broyden, C. 1970. The convergence of a class of double-rank minimization algorithms, Journal of Institute Mathematical Applications 6(1): 76-90. http://dx.doi.org/10.1093/imamat/6.1.76

Cocianu, C.; State, L. 2013. Kernel-based methods for learning non-linear SVM, Economic Computation and Economic Cybernetics Studies and Research 47(1): 41-60.

Coupelon, O. 2007. Neural network modeling for stock movement prediction: a state of the art. Blaise Pascal University. 5 p.

Dadelo, S.; Turskis, Z.; Zavadskas, E. K.; Dadeliene, R. 2012. Multiple criteria assessment of elite security personal on the basis of ARAS and expert methods, Economic Computation and Economic Cybernetics Studies and Research 46(4): 65-88.

Egeli, B.; Ozturan, M.; Badur, B. 2003. Stock market prediction using Artificial Neural Networks, in Proceedings of the 3rd Hawaii International Conference on Business, 26-28 July, 2012, Honolulu, Hawaii, USA. 8 p.

Elliott, D. L. 1993. A better activation function for Artificial Neural Networks. Institute for Systems Research, University of Maryland.

Faria, E. L.; Albuquerque, M. P.; Gonzalez, J. L.; Cavalcante, J. T. P.; Albuquerque Marcio, P. 2009. Predicting the Brazilian stock market through Neural Networks and adaptive exponential smoothing methods, Expert Systems with Applications 36(10): 12506-12509. http://dx.doi.org/10.1016/j.eswa.2009.04.032

Fletcher, R. 1970. A new approach to variable metric algorithms, Computer Journal 13(3): 317-322. http://dx.doi.org/10.1093/comjnl/13.3.317

Georgescu, V. 2011. An econometric insight into predicting Bucharest stock exchange mean, return and volatility--return processes, Economic Computation and Economic Cybernetics Studies and Research 45(3): 25-42.

Georgescu, V. 2010. Robustly forecasting the Bucharest stock exchange BET index through a novel computational intelligence approach, Economic Computation and Economic Cybernetics Studies and Research 44(3): 23-42.

Goldfarb, D. 1970. A family of variable-metric methods derived by variational means, Mathematical Computations 24: 23-26. http://dx.doi.org/10.1090/S0025-5718-1970-0258249-6

Idowu, P. A.; Osakwe, C.; Kayode, A. A.; Adagunodo, E. R. 2012. Prediction ofstock market in Nigeria using artificial neural network, International Journal of Intelligent Systems and Applications 4(11): 68-74. http://dx.doi.org/10.5815/ijisa.2012.11.08

Iordache, A. M.; Spircu, L. 2011. Using Neural Networks in ratings, Economic Computation and Economic Cybernetics Studies and Research 45(3): 101-112.

Isfan, M.; Menezes, R.; Mendes, D. A. 2010. Forecasting the Portuguese stock market time series by using Artificial Neural Networks, Journal of Physics: Conference Series 221(1): 13 p.

Jha, G. K. 2009. Artificial Neural Networks. Indian Agricultural Research Institute, PUSA, New Delhi. 8 p.

Kaastra, I.; Boyd, M. S. 1995. Forecasting futures trading-volume using Neural Networks, The Journal of Futures Markets 15(8): 953-970. http://dx.doi.org/10.1002/fut.3990150806

Khan, A. U.; Gour, B. 2013. Stock Market trends prediction using neural network based hybrid model, International Journal of Computer Science Engineering and Information Technology Research 3(1): 11-18.

Minsky, M.; Papert, S. 1969. Perceptrons. Cambridge: MIT Press. 81 p.

Rumelhart, D.; Hinton, G.; Williams, R. 1986. Learning representations by backpropagation errors, Nature 323: 533-536. http://dx.doi.org/10.1038/323533a0

Ruxanda, G.; Smeureanu, I. 2012. Unsupervised learning with expected maximization algorithm, Economic Computation and Economic Cybernetics Studies and Research 46(1): 28 p.

Ruxanda, G. 2010. Learning perceptron neural network with backpropagation algorithm, Economic Computation and Economic Cybernetics Studies and Research 44(4): 37-54.

Shanno, D. 1970. Conditioning of quasi-Newton methods for function minimization, Mathematical Computations 24: 647-656. http://dx.doi.org/10.1090/S0025-5718-1970-0274030-6

Sibi, P.; Allwyn Jones, S.; Siddarth, P. 2013. Analysis of different activation functions using Backpropagation Neural Networks, Journal of Theoretical and Applied Information Technology 17(3): 1264-1268.

Thenmozhi, M. 2006. Forecasting stock index returns using Neural Networks, Delhi Business Review 7(2): 59-69.

Ungureanu, E.; Burcea, F.-C.; Pirvu, D. 2011. The analysis of interest rate and exchange rate influence's on stock market. Medium run evidence from Romania, Annals Economic Science Series 17: 163-170.

Vahedi, A. 2012. The predicting stock price using artificial neural network, Journal of Basic and Applied Scientific Research 2(3): 2325-2328.

Verhulst, P. F. 1845. Recherches mathematiques sur la loi d'accroissement de la population, Nouveau Memoires de lAcademie Royale des Sciences et Belles-Lettres de Bruxelles 18: 41 p.

Zayani, R.; Bouallegue, R.; Roviras, D. 2008. Levenberg-Marquardt learning neural network for adaptive predistortion for time-varying HPA with memory, in OFDM Systems: 16th European Signal Processing Conference (EUSIPCO 2008), 25-29 August, 2008, Lausanne, Switzerland.

Zoicas, I. A.; Fat, M. C. 2005. The analysis of the relation between the evolution of the Bet Index and the main macroeconomic variables in Romania (1997-2008), Annals of the University of Oradea: Economic Science 3(1): 632-637.

Received 28 May 2013; accepted 24 November 2013

Gheorghe RUXANDA, Laura Maria BADEA

Bucharest University of Economic Studies, 15-17 Calea Dorobanti, District 1, Bucharest, Romania

Corresponding author Laura Maria Badea

E-mail: laura.maria.badea@gmail.com

(1) ROBID is the interbank average interest rate for deposits, and ROBOR is the interbank average interest rate for loans granted. Each one is available for 8 maturities: overnight (ROBID_ON), tomorrow-next (ROBID_TN), one week (ROBID_1W), one month (ROBID_1M), three months (ROBID_3M), six months (ROBID_6M), nine months (ROBID_9M), and twelve months (ROBID_12M).

Gheorghe RUXANDA. PhD in Economic Cybernetics, Editor-in-chief of ISI Thompson Reuters Journal "Economic Computation and Economic Cybernetics Studies and Research" and Director of Doctoral School of Economic Cybernetics and Statistics. Is a Full Professor and PhD Adviser within the Department of Economic Informatics and Cybernetics, The Bucharest Academy of Economic Studies. He graduated from the Faculty of Economic Cybernetics, Statistics and Informatics, Academy of Economic Studies, Bucharest (1975) where he also earned his Doctor's Degree (1994). Had numerous research visits in USA, England and France. He is a Full Professor of Multidimensional Data Analysis (Doctoral School), Data Mining and Multidimensional Data Analysis (Master Studies), Modeling and Neural Calculation (Master Studies), Econometrics and Data Analysis (Undergraduate Studies). Scientific research activity: over 35 years of scientific research in both theory and practice of quantitative economy and in coordinating research projects; 50 scientific papers presented at national and international scientific sessions and symposia; 65 scientific research projects with national and international financing; 79 scientific papers published in prestigious national and international journals in the field of economic cybernetics, econometrics, multidimensional data analysis, microeconomics, scientific informatics, out of which eleven papers being published in ISI--Thompson Reuters journals; 18 manuals and university courses in the field of econometrics, multidimensional data analysis, microeconomics, scientific informatics; 31 studies of national public interest developed within the scientific research projects. Fields of scientific competence: evaluation, measurement, quantification, analysis and prediction in the economic field; econometrics and statistical-mathematical modelling in the economic-financial field; multidimensional statistics and multidimensional data analysis; pattern recognition, learning machines and Neural Networks; risk analysis and uncertainty in economics; development of software instruments for economic-mathematical modelling.

Laura Maria BADEA is a PhD candidate in Economic Cybernetics at the Bucharest Academy of Economic Studies, has an MA in Corporate Finance (2010) and graduated the Faculty of Finance, Insurance, Banking and Stock Exchange from Bucharest University of Economic Studies (2008). Scientific research activity: 2 published articles in ISI Thompson Reuters Journals. Fields of scientific interest: machine learning and other modelling techniques used for classification matters in economic and financial domains, with a focus on Artificial Neural Networks.

Table 1. R-squared results for simple regressions of BET index
against ROBID and EUR/RON

            ROBID_ON   ROBID_TN   ROBID_1W   ROBID_1M

R-squared   0.028535   0.033433   0.050462   0.056978

            ROBID_3M   ROBID_6M   ROBID_9M   ROBID_12M

R-squared   0.077268   0.092286   0.101399   0.101884

            EUR/RON

R-squared   0.556330

Table 2. Input variables for BET index

Dependent Variable: D_LN_BET_T
Method: Stepwise Regression
Included observations: 1924 after adjustments
Number of always included regressors: 3
Selection method: Stepwise forwards
Stopping criterion: p-value forwards/backwards = 0.1/0.1

Variable           Coefficient   Std. Error   t-Statistic   Prob.

D_LN_BET_T_1        0.077819      0.022683     3.430684     0.0006
D_LN_BET_T_3        -0.038682     0.022700     -1.704051    0.0885
D_LN_EUR_RON_T_2    -0.274951     0.094916     -2.896787    0.0038

Table 3. Neural Networks' results on Romanian data

                      Training    Validation
Network name             SOS         SOS       Test SOS

MLP 3-4-1 BFGS 7 T    3.225E-01   2.345E-02    8.120E-03
MLP 3-5-1 BFGS 5 T    3.226E-01   2.342E-02    8.130E-03
MLP 3-3-1 BFGS 28 L   3.191E-01   2.319E-02    8.140E-03
MLP 3-2-1 BFGS 5 L    3.225E-01   2.345E-02    8.140E-03
MLP 3-3-1 BFGS 7 T    3.224E-01   2.348E-02    8.150E-03
MLP 3-2-1 GD 3 T      3.932E-01   3.142E-02    1.908E-02
MLP 3-3-1 GD 3 T      3.933E-01   3.144E-02    1.910E-02
MLP 3-2-1 GD 4 T      3.938E-01   3.150E-02    1.916E-02
MLP 3-5-1 GD 7 T      3.938E-01   3.149E-02    1.917E-02
MLP 3-2-1 GD 5 T      3.946E-01   3.159E-02    1.926E-02

                        Training      Hidden layer
                      algorithm and    activation
Network name          No. of cycles     function

MLP 3-4-1 BFGS 7 T       BFGS 7           Tanh
MLP 3-5-1 BFGS 5 T       BFGS 5           Tanh
MLP 3-3-1 BFGS 28 L      BFGS 28        Logistic
MLP 3-2-1 BFGS 5 L       BFGS 5         Logistic
MLP 3-3-1 BFGS 7 T       BFGS 7           Tanh
MLP 3-2-1 GD 3 T          GD 3            Tanh
MLP 3-3-1 GD 3 T          GD 3            Tanh
MLP 3-2-1 GD 4 T          GD 4            Tanh
MLP 3-5-1 GD 7 T          GD 7            Tanh
MLP 3-2-1 GD 5 T          GD 5            Tanh

In case of all networks, linear activation function was used in the
output layer

Table 4. Results on Croatian data

Network name          SOS for Croatian data

MLP 3-3-1 BFGS 28 L         4.572E-03
MLP 3-5-1 BFGS 5 T          4.573E-03
MLP 3-2-1 BFGS 5 L          4.580E-03
MLP 3-3-1 BFGS 7 T          4.587E-03
MLP 3-4-1 BFGS 7 T          4.605E-03
MLP 3-2-1 GD 3 T            1.473E-02
MLP 3-3-1 GD 3 T            1.476E-02
MLP 3-2-1 GD 4 T            1.481E-02
MLP 3-2-1 GD 5 T            1.482E-02
MLP 3-5-1 GD 7 T            1.490E-02