文章基本信息

标题：Sustainable economy inspired large-scale feed-forward portfolio construction.
作者：Raudys, Sarunas ; Raudys, Aistis ; Pabarskaite, Zidrina 等
期刊名称：Technological and Economic Development of Economy
印刷版ISSN：1392-8619
出版年度：2014
期号：March
语种：English
出版社：Vilnius Gediminas Technical University
摘要：Sustainable economy. In general sense, sustainability is the competence to support, maintain or endure; sustainability can help a community ensure that its social, economic and environmental systems are well integrated and will endure. One of the main sustainability dogmas requires the enhancement of local economic vitality. One must consider this in every step of the process, regardless of whether the process originates in manufacturing or teaching process.
关键词：Algorithms;Costs (Law);Decision making;Decision-making;Feedforward control systems;Investment analysis;Legal fees;Mathematical optimization;Multivariate analysis;Optimization theory;Portfolio management;Securities analysis;Securities trading;Sustainable development

Sustainable economy inspired large-scale feed-forward portfolio construction.

Raudys, Sarunas ; Raudys, Aistis ; Pabarskaite, Zidrina 等

Introduction

Sustainable economy. In general sense, sustainability is the competence to support, maintain or endure; sustainability can help a community ensure that its social, economic and environmental systems are well integrated and will endure. One of the main sustainability dogmas requires the enhancement of local economic vitality. One must consider this in every step of the process, regardless of whether the process originates in manufacturing or teaching process.

Large-scale processes are often difficult to optimize and raise sustainability issues. This has been noted in several research studies, including algae-based bio fuel production (Board on ... 2012). Therefore, the natural way to combat large-scale optimization problems is to break the problem down into smaller ones and address them locally, later joining the results into one large global solution. Small-scale solutions are easier to make sustainable, as illustrated by the fishery industry (Cochrane et al. 2011). We follow this principle by creating and analysing a multilevel feed-forward automated trading system portfolio.

Portfolio construction. Financial problems play one of leading roles in the evolution of modern society. The diversity of assets or financial trading participants is of great importance in this respect. Diversification is also highly important for the sustainable development of society (Jeucken 2001; McCormick 2012). Portfolio construction is a field in which a large quantity of data is available and it is easy to perform computer simulations. In this paper, we use sustainable economy principles in large-scale portfolio construction problems. Here, we may have thousands of potential portfolio members. At the same time, the task of portfolio management (PM) is one of the principal research topics in financial markets. Here we have clearly expressed performance criteria. To create the portfolio, using the formula:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1)

where we form a multidimensional weight vector [w.sub.r] = ([w.sub.r1], [w.sub.r2], [w.sub.rN]) that determines "optimal" investment proportions for N trading robots. In the analysis in this paper, we suppose that coefficients [w.sub.rj] fulfil certain requirements:

[w.sub.rj] = [greater than or equal to] 0, [N.summation over (j=1)] [w.sub.rj] = 1. (2)

To find the weights, one uses L days' investment profit, i.e. so many days of investment profit and history, [x.sub.ji] (i = 1, 2, ..., L; j = 1, 2, ..., N).

Automated trading systems. During the last decade, automated trading systems (ATS) and especially high frequency trading (Chan 2008; Aldridge 2010) have become very popular. They potentially promise good returns with relatively small risk (Bookstaber 2009) as positions are held for a short time (from a few seconds to a few hours or a day) and thus losses are also small; this approach minimizes the risk of large losses. A profit is possible due to the very short term market inefficiencies. Usually, the number of ATS is larger than the number of available assets as it is possible to create a number of different types of ATS trading one asset. However, having a high number of ATS, we face large-scale portfolio construction problems.

Large-scale portfolio problems. Large-scale portfolio construction becomes a significant research problem when the number of trading strategies, N, is larger than the number of data points, L. In such cases, small sample size problems arise in portfolio optimization. We have a huge number of inputs and we know little about returns of these time series and correlations between them.

Portfolio of assets vs. trading systems. Historically, the majority of efforts have been aimed at creating portfolios for a set of assets (stocks, bonds, etc.) where we have profit or loss every day. In contrast, we use ATS instead of the assets where we may have profit or loss or zero (zero if ATS is not trading). To date, there have been very few attempts to relate analyses of "economic crises" with portfolio management problems using artificial and swarm intelligence (Cura 2009) approaches.

Portfolio management approaches comprise two different lines of attack: the first is based on the assets and the second on the successes of automated trading systems. ATS trade infrequently and therefore the data matrix is sparse. The data consist of circa 70% of zeros. Distribution of the data becomes bimodal.

Specificity of this paper. In this research, we aim to consider large-scale sustainable economy problems inspired by ATS portfolio construction. Here, a variety of diverse human factors is incorporated in the trading software and plays a prominent role. In our analysis, portfolio creation is not based on assets. This aspect distinguishes it from previous research. The method of portfolio construction implemented uses information on the successes and losses of the ATS that trade the assets. In addition, the underlying trading systems are short term/high frequency, so aggregated returns are not correlated to any great extent to variations in the underlying instruments.

An additional idiosyncrasy of our analysis is the extremely large number (11,730) of potentially useful trading robots and relatively small (2,398) sample size (number of days) used to design portfolio weights in situations where data structure is changing constantly. The literature on this topic is very sparse. Research studies have tended to focus on the trading systems or portfolio construction methods or multi-agent systems (Smeureanu et al. 2012) separately. Some have optimized trading system portfolios (Moodyand, Lizhong 1997), (Dempster, Jones 2001) but very few have addressed large-scale trading system portfolio optimization problems (Perold 1984). Multi-agent systems are rarely used in trading, although occasional examples can be found (Araujo, de Castro 2011). In the past, only the authors of this paper have looked into large-scale feed-forward ATS portfolio construction (Raudys, S., Raudys, A. 2011, 2012) and multivariate statistical analysis of multidimensional investment tasks (Raudys 2013) as a whole.

The paper is organized as follows. In Section 1 we describe the data. In Sections 2 and 3 we itemize the portfolio management tasks and describe the novel decision-making methodology. In the last two sections, we present the discussion and conclusions.

1. Data description

Certain economic simulations are difficult to perform due to the lack of data, poor data quality or insufficient data. Thus, in some cases it is useful to choose some related problem that has sufficient data. This is especially acute for large-scale sustainability problems where the number of factors is huge. Sustainable environment analysis is dispersed in a large number of diverse papers. Today it is impossible to collect large-scale data which has huge amount of factors affecting sustainability with lengthy histories. In analysis of ATS, however, we can analyse thousands of them and generate lengthy data histories. ATSs are also mutually dependent.

1.1. Data source description

Systematic trading firm that trades futures in the global markets provided us with 3 sets of data: p = 3,133, p = 7,708 and p = 11,730. We named them A, B and C accordingly. Here, time series data X = ([x.sub.1], [x.sub.2], [x.sub.3], ... [x.sub.p]). Each time series is created by simulating automated trading systems and by recording daily profit and loss. Time series are very sparse and consist of circa 70 percent of zeros. It means that ATS refuse to trade 7 out of 10 times on average. This is quite a typical behaviour as for example, trend following ATS will not trade until it detects a trend. One typical example is presented in Figure 1. We have daily profit and loss of the system in the top graph and the assets that is being traded in the bottom graph where ATS has occasional zero profit/non-trading periods.

[FIGURE 1 OMITTED]

1.2. Summary of datasets

A--Variety of ATS systems including trend following, momentum, mean reversion and seasonality systems. The set was composed over a long period adding the best performing strategies.

B--One specific type of mean reversion system (MRS) was optimised on approximately 30 most liquid futures and from each instrument we selected a set of the best solutions. The MRS typically buys if market is falling and sells if market is rallying in anticipation of a reversal.

C--This dataset is similar to B, but RMS logic is slightly different. In selection procedure we selected bigger set of systems, hence dataset is bigger.

Table 1 below demonstrates characteristics of the data.

1.3. Individual dataset description

The firm is trading most liquid US and European futures in global exchanges (CME, CBOT, NYBOT, ICE, COMEX, EUREX): stock indexes, energies, metals, commodities, interest rate products, foreign exchange products (E-mini S&P 500, E-mini S&P MidCap 400, E-mini Russell 2000, E-mini, ASDAQ-100, E-mini DOW ($5), Canadian Dollar, Swiss Franc, Japanese Yen, Australian Dollar, Euro FX, British Pound, Sugar No. 11, Coffee, Soybeans, Gold, Silver, Copper, DAX, EURO STOXX 50, Natural Gas, Crude Oil, 2 Year U.S. Treasury Notes, 5 Yr U.S. Treasury Notes, 10 Yr U.S. Treasury Notes, 30 Yr U.S. Treasury Bonds and others). The set ranges from high frequency and short term automated trading systems running on minute data to systems having trade duration of 5 days. Majority of the systems exit their position within 24 h. All ATS have realistic transaction costs and slippage included. Firm's objective was to create a portfolio that will have robust Sharpe ratio on out of sample data.

Some ATS are very similar and trade almost identically. Correlation coefficients of A (p = 3,133) dataset grouped by similar strategies can be viewed in Figure 2. We can see groups of similar strategies (yellow/light squares), carroty areas represent uncorrelated systems and red/darker squares represent negatively correlated systems (very few).

[FIGURE 2 OMITTED]

1.4. Typical strategy logic description

Some ATS (or trading strategies or simply strategies) are very different from each other and profit from completely different market inefficiencies. Some strategies work on a high-frequency market data and some work on a daily or even weekly data. At any point, strategy can be long, short or flat, so profits can be generated in rising and falling markets. Some strategies never hold position during the night, some do. Strategies use only technical analysis indicators and pay no attention to the fundamental data.

Trend following systems tend to follow the market direction. If market is rising they buy and if market is falling--sell. Momentum strategies behave summarily but use momentum as a market direction indicator.

Mean reversion systems (also known as contra-trend strategies) tend to take an opposite direction than trending ones. The rationale of this behaviour is that market is moving in cycle/waves. If the market rises significantly and systems see trend developing it will take opposite direction with the anticipation of a correction.

Interestingly, both types of systems can be profitable on the same financial instrument. The difference is the time frame. I.e. market may be trending upwards but on the way it waves a bit where mean reversion strategies can make a profit.

Seasonality type of strategies uses rationale that market repeats itself on specific time frame. Seasonality may not necessarily be yearly, it can be on a monthly, weekly or daily basis. If strategy spots such behaviour, next time it will take the position with anticipation of the same market movement.

1.5. Trading strategy optimization

After creation of ATS, it can be calibrated/optimised to the specific time frame, market or market conditions. The process involves changing strategy parameters (i.e. moving average periods, take profit or stop loss levels, etc.) and calculating simulated profit, Sharpe ratio or other performance measures. Systems with the best results are selected to the next level--inclusion into the portfolio. During ATS optimization procedure a lot of similar solutions (with small differences in parameters and trading patterns) can be produced. Optimization procedure typically is performed using brute force or genetic optimizations methods. This procedure produces a lot of similar systems and typically, a number of systems exceeds the number of data points. Therefore, a robust sustainable portfolio creation procedure is required to select only optimal set of ATS for the portfolio.

2. Problem description

In this section, we will familiarize with portfolio management problem in more detail. In particular, we will pay more attention in sustainable portfolio construction, mean variance portfolio optimization and proposed feed-forward portfolio construction.

2.1. Mean variance portfolio optimization

In the mean-variance framework (Markowitz 1952) one maximizes the sample mean (mean) and standard deviation (std) ratio (a modified Sharpe ratio with no risk free rate) for a selected number (say k values) of apriori selected return (profit or loss) values:

Sh = mean([x.sub.Pi])/std([x.sub.Pi]), (3)

where: [x.sub.i] = ([x.sub.r1i], [x.sub.r2i], ..., [x.sub.rNi]), [sup."T"] denotes transpose operation, (s = 1, 2, ..., k), mean([x.sub.Pi]) = [[bar.x].sub.r] [w.sup.T.sub.r], std([x.sub.Pi]) = [w.sub.r][S.sub.r][w.sup.T.sub.r], [[bar.x].sub.r] is N-dimensional estimate of mean vector, [mu], of returns and [S.sub.r] = [1/T - 1] [T.summation over (i=1)][([x.sub.ri] - [[bar.x].sub.r]).sup.T] ([x.sub.ri] - [[bar.x].sub.r]) is an estimate of N x N dimensional covariance matrix [summation].

In practice instead of Eq. (3), a scaled ratio Sh = mean ([x.sub.Pi])/std([x.sub.Pi]) x 15.8745 is used (15.8745=252/sqrt(252)--is annualised Sharpe ratio modifier). In Equation (3) mean([x.sub.Pi]) = [[bar.x].sub.r][w.sup.T.sub.r], std([x.sub.Pi]) = [w.sub.r][S.sub.r][w.sup.T.sub.r], [[bar.x].sub.r] is N-dimensional estimate of mean vector, [mu], of returns and [S.sub.r] = [1/T - 1] [T.summation over (i=1) [([x.sub.ri] - [[bar.x].sub.r]).sup.T]([x.sub.ri] [[bar.x].sub.r]) is an estimate of N x N dimensional covariance matrix [summation].

To speed up calculations, instead of exact solution, minimization of standard deviations by using the Lagrange multiplier, we used approximate analytic solution. In our research, we minimize cost function, where constraints (2) are incorporated inside:

Cost([q.sub.t]) = [w.sub.t]S[w.sup.T.sub.t] + [[lambda].sub.q] [([w.sub.t][[bar.X].sup.T] - [q.sub.t]).sup.2] + [[lambda].sub.1][(1[w.sub.t.sup.T] - 1).sup.2].

In the above equation, 1 = [1 1 ... 1] stands for a row vector composed of N "ones". Scalars [[lambda].sub.q] and [[lambda].sub.1] control the constraints violations. Values of [[lambda].sub.q], [[lambda].sub.1] should be sufficiently large to ensure terms, [([w.sub.t][[bar.X].sup.T] - [q.sub.t]).sup.2] and [(1[w.sub.t.sup.T] - 1).sup.2], converge to 0. We presumed [[lambda].sub.q] = [[lambda].sub.q] = [lambda] = [10.sup.8].

Then optimal weights can be expressed in a straightforward way:

[w.sub.t] = [([q.sub.t] [bar.X] + 1)(S / [lambda] + [[bar.X].sup.T] [bar.X] + [1.sup.T]1).sup.-1], (4)

where: [lambda] plays role of optimization accuracy constant; and [q.sub.t] = [w.sub.1][[bar.X].sup.T] is one of F return values used to calculate an efficient frontier (Markowitz 1952).

Note, the optimization accuracy constant [lambda] controls weights magnitude and stands as additional regularisation constant (Brodie et al. 2009; Raudys, S., Raudys, A. 2011; Zafeiriou et al. 2012; Stuhlsatz et al. 2012). After calculating vector [w.sub.s], to satisfy the constraints (2) we brought negative weights to naught, if [w.sub.j] [less than or equal to] 0 and normalize [w.sub.s] to meet constraint (2). Analytical solution was roughly 30 times faster as traditional Lagrange multiplier based optimization procedure realized in Matlab frontcon code. Simple analytical expression and high calculation speed is very important when we have to generate a large number of agents differing in subsets of trading robots and training parameters, and training sequence.

2.2. Benefits of portfolio management model for sustainability analysis

The mean/standard deviation ratio used in portfolio construction has two important advantageous features:

1. Provided the number of inputs (assets, or trading robots) is large, distribution of the weighted sum (2) can become close to Gaussian (central limit theorem).

2. In case of Gaussian returns, maximization of ratio (3) means that mean value of returns is maximized provided probability Prob ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) is fixed apriori, where [P.sub.max] is a freely chosen risk level. Positive feature of the mean and standard deviation criterion is that result does not depend on the choice of [P.sub.max].

Another encouraging feature also lies in a fact that, in many practical tasks, good Gaussian fit can be obtained when number of inputs, N, exceeds one hundred even in the case of correlated and obviously non-Gaussian univariate distributions of [x.sub.a2i]. In this way, the mean-variance framework does not require normality of distribution for each single input. One needs that distribution of the weighted output, [x.sub.Pi] = [x.sub.ri] [x.sup.T.sub.r], would be close to Gaussian.

The presence of extremely large number of trading robots is a characteristic peculiarity of automatic asset trading. If a sample size, L, is small and the number of inputs N, is large, instabilities and diminution in the portfolio performance arise. The instabilities can be reduced if regularizing constraints or penalty terms are incorporated in the optimization procedure (Kan, Zhou 2007; DeMiguel et al. 2007). Use of assumptions about a block diagonal structure of matrix [summation], can dramatically reduce the number of parameters to be estimated. In pattern classification, this model has been used for four decades (see e.g. review (Raudys, Young 2004)), now it is successfully applied for portfolio management (Hung et al. 2000; Raudys, Zliobaite 2005). Authors in DeMiguel et al. (2007), Kan, Zhou (2007) divided the assets into three groups. As a result they used specific "block type" assumptions and obtained a noticeable gain. Complexity reduction is also obtained while using a tree type dependence between variables [x.sub.r1], [x.sub.r2], ..., [x.sub.rN] (Bai et al. 2009; Raudys, Saudargiene 2001; Raudys 2001).

3. Multilayer feed-forward system for large-scale portfolio design

Feed-forward large-scale portfolio construction system is composed of several parts. The steps are as follows:

--In order to obtain diversifications and reduce a number of inputs we form trading groups that calculate averages of a large number of similar trading robots. For this purpose, for each single time intervals (say 100 or 400 days time interval) we cluster ATSs' by the correlation of their simulated trading track record--profit and loss series. We form a number of clusters in such a way that the correlations inside the clusters are high, however, correlations between the clusters are much smaller.

--Then we can join trading systems inside each single cluster in order to make a trading agent. For this purpose, the 1/N portfolio is close to optimal (Raudys 2013).

--After reduction of the number of factors (inputs) we can use mean-variance principle based portfolio design, i.e. perform classical Markowitz rule between the outputs of the first order agents.

--To increase the diversity we: 1) use distinct regularization values; 2) carry out a several clustering procedures based of time series of different length. In such a way we obtain a great number of trading modules (see schema in Fig. 3).

--To select the best modules for final portfolio weights calculation we use specially developed cost sensitive multilayer perceptron.

--We use selected agents to produce future portfolio as an average of modules' weights recognized as belonging to "the elite (the fourth) class".

In the next section we will explain the steps in more detail.

3.1. History length selection

The data used for analysis cover prior-to-crisis and crisis time period, characterized by multitude of sudden changes. In preceding analysis (Raudys, S., Raudys, A. 2011) it was found that 300-600 days of data history produce the highest Sharpe ratio. Calculations show that a small (below 100) number of inputs should be used in such situations (Raudys 2013). Consequently, we have to use either non-trainable 1/N Portfolio rule, or reduce the number of inputs severely and use the mean-variance approach. Both ways have their positive and negative aspects. Thus, we need to find a compromise solution.

At first, we considered conditions where 1/N portfolio can be optimal and found that if

--the sum of the automated trading robot outputs are normally distributed;

--outputs of [N.sub.B] robots are equally correlated, i.e. [[rho].sub.ij], = [rho];

--the mean returns of the robots are equal,

then the benchmark 1/[N.sub.B] portfolio rule is the optimal solution. This conclusion follows from a simple matrix algebra analysis of covariance matrix of the form

K = I x (1 - [rho]) + [1.sup.T] 1 x p,

when matrix K is inserted instead of S into Equation (4).

Modern world is affected by the fast technological, political and economical changes, especially in prior-to-crisis and crisis period. Thus, financial situations very often are varying and unpredictable. Therefore, the lengthy time series are often unsuitable for precise portfolio calculation. One needs to employ shorter training histories (Raudys, Zliobaite 2005; Raudys, Mitasiunas 2007; Raudys 2013). In our analysis of diverse trading strategies we face data where in 3/4 of days the trading robots refuse to operate. Therefore, actual (effective) lengths of time series are much shorter and training sample size becomes a very important issue.

3.2. Clustering of the trading robots

One of the principal ideas proposed in this paper is that having a large number of ATS (trading robots) allows us to group/cluster most similar ATS together and create almost uncorrelated groups. We group PnL series into clusters by their correlation using k-means algorithm and perform 1/N portfolios in each of them. The k-means algorithm is known and popular algorithm for constructing groups of similar items described in many books and realized in many data mining software. In our portfolio design schema, we perform cluster analysis of N x L data according to correlations between the N time series of length L. Here we use absolute values [absolute value of [1-corelation.sub.ij]] as the similarity measure between i-th and j-th time series. Assuming that the robots inside single group are similar we can use mean values of their outputs in such a way realizing 1/N B portfolio rules serving as R first order (expert) agents. Having a small number of expert agents we can use mean-variance approach to design more complex agent (module) where R weights would be calculated according to Equation (4). To improve small sample properties of the covariance matrix we use regularized covariance matrix:

[S.sub.reg1] = S x (1 - [tau]) + D x [tau], (5)

where: D is a diagonal N x N matrix found from matrix S; and [tau] is regularization parameter (0 [less than or equal to] [tau] [less than or equal to] 1).

[FIGURE 3 OMITTED]

Two levels decision-making schema or R non-trainable (expert) and one trainable agent we call feed-forward trading module (FFM), see Figure 3.

3.3. Final feed-forward system design

A novelty of the present paper is an introduction of a vast number of feed-forward modules depicted in Figure 3. The modules differ in learning set sizes, L, used to perform clustering of the data, regularization parameter, t, and randomly selection of subsets of expert agents. In our experiments reported below, in each of a walk-forward step we considered four learning set sizes [L.sub.1] = 100, [L.sub.2] = 200, 300 and 400 days prior to 100 days validation period and clustered ATS into R = 25 groups. After averaging, we formed 100 diverse expert agents. To increase a diversity, from 100 agents we formed 80 semi-randomly selected groups of [R.sub.L] = 60 expert agents to be joined into 120 types of higher order agents. Each of the latter agents were made by the mean-variance approach using one of four a priori selected values of covariance matrix regularization (t = 0.4, 0.6, 0.7 or 0.8). Altogether, we used 480 diverse feed-forward modules in the experiments.

3.4. Selection of "elite modules" for the final portfolio calculation

Each single module depictured in Figure 3 gives a set of portfolio weights. Some modules perform better in some time interval, other modules are preferable in another time interval. To adapt to changes constantly, every 10 days we performed module selection for the final portfolio design. Each time we divided 480 modules into K = 4 equal groups (120 modules in each) according to their mean return values during the last 20 days. Then we used specific parameters of the modules: 1) a number of times when each of 100 first order agent was employed in the module; 2) regularization parameter; and 3) clustering data interval size.

For classification we used four class multilayer perceptron training pair-wise misclassification cost specific loss function, where in the output layer instead of minimizing the class specific weights [w.sub.1], [w.sub.2], ..., [w.sub.K], one minimizes differences of two weight vectors, [w.sub.j] - [w.sub.h] (Raudys, S., Raudys, A. 2010):

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)

We use a K x K--dimensional misclassification cost matrix [C.sub.cost] = (([C.sub.hj])) to control pairwise misclassification cost.

4. Simulation study

In the simulation study we performed a set of experiments with three different datasets. The proposed feed-forward methodology showed better results than benchmark method. Below we will describe experiments in more detail. The model testing can be done in 3 ways. Firstly, one creates a model, calibrates its parameters and tests performance/quality on the same data set. This type of testing is widely criticized because complex models can adapt to the training data and generate optimistically biased results. Secondly, one calibrates the model on one set and validates it on the unseen data. This method is better but also unreliable as having one satisfactory out of sample result which can be a matter of luck. This approach is popular but limits us to a small amount of unseen data. The third alternative: one is repeating the experiment with multitude of pairs of the data subsets. One subset is used for model calibration and another one is used for performance evaluation. In time series analysis it is called a walk-forward approach. Here, we train the model on one set and test it on the small period of future data. Next, we shift the training period by period x and we shift the testing period by the same time x into the future. This procedure is repeated until there is no data to shift our training and testing data periods. Walk-forward analysis is gaining popularity. It is a time consuming process but allows viewing potential results in out of sample with longer time periods.

[FIGURE 4 OMITTED]

In this study, we organised data into k = 16 time intervals [z.sub.i] of 100 working days. Initially, we create a portfolio using [z.sub.1] ... [z.sub.m] intervals and test it on [z.sub.m+1] interval. In the next step, we create a portfolio using [z.sub.1] ... [z.sub.m+1] range and test on [z.sub.m+2]. The process is repeated until we reach [z.sub.m+k]. This is illustrated in Figure 4. So we have totally k out of sample periods that we can concatenate and get one long out of sample period. In total we have m + k periods. Thus, in experiments reported below, for testing we used 16 x 100 = 1600 days data.

4.1. Benchmark methods

In small scale problem we could use classic Markowitz approach for portfolio construction. Because of the size of the problem we cannot use classic Markowitz as a benchmark method. Markowitz is not capable of handling large-scale, high-dimensionality data. We are forced to use simple 1/N rule as a benchmark instead. Here, every possible portfolio member is included into account.

This rule is sometimes regarded as equally weighted rule. In some research works it was noted that this rule can be rather good in many practical portfolios (DeMiguel et al. 2009). We considered two versions of 1/N rule. In the first one, all ATSs are used. In its modification, only agents with average above zero profits during the latest 400 trading days were selected.

In spite of the known ways to reduce complexity "there are still many miles to go" before the gains promised by optimal portfolio choice can actually be realized out of sample (Kan, Zhou 2007; DeMiguel et al. 2007). One of the reasons of a mild success relies on the incorrect normality assumption inherently incorporated into the standard mean-variance framework. This factor is often observed if a small number of assets and records are measured. Therefore, simple fixed (non-trainable) benchmark portfolio trading rule (1/N--equally weighted portfolio) with ([w.sub.r1] = [w.sub.r2] = ... = [w.sub.rN] = 1/N) is suggested as a benchmark method (DeMiguel et al. 2007).

4.2. Portfolio construction experiments

In an attempt to choose the best model for the final portfolio calculation, we split 480 systems into four equal (120 models in each of them) pattern classes according to the last period's 10 days mean return values. The most successful models (the 4th class) were used in the final Portfolio weights calculation for subsequent 10 validation days. To "recognize" the 4th class models we used the misclassification cost-sensitive multilayer perceptron with 20 inputs, two hidden units and four outputs. Such perceptron allowed realizing non-linear decision boundary. Attributes for recognition of the best models were: a) 100 quantities of expert agents in each model; b) a generalized parameter that characterized learning set sizes of the given model; and c) regularization parameter, [tau].

Totally, we had 102 attributes. Learning set size used to train perceptrons was 480. To improve small sample properties of classification rule we performed a singular value decomposition of 480x102 dimensional data and used the first 20 principal components for classification. The multilayer perceptron was trained starting from very small random initial weights using a cost sensitive algorithm aimed to minimize the sum pair-wise costs of misclassifications (Raudys, S., Raudys, A. 2010). In some of experiments the pair-wise misclassification costs were calculated according to values of differences between the average returns in the 4 pattern classes. The best results, however, were obtained when our costs matrix prevented allocation of vectors of the first class (most unsuccessful) to the fourth class. In Table 2 we present misclassification cost matrix used in the experiments. Here, allocation of the worst models (the 1st class) to the 4th class is predominantly penalized.

In Table 3 and Figure 5 we can see that the 1st and 4th classes are separated pretty well. None from the first class was assigned to the class 4. This is what we tried to achieve using cost matrix and indicating very high cost for such assignment. This result was achieved using perceptron with 2 hidden layers that takes into account cost matrix during training. Figure 5 visualises 480 agents in 2D two hidden unit space.

We can see in the figure that class one (black dots) is not mixed with class 4 (green dots). In 16 walk-forward shifts of the training and the test data sets we designed 480 trading modules 16 times. Selection of the elite (the 4th class) modules were preformed after each 10 trading days. So, the evaluation of portfolio performance was executed 160 times.

[FIGURE 5 OMITTED]

To avoid over-adaptation to experimental material, the parameters of multistage portfolio weights calculation schema were found while experimenting with the 11,730 trading agent data recorded in the period from 2002 to 2010. The evaluation of the method's performance was executed with another two data sets formed from diverse trading robots collections selected from 2002 to June 2012 and from 2003 to December 2012 data archives. Below we present out-of-sample results, variation of the Sharpe ratio during the last 6 years.

By employing proposed approach where agents differ in the training history length, allowed us to improve portfolio Sharpe ratio from equally weighted 5.23 to a new system's 7.59. Both benchmark methods were approximately equally effective (Fig. 6. left panel). The 1.5 times improvement is very stable and statistically significant and in almost all out of sample experiments produced better Sharpe ratio (Fig. 6).

[FIGURE 6 OMITTED]

Conclusions

This paper presents a portfolio construction method inspired by sustainable economy principles, so that large tasks are divided into smaller ones, solved and later composed into a final solution. The theoretical justification for this novel solution is based on multivariate statistical analysis of multidimensional investment tasks, particularly on relations between data size, algorithm complexity and portfolio efficacy. Validation of the feed-forward decision-making system was performed on large-scale financial data sets, taking into consideration thousands of ATS during the last ten years.

Having a large number of portfolio candidates, we move towards the Gaussian distribution of portfolio returns. In such a situation, one can use the mean-variance approach. In the case of thousands of ATS, however, the small sample size problem arises. In high dimensional situations the employment of typical solutions, such as the Markowitz optimization principle, becomes ineffective. Thus, it is necessary to develop additional tools for reducing dimensionality with minimal loss of useful information. Our novel multilevel feed-forward decision-making schema comprises the following procedures:

1) clustering is used for dimensionality reduction and generating the first order trading agents by using the non-trainable 1/N portfolio design rule;

2) mean/variance optimization employed to taking into account correlations between the outputs of the first order trading agents;

3) cost-sensitive multi-category classification applied to select the group of the best trading modules;

4) final decision making based on the non-trainable 1/N portfolio rule.

The above sequence of procedures is based on sound theoretical considerations and is explained in more detail in Raudys, S., Raudys, A. (2011) and Raudys (2013). In the first procedure, we obtain a gain due to the theoretically-based knowledge that for correlated agents with similar mean returns, the non-trainable 1/N portfolio rule is well founded. In the second procedure, we obtain a gain because we have a relatively small number of first order trading agents and regularize the covariance matrix while developing the trading modules. In the third procedure, we have a gain in view of the fact that we are selecting the most promising trading modules by means of a special multilayer perceptron capable of taking into account the pair-wise costs of misclassification. In the fourth procedure, we expect a gain due to employing the 1/N rule. The gain can result from the fact that after the use of randomization, the performances of all modules allocated to the 4th class and correlations between the module outputs should not differ notably. Theory shows that in such situations the non-trainable 1/N rule becomes close to optimal.

In our two large-scale empirical performance evaluations we demonstrated the superiority of the novel method over the benchmark methods in 16 out-of-sample periods. From theoretical and empirical analysis it was clear that sample size issues are of great importance in portfolio construction: shorter time series are beneficial to out-of-sample portfolios when environments are undergoing frequent change. This can be useful during a crisis period, in which the environment is changing more rapidly than would usually be the case. Therefore, for portfolio construction, shorter histories have to be used (Raudys, S., Raudys, A. 2011).

The new trading system portfolio methodology has a theoretical basis and has been verified empirically using the large financial data sets. It shows promising results, although, it can be improved undoubtedly. One possible way would be to start applying evolutionary and/ or memetic algorithms (Krasnogor, Smith 2005) instead of agent/module selection. Recent approaches aimed at pre-processing truly high-dimensional input data to low-dimensional representations combined with regularization (Stuhlsatz et al. 2012; Zafeiriou et al. 2012) can facilitate enhanced trading agents and design of modules.

The proposed multi-layer feed-forward portfolio construction system with the selection of the best agents and modules for each time interval allowed reducing the number of incorrect decisions and increasing Sharpe ratio. We believe that the use of adaptive multistage feed-forward systems is suitable not only for financial portfolio modelling. In sustainable ecology, sustainable economy and sustainable society analysis tasks, a multitude of factors/ agents (smaller elements of the large model) influence the final decision. The similarity between such tasks and the large-scale portfolio design strategy suggests that the newly developed methodology is worth applying to wider areas of research. We need to seek alternative problems that can provide sufficient data and which are similar in nature to the modelling problems discussed.

Caption: Fig. 1. Typical ATS (top) time series in our p = 3,133 sized dataset and the asset (E-mini S&P 500 futures) being traded (bottom) by that ATS

Caption: Fig. 2. Correlation matrix of p = 3,133 ATS dataset. Yellow/light areas of the figure correspond to highly correlated ATS, red correspond to negatively correlated ATS and carroty areas correspond to uncorrelated ATS

Caption: Fig. 3. Feed-forward flow of information in single trading module of decision-making system

Caption: Fig. 4. Walk-forward testing, [z.sub.m+1] is the first out-of-sample period and [z.sub.m+k] and is the last

Caption: Fig. 5. Classification results into 4 pattern classes (first class is black; green--the "elite" class)

Caption: Fig. 6. Variation of out-of-sample Sharpe ratio evaluated in six years period with two diverse data sets of trading robots (left panel--p = 7,708 dataset, right panel--p = 3,133 dataset)

doi:10.3846/20294913.2014.889773

Acknowledgments

This work was supported by the Research Council of Lithuania under Grants MIP-043/2011 and MIP-018/2012.

References

Aldridge, I. 2010. High-frequency trading: a practical guide to algorithmic strategies and trading systems. Hoboken, New Jersey: John Wiley & Sons. 354 p.

Araujo, C.; de Castro, P. 2011. Towards automated trading based on fundamentalist and technical data, Advances in Artificial Intelligence--SBIA 2010, 112-121.

Bai, Z.; Liu, H.; Wong, W.-K. 2009. On the Markowitz mean-variance analysis of self-financing portfolios, Risk and Decision Analysis 1: 35-42.

Board on Agriculture and Natural Resources. 2012. Sustainable development of algal biofuels [online], [cited 17 November 2012]. Available from Internet: http://www.nap.edu/catalog.php?record_id=13437. Washington DC, USA: The National Academic Press.

Bookstaber, R. 2009. Risk from high frequency and algorithmic trading not as big as many think [online], [cited 30 August 2009]. Available from Internet: http://seekingalpha.com/article/158962-risk-fromhigh-frequency- andalgorithmic-trading-not-as-big-as-many-think

Brodie, J.; Daubechies, I.; De Mol, C.; Giannone, D.; Loris, I. 2009. Sparse and stable Markowitz portfolios, PNAS (Proceedings of the National Academy of Sciences of the United States of America) 106(30): 12267-12272. http://dx.doi.org/10.1073/pnas.0904287106

Chan, E. P. 2008. Quantitative trading: how to build your own algorithmic trading business. Hoboken, New Jersey: John Wiley & Sons. 204 p.

Cochrane, K. L.; Andrew, N. L.; Parma, A. M. 2011. Primary fisheries management: a minimum requirement for provision of sustainable human benefits in small-scale fisheries, Fish and Fisheries 12(3): 275-288. http://dx.doi.org/10.1111/j.1467-2979.2010.00392.x

Cura, T. 2009. Particle swarm optimization approach to portfolio optimization, Nonlinear Analysis: Real World Applications 10(4): 2396-2406. http://dx.doi.org/10.1016/j.nonrwa.2008.04.023

DeMiguel, V.; Garlappi, L.; Uppal, R. 2007. Optimal versus naive diversification: how inefficient is the 1/N portfolio strategy?, Review of Financial Studies 22(5): 1915-1953. http://dx.doi.org/10.1093/rfs/hhm075

DeMiguel, V.; Garlappi, L.; Nogales, F. J.; Uppal, R. 2009. A generalized approach to portfolio optimization: improving performance by constraining portfolio norms, Management Science 55(5): 798-812. http://dx.doi.org/10.1287/mnsc.1080.0986

Dempster, M. A. H.; Jones, C. M. 2001. A real-time adaptive trading system using genetic programming, Quantitative Finance 1(4): 397-413. http://dx.doi.org/10.1088/1469-7688/1/4/301

Hung, K. K.; Cheung, C. C.; Xu, L. 2000. New Sharpe-ratio-related methods for portfolio selection, in Proc. of the IEEE/IAFE/INFROMS Conference on Computational Intelligence for Financial Engineering, 26-28 March, 2000, New York, 34-37.

Jeucken, M. 2001. Sustainable finance and banking: the financial sector and the future of the planet: peopleplanetprofit in the Financial Sector. Guilford: Routledge. 320 p.

Kan, R.; Zhou, G. 2007. Optimal portfolio choice with parameter uncertainty, Journal of Financial and Quantitative Analysis 42(3): 621-656. http://dx.doi.org/10.1017/S0022109000004129

Krasnogor, N.; Smith, J. 2005. A tutorial for competent memetic algorithms: model, taxonomy, and design issues, IEEE Transactions on Evolutionary Computation 9(5): 474-488. http://dx.doi.org/10.1109/TEVC.2005.850260

Markowitz, H. 1952. Portfolio selection, The Journal of Finance 7(1): 77-91.

McCormick, R. 2012. Towards a more sustainable finance system, part 2: creating an effective civil society response to the crisis, Law and Financial Markets Review 6(3): 200-207. http://dx.doi.org/10.5235/175214412800650527

Moodyand, J.; Lizhong, W. 1997. Optimization of trading systems and portfolios, in Proceedings of the IEEE/IAFE Computational Intelligence for Financial Engineering (CIFEr), 23-25 March, 1997, New York, 300-307.

Perold, A. 1984. Large-scale portfolio optimization, Management Science 30(10): 1143-1160. http://dx.doi.org/10.1287/mnsc.30.10.1143

Raudys, S. 2001. Statistical and neural classifiers: an integrated approach to design. New York: Springer. 289 p. http://dx.doi.org/10.1007/978-1-4471-0359-2

Raudys, S. 2013. Portfolio of automated trading systems: complexity and learning set size issues, IEEE Transactions on Neural Networks and Learning Systems 24(3): 448-459. http://dx.doi.org/10.1109/TNNLS.2012.2230405

Raudys, S.; Young, A. 2004. Results in statistical discriminant analysis: a review of the former Soviet Union literature, Journal of Multivariate Analysis 89(1): 1-35. http://dx.doi.org/10.1016/S0047-259X(02)00021-0

Raudys, S.; Mitasiunas, A. 2007. Multi-agent system approach to react to sudden environmental changes, Lecture Notes in Artificial Intelligence 4571: 810-823.

Raudys, S.; Raudys, A. 2010. Pair-wise costs in multi-class peceptrons, IEEE Transactions on Pattern Analysis and Machine Intelligence 32: 1324-1328. http://dx.doi.org/10.1109/TPAMI.2010.72

Raudys, S.; Raudys, A. 2011. High frequency trading portfolio optimization: integration of financial and human factors, in Proc. 11th International Conference on Intelligent Systems Design and Applications (ISDA), 22-24 November, 2011, Cordoba, Spain, 696-701.

Raudys, S.; Raudys, A. 2012. Three decision making levels in portfolio management, in IEEE Conference on Computational Intelligence for Financial Engineering and Economics, 29-30 March, 2012, New York, 1-8.

Raudys, S.; Saudargiene, A. 2001. First order tree-type dependence between variables and classification performance, IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2): 1324-1328. http://dx.doi.org/10.1109/34.908975

Raudys, S.; Zliobaite, I. 2005. Prediction of commodity prices in rapidly changing environments, Lecture Notes in Computer Science 3686: 154-163.

Smeureanu, I.; Ruxanda, G.; Diosteanu, A.; Delcea, C.; Cotfas, L. A. 2012. Intelligent agents and risk based model for supply chain management, Technological and Economic Development of Economy 18(3): 452-469. http://dx.doi.org/10.3846/20294913.2012.702696

Stuhlsatz, A.; Lippel, J.; Zielke, T. 2012. Feature extraction with deep neural networks by a neneralized discriminant analysis, IEEE Transactions on Neural Networks and Learning Systems 23(4): 596-608. http://dx.doi.org/10.1109/TNNLS.2012.2183645

Zafeiriou, S.; Tzimiropoulos, G.; Petrou, M.; Stathaki, T. 2012. Regularized kernel discriminant analysis with a robust kernel for face recognition and verification, IEEE Transactions on Neural Networks and Learning Systems 23(3): 526-534. http://dx.doi.org/10.1109/TNNLS.2011.2182058

Received 03 January 2013; accepted 31 May 2013

Sarunas RAUDYS, Aistis RAUDYS, Zidrina PABARSKAITE

Faculty of Mathematics and Informatics, Vilnius University, Naugarduko g. 24, 03225 Vilnius, Lithuania

Corresponding author Sarunas Raudys

E-mail: sarunas.raudys@mif.vu.lt

Sarunas RAUDYS. Doctor Habil, Professor. He obtained the Master's degree and the PhD degrees in Computer Science from Kaunas University of Technology, and USSR Doctor of Science (Habil) degree from Riga Institute of Electronics and Computer Science in 1978. Presently, he is a Senior Researcher in Faculty of Mathematics and Informatics, Vilnius University. Research interests: multivariate analysis, statistical pattern recognition, data mining, artificial neural networks, deep learning, evolvable multi-agent systems, artificial economics, and artificial life.

Aistis RAUDYS received his PhD from the Institute of Mathematics and Informatics, Lithuania in a field of Feature Extraction from Multidimensional Data. Currently, he works as a Senior Research Fellow at Vilnius University Faculty of Mathematics and Informatics where he teaches Algorithmic Trading Technologies. Previously, he worked as a Researcher and also as a Software Developer in various software companies. He collaborated with a number of top tier banks including Deutsche Bank, Societe Generale and BNP Paribas. He is the author of 21 publications and scientific works. His research interests are in machine learning for financial engineering and automated trading.

Zidrina PABARSKAITE obtained PhD from Vilnius Gediminas Technical University, Lithuania in 2009. She is the author of 7 research articles. She worked as a Lecturer and Data Analyst in the past. Currently, she is working as Postdoctoral Research Fellow at Kaunas University of Technology in the field of Multivariate Data Analysis. Her research object was web log mining process: enhancements of web log data preparation process, application of different methods and algorithms to the web log data analysis and results presentation.

Table 1. In this table we present detailed information about datasets

Name   L (days)   p (robots)   % of zeros      From           To

A       2,581       3,133        68.65%     11 Mar 2002   04 Dec 2012
B       2,517       7,708        71.84%     10 Jan 2003   03 Sep 2012
C       2,398       11,730       64.44%     01 Jan 2002   10 Mar 2011

Table 2. The 4x4 dimensional matrix, [C.sub.cost], of pair-wise
misclassifixation costs

Class   1   2   3   4

1       0   2   4   20
2       1   0   1   1
3       1   1   0   1
4       1   1   1   1

Table 3. A number of allocations of 120 vectors (modules) matrix in
a singe 20 days period training session. Diagonal values represent
correct classifications and other--a number of misclassifications
(on the right)

Class   1    2    3    4

1       82   37   1    0
2       21   44   47   8
3       17   15   69   19
4       4    5    15   96