An empirical test of an IPO performance prediction model: are there "blue chips" among IPOs?
Miller, John ; Stretcher, Robert
ABSTRACT
An earlier study of 563 firms which issued IPOs during 1997
identified and estimated a three-stage algorithm in which basic
accounting variables and indices available at the time of the IPO were
found to predict mean annual wealth appreciation from buy-and-hold stock
ownership for the ensuing three years. Firm size predicted membership in
the middle sixth and seventh deciles; sales, receivables turnover, and
retained earnings per assets predicted the top quintile; current debt
and selling costs predicted the lowest quintile. Since February 2001
market trends have been generally negative. The current paper confirms
the earlier model despite negative currents.
PURPOSE OF THIS STUDY
An earlier investigation (Miller, 2003) uncovered a non-linear and,
indeed, non-metric anomaly in the joint distributions of the wealth
appreciation of companies with new initial public offerings and certain
accounting data made public at or around the date of the offering. The
earlier study was purely exploratory and consisted of specifying the
model and estimating the parameters of a three-stage prediction scheme.
The model was able to predict approximately three-fourths of the firms
correctly into three segments of wealth appreciation. The three segments
were the "MID" comprised of the sixth and seventh deciles,
"TOP" or the top quintile, and "LOW" or the bottom
quintile. It is the objective of this study to evaluate the performance
of the model in the face of the generally poor market conditions of the
two years immediately posterior to model construction (March, 2001 to
July 2003).
INTRODUCTION
It is not rare to find examples of data mining in the literature
relating financial data to stock market and general business
performance. Even the most influential of the early papers on company
failure prediction (e.g., Beaver, 1967, Altman, 1968, and Edminster,
1972) might be accused of tooenthusiastic opportunism by their use of
repeated analyses (one suspects) until a statistically significant
formulation appeared. And, to make matters worse, sample sizes were very
small and drawn as convenience samples rather than probability samples.
As is apparent from these cautionary examples, data mining is not always
a complementary term. It is also called "data dredging" or the
over-working of data, and is a natural result of the statistician's
desire to do a thorough job. It may be said that the goal of any
statistical analysis is to uncover statistical significances. See Fisher
(1986) for a broader discussion of the tensions between the statistician
and his client. There is also a careful discussion of the problem in the
paper and subsequent comments in Chatfield (1995). Chatfield underscores
the potentials for disaster whenever a model is uncovered and fit to a
set of data, and then tested on the same set. This is especially true in
the cases of step-wise regression and time series analysis. While this
is not a novel idea, he goes further to argue that split sample designs
are also suspect and that models should preferably be tested on data
gathered at another time. Only then can his "model selection
biases" be removed. More generally, it can be argued that there are
two stages in any kind of scientific enterprise. Tukey (1977) has
developed a broad range of powerful "exploratory data" tools
to assist the researcher in uncovering explanatory models. But he would
agree that there is still a need for "confirmatory" analysis
(Tukey, 1980). Good scientific procedure calls for such confirmation not
to come from the model source, but from independent investigators
operating in other sites on related, but not identical, datasets. The
approach of this paper is strictly "exploratory" and the
confirmatory phase will be left as a follow-up exercise.
As part of a fundamental reflection on the theoretical
underpinnings of the statistical analysis, Hand (1996) has expanded on
the opening provided by Velleman and Wilkinson (1993), who were
criticizing the psychophysicist Stevens' (1951) data measurement
scale hierarchy (nominal, ordinal, interval and ratio) that has become
almost routinely accepted in much of scientific work, especially the
social sciences and business research. Hand argued that the traditional
approach to science used a "representational measurement
theory" in which the data are integral parts of mathematical models
of empirical theories and are direct attempts to "formulate
properties that are observed to be true about certain qualitative
attributes" (foreword to the monumental Foundations of Measurement
trilogy, Krantz et al., 1971, Suppes et al,. 1989, Luce et al., 1990
quoted in Hand, 1996) This is the dominant assumption used by most
scientists in their work and is at least a century old. Later, as
physicists became troubled by such difficulties as those caused by the
dual nature of light, physicists began to relax the relationship between
their data and the real world. The development of "operational
measurement theory" is traced by Hand to Bridgman (1927) and is a
shift in the focus of the measurement theory from the empirical to the
mathematical construct being used to model that reality. In this case,
the emphasis is on how that model determines the properties of the data
measurement scale. It is exemplified in the elaborate models of latent
variables and structural equations used in the social sciences. There
the models are less a picture of some external reality and more of a
prediction scheme. Now the role of the statistician is merely to insure
that the assumptions about the data structure do not violate that model,
not some underlying reality. The responsibility for connection between
the model and external reality is entirely that of the social scientist,
not the data analyst.
There was probably a time when accounting data was (reasonably)
thought to be representational. The representational approach is still
exemplified in the work of the Banque de France (Bardos, 2000) in which
classical Fisher linear discriminant analysis is used to forecast
company disasters. But, it is becoming more and more apparent that such
reliance upon the external reality of bookkeeping data is not warranted.
This relaxed approach is that of Zopoundis (1999), for example. For the
purposes of this analysis we will not assume that the SEC-reported
numbers are fundamentally precise reflection of a company's
situation, but we will assume that the data can be relied on for
direction and relative size. That is, for most of the subsequent
analysis we assume qualitative rather than quantitative scaling.
The model used to begin this analysis was that of correspondence
analysis. It is one of many statistical procedures which have as their
raison d'etre the analytic development of a quantitative re-scaling
from data which are assumed only to be nominal or ordinal to begin with.
One popular model is that of the "tolerance" distribution of
Cox and Oakes (1984) and McCullagh (1980). Correspondence analysis has a
long history rooted in the work of Fisher (1940) in the middle of the
last century; however, it is certainly not the only possible analytic
procedure. There are many possible models that are used to rescore the
rows and/or columns of a contingency table. These models can have either
no conditions on the scores for the rows and/or columns (unrestricted
models), or it is possible to require that either the rows or the
columns or both must be ordinal in nature (restricted models). On the
one hand, Goodman (1981, 1986) has developed his R, C and RC models. The
latter is shown (Ritov and Gilula, 1991) to be equivalent to:
[P.sub.ij] = [[alpha].sub.i][[beta].sub.j] exp
([gamma][[mu].sub.i][v.sub.j]) (1)
where Gamma is the coefficient of intrinsic association; the sets
of parameters Alpha and Beta are the scores to be "optimized"
by maximizing; Mu and Nu are nuisance parameters. The rescorings are
centered to zero and scaled to one so that, according to Goodman, they
can be compared to the results of correspondence analysis, in which the
same standardizing takes place. Gilula, Kreiger, and Ritov (1988) show
that this is a model in which entropy in an information theory sense is
being maximized
Procedures for estimation of the parameters in the RC model have
been developed by Goodman in the unrestricted case. The R and C
association models in the restricted case were solved by Agresti, Chuang
and Kezouh (1987) and the RC model by Ritov and Gilula (1993).
The correspondence model used in this paper can be expressed as:
[P.sub.ij] = [P.sub.i.][P..sub.J]] (1 +
[lambda][[epsilon].sub.i][[delta].sub.j]) (2)
where Lambda is the "coefficient of stochastic extremity"
(Gilula, Kreiger, and Ritov, 1988), the sets of parameters Epsilon and
Delta and are the scores to be "optimized" by maximizing, and
the P's are marginal proportions. "Stochastic extremity"
is reference to the cumulative distributions of the rows (columns) which
are maximally distanced by this procedure. The coefficient Lambda is a
monotonic correlation in the sense of Kimeldorf and Sampson (1978),
which they define as the supremum of correlation coefficient over all
possible monotonic functions of the two variables. Perhaps the most
interesting aspect of monotonic correlation is in its relation to
statistical independence. Unlike the case for an ordinary Pearson
correlation coefficient, when a monotonic correlation is zero, then the
variables are independent. The optimization solution in the unrestricted
case of this can be traced at least as far back as Fisher (1940) and
even Hotelling (1933, 1936), and can be easily derived through the
singular value decomposition of a certain matrix (Hirshfield, later
changed to Hartley, 1935). The latter parameter, Lambda, also has the
felicitous meaning of a canonical correlation. A further appealing
property of the correspondence model (and the RC model, too, for that
matter) is that the data are "stochastically ordered" in the
sense that if the scores are ordered, then the conditional cumulative
probabilities over those scores are similarly ordered (Ritov and Gilula,
1993).
The differences between these two approaches, correspondence
analysis and Goodman's RC model, to constructing the restricted
ordinal scales will in general result in similar scale values and
similarly interpreted measures of association--so long as the
association between the pair of variables is "weak." (This is
an observation by Goodman, 1981, for unrestricted models extended to
restricted models by Ritov and Gilula, 1993.) This is the situation in
most social sciences and business applications, and is certainly true
for the properties under investigation here. In fact, under the
commonly-held "market efficiency" presumption there should be
no correlation at all.
For this analysis, the particular algorithm is not that from Ritov
and Gilula (1993) in which they reparameterize the scales via a latent
variable approach and then use the EM algorithm to optimize the scores.
This analysis follows the venerable Benezecri (1973) to Gifi (1990)
track which utilizes an "alternating least squares"
optimization due to Young, de Leeuw and Takane in the 1970s (see de
Leeuw, 1993). The specific implementation of this is found in the
routine "Optimal Scoring" developed for the Statistical
Package for the Social Sciences (SPSS) by Meulman (1992, but also the
technical annotation for the SPSS routine).
However, we actually began this analysis not at the ordinal level,
but without any assumptions beyond categorization. Each of the
firms' predictor variables was reduced to deciles and submitted to
a correspondence analysis. (SPSS "Optimal Scoring") The use of
deciles is common in finance literature (e.g., Lakonishok, Shleifer and
Vishny, 1994) and it is the basic beginning data structure of this
paper's analysis. Later, investigation was made into other
"n-tiles" (from quintiles up to "20-tiles") only to
find no real difference between the correspondence analysis results. To
get a picture of the scale, and unusual nature of this financial data,
here are the means of the deciles for the variable measuring the average
over several years of the 12-month wealth appreciation (MEANRT). This
will be the primary response variable for the subsequent analysis, and a
major goal will be to rescale this to a manageable metric. While a
nearly linear pattern can be seen over the middle seven deciles, the
first and last two deciles break that pattern. The highest decile had an
average return four times that of the ninth decile. Note also that half
of the deciles had average performances which returned either no gain
or, more likely, a loss to those who held the stocks for a year.
This variable is defined by Compustat as: "The Total Return
concepts are annualized rates of return reflecting price appreciation
plus reinvestment of monthly dividends and the compounding effect of
dividends paid on reinvested dividends "(Research Insight, 2001).
The first set of regressions were bivariate analyses of the deciles
for each of the 25 financial scales and ratios versus the mean annual
wealth aggregate for the period January 1998 (for those companies that
went public very early in 1997) to February, 2001. Each of the 25
predictor variables was selected because it had been considered in
earlier research and was available at or near the time of issuance of
the IPO. (The results of the nominal-scaled analysis showed that the
predictor variables could be reasonably approximated as ordinal without
much loss of correlation, so the 25 optimal scaling analyses were re-run
forcing an ordinal restriction on the predictors but not on the
response). The table below and the subsequent graph show an example of
the results of those nominal-ordinal analyses. In general, the
relationships between the predictors and response were not strong, but
for many they were not insignificant.
[FIGURE 1 OMITTED]
One is struck immediately by the respectable performance of such
variables as TOTASS (total assets, r = .355), TOTLIAB (total
liabilities, r = .344), EBITDA (earnings before taxes, r = .343),
OPER.AT (operating income to assets ratio, r = -.328), NETPROFM (net
profits, r = -.319), RETE.AT (retained earnings to assets ratio, r =
-.310), and SALES (net sales, r = -.306). These correlations are, of
course, only potentials. They are the maxima found by a process designed
to adjust the response and predictor measures (row and column scores)
monotonically until such maxima are achieved. (Recall that ordinary
Pearson correlation coefficients are also maxima derived from a process
of optimization over all possible linear relations.) But, these are high
enough to be encouraging. Note also the valid sample sizes. No attempt
was made to eliminate any special classes of businesses (REITs,
financial institutions, etc.); if they reported data, they were
included. And, for many of these variables, 90% or more of the 563 total
firms did report the predictor variable. (See Table 1 below.)
Below are the graphs relating the original "raw" ordinal
integer scores to the rescoring values ("quantifications") for
an example variable which gave rise to the correlations above. There is
a persistent non-linear pattern in the MEANRT (average wealth
appreciation) response scorings. It strongly suggests that one end of
the predictor variable scale (the right end is indicated by the sign on
the correlation coefficient in the table above) is related to middling
performance at or above the median ("above average firms") and
that the variable is less able to discriminate between those at the
extremes. The top decile and the bottom three response deciles receive
nearly identical scores.
[FIGURE 2 OMITTED]
This example is only one of the very clear pictures resulting from
the pairings of wealth appreciation ("MEANRT") with each of
the 25 predictors. The rescoring for total assets (TOTASS chart on
right) puts the bottom 70% of the firms' assets at virtually the
same score, then distinguishes between the top three deciles. The
response curve (MEANRT-chart on left) has the interior sixth and seventh
deciles well above the others. It would appear that the larger
three-tenths of companies (in terms of assets size) tend to be in the
sixth and seventh decile (in terms of wealth appreciation). This pair of
deciles will be called the "MID" group.
Almost half (44%) of the middle wealth group were among the largest
companies (when they went public), while only about 13% of the remaining
companies were in that largest size group. It is apparent that the
rescoring found by the procedure and charted above, which has the effect
of equating the lowest seven deciles, leads to a linear pattern relating
the two variables:
Stage 1
Having discovered the (relative) isolation of the MID Group,
recourse was made to a logistic regression to develop a prediction
scheme for this middle group. As compared to discriminant function
analysis, Press and Wilson (1978) showed that logistic regression is to
be preferred, primarily because the latter is better equipped to handle
non-normal predictors. All of the variables identified above as having
high potential correlations were tried out. The best prediction equation
(after a variation of the all possible subsets regression paradigm)
turned out to be:
[FIGURE 3 OMITTED]
The detail on the model includes the fact that the model involved
all but 38 of the firms (some variables were eliminated because too few
firms gave that data (e.g., 345 firms listed no intangible assets); mean
annual wealth appreciation was not available for seven firms; 31 had no
assets or net profits reported around the time of their IPO.
From this table (Table 2 above) we conclude that above average
performance (MID Group membership) is associated with larger total
assets and larger net profit margins--and that each (standard deviation)
step up the (rescored) asset ladder results in an increase in the odds
of being in the above average category (the "Exp(B) column) of 85%,
while a corresponding step on the net profit scale results in an
increase of the odds of 42%. (Note that the direction of the association
must be checked against the Optimal Scoring runs above.) The
classification table for this regression shows that when the cut-off is
adjusted so that the prediction equation puts 101 firms into the MID
Group, it does so accurately in 45 (44%) of the cases. This adjustment
of the posterior odds corresponds to placing the costs of
misclassification other than at one-to-one. Instead, the cost of
misclassifying the smaller quintile--a false negative--were put closer
to four-to-one. This assignment of costs is in the spirit of recent
papers advocating re-evaluation of previous simplistic cost structures
(Provost, Fawcett, and Kohavi, 1998, Adams and Hand, 1999, Drummond and
Holte, 2007). Overall, 413 or 79% of the 525 firms for which all of the
data were available were correctly classified at this initial stage.
Stage 2
At the second stage, the 101 firms identified by the initial
logistic regression were excused from the analysis, and a new search for
predictors began. The result of this search for predictors of the TOP
group of IPO performers was three variables. Sales, receivable turnover
(RECTURN), and the ratio of retained earnings to assets (RETE.AT) all
entered the model, all did so with negative coefficients, indicating
inverse relationships to the response, wealth appreciation. The details
of the table below show the significance of each of the three
predictors, and their impact on the probability of inclusion in the TOP
quintile of mean annual wealth appreciation firms.
The prediction equation attempts to predict 85 into the TOP group,
and it is accurate in 37 (or 44%) of those predictions.
A total of 295, or 74% of the 396 firms available at this stage of
the analysis, were accurately predicted at this stage.
Stage 3
The balance of the data, omitting the 101 predicted in Stage I and
the 85 predicted in Stage II, were again subjected to logistic
regressions with the aim of predicting membership in the lowest
quintile. This LOW Group prediction equation involved two variables,
current debt (DEBTCURR) and marketing expenses (SELLCOST). Perhaps
intuitively, current debt is a positive indicator for the LOW group,
while marketing effort is a negative indicator.
The details indicate that 270 firms remained with scores available
for the analysis. Every step up of one standard deviation in DEBTCUR
doubles the odds of being in the LOW Group while each similar step along
the SELLCOST dimension reduces the odds by 37%.
The classification table shows below shows that 43 cases were
predicted into the LOW Group, with 17 done so accurately, or 40% correct
prediction among them. Overall, 198, or 73% of the 270 used in this
analysis were correctly predicted.
DISCUSSION AND SUMMARY
A summary of this multi-stage prediction scheme would include its
ability to predict performance at three places along the continuum of
mean annual wealth appreciation. At the outset, all the variables are
segmented into deciles to preserve the ordinality, but destroy the exact
metric of the variables. This is done for two reasons: 1) the
distribution of firms along these original variables is distinctly
non-normal with astronomical skewness and kurtosis numbers; and 2) it is
believed that the accounting practices vary between firms and that their
reported numbers bear only a tenuous relationship to the underlying
reality they purport to represent. The deciles are then submitted to a
re-scoring routine developed by Benezecri, Gifi, and others. This
rescoring attempts to maximize a measure of monotone correlation
(Kimeldorf & Sampson, 1978). It is these
"quantifications," or rescorings, that are used in the balance
of the analysis. The model uses different variables at each of three
stages. It starts by trying to predict membership in the middle of the
distribution (the sixth and seventh deciles). It is entirely possible
that this non-linear relationship that is being taken advantage of is
the famous "horseshoe" discussed throughout Gifi (1990). It
was originally believed that if an accurate prediction could be made off
the middle of the distribution, then the middle could be deleted and it
would be relatively easier to predict the extremes. While this ease did
not eventuate, it was possible to make respectable predictions at each
stage.
The summary table for the prediction is as follows. There were at
the onset, three sets of firms which were of interest: MID, TOP and LOW.
The three accounted for 60% of the firms. The procedure proposed herein
actually predicted 253, or 45% of the original 563 firms and was
accurate in 99, or 39% of those predictions. However, when the absence
of key accounting data is taken into account, the procedure actually was
accurate in about three-fourths of its predictions.
CONFIRMATION
The original model was built on data from all IPOs issued during
the calendar year 1997. Since the criterion variable of most interest
was that of "annual wealth appreciation" (following Gompers
and Lerner, 1999 and many others), the earliest data came from January
1998, when only 31 firms were able to show the requisite 13 months of
data in order to be able to calculate the annual rate of change. By one
year later all 563 firms were showing wealth appreciations. By the end
of the model period, January 1998 up to February 2001, virtually all of
the 1990s "bubble" had evaporated, and the "interim"
period from March 2001 up to July 2003 saw a relatively subdued market
with most of the 1997 class of IPO stocks taking money from their
investors. (Table below.) As the table demonstrates, only the top
quintile was performing at annual rates above zero (returning one dollar
in market value plus accumulated and reinvested dividends for one dollar
invested at the beginning of the 12 month period). Over 40% had
essentially disappeared from the markets, either through failure,
merger, purchase, or any of the other exit routes from the markets. This
exodus posed a difficult challenge for the model to perform well.
[FIGURE 7 OMITTED]
The present confirmation study involved inspection of data from
Standard & Poor's Research Insight, and included all but 24 of
the original IPOs, which showed stock price and other company
information up through July, 2003. It might be expected that the missing
companies would be from the lower performing groups. In fact, though,
seven were from the TOP group, three from the MID group, and nine were
from the LOW group.
The additional data provided nearly 1.5 years (17 months, the
so-called "interim" period) more data than the earlier
analysis, which had a cut-off date of February, 2001.
A more serious concern was that many of the companies were missing
one or more months of wealth appreciation data. In fact, 164 or 30.4%
were missing at least 20 of the months (the maximum missing was 29
months). Again, however, there was no indication that the absence of
data was related to the performance of the company's stock. The
percentages of those missing 20 data points or more was 28%, 35%, and
26% in the TOP, MID and LOW groups, respectively (these groups are those
established in terms of their mean performance over the original three
years). And, the same missing class represented 25%, 31%, and 27% of
those firms predicted to be in the TOP, MID and LOW groups,
respectively.
The missing data, then, was not seen as a differentially distorting
factor. All missing data (from 1999 on) were replaced by the value
"-100," representing the depressing fact that for most of
these firms their investors would have lost all of their investment had
they held onto their stock for the full 12 months. Discriminating
between the various shades of disappearances will be the subject of a
future analysis. For the purposes of this confirmation analysis,
however, any distortions due to this oversimplification will simply be
absorbed into the error estimation for the model.
The Model Tested. Two tests formed the basis of the confirmation of
the model: 1) study of the mean shifts of the predicted groups since
they were formed in mid-2001; and 2) study of the composition churn in
the predicted groups since mid-2001.
Questions to be answered are 1) are the predicted groups performing
at their (relative) predicted levels over the full five-year period? 2)
does the model still predict full five-year performance better than
chance?
As might be expected from the overall patterns, the predicted
groups performed much differently in the "interim" period than
they had during the original three years. In the original period the TOP
group demonstrated a mean annual growth of about 50%, while the MID
group (actually the sixth and seventh percentiles) was nearly even and
the LOW group depreciated at about a third per year. All three groups
declined during the interim period, and they did not all decline at the
same rate (p-value = .03 for a simple ANOVA including only the data from
the three groups). The TOP group actually declined the most, losing more
than 50 percentage points from its mean Wealth Appreciation level during
the original period. The MID group dropped the least, but still
performed at a level about 30 percentage points below its pre-2001
level. The LOW group lost performance at a rate mid-way between TOP and
MID.
However, despite the distortions caused by the differential interim
performances, the fiveyear means retained their original relative
standings. The TOP group lost all investor advantage they had had in the
earlier period to return essentially what was invested, while the MID
group slid to a mean loss level of about a sixth of what was invested 12
months earlier. The LOW group maintained both its relative position at
the bottom, and its absolute level of draining about one-third of its
investors' money each year. The statistical test of these five-year
performances was very significant (p-value under .0001).
Finally, note that the range between the means of the TOP and LOW
predictions groups dropped from about 85 percentage points to about 41.5
percentage points. This is a very striking reduction and is no doubt at
least partially due to some sort of "regression to the mean"
effect. However, the entire market for IPOs went through a compression
relative to the earlier period. For example if one looks at the
"interdecile range" (difference between the tenth and
ninetieth percentiles), its value for the original period was about 148
percentage points. The value for the interim period was about 135 points
while the overall five-year interdecile range was only 118 percentage
points (not shown).
Further, a comparison of the performance groups' composition
based on all five years of wealth appreciation shows the same pattern as
the means above. While the original model correctly predicted 43.2% of
the training set (99 out of 229 predicted into one of the groups) into
the three quintiles, the quintiles based on the full five years'
data were correctly predicted 29.6% of the time (64 of the 216 firms in
the TOP, MID and LOW quintiles were accurately predicted).
Under the random model, in which only 20% of the quintile
predictions should be accurate, a chi square goodness-of-fit test of the
original model had a test statistic value of 88.6 with a p-value
starting with 20 zeros, indicating a very high significance. The test of
the updated model had a chi square of 12.5 with a p-value of 0.0004.
DISCUSSION
The purpose of this empirical analysis was to evaluate a model
created in mid-2001 which has as its objective the prediction of annual
wealth appreciation performances over a three-year period (January 1998
to February 2001) of 563 IPOS issued in 1997. Since creation of the
model, the markets have suffered through a prolonged period of poor
returns to its investors. In fact, the data do suggest that forecasting
during the pre-2001 period may have been more likely to be successful
than the "interim" period since. The great bulge in the
performance of the TOP group (top quintile) had disappeared in early
2001. However, the strength of the stocks during 1998-2001 was enough
that the five-year (actually January 1998 to July 2003) means still
reflected that earlier performance. Perhaps it is a further mark of the
potency of the model that it was able to weather these extremes.
Like all models there are more questions than answers. While this
empirical test may have contributed to the question about whether any
modeling might be effective, we still are concerned about generalizing
it. There are structural issues about using the techniques in other
times and for other types of equities. There are substantive questions
about the cause and effect-the process by which the variables utilized
in the prediction scheme materialize and lead to the results in the
markets.
Future research is very much needed in several areas. To start
with, it would be of great interest to see if the non-metric,
"deconstructive" methods used to develop this model will be
similarly successful using the full five-years' data. And, if
successful, is the five-year model similar to the earlier one? Does the
new model "find" the same groups? Does it employ the same, or
related, variables? (These variables are, to put it mildly,
inter-correlated. Each of the 26 predictors considered relates to one of
only a handful of underlying factors. While an untangling of the
variable intercorrelations might shed light on which of the variables is
most or least effective, the overall model strength should not be
affected.
Use of alternative classification algorithms to the logistic
regression techniques is also worth considering. Some recent research
has found the new boosting procedures useful in predicting corporate
failures (Cortes et al., 2007). On the other hand, it should be noted
that the inherent strengths of older techniques like logistic regression
have been shown both in a study of successful companies very similar to
this one (Johnson and Soenen, 2003), and in more basic research coming
out of the statistical and machine learning communities (Holte, 1993,
Lim et al., 2000).
More work needs to be done on the "missing" firms. Not
all of the missing firms suffered catastrophic declines. Perhaps more
artful estimates of the transformed entities derived from the original
563 IPOs will provide more insight into the model's accuracy.
Not only the missing group, now accounting for nearly half of the
1997 IPOs, are of interest. It would be instructive to follow-up on the
TOP group. How many of those in the TOP group are still up there? How
does the turmoil in TOP membership relate to the prediction model? What
characterizes the TOP firms which maintained versus those that slid?
REFERENCES
Adams, N. M. & Hand, D. J. (1999). Comparing classifiers when
the misallocation costs are uncertain. Pattern Recognition, 32,
1139-1147.
Agresti, A., C. Chuang & A. Kezouh. (1987). Order-restricted
score parameters in association models for contingency tables. Journal
of the American Statistical Association, 82, 619-623.
Altman, E. (1968). Financial ratios, discriminant analysis and
prediction of corporate bankruptcy. Journal of Finance, September 1968,
589-609.
Bardos, Mireille (2000). Detection of company failure and global
risk forecasting, in Data Analysis, Classification and Related Methods,
Kiers, Henk A.L., Rasson, Jean-Paul, Groenen, Patrick J. F. and Schader,
Martin (Eds.). Berlin, Springer.
Beaver, W. (1967). Financial ratios as predictors of failure,
Empirical Research in Accounting, Selected Studies, Supplement to
Journal of Accounting Research, 5, 71-111.
Bridgman, P. (1927). The logic of modern physics, New York,
Macmillan.
Chatfield, C. (1995). Model uncertainty, data mining and
statistical inference with comments, Journal of the Royal Statistical
Society, Series A (Statistics in Society). 158,419-466.
Cortes, Alfaro E., Gamez Martinez, M. & Garcia Rubio, N.
(2007). Multiclass corporate failure prediction by Adaboost.M1.
International Advances in Economic Research, 13, 301-312.
Cox, D. & D. Oakes (1984). Analysis of survival data, London,
Chapman & Hall.
de Leeuw, J. (1993). Some generalizations of correspondence
analysis. Retrieved April 15, 2003 from
http://citeseer.nj.nec.com/deleeuw93some.html.
Drummond, C. & Holte, R. R. (2007) Cost curves: An improved
method for visualizing classifier performance. Machine Learning, 65,
95-130.
Edmister, R. (1972). An empirical test of financial ratio analysis
to small business failure prediction. Journal of Financial and
Quantitative Analysis, 7, 1477-1493.
Fisher, R. (1940). The precision of discriminant functions. Annals
of Eugenics, 10, 422-429.
Fisher, F. (1986). Statisticians, econometricians, and adversary
proceedings. The American Statistician, 81, 277-286.
Gifi, A. (1990). Nonlinear multivariate analysis. Chichester, John
Wiley & Sons.
Gilula, Z., A. Krieger. and Y. Ritov (1988). Ordinal association in
contingency tables: some interpretive aspects. Journal of the American
Statistician, 83, 540-545.
Gompers, P. (1995). Optimal investment, monitoring, and the staging
of venture capital. Journal of Finance, 50, 14611490.
Gompers, P. and J. Lerner (1999). The venture capital cycle.
Cambridge, Mass., The MIT Press.
Hand, D. (1996). Statistics and the theory of measurement (with
comments). Journal of the Royal Statistical Society, Series A,
159,445-492.
Hirschfeld, H. (1935). A connection between correlation and
contingency, Proceedings of the Cambridge Philosophical Society,
31,520-524.
Holte, R. C. (1993). Very simple classification rules perform well
on most commonly used datasets. Machine Learning, 11, 63-90.
Hotelling, H. (1933). Analysis of a complex of statistical
variables into principal components. Journal of Educational Psychology,
24, 417-441, 498-520.
Hotelling, H. (1936). Relations between two sets of variates.
Biometrika, 28, 321-377.
Johnson, R. & Soenen, L. (2003). Indicators of successful
companies. European Management Journal, 21, 364-369.
Kimeldorf, G. & A. Sampson (1978). Monotonic dependence. Annals
of Statistics, 6, 895-903.
Krantz, D., R. Luce, P. Suppes & P. Tversky (1971). Foundations
of measurement, vol. 1, Additive and polynomial representations. New
York, Academic Press.
Lakonishok, J., J. Shleifer & R. Vishny (1994). Contrarian
investment, extrapolation and risk. Journal of Finance, 49, 1541-1578.
Lim, T-S., Loh, W-Y. & Shih, Y-S. ((2000). A comparison of
prediction accuracy, complexity, and training time of thirty-three old
and new classification algorithms. Machine Learning, 40, 2003-228.
Luce, R., D. Krantz, P. Suppes & A. Tversky (1990). Foundations
of measurement, vol. 3, Representation, axiomatization, and invariance.
San Diego: Academic Press.
McCullagh, P. (1980). Regression models for ordinal data (with
discussion). Journal of the Royal Statistical Society, Series B, 42,
109-42.
Meulman, J. (1992). The integration of multidimensional scaling and
multivariate analysis with optimal transformations. Psychometrika, 57,
539-565.
Miller, J. M. (2003). Venture capital, entrepreneurship, and
long-run performance prediction:An application of data mining.
Unpublished doctoral dissertation, Rice University.
Press, S. & S. Wilson (1978). Choosing between logistic
regression and discrimination analysis. Journal of the American
Statistical Association, 79, 699-705.
Provost, F., Fawcett, T. & Kohave, R. (1998). The case against
accuracy estimation for comparing induction algorithms. In: Proceedings
of the Fifteenth International Conference on Machine Learning, 43-48.
Ritov, Y. & Z. Gilula (1991). The order-restricted RC model for
ordered contingency tables: Estimation and testing for fit. Annals of
Statistics, 19, 2090-2101.
Ritov, Y. & Z. Gilula (1993). Analysis of contingency tables by
correspondence models subject to order constraints. Journal of the
American Statistical Association, 88, 1380-1387.
Stevens, S. (1951). Measurement, statistics, and psychophysics, In
Stevens, S. (Ed.), Handbook of Experimental Psychology, New York: John
Wiley & Sons.
Suppes, P., D. Krantz, R. Luce & A. Tversky (1989). Foundations
of measurement, vol. 2, Geometrical, threshold, and probabilistic
representations. San Diego: Academic Press.
Tukey, J. (1977). Exploratory data analysis. Reading, Mass.:
Addison-Wesley.
Tukey, J. (1980). We need both exploratory and confirmatory. The
American Statistician, 34, 23-25.
Velleman, P. & L. Wilkinson (1993). Nominal, ordinal, interval,
and ratio typologies are misleading. The American Statistician, 47,
65-72.
Zopoundis, C. & A. Dimitras (1998). Multicriteria Decision Aid
for the Prediction of Business Failure. Boston: Kluwer Academic
Publishers.
John Miller, Sam Houston State University
Robert Stretcher, Sam Houston State University
Table 1: Optimal Scoring Correlations
Predictor Valid Data Correlation [R.sup.2]
EBITDA 515 -0.343 0.118
SALES 532 -0.306 0.094
COSTSALE 532 -0.254 0.064
SELLCOST 442 0.290 0.082
INTEXP 504 -0.299 0.090
CURASTOT 469 0.238 0.045
INTANG 556 -0.179 0.032
TOTASS 540 0.355 0.125
DEBTCURR 507 -0.225 0.051
TOTCLIAB 476 0.241 0.058
TOTLTDBT 540 0.276 0.076
TOTLIAB 540 0.344 0.118
DEBTEBIT 515 -0.262 0.068
LIABNETW 540 -0.201 0.041
EBITASS 515 -0.279 0.078
CURRENTR 468 0.208 0.043
QUICKRAT 474 0.206 0.043
RECTURN 489 -0.209 0.044
TOTASST 507 -0.176 0.031
CASHTURN 505 -0.269 0.073
NETPROFM 525 -0.319 0.101
WCAP.AT 467 0.186 0.034
RETE.AT 534 -0.310 0.096
OPER.AT 439 -0.328 0.108
SALES.AT 509 -0.181 0.033
Source: Compustat estimates of mean wealth appreciation for 563 IPOs
in 1997. Input data were decile memberships and were not forced to
be ordinal.
Table 2: Prediction of the MID Group
Variable Coefficient Standard Error Significance Exp(B)
TOTASS 0.6157 0.1058 0.0000 1.8510
NETPROFM 0.3506 0.1434 0.0145 1.4199
Constant -1.6119 0.1277 0.0000 0.1995
Source: Compustat; logistic regression
Table 3: Stage I--Classification Table
Predicted Actual Others Mid Group Total
Others 368 56 424
MID Group 56 45 101
Total 424 101 525
Table 4: Stage II--Prediction of the TOP Group
Variable Coefficient Standard Error Significance Exp(B)
SALES -0.4169 0.1714 0.0150 0.6591
RECTURN -0.3527 0.1544 0.0224 0.7028
RETE.AT -0.5055 0.1313 0.0001 0.6032
Constant -1.4973 0.1463 0.0000 0.2237
Source: Compustat, logistic regression
Table 5: Stage II--Classification Table
Actual Others TOP Group Total
Predicted Others 258 53 311
MID Group 48 37 85
Total 306 90 396
Table 6: Stage III--Prediction of the LOW Group
Variable Coefficient Standard Error Significance Exp(B)
DEBTCUR 0.676 0.2389 0.0047 1.9660
SELLCOST -0.461 0.1705 0.0069 0.6307
Constant -1.158 0.1573 0.0000 0.3141
Source: Compustat; logistic regression
Table 7: Stage III--Classification Table
Actual Others LOW Group Total
Predicted Others 181 46 227
LOW Group 26 17 43
Total 207 63 270
Table 8: Classification Results Summary
STAGE I STAGE II STAGE III
GROUP PREDICTED MID TOP LOW
Prediction Variables TOTASS SALES DEBTCURR
NETPROFM RECTURN SELLCOST
RETE.AT
Odds Effects 1.85 1.42 1.97
1.42 0.70 0.63
0.60
Firms Available for Prediction 525 396 270
Firms in Target Group 101 90 63
Predicted Total 101 85 43
Predicted Accurately 45 37 17
Conditional Prediction
Accuracy Rate 79% 74% 73%
Table 9: Mean Shifts in Wealth Appreciation
1997 to Feb Interim to 1997 to July Number of
2001 2003 2003 Firms
Predicted
Group
TOP 51.20 -51.44 2.73 75
MID -5.26 -31.61 -17.96 98
LOW -33.87 -44.77 -38.82 43
Other -5.35 -31.44 -17.57 324
Total 0.25 -35.31 -16.52 540
ANOVA tests of the means from the three prediction groups alone have
F ratios and p-values of 14.114, 0.00000175; 3.542, 0.0306;
7.814, 0.000531, respectively.
Table 10: Prediction Accuracy--Wealth Appreciation Quintiles
1997 to Feb 2001 1997 to July 2003
Correct TotalCorrectTotal
TOP 37 852375
MID 45 1012798
LOW 17 431443
Total 43.2% 100%29.6%100%