Testing Bayesian updating with the Associated Press Top 25.
Stone, Daniel F.
I. INTRODUCTION
Most studies of Bayesian updating use experimental data. (1)
Although this research has led to many insights into human behavior, it
is inherently subject to a few criticisms. A common one is that in
experimental settings agents lack expertise. Many experiments address
this concern by giving subjects opportunities to practice. Still, the
issue can never be completely mitigated. The intuition real-world agents
gain from years of experience is not replicable in the lab. (2)
This article uses a non-experimental data source--the voter ballots
of the Associated Press (AP) "Top 25" college (American)
football poll--to contribute to the behavioral economics belief updating
literature. (3) The AP poll is a weekly subjective ranking of the top
teams by dozens of experienced journalists. As the rankings are revised
primarily in response to game results, the main signals causing rank
updating are publicly observable. As each team plays at most one game
per week, there is at most one major new signal per team per week. These
features of the institutional setting make it an ideal-field context for
analyzing belief updating. Furthermore, the prior and signal
distributions are much richer than those typically used in lab work,
where subjects often use binary random variables to update beliefs on a
single binary state variable. The richness of the rankings data yields
evidence of a variety of behaviors: that the poll voters appear to
Bayesian update in some situations, and respond both excessively
(overreact) and insufficiently (underreact) to new information in other
circumstances. The results both support the idea that even experts in
real-world settings indeed sometimes act in a non-Bayesian way, and
enhance understanding of the underlying causes of the different types of
belief updating behavior.
The article's empirical method can be summarized very briefly
as follows. I first estimate the voters' weekly Bayesian posterior rankings (henceforth the estimated posteriors). I then test for
systematic differences between the estimated posteriors and the
voters' actual posterior rankings (the observed posteriors), and
whether these differences are associated with particular contextual
factors.
Estimating the Bayesian posteriors is the main challenge. It is
unclear how voters "should" update their ranks, as the
instructions the AP provides voters regarding how to rank teams is vague
and intended to allow for subjective interpretation. (4) The estimation is based on a model of Bayesian updating that assumes, for each voter
season, there exists a "true" ranking--the ranking based on
full information on team qualities, performances, and any other factors
considered relevant--and that each voter's weekly goal is to rank
teams as closely as possible to this true ranking. The model thus
implies that to rank teams optimally each week voters should Bayesian
update their beliefs about true ranks. I estimate these Bayesian updated
beliefs by using each voter's final ranks for the season as an
estimator, or proxy, for her true ranks for that season. This allows use
of the empirical relations between final ranks and earlier ranks, and
final ranks and game results, to construct posterior distributions.
These distributions are then translated into the estimated posteriors
ranks.
The final ranks are a natural proxy for true ranks for two main
reasons. First, final ranks are based on more information than all
earlier ranks from the same season, and so should be the most precise
ranks for the season, from each voter's perspective at least.
Second, using the voters' own final ranks to proxy
"truth" allows each voter-season to have its own definition of
truth. Given the subjective nature of the rankings this seems
preferable, as opposed to imposing a single ideal ranking on all voters
(such as the aggregate final ranks). Still, other plausible proxies are
explored in robustness checks, discussed further below.
It is worth noting this method implies the estimated posteriors can
be interpreted as estimates of how to update toward a voter's own
final ranks as "quickly" as possible. If voters failed to do
this it would indicate they fail to use available information
efficiently, that is, they use this information in a non-Bayesian way.
However, as a voter's own final ranks could be flawed, updating
toward them maximally "quickly" is just a necessary, and not a
sufficient, condition for Bayesian updating. (5) I elaborate on this
issue, and all the details of the empirical method and model, further
below, especially in Section III. In particular, I provide evidence that
rank precision increases within the season.
In Section IV, after first showing evidence of the validity of the
estimated posteriors, I use ordinary least squares regressions to test
the null hypothesis of Bayesian updating. There are a number of results
supporting rejection of the null. Voters do not respond sufficiently to
home status and margin of victory over unranked opponents. (6) The
voters' responses to margin of victory over ranked opponents and
margin of loss against all opponents are closer to the estimated
posterior responses. The latter types of signals, losses and wins over
ranked teams, occur less often and are relatively informative regarding
the teams' final ranks. The results are thus consistent with the
voters being, in a sense, selectively Bayesian: determining their
posteriors in a "more Bayesian" way only in response to
relatively infrequent and informative signals. Voters seem to use a
"win is a win" heuristic the rest of the time. Voters may be
more responsive to characteristics of losses and wins over ranked teams
because those games are more salient, that is, receive more attention
from analysts, players, and coaches, and there are information
processing costs. (7) It is also possible that voters prefer to adjust
prior ranks minimally to preserve reputation or ego, and consequently
ignore evidence that is relatively ambiguous (Sloman, Fernbach, and
Hagmayer 2010).
There are also considerable deviations from Bayesian behavior that
seem to result from subtle, or non-salient, differences in the precision
of prior distributions for teams with different prior ranks. The
estimated priors imply the most highly ranked teams are much better than
moderately ranked teams, while the difference between moderately and
low-ranked teams is small. (8) Thus, voters should have relatively
precise prior beliefs for the ranks of the very best teams. This greater
precision implies that voter responses to losses by top-ranked teams
should be small: the mean estimated rank decline after losses by teams
ranked 1-5 is only 5.7 spots, whereas the mean estimated decline
following losses by teams ranked 6-10 is 8.9 spots. The mean observed
declines are 7.7 and 8.4, respectively. In other words, voters downgrade top 1-5 teams by around 2 (7.7-5.7) spots more than they should after
losses. (9)
In summary, a parsimonious explanation for the different types of
behavior is that voters are more responsive to information that is more
salient and less vague, conditional on actual information content.
Although the salience bias and its effects on belief updating are
recognized in psychology and even the popular press, (10) the
relationship does not seem fully appreciated by the economics literature
on belief updating. Barberis and Thaler (2003) note the relevance of
salience to over/underreaction, (11) but do not cite any studies
directly supporting this idea, although their article is in an in-depth
survey. Recent experimental work (Holt and Smith 2009) and theory
(Epstein, Noor, and Sandroni 2010) seems to neglect variation in the
degree of salience among signals to focus on other issues. Recent work
on salience and attention (Chetty, Looney, and Kroft 2009) does not
focus on belief updating. The results from this article help fill this
gap in the literature. The results also support the theory that the
salience bias may sometimes be an optimal heuristic, as more salient
games are relatively informative, although ignoring variation of prior
precision seems strictly suboptimal. (12)
Before proceeding, it should be highlighted that using subjective
beliefs (voters' own final ranks) as a proxy for true values is an
unusual, and perhaps unique, empirical approach. The approach may even
seem internally inconsistent, as it may appear to not make sense that
individuals can make mistakes in updating toward their own beliefs,
especially when the mistakes are identified using data on the behavior
of those same individuals. However, it is certainly possible. A
mathematical example is provided to illustrate this idea in the
Supporting Information. The example also shows how rank precision can
increase through the season despite rank updates being affected by
mistakes. The intuition is essentially that mistakes, if sufficiently
small, largely "wash out," while legitimate rank updates are
more persistent.
The relevant question then is how this approach may bias the
results. It is fairly intuitive that, as alluded to above, the bias
likely would be against rejecting the null. This is most clearly seen
for the extreme case of the final rank updating. Using the final ranks
to proxy truth implies the observed posteriors for the next-to-final
rankings are tautologically Bayesian. So the null cannot possibly be
rejected for the final updating. More generally, the closer the updating
is to the final updating, the more likely it will appear the voters are
acting in a Bayesian way even when they are not. The natural way to
address this problem is to focus on early season updating. The sample
used for analysis is restricted to the first half (7 weeks) of each
season. As rankings do vary considerably in the final half, the issue is
substantially mitigated. That the problem is not too severe is supported
by the validity analysis and the fact that numerous results supporting
rejection of the null are indeed found.
Still, robustness checks are especially in order because of the
unusual framework. In Section IV, I show that results are actually very
similar when the posteriors are estimated using two alternative true
rank proxies: (1) the aggregate AP poll final rankings and (2)
"computer rankings." Both are essentially independent of each
individual voter's ranks, and thus avoid the endogenity of true
ranks problem. Finally, Section V presents two analyses that completely
relax the main empirical framework, but also yield supportive results.
II. THE DATA
The AP college football poll is conducted once per week during the
college football season and teams play exactly one or zero games per
week. As games are the major signals regarding how the teams should be
ranked, the poll voters observe at most one major signal about each team
per week. (13) The signal probabilities--the empirical distributions of
the scores--should be common knowledge, as the voters have all observed
years of scores.
These two features of the data--the single major signal for each
team between observations of the voters' rankings, and knowledge of
the signal distributions--distinguish the rankings data from most
economic data, and allow the rankings to be used to study Bayesian
updating. In most economic situations there are many important signals,
which arrive erratically, that may affect beliefs. It is difficult to
tell which individuals observe which signals, and even more difficult to
say anything about the subjective probabilities of the signals. (14)
The first AP poll is taken before the season starts in late August
and the final poll occurs after the season ends in early January. The
poll is voted on by 60-65 leading college football journalists from
throughout the United States and different forms of media. Each voter
submits a ranking of the top 25 teams, and the aggregate ranking is
determined by assigning teams 25 points for each first-place vote, 24
for second, and so forth, and summing points by team (a Borda ranking).
The poll began in 1934 but the number of teams ranked by each voter has
changed over time, and has been 25 since 1989. Historically, the poll
has played a part in determining the national championship, but this
role ended in 2005. The individual ballots of the AP poll voters are not
confidential, but historical ones from before 2007 are not published or
even available on the Internet. The AP only makes the current
week's ballots available on its website, which is where I obtained
the 2007 and 2008 ballots. I obtained hard copies of the individual
ballots for the 2006 season directly from Paul Montella and Ralph Russo
of the AP. Historical aggregate polls and score data are widely
available. (15)
The empirical setting does have several weaknesses. First, the
voters do not have direct incentives relating to the quality of their
rankings. This is not too serious an issue, as the voters'
reputations, and thus career concerns, depend on their rankings. For
example, a voter was removed from the 2006 poll after mistaking a win
for a loss, and the voters' weekly ballots are scrutinized
carefully by bloggers and websites. (16) In addition, discussions with
voters indicate that they put substantial effort into producing their
best possible rankings.
There are three more significant weaknesses. The first is that the
voters only rank 25 out of more than 100 teams. The second is that the
data only include rankings, rather than distributions of subjective
probabilities regarding specific variables. The third is that there is
ambiguity regarding the criteria on which the teams are being ranked. I
discuss all of these weaknesses at length in the following section.
III. ESTIMATING THE BAYESIAN POSTERIORS A. The True Rankings
To quickly restate the main empirical method: I first estimate the
voters' Bayesian posterior ranks (after observing game results;
Week 2 is posterior to Week 1, Week 3 posterior to Week 2, etc.), then
conduct tests on the differences between these estimates and the
observed posteriors. Estimating the Bayesian posteriors requires an
assumption regarding what the voters are updating toward. This is
ambiguous, as the criteria on which the rankings are based are
ambiguous. (17) As discussed in Section I, I assume each voter has a
true ranking for the season--an ideal ranking--which she attempts to
update toward throughout the season. I use each voter's final
ranking as an estimate of her true ranking, as the final ranks
incorporate both voter-specific rank criteria and should be the most
precise ranks for the season.
The estimated posteriors are thus essentially estimates of how to
update toward own final ranks efficiently. Consequently, the estimated
posteriors can really only be used to assess whether voters satisfy the
necessary condition for Bayesian updating that future changes in ranks
are not predictable using current information. If this condition is
violated, this would imply voter updating is not fully Bayesian. To
illustrate with an extreme example, if voters always rank teams that win
10 games in the final top 10, and teams that win their first game by 50
points always win 10 games, then non-top 10 teams that win their first
game by 50 should be ranked in the top 10 right away (i.e., in the Week
2 ranking). If voters failed to do this, it would indicate they do not
understand the informativeness of the first signal, that is, do not use
it efficiently.
However, efficient updating toward own final ranks is not a
sufficient condition for Bayesian updating. To again illustrate by
example, if voters simply held their ranks constant all season, then the
estimated posteriors would also be constant, as it would appear that the
prior ranks are equal to the true ranks, making the game results
completely uninformative. This would cause the estimated posteriors to
equal the observed posteriors, and the null of Bayesian updating would
not be rejected. Clearly, however, the rationality of this behavior
would be highly suspicious. While it is good to be aware of this issue,
it would only be potentially problematic if the results largely failed
to support rejection of the null. Since, as discussed above, this is not
the case, the issue is not too concerning.
To examine the validity of the empirical method further, it is
worth discussing how exactly the voters actually determine their ranks.
One plausible criterion voters may use is subjective assessment of
season-long performance. If voters rank teams this way and are not aware
of possible mistakes they make in rank updating, then they will think of
their final ranks as literally their true ranks. If voters did make
mistakes and were aware of them, it would be natural to expect voters to
correct for the mistakes to the extent possible, and still think of
their final ranks as the most informed ranks for the season. The other
most plausible criterion is current team quality. (18) While quality may
change throughout the season, if quality is constant then final ranks
would be (at least in the voters' eyes) the most precise estimators
of true ranks throughout the season, since the final ranks incorporate
maximal information and so should be most precise. Thus, if quality is
constant, whether voters rank teams on "season-long
performance" or "quality," it seems reasonable to use the
final ranks to proxy the true ranks. (19)
Table 1 presents evidence that ranks do indeed become more precise
throughout the season. The significant positive interaction term
indicates the marginal effect of the rank advantage of the favorite (the
team with ex ante better rank) on the probability of it winning the
game, and on its score advantage, is greater later in the season than
early. The results from the simplest model, presented in column 1,
indicate when the favorite has a 20 rank advantage, this increases the
favorite's chance of winning by only 10% in the early season (as
compared to having no rank advantage), but yields a 25% increase in the
late season. Moreover, the finding of Logan (2010) that rank responses
to losses decline throughout the season is consistent with voters
thinking their rank precision improves. Another advantage of using the
voters' own final ranks to proxy truth is they incorporate regional
or other voter-specific biases that are constant throughout a season.
(20)
It seems the main reason the final ranks as true ranks proxy could
be problematic is the possibility that team qualities change
substantially within seasons and voters rank teams on current quality.
Then it would be difficult to distinguish changes in ranks that occur
during a season because of learning from those that are caused by
changes in quality. I examine this possibility by testing the hypothesis
that average score differences for teams of different final ranks are
constant throughout the season. (21) If voters rank teams on current
quality, and quality changes throughout the season, then teams highly
ranked in the final poll would have better performances in the later
part of the season, on average. This is because teams highly ranked in
the final poll would improve on average throughout the year, and teams
ranked poorly in the final poll worsen. See the Supporting Information
for a theoretical illustration of this phenomenon.
Table 2 presents empirical evidence that this issue is not a
problem. The table indicates that rankings are either based primarily on
season-long performance, or that team qualities do not change
significantly within seasons, perhaps due to the lack of a hot hand at
the team level (Camerer 1989). While home teams of final rank 1-12 do
beat teams of final rank 13-25 by a greater margin in the late season,
home teams of final rank 13-25 also perform better in later months
versus superior teams of rank 1-12. These results essentially nullify each other, as they point in opposite directions. Neither of the other
results (for games between ranked teams and unranked teams that were
ranked in the final poll in one of previous two seasons) indicate
well-ranked teams' performances improved throughout the seasons.
Thus there is little reason to lose confidence in the null, that score
differences are uncorrelated over time.
A few further remarks are in order regarding the true ranks proxy.
As discussed in Section I, the sample is restricted to the first half of
each season to minimize the bias against rejecting the null caused by
the proxy; this restriction should also reduce the chance of voters
committing the hot-hand fallacy, that is, falsely inferring trend in
team quality changes. (22) Another potential problem with using voter
final ranks to proxy truth is it implies voters have no one to please
but themselves; their objective functions do not depend in any way on
others' perceptions of the accuracy of their rankings. (23) Also,
the voters may think the aggregated final ranks may contain more
information than their individual final ranks. Hence, an alternative
proxy for true ranks that accounts for these issues is the aggregate
final ranks. I conduct the analysis using this alternative proxy, and a
version using the well-known Sagarin computer rankings as the true ranks
proxy, to estimate the voter prior distributions. These are rankings
calculated based only on game results and strength of schedule, so they
completely eliminate the endogeneity issue.
Figure 1 shows an illustration of the estimated priors given the
true ranks proxy: the final rank distributions conditional on prior rank
group, split out by teams that win and lose their immediate subsequent
game. The figure shows interesting variation in the precision of prior
distributions across prior rank. The distribution for top 1-5 teams
after losses dominates the distribution for top 6-10 teams after wins
(and losses). That is, even after losses, top 1-5 teams still have
higher probabilities of finishing in the top 10 and top 11-25 than top
6-10 teams do after they win. This implies that the priors for top 1-5
teams are quite strong, or precise--these teams are considerably better
than the next best group of teams. On the other hand, the distribution
for top 11-15 after losses is dominated by the distributions of both top
16-20 and 21-25 teams after wins. Moreover, the distributions for all
three of these rank groups after wins are fairly similar. This indicates
there is little distinguishing teams in the bottom 15 of the rankings,
that is, the priors for these teams are relatively imprecise. This
finding will be important for understanding the results from the main
analysis presented in Section IV.
[FIGURE 1 OMITTED]
B. Formal Framework
This subsection specifies a model of Bayesian rank updating used as
a framework for estimating the Bayesian posterior ranks. Let
[r.sup.v.sub.i] denote the true rank of team i for voter v (in a
particular season; index suppressed), i [member of] {1, ..., N},
[r.sup.v.sub.i] [member of] {1, ..., N}, and v [member of] {1, ..., V},
in which N is the total number of teams, and V is the number of voters.
Let [[??].sup.v.sub.i,t] be the rank that voter v assigns to team i in
week t, in which [[??].sup.v.sub.i,t] [member of] {1, ... ,25,
unranked}, because the voters can only rank 25 teams, and t [member of]
{1, 2, ..., T}, with T denoting the final week of the season. Thus,
[[??].sup.v.sub.i,t] is the observed rank of team i by v in week t. The
notation v is suppressed in the remainder of this subsection as it is
unnecessary.
It is convenient to assume each voter's objective in each week
t [member of] {1, 2, ..., T} is to minimize the expectation of a
quadratic loss function of current and true ranks:
[E.sub.t][[[summation].sub.i=1:N][([[??].sub.i,t] - [r.sub.i]).sup.2]].
For this purpose [[??].sub.i,t] can be equal to any number greater than
25 if in fact [[??].sub.i,t] = unranked. If [[??].sub.i,t] were a
continuous variable, clearly it would be optimal for voters to set
[[??].sub.i,t] = [E.sub.t]([r.sub.i]), for all i and t. They cannot do
this though, because they have to assign the discrete ranks of 1 through
25 to 25 different teams. However, optimal behavior in the discrete case
is similar to that of the continuous case, as shown in the following
proposition:
PROPOSITION 1. For each t [member of] {1, 2, ..., T}, the loss
function is minimized by ranking teams as follows:
(1) [E.sub.t]([r.sub.i]) > [E.sub.t]([r.sub.j]) [right arrow]
[[??].sub.i,t] [greater than or equal to] [[??].sub.j,t].
Proof: Suppose not. Then there exists x, y, t such that
[E.sub.t]([r.sub.x]) > [E.sub.t]([r.sub.y]), and [[??].sub.x,t] <
[[??].sub.y,t] minimizes the loss function. The loss function can be
written as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. As
[[??].sub.x,t] and [[??].sub.y,t] minimize the loss function, it must be
true that
(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
which is a contradiction. This proves the result.
Thus, the objective function implies teams should be ranked in
order of expected final rank. To be clear, this objective function, and
Proposition 1, are just tools for translating subjective probability
distributions into ranks. The objective function may appear odd given
the true ranks proxy. If one was to interpret the proxy literally--that
the final ranks are literally true ranks--it would imply voters could
strategically manipulate their final ranks so as to optimize the
function. But I do not claim final ranks are literal truth; a
voter's final ranks are just her best guess at truth, in the same
way earlier ranks are the best guess at the time. Voters may rank teams
this way because of intrinsic incentives or (as discussed above) career
concerns incentives to rank teams accurately, which likely include the
incentive to avoid being accused of ranking manipulation. In fact, the
possibility of manipulation may explain why the AP does not instruct voters to rank teams based on their expected final ranks--if the AP made
this explicit, it could increase suspicion of manipulation. I should be
clear though that this framework is not claimed to perfectly capture
voter behavior and mindsets. It is intended just to be a coherent model
of their behavior that makes the subsequent empirical analysis
tractable.
Voters thus determine their optimal weekly ranks by updating their
beliefs about the true ranks using as much information as possible.
Attention is restricted to game result information, because this is the
information that is usually most important and is always publicly
observed. Let [s.sub.ij] be the points scored by home team i minus
points scored by away team j (i wins if and only if [s.sub.ij] > 0).
This variable has no time subscript because teams almost never play each
other more than once.
Let g([s.sub.ij]|[r.sub.i], [r.sub.j]) be the conditional
probability that the game between teams with true ranks [r.sub.i] and
[r.sub.j] results in score [s.sub.ij] (the conditional signal
probability). Let [f.sub.i,t]([r.sub.i]) be the subjective probability
that team i has true rank [r.sub.i] in week t. (f() is the prior;
[r.sub.i] is only indexed by i for clarity in the Bayesian updating
formula below.)
After team i plays j, [s.sub.ij] is observed and voters can update
their beliefs to [f.sub.i,t+1]([r.sub.i]|[s.sub.ij]),
[f.sub.j,t+1]([r.sub.j]|[s.sub.ij]). Technically, if beliefs about team
i's rank change, beliefs about at least one other team's rank
also must change. That is, [for all]k [not equal to] i, j, the voters
update [f.sub.k,t+1](r|[s.sub.ij]). However, because these effects are
minimal ! ignore them. Similarly, I make the simplifying assumption that
[f.sub.i,t]([r.sub.i]|[r.sub.j]) = [f.sub.i,t]([r.sub.i]), [for all]j
[not equal to] i.
Voters know g([s.sub.ij]|[r.sub.i], [r.sub.j]) from their
observation of years of historical scores and true rankings. Voters can
thus use a fairly straightforward application of Bayes' rule to
update beliefs. For example, suppose the team indexed 10 hosts a game
against team 11 and we are interested in the posterior probability that
team 10 has true rank 1: [f.sub.10,t+1](1|[s.sub.10,11]). Using
Bayes' rule, this is equal to the probability of [s.sub.10,11]
given [r.sub.10] = 1, g([s.sub.10,11]|[r.sub.10] = 1), times the prior
that team 10 has true rank 1, [f.sub.10,t](1), divided by the
unconditional score probability, g([s.sub.10,11]). The first g() term
depends on beliefs about the true rank of team 11, and the second
depends on beliefs about the true ranks of both teams 10 and 11,
specifically [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
In general, the formula for Bayesian belief updating is:
(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
C. Estimation Methodology
The estimated Bayesian posteriors are constructed using Equation
(3) and Proposition 1 to translate these distributions into rankings.
The procedure requires estimates of both of the components of Equation
(3), the f's and g's. The Supporting Information provides a
detailed discussion of how these are obtained; the remainder of this
subsection may be skipped without loss of continuity as well. To
summarize, I use empirical frequencies to estimate both sets of
distributions, but am forced to make several methodological compromises
because of data limitations. First, I coarsen the support for both the
f's and g's; second, I condition on aggregate final rank for
the g's; third, I assume all voters have the same f for each prior
rank; fourth, I use the 2006-2008 data to estimate the f's (the
same data that the analysis is conducted on). The first and third are
approximations and should not introduce any systematic bias. The second
may cause the estimated g's to be too "tight"; if so,
this would cause the signals to appear too informative, and the
estimates to move too far from the priors. This would bias the results
toward findings of underreaction. However, because it turns out the
aggregate and individual rankings are very similar in the final poll (as
a robustness check shows) this bias should be minimal. (24) The fourth
issue implies future data are used to estimate current priors, so the
estimated priors are based on information obviously unavailable to the
voters. This method should not be problematic if the priors are stable
from year to year. Then future final rank frequencies would be unbiased
estimates of past frequencies, which voters do observe, and I do not.
Still I check robustness by conducting a separate analysis on the 2008
data only with priors estimated from the 2006-2007 data. The robustness
checks reported in Section V also are not subject to this issue.
The other issue that needs to be addressed before constructing the
estimates is the limited number of teams that are actually ranked (the
fact that the voters only rank 25 out of approximately 120 Division I-A
teams). As most games are between ranked and unranked teams, some
objective method of distinguishing among unranked teams is needed. I use
three publicly observable variables to do this: (1) currently ranked by
at least one other voter, (2) ranked by at least one voter in final AP
poll in one of previous two seasons, and (3) ranked by at least one
voter in final AP poll in one of previous three to five seasons. I also
condition on YTD number of losses (0 vs. >0 in Weeks 1-3; 0-1 vs.
> 1 in Weeks 4+) for teams not currently receiving votes from another
voter. This expands the cardinality of the set of elements [r.sub.i,t]
is in to 32, in which [r.sub.i,t] = 26 means team i receives at least
one vote from others in week t, [r.sub.i,t] = 27 means team i does not
receive any votes but was ranked in one of two previous seasons and has
zero losses, and so forth.
This method of distinguishing among unranked teams is not
sufficient for accurately estimating posterior beliefs for unranked
teams. Consequently, I only estimate posterior beliefs and rankings for
teams that are currently ranked. This forces a need to account for the
fact that several teams do indeed drop from the rankings for most voters
from 1 week to the next. I do this by restricting the maximum (worst)
estimated posterior rank to one greater than the number of teams that
are observed to stay in the poll, by voter week. I also re-rank observed
posteriors among teams that were in the prior poll, and assign the same
maximum rank (one greater than the number of teams that stay in the
voter week's top 25) to teams that drop from the empirical
rankings. This allows comparisons between Bayesian and observed
posteriors to be apples-to-apples, unconfounded by teams entering the
polls at various rank levels. (25) Finally, because teams are ranked in
order of expected rank I use the value 35 for expected rank conditional
on being unranked. (26)
IV. ANALYSIS
A. Validity of the Estimated Posteriors
The validity of the estimation procedure and results can be
assessed before using them to test over/underreaction, as a natural
measure of the posteriors' accuracy--their distance from the final
rankings--is observable. As the estimated posteriors are conditioned on
limited information, if they are at least as close as the observed
posteriors, this would be strong evidence the estimated posteriors use
the information they are conditioned on efficiently, as compared to how
the observed posteriors use the same information. That is, it would be
evidence the estimated posteriors are reasonably unbiased and precise
(valid) estimates of the Bayesian responses to the information they
incorporate. The distances for observed priors and flat priors are also
reported for comparison. The observed priors are the voters'
rankings prior to the game results, and the flat priors are equal prior
rankings for all teams (the average rank).
Distance is measured using mean absolute deviation (MAD): 1/n
[[summation].sub.i,t,v] [absolute value of [[??].sup.v.sub.i,t+1] -
[r.sup.v.sub.i]]; n is the number of observations (results are similar
using other metrics). (27) Table 3 presents summary statistics, split
out by game result type. The estimated posteriors are actually closer to
the final ranks than the observed posteriors for each type of game
results. In addition, it appears voters at least approximate Bayesian
methods as the observed posteriors are slightly more accurate than the
observed priors, and observed priors are substantially more accurate
than the flat priors.
Because observations are correlated across voters by game, I
formally test the differences in MADs separately by voter, using paired
t tests. The MADs for the estimates are lower for 94 of the 121 voters.
This means the estimates out predict the voters' own final ranks
for a large majority of voters. The null can be rejected in favor of the
estimated posteriors being more accurate at the 5% level for 31.4% of
the voters. The null is not rejected at 5% in favor of the observed
posteriors being more accurate for any voters. Given the limited
information the estimates are conditioned on, this is strong evidence
the estimates use the information they do incorporate efficiently,
supporting the estimates' validity.
B. Hypothesis Testing
Summary statistics for the estimated and observed responses to game
results are presented in Table 4, categorized by the most basic game
result types--win and loss (for simplicity I ignore byes)--and broken
out by prior rank categories. The overall mean estimated and observed
rank changes are similar, implying that on average the voters do not
particularly under- or overreact. However, there are some stark
contrasts within some of the specific rank groups. In particular, the
estimated and observed responses to wins are very different for teams
ranked 21-25; the estimated improvement is 4.9 spots while the observed
improvement is only 2.66 spots. The responses to losses are most
different for the top five teams; the mean estimated rank decline for
them is 5.68 spots, while the mean observed decline is 7.66 spots. These
statistics indicate the voters underreact to wins by low-ranked teams,
and overreact to losses by top-ranked teams. However, this does not
account for potentially confounding factors, or the correlation in
observations in the aggregated sample (the repetition of games across
voters).
To control for these things and estimate the determinants of
over/underreaction, I construct a simple measure of overreaction, and
regress it on a vector of covariates, with standard errors clustered by
game. The measure of overreaction is intended to measure excess rank
improvement following positive information and excess rank decline
following negative information. Underreaction is the opposite. Thus,
under/overreaction are not defined in the absence of a signal (weeks in
which teams have byes), and need to be defined differently depending on
the nature of the signal. I define the nature of the signal in the
simplest possible way that is agnostic to my construction of the
estimated posteriors: wins are positive signals, and losses negative.
The variable overreaction (OVER) is defined thus as estimated posterior
minus observed posterior after wins, and observed minus estimated
posterior after losses. The intuition for this is when OVER for a team
is positive after a win, then the observed posterior must be better than
the estimated posterior, indicating the voter overreacted to the win;
the intuition is similar for underreaction and losses. (28) OVER is then
used in the following regression equation, estimated separately for
games in which the ranked team wins and loses (as will be discussed
shortly the hypothesized coefficient signs depend on whether the game
was won/lost):
(4) [OVER.sub.ivts] = [X.sub.ivts] [beta] [[delta].sub.v] +
[WEEK.sub.t] * [[delta].sub.v], + [[gamma].sub.s] +
[[epsilon].sub.ivts].
i, v, t, and s denote team, voter, week, and season, respectively;
[[delta].sub.v] is a voter fixed effect (FE) and [[gamma].sub.s] is a
season FE. X is a vector of controls. These are defined in Table 5, and
also include TOP1_5, TOP6_10, TOP11_15, and TOP16_20, which are dummies
for team i being in the top 1-5 (for voter [upsilon] in t-s), and so
forth (top 21-25 is omitted). The voter FEs are used as controls to
account for voter-specific variation in priors or tendencies to over- or
underreact. The interaction of the voter FE and week is used to account
for possible heterogeneity of ranking definitions. If voters weigh
season performance and quality differently, their responses to games
might vary over time. (29) Summary statistics for variables used in the
regressions are presented in Table 5.
If the voters were in fact Bayesian, the variables the estimates
are conditioned on (variables i = 1-4 from Table 5) would not affect the
estimated and observed posteriors in systematically different ways. That
is, these variables would have no effect on OVER, thus, the null
hypothesis of Bayesian updating implies [[beta].sub.i] = 0 for i = 1,
..., 4. The other variables may affect OVER despite Bayesian updating,
because the estimates are not conditioned on them. The primary
alternatives to Bayesian updating are overreaction and underreaction. I
specify the alternative hypothesis of overreaction for each of these
variables and the rank-group dummies below; the alternative hypothesis
of underreaction would imply each coefficient takes the opposite sign.
Overreaction Hypotheses:
(1) For wins, [[beta].sub.1] < 0, [[beta].sub.2],
[[beta].sub.3], [[beta].sub.4] > [[beta].sub.TOP1_5] <
[[beta].sub.TOP6_10] < ... < [[beta].sub.TOP16_20] < 0.
(2) For losses, [[beta].sub.1] > 0, [[beta].sub.2],
[[beta].sub.3], [[beta].sub.4] < 0, [[beta].sub.TOPI_5] >
[[beta].sub.TOP6_10] > ... > [[beta].sub.TOP16_20] > 0. (30)
Part 1 can be explained as follows. Winning a game is positive
information, which in general causes rank to improve. Additional
positive information would cause the rank to improve excessively if the
voters overreact. If a win occurs away (HOME = 0), this is additional
positive information, because winning is less likely on the road for
teams of worse true rank. Thus, if voters overreact to home status for
wins, then OVER would be lower when HOME = 1, so [[beta].sub.1] would be
negative. If voters underreact to home status for wins, OVER would be
greater when HOME = 1 and [[beta].sub.1] > 0. Similarly, if voters
overreact to additional positive information, their rank improvements
will be greater when the score margin is high, the opponent is ranked,
and margin of victory over ranked opponent is high. The coefficients for
the rank-group dummies are predicted to be increasing as the team's
prior rank worsens since, if voters overreact in general, they will
improve ranks excessively after all wins. However, they will do so to a
greater extent for worse ranked teams because they have further to
potentially move up, due to the censored nature of the data. Part 2 of
the overreaction hypotheses is analogous; negative information relating
to losses would cause voters to worsen the losing team's rank
excessively, if the voters overreact.
C. Results
Selected estimation results are presented in Table 6. (31) HOME is
significant at the 1% level for almost all specifications; the estimates
for both wins and losses samples imply voters do not appreciate the
importance of home-field advantage. SMARGIN is negative and highly
significant for wins, but close to 0 for losses. This implies voters are
insensitive to margin of victory but not margin of loss. Voters are more
responsive to margin of victory when the opponent is ranked. The rank of
the opponent does not have a significant effect on loss responses.
In the wins models, the rank-group dummies are highly significant
and of large magnitude for top 20 teams, indicating voters relatively
underreact to wins by the lowest ranked teams. In the losses models, the
top 1-5 dummy's coefficient stands out as it has a large, positive
coefficient, significant at 5% for the preferred specification, and the
top 11-15 dummy has a negative coefficient of smaller magnitude but
significant at 1%. These results indicate voters overreact to losses by
top 1-5 teams, especially relative to their reactions to losses by top
11-15 teams. The other control variables are mostly insignificant;
AGGRKDIFF is the only one with consistently strong economic and
statistical significance, indicating the voters are influenced by their
peers. The results for robustness specifications (2) and (3) are very
similar to those of the preferred model. The results for the regressions
estimated only on the 2008 sample are similar to the other results but
have substantially higher standard errors.
In summary, the results do not uniformly support the null or either
alternative hypothesis (underreaction or overreaction). The
[[??].sub.1], [[??].sub.2], and [[??].sub.3] wins-model estimates
support underreaction, but the [[??].sub.4] estimate supports
overreaction. The [[??].sub.1] losses-model estimate supports
underreaction, but the insignificant and small [[??].sub.2],
[[??].sub.3], and [[??].sub.4] estimates support the null. In addition,
the rank-group coefficients are not fully consistent with any
hypothesis.
D. Interpretation
The fact that the non-Bayesian mistakes go in opposing directions
is not surprising, given that both overreaction and underreaction
behavior have been found in previous literature. In this subsection, I
discuss a possible common thread among these seemingly inconsistent
results. I realize this exercise may have the flavor of post-hoc
rationalization. The discussion is still presented because there does
appear to be one particular factor, saliency, that has a great deal of
explanatory power, and is supported by previous literature, so it could
have reasonably been discussed as part of the a priori theory.
Voters underrespond to HOME and SMARGIN for wins. These variables
are arguably salient, in that the sports media of course often discuss
home-field advantage and the final score of games. But they are still
second order, and thus relatively non-salient, as compared to the binary
outcome of win/loss. Voters seem to update ranks in a similar way for
all wins, whether they occur at home or on the road, or by wide or small
margins. (32) In fact there is a saying in sports that "a win is a
win," meaning that athletes, coaches, and commentators ignore
negative features of wins (such as the score being close). The poll
voters appear to use a heuristic like this, despite the fact that these
factors do affect the estimated posteriors, indicating they are
informative regarding final rank. The results for OPPRANK and OR_SMARGIN
indicate voters generally do not take full account of the quality of
opponent, which is also less salient than the win/loss outcome, but are
more responsive to margin of victory when the win occurs against a
ranked opponent. Margin of victory is plausibly more salient for games
involving ranked opponents, because these games receive more attention.
There is no analogous saying that "a loss is a loss," and the
voters indeed are more responsive to home status and score margin for
losses. That estimated overreaction to score margin of loss is close to
zero for models (1)-(3) suggests that the voters are capable of revising
beliefs in a sophisticated way.
A natural question that now arises is why are losses and wins
against ranked teams relatively salient. One reason might be that they
happen relatively infrequently, which, unto itself, would not justify
the differences in voter responsiveness. It is also possible though that
these more salient signals are actually more informative than other
signals. I test for this possibility in a simple way: regressions with
dependent variables of rank in final poll, and ranked/not ranked in
final poll, on a single independent variable, estimated Bayesian
posterior rank, separately for teams that lost, teams that beat ranked
teams, and teams that beat unranked teams. The estimated coefficients
and adjusted [R.sup.2] values are substantially higher for teams that
lost and those that beat ranked teams. This implies those signals indeed
are more predictive of final rank, meaning they contain more relevant
information, than wins over unranked teams. (33) This means that if
voters face the same effort costs in calculating their responses to all
signals, it would be rational to exert more effort in responding to the
more salient signals. Alternatively, voters may feel that minimizing
rank revisions may enhance reputation or ego, as it indicates having
strong priors, and so the benefit of unresponsiveness may be greater for
more vague, less salient, game results.
Another question is what explains the variation in the rank-group
coefficient estimates. One explanation is that the voters do not
appreciate the differences in the precision of priors noted above. As
these differences are subtle, this can also be interpreted as the voters
failing to appreciate less-salient information. As discussed in Section
III.A, the priors for top 1-5 teams are substantially stronger than
those of teams ranked just below them, but the priors for teams ranked
11-25 are fairly similar. Consequently, the Bayesian response to a loss
by a top 1-5 team should be small, and the response to a loss by a top
11-15 team should be large. The response to a win by a top 21-25 team
should also be large, compared to responses to wins by other teams,
because the teams ranked just better than 21-25 teams are not ex ante
much better. If, on the other hand, the voters think of the priors as
uniformly precise, they will treat losses and wins by all teams
similarly. This would cause voter reactions to wins by top 21-25 teams
to be too small, reactions to losses by top 1-5 teams to be too big, and
reactions to losses by top 11-15 teams to be too small--which is exactly
what occurs.
An alternative explanation for overreaction to losses by very top
teams is that voters put too much weight on these signals because they
are more unusual. However, this would imply voters should overreact to
losses by top 11-15 teams relative to losses by worse ranked teams,
which is clearly not the case. (34) An alternative explanation for
underreaction to wins by top 21-25 teams is that voters underreact to
wins in general, and this underreaction is most pronounced for the
lowest ranked teams simply because they have the furthest to rise in the
polls. But this would imply voters should underreact to wins by top
11-15 teams relative to wins by top 10 teams, which does not occur. A
final alternative explanation I explore is that voters react to aspects
of the signal, such as HOME and SMARGIN, differently for teams with
different prior ranks, which could cause the rank-group dummy estimates
to vary. To investigate this, I estimated the models separately for each
rank group and examine the constants; they indeed vary in a way
consistent with voters not appreciating prior precision variation. (35)
V. ROBUSTNESS CHECKS
A. Game Result Forecast Errors
In this subsection, I check whether the results hold using a
different assumption for the voters' objective function--that they
attempt to rank teams such that, ceteris paribus, higher ranked teams
are likely to perform better in future games than lower ranked teams.
Under this assumption, if historical information from earlier in the
season on ranks or game performances was predictive of future game
results conditional on current rank, current rank of opponent, and other
appropriate controls, this would imply current ranks do not incorporate
the historical information efficiently. That is, if voters make
systematic game result forecast errors that are correlated with
particular types of historical information, this would provide
alternative evidence voters responded to that information in
non-Bayesian ways. One reason this assumption is not used for the main
analysis is it does not allow ranking criteria to be heterogeneous;
another is that game forecast errors could be caused by incorrect prior
ranks, in addition to incorrect rank updating. The main analysis is
agnostic to the accuracy of prior rank, allowing for a focus on the
updating process. (36)
The three strongest results regarding updating mistakes presented
in Section IV are that voters are underresponsive to score margin for
wins and to home status in general, and overresponsive to losses by top
five teams. (37) To test whether these factors are associated with
forecast errors, I separately regress two measures of game results,
SMARGIN and a binary win variable, on variables representing the ranked
team's history of home status (HIST_HOME), history of scores for
wins (HIST_WSM), and history of being ranked in the top five
(HIST_TOP5), along with controls for rank, opponent rank, current game
home status, STATE, REGION, and week and voter FEs.
HIST_HOME is defined as the number of home games minus the number
of away games the team has played prior to the current game. If voters
underrespond to home status then teams that play more home (away) games
will be overrated (underrated), so this variable should have a negative
coefficient. HIST_WSM is defined similarly (the sum of points minus
opponent's points, for games the team won only), but the hypothesis
goes the other direction--if voters underrespond to score margin, then
this variable should have a positive coefficient. HIST_TOP5 is the
number of weeks the team was ranked in the top five; it should be
positively associated with future game results for teams that were
downgraded from the top five if voters overreacted to negative signals
for those teams. To account for this, I estimate the models for both the
full sample, and a subsample of teams ranked six or worse. HIST_TOP5
should have a much stronger effect for the latter. I also estimate the
models separately for a subsample of games in which the opponent is
ranked by at least one voter, because quality of opponent can be
controlled for more precisely for these games. (38)
Results are presented in Table 7. Results for HIST_WSM are
consistently positive and significant, indicating that current ranks do
not sufficiently account for historical margin of victory. HIST_TOP5 is
positive and significant for the models that exclude teams currently
ranked in the top five, which supports the conclusion that voters
overreact to negative signals for top five teams. However, home status
is consistently insignificant. Thus, only two of the three hypotheses
are supported; the evidence for one is inconclusive.
B. The 1991-2005 Aggregate Polls
While the individual voter ballots prior to 2006 are unavailable,
the aggregate poll data are publicly available for all seasons. Despite
their limitations, the aggregate data can be used to check that the main
trends found above hold for other seasons, and using other analytical
methods. In this subsection, I use very simple tests to verify the main
results from above (underreaction to home status and score margin for
wins, and overreaction to losses by top five teams). This robustness
check is conducted by comparing sample mean posterior and final ranks
for: teams that play at home versus on the road, top 21-25 teams that
win by more than 10 points versus top 16-20 teams that win by less than
10 points, and top 1-5 teams that lose versus top 6-10 teams that win.
If voters underrespond to home status, teams that play at home will tend
to have better posterior ranks because their game performances will be
inflated by the home advantage, but will have no better final ranks, as
being the home team does not make a team better in the long run. If
voters underrespond to score margin, teams ranked 16-20 who win by small
margins will have better posteriors than teams ranked 21-25 who win by
large margins, but the difference in final ranks should be smaller. If
voters overrespond to losses by top 1-5 teams, their posteriors will be
worse than those of winning top 6-10 teams, but the difference in final
ranks will be smaller. These results are reported in Table 8.
All of the table's results support the conclusions of Section
IV. Teams that play their next game at home have better next-week
posterior ranks, but insignificantly different final ranks. (39) Top
21-25 teams that win games by large margins have worse posterior ranks
than top 16-20 teams that win by small margins, but the teams have
insignificantly different final ranks. Finally, top 1-5 teams that lose
have worse posterior ranks than top 6-10 teams that win, but the teams
have insignificantly different final ranks.
VI. CONCLUDING REMARKS
This article presents extensive evidence that real-world agents
with extensive experience make belief updating mistakes similar to those
committed by laboratory subjects. The agents also exhibit behavior
consistent with estimated Bayesian behavior in some circumstances. A
simple but powerful explanatory factor driving the different results
seems to be salience. The voters' responses to the most salient
aspects of the most salient signals (score margin of losses) are
estimated to be Bayesian. Other, less-salient aspects of the signals,
which are still informative, tend to be ignored. These results suggest
that, given their experience, the voters would fare relatively well if
faced with a simple belief updating task similar to the standard one
faced by experimental subjects. It is somewhat, but not entirely, simply
the overwhelming complexity of updating a ranking of 25 teams in
response to dozens of multi-dimensional signals (game results) that
causes the voters to rely on heuristics and ignore relevant, but less
salient, information.
The results also show the voters are unaware of subtle differences
in prior strength across the top 25 teams. Both under/overreaction
sometimes result from this unawareness. These results are similar to and
in line with experimental work on confidence and stability of systems.
It is hard to justify these mistakes with information processing costs,
or reputation concerns, however.
While the patterns found in the data studied in this article are
strong, they are ultimately very broad, and require validation in other
contexts. Using experiments and looking for other field data sources to
confirm the relationship between salience and belief updating, analyzing
individual-level heterogeneity (as in, e.g., El-Gamal and Grether 1995)
and the structural relationship between salience and belief updating at
a deeper level, and applying these results to the study of real-world
economic phenomena are important directions for future research.
doi: 10.1111/j.1465-7295.2011.00431.x
ABBREVIATIONS
AP: Associated Press
BCS: Bowl Championship Series
FE: Fixed Effect
MAD: Mean Absolute Deviation
YTD: Year-To-Date
REFERENCES
Amir, E., and Y. Ganzach. "Overreaction and Underreaction in
Analysts' Forecasts." Journal of Economic Behavior and
Organization, 37(3), 1998, 333-47.
Barberis, N., and R. Thaler. "A Survey of Behavioral
Finance." Handbook of the Economics of Finance, 10, 2003, 6-12.
Cai, H., Y. Chen, and H. Fang. "Observational Learning:
Evidence from a Randomized Natural Field Experiment." The American
Economic Review, 99(3), 2009. 864-82.
Camerer, C. "Does the Basketball Market Believe in the Hot
Hand?" The American Economic Review, 79(5), 1989, 1257-61.
Chetty, R., A. Looney, and K. Kroft. "Salience and Taxation:
Theory and Evidence." The American Economic Review, 99(4), 2009,
1145-77.
DellaVigna, S. "Psychology and Economics: Evidence from the
Field." Journal of Economic Literature, 47(2), 2009, 315-72.
Dominitz, J. "Earnings Expectations, Revisions, and
Realizations." Review of Economics and Statistics, 80(3), 1998,
374-88.
El-Gamal, M., and D. Grether. "Are People Bayesian? Uncovering
Behavioral Strategies." Journal of the American Statistical
Association, 90(432), 1995, 1137-45.
Epstein, L., J. Noor, and A. Sandroni. "Non-Bayesian
Learning." The BE Journal of Theoretical Economies, 10(1), 2010,
article 3.
Goff, B. "An Assessment of Path Dependence in Collective
Decisions: Evidence from Football Polls." Applied Economics, 28(3),
1996, 291-97.
Gonzalez, R., and G. Wu. "On the Shape of the Probability
Weighting Function." Cognitive Psychology, 38(1), 1999, 129-66.
Grether, D. "Bayes Rule as a Descriptive Model: The
Representativeness Heuristic." The Quarterly Journal of Economics,
95(3), 1980, 537-57.
Griffin, D., and A. Tversky. "The Weighing of Evidence and the
Determinants of Confidence." Cognitive Psychology, 24(3), 1992,
411-35.
Holt, C., and A. Smith. "An Update on Bayesian Updating."
Journal of Economic Behavior and Organization, 69(2), 2009, 125-34.
Kraemer, C., and M. Weber. "How Do People Take into Account
Weight, Strength and Quality of Segregated vs. Aggregated Data?
Experimental Evidence." Journal of Risk and Uncertainty, 29(2),
2004, 113-42.
Lebovic, J., and L. Sigelman. "The Forecasting Accuracy and
Determinants of Football Rankings." International Journal of
Forecasting, 17(1), 2001, 105-20.
Levitt, S. "Why Are Gambling Markets Organised So Differently
from Financial Markets?" The Economic Journal, 114(495), 2004,
223-46.
Levitt, S., and J. List. "What Do Laboratory Experiments
Measuring Social Preferences Reveal about the Real World?" Journal
of Economic Perspectives, 21(2), 2007, 153-74.
Logan, T. "Econometric Tests of American College
Football's Conventional Wisdom." Applied Economics, 43, 2010,
2493-518.
Massey, C., and G. Wu. "Understanding Under- and
Overreaction." The Psychology of Economic Decisions, 2, 2004,
15-29.
Nisbett, R., and L. Ross. Human Inference: Strategies and
Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice Hall,
1980.
Nutting, A. "And After That, Who Knows?: Detailing the
Marginal Accuracy of Weekly College Football Polls." Journal of
Quantitative Analysis in Sports, 7(3), 2011, 1274.
Sloman, S., P. Fernbach, and Y. Hagmayer. "Self-Deception
Requires Vagueness." Cognition, 115, 2010, 268-81.
Surowiecki, J. "Running Numbers." The New Yorker, January
21, 2008.
Tversky, A., and D. Kahneman. "Judgment under Uncertainty:
Heuristics and Biases." Science, 185(4157), 1974, 1124-31.
Zafar, B. "How Do College Students Form Expectations?"
Journal of Labor Economics, 29(2), 2011, 301-48.
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online
version of this article:
Appendix S1. An illustrative model of rank updating.
Appendix S2. Testing constant team qualities.
Appendix S3. Estimation of score distributions.
Appendix S4. Estimation of prior distributions.
(1.) See, for example, Tversky and Kahneman (1974), Grether (1980),
Massey and Wu (2004), and Holt and Smith (2009). Della Vigna (2009)
provides a thorough review of field evidence of deviations from rational
behavior, but does not refer to any studies that focus on belief
updating. There is a strand of the literature on belief updating that
analyzes survey data, such as Zafar (2011) and Dominitz (1998). These
studies, while non-experimental, lack the data to directly test Bayesian
updating.
(2.) There are a number of other fairly well-known criticisms of
experimental work. Levitt and List (2007) provide an interesting
discussion of some of these issues, including self-selection of
subjects, small stakes, self-consciousness, inability to confer with others, and insufficient time to make optimal decisions.
(3.) Several other academic studies have used the AP Top 25 as a
data source, including Goff (1996), Lebovic and Sigelman (2001), and
Logan (2010). None focus on analyzing the rationality of belief
updating.
(4.) Paul Montella, who manages the rankings for the AP, said in a
phone conversation in the summer of 2009: "There is no real
criteria for voting." While the voters are given a set of
guidelines before each season, they are limited in scope to avoid
imposing substantial structure on the voting. The guidelines are
discussed further below.
(5.) Of course, this discussion refers to efficient updating from
an ex ante, and not ex post, perspective. If the preseason number one
ranked team loses its first game, wins all subsequent games, and
finishes the season the consensus number one, lowering its rank after
the initial loss is only (very likely) optimal ex ante, and not
conditional on knowing the team is the "true" number one.
(6.) These results are consistent with those of Levitt (2004), who
found gamblers also do not pay full heed to home status and score
margin.
(7.) I follow the use of the term saliency given by Cai, Chen, and
Fang (2009). They say, "The term 'saliency' is widely
used in the perceptive and cognitive psychology literature to refer to
any aspect of a stimulus that, for whatever reason, stands out from the
rest."
(8.) Nutting (2011) documents this phenomenon in detail; his
article includes a reference to a very apt quote made in 1989 by United
Press International sports editor Fred McMane: "I don't think
there are 25 good teams in the country. I think you generally see five
good teams, 10 who are fairly good, and after that, who knows?"
(9.) These results are consistent with those of Massey and Wu
(2004), who find overreaction is relatively likely in "stable
systems" (situations with relatively precise priors).
(10.) See, for example, Nisbett and Ross (1980) and Surowiecki
(2008).
(11.) They write, "If a data sample (signal) is representative
of an underlying model, then people overweight the data. However, if the
data is not representative of any salient model, people react too little
to the data."
(12.) Amir and Ganzach (1998) analyze over/underreaction with field
data, and analyze salience of the prior, but not the signal. Both
overreaction and underreaction may be caused by a host of more specific
biases identified in the psychology literature; for example, the
availability (overreaction) and the anchoring (underreaction) biases.
Although this article does not focus on any of these particular biases,
it supports the idea that salience may be a key factor that helps to
reconcile them. It is also worth noting that Griffin and Tversky (1992)
hypothesized and showed evidence that when a signal was
"strong" (extreme) but of low "weight" (low
credibility), it would cause overreaction, and vice versa for low
strength, high weight signals. Kraemer and Weber (2004) showed that this
relationship was reversed if the weight of information was presented
clearly. Again, the unifying factor may be salience--that normally
strength is more salient than weight, but if weight of information or
priors is presented relatively salient, it may indeed be overreacted to.
(13.) The voters do obtain other information about each team
besides game scores, but this information has a small impact on the
rankings, especially on a week-to-week basis.
(14.) For example, Dominitz (1998) analyzes revisions in beliefs
about future earnings using survey data. Although he has data on
subjective distributions of future earnings (priors) and the earnings
realizations from an intermediate point in time (signals), he does not
have sufficient information to estimate whether the observed belief
revisions are actually Bayesian. This is mainly because of a lack of
data on the subjective distributions of the signals. In fact, the author
explicitly comments on this issue--the difficulty of analyzing belief
updating using field data--saying: "This component of the analysis
calls attention to the breadth of data required to assess the
responsiveness of expectations to new information."
(15.) The historical aggregate AP polls and 'Others Receiving
Votes' (teams receiving some votes whose point totals were not in
the top 25) are from appollarchive.com and The [Baltimore] Sun.
Historical score data are from
http://homepages.cae.wisc.edu/dwilson/rsfc/history/howell/ and
http://www.knology.net/jashburn/football/archive/ (URL as of May 5,
2009).
(16.) See, for example, http://sports.espn.go.com/ncf/
news/story?id=2663882 and http://pollspeak.com/.
(17.) Voters are given the following guidelines before each season,
which are intentionally left ambiguous and open to interpretation
according to Paul Montella of the AP: "Base your vote on
performance, not reputation or preseason speculation. Avoid regional
bias, for or against. Your local team does not deserve any special
handling when it comes to your ballot. Pay attention to head-to-head
results. Don't hesitate to make significant changes in your ballot
from week to week. There's no rule against jumping the 16th-ranked
team over the 8th-ranked team, if No. 16 is coming off a big victory and
No. 8 just lost 52-6 to a so-so team." The first line (which may
seem incongruous as there is a preseason poll) was added for the 2008
season, and Montella said it was not indicative of a policy change, but
just meant to encourage the voters to be more responsive to game
results. This is consistent with my finding that the voters are often
underresponsive to game information. Results for the 2008 season are
similar to those from earlier seasons regardless.
(18.) The terms "season-long performance" and
"quality" are admittedly somewhat vague. It is not necessary
for these terms to be defined precisely here, but performance can be
thought of as referring to the realization of game results, and quality
as referring to the unobserved team-specific distribution of game
results (probability of winning).
(19.) Another criterion the rankings might appear to be plausibly
based on is year-to-date (YTD) performance. This would be problematic,
however, because of the existence of a preseason poll. As there is no
YTD performance at that point, and a poll exists, the poll cannot be an
assessment only of performance that has been observed. It follows that
mid-season polls can also not be based purely on YTD performance. The
data bear this out as voters clearly do not rank teams purely on YTD
performance in early season polls, as, for instance, teams with two wins
and one loss are often ranked ahead of teams with three wins and no
losses. Similarly, if the weights placed on YTD performance varied
throughout the season, the rankings criteria would be time inconsistent
and this would be another form of deviation from rationality.
(20.) I thank Andrew Nutting for providing the data set used for
this analysis. The data sets used for the paper's main analysis do
not include ranks on teams for each week throughout the seasons, only
the first half of each season and final ranks. I do not find evidence of
precision increasing substantially in just the first half of seasons.
(21.) I use aggregate ranks for these tests due to lack of
historical individual rank data.
(22.) If voters believed team qualities were changing, they would
appear to overreact but would be making mistakes qualitatively distinct
from basic misuse of Bayes' rule.
(23.) I do test for, and find significant, the effects of
differences between individual and aggregate ranks on individual rank
changes. This is likely large because of social learning. The data
indicate that indeed the voters do not attempt to rank teams as closely
to the aggregate ranks as possible. For example, in the first poll of
2006 Ohio State received the majority of first-place votes: 35 of 65. In
the second poll, after a strong opening win, Ohio State received 39
first-place votes. If voters were simply trying to match the aggregate
rankings, more than four of them would have switched their first-place
vote.
(24.) This method is used for all of the specifications reported in
Table 6. It is clearly not problematic (actually it is ideal) for the
robustness check that uses the aggregate polls as true rankings; it is
not ideal for the robustness check that uses computer rankings as truth.
But it is a reasonable approximation for this purpose as well, as the
final aggregate and computer polls are similar, and regardless should
not cause the same issue with the signals appearing too informative as
discussed above.
(25.) In other words, it allows the estimates to potentially
exactly match the observed data. To illustrate by example, suppose only
22 of 25 teams in voter l's Week 1 ballot are ranked in Week 2.
Suppose the teams ranked 19-21 in Week 1 dropped out of the voter's
top 25 and were replaced by new teams (teams unranked by that voter in
Week 1), so the ranks of teams ranked 1 -18 and 22-25 did not change. As
I know little about voter 1's beliefs about the new teams in the
poll (because they were unranked before they entered the poll) I ignore
them and adjust the observed Week 2 posteriors. I assign ranks 19-22 to
teams observed ranked 22-25, and 23 to the teams that dropped out. For
the Bayesian estimates, I assign rank 23 to all teams with estimated
rank 23 or higher. Hence, the estimated rankings can potentially be
exactly the same as the observed posteriors.
(26.) Results are similar for other values.
(27.) I adjust the rankings to account for number of teams, per
week and voter, not being in the final poll, in the same way that the
estimated posteriors are adjusted to account for number of teams in
observed posteriors and priors, as discussed in Section III.C.
(28.) This definition is straightforward and easily interpretable;
because of its simplicity, however, it does not allow for
"bad" wins or "good" losses, which certainly do
occur. I experimented with numerous other definitions of overreaction
that do account for these types of signals and found that they generally
do not result in substantially different results.
(29.) This would not imply the true ranks proxy is problematic.
Specifically, while the magnitudes of all voters' reactions to
signals should decrease as the season progresses (and beliefs become
more precise), the degree to which the reactions of voters who emphasize
performance decrease may be larger. This is because these voters'
belief revisions regarding future performance become less important as
the number of future games decreases. However, I do not expect this
difference to be substantial, as the sample is restricted to the first
half of the season and there is a large number of remaining games even
after the last week used in the analysis.
(30.) The other covariates may have significant effects on OVER for
reasons other than non-Bayesian ranking changes, thus, they are mainly
included to serve as controls, and are not the focus of discussion of
results. For example, voters may legitimately be influenced by AGGRKDIFF
via social learning, and while STATE and REGION seemingly should not
have any effect on ranking updates, if they did it would be for reasons
other than non-Bayesian updating.
(31.) Bootstrap standard errors are used because the dependent
variable is estimated. Results are similar with conventional estimates.
The coefficient estimates unreported are mostly insignificant.
(32.) It is actually somewhat amazing how insensitive the
voters' responses are to home status. The mean observed rank
improvement following home wins is 1.36 spots; the mean improvement
following away wins is 1.43 spots. In contrast, the respective estimated
Bayesian rank improvements are 1.21 and 2.55 spots.
(33.) Results not reported; available on request. I use estimated
posterior rather than observed posterior ranks as the independent
variable because if the observed posteriors do not respond appropriately
to the signals the tests would be invalid. But results are similar
either way.
(34.) Other alternative explanations, such as voters committing the
base-rate fallacy (in which priors are in general ignored) or
probability weighting (in which very high/low probabilities are
interpreted as being closer to 0.5; see Gonzalez and Wu 1999), also
should cause overreaction to losses by top 6-15 teams, relative to
losses by top 16-25 teams, and thus do not work.
(35.) The constant is large for the model estimated on the
subsample of losing top 1-5 teams, and small for the losing 11-15 and
winning 21-25 teams subsamples. This indicates over/underreaction
tendencies are generally different for these subsamples, and not just
driven by differences in the types of signals (e.g., score margins) or
responsiveness to signal characteristics across subsamples; results
again are unreported. The claim that voters do not appreciate
differences in prior precision is supported by the literature on
confidence (Griffin and Tversky 1992), which has shown that while people
are in general overconfident, they are more so when facing a difficult
task, and people tend actually to be underconfident when facing easy
tasks. In the AP poll context, ranking top 11-25 teams is difficult,
which would make voters overconfident and use priors that are too
precise, causing underreaction to signals. Ranking top 1-5 teams is
easy, making voters underconfident, thus causing overreaction to
signals.
(36.) The new objective function assumption is not inconsistent
with the true rankings assumption used for the main analysis; as
discussed in Section III.A, if voters only ranked teams on quality, with
quality defined as likelihood of winning, the true rankings assumption
is still valid given the evidence that quality does not vary
substantially throughout the season. I also note forecast errors for the
original assumption could be defined as the differences between final
and posterior ranks. These are not analyzed because they are not clearly
observed for many teams, because voters typically only rank in their
final top 25 around 15 of the teams currently ranked. That is, for 10 of
the ranked teams in each voter's Week 1-7 polls, the final rank is
unobserved, meaning it could be anything from 26 to 120. Game results
are observed for all ranked teams that play games in each week, which is
the vast majority.
(37.) The specification (1) results imply voters react by over
three spots more than they should to losses by top 1-5 teams relative to
top 11-15 teams. The result that voters underreact to wins by top 21-25
teams is also strong, but of somewhat lower magnitude, and is more
difficult to verify in this context.
(38.) Rank and opponent rank are controlled for with FE for each
rank. For the models using games with unranked opponents, separate FE
are used for each rank group used to construct the estimated posteriors.
(39.) The overall mean final rank is worse than the posterior
because teams tend to become unranked as the season progresses.
DANIEL F. STONE, I thank Shan Zhou for excellent research
assistance, Paul Montella of the Associated Press for providing me with
the 2006 ballots and helpful discussion, Andrew Nutting for sharing data
and discussion, and Edi Karni, Matt Shum, Joe Aldy, Tumenjargal
Enkhbayar, Liz Schroeder, Carol Horton Tremblay, Stephen Shore, Peyton
Young, Basit Zafar, and seminar participants at the Econometric Society 2009 North American Summer Meeting and 2009 IAREP/SABE joint meeting for
helpful comments. Two referees and the coeditor (especially) also
provided very helpful feedback. I thank Andrew Nutting for providing the
data set used for this analysis. The data sets used for the
article's main analysis do not include ranks on teams for each week
throughout the seasons, only the first half of each season and final
ranks. I do not find evidence of precision increasing substantially in
just the first half of seasons. I thank an anonymous referee for
suggesting this. The Sagarin rankings are a component of the Bowl
Championship Series (BCS) rankings along with other computer rankings. I
cannot use the BCS rankings because they are not computed after the
bowls. I use the Sagarin ratings because they were easily obtainable and
I expect other computer rankings would yield similar results.
Stone: Assistant Professor, Department of Economics, Oregon State
University, Corvallis, OR 97331. Phone 541 737 1477, Fax 541 737 5917,
E-mail dan.stone@oregonstate.edu
TABLE 1
Analysis of Within Season Changes in Rank Precision
Dep. Var = Favorite Wins (0/1)
(1) (2) (3) (4)
RANK_DIFF 0.005 0.0032 0.0213 -0.0054
(0.005) (0.011) (0.014) (0.014)
POST_WK7 -0.1143 * -0.1177 * -0.1022 -0.0982
(0.062) (0.063) (0.065) (0.066)
RANK_DIFF x 0.0134 ** 0.0135 ** 0.0124 ** 0.0125 **
POST_WK7 (0.0058) (0.0058) (0.0062) (0.0062)
Full controls [check] [check]
Rank and [check] [check]
season FE
[R.sup.2] 0.055 0.059 0.113 0.113
Observations 895 895 895 895
Dep. Var = Favorite Points - Underdog
Points
(1) (2) (3) (4)
RANK_DIFF 0.290 * -0.185 0.797 * -0.571
(0.168) (0.408) (0.409) (0.494)
POST_WK7 -4.722 ** -5.015 ** -4.317 * -4.269 *
(2.111) (2.087) (2.206) (2.184)
RANK_DIFF x 0.556 *** 0.580 *** 0.507 ** 0.523 **
POST_WK7 (0.2110) (0.2100) (0.2210) (0.2210)
Full controls [check] [check]
Rank and [check] [check]
season FE
[R.sup.2] 0.102 0.108 0.155 0.156
Observations 895 895 895 895
Notes: Robust standard errors in parentheses. All models estimated
by ordinary least squares. Sample includes all games between teams
ranked in aggregate AP top 25 from 1991 to 2008. "Favorite" is (ex
ante) higher ranked team. RANK DIFF = favorite's rank--opponent's
rank; POST WK7 = 0/1 for game occurring in Week 8 of season or
later. Dummy variables for home/away and bowl game are included in
all models. Full controls include RANK DIFFZ and dummies for
favorite/opponent in top 5 and conference game. Rank and Season FE
are dummies for favorite rank, opponent rank, and season.
Significance levels: * 10%; ** 5%; *** 1%.
TABLE 2
Tests of [H.sub.0]: Mean Score Differences
Conditional on Final Rank Groups Are Equal
in the First and Second Halves of the Season
p-Value for
Home Away [H.sub.0] :
Final Final [[bar.s].sub.Aug-Oct]
Rank Rank Period [bar.s] = [[bar.s].sub.Oct-Dec]
1-12 13-25 Aug-Oct 15 13.6 0.14
1-12 13-25 Oct 16-Dec 15 16.9
1-12 Unranked Aug-Oct 15 22.6 0.35
1-12 Unranked Oct 16-Dec 15 21.0
13-25 1-12 Aug-Oct 15 -7.0 0.35
13-25 1-12 Oct 16-Dec 15 -4.7
13-25 Unranked Aug-Oct 15 15.5 0.09
13-25 Unranked Oct 16-Dec 15 12.9
Notes: "Final Rank" = final AP aggregate rank; [bar.s] = mean home
score-away score. Sample includes games played 1989-2008 with
at least one Division 1-A team on non-neutral field. "Unranked"
restricted to teams receiving votes in final aggregate poll in at
least one of previous two seasons.
TABLE 3
MADs from Final Ranks (SDs in Parentheses)
[absolute [absolute [absolute [absolute
value of value of value of value of
Estimated Observed Observed Flat
Posterior- Posterior- Prior- Prior-
Observed Observed Observed Observed
Final] Final] Final] Final]
Wins 3.90 4.02 3.89 4.66
(3.48) (3.61) (3.66) (2.66)
Losses 2.13 2.36 3.14 4.47
(3.28) (3.38) (3.57) (2.22)
Byes 3.51 4.04 3.97 4.62
(3.72) (3.88) (3.89) (2.43)
Total 3.49 3.67 3.74 4.61
(3.53) (3.65) (3.68) (2.55)
Notes: Sample includes all games from Weeks 1 to 7 of 2006/2008
seasons with available data and games played on non/neutral sites
(for wins/losses); N = 21,758, 6,664, 2,881 for wins, losses, and
byes, respectively.
TABLE 4
Mean Rank Improvement (Prior
Rank-Posterior Rank; SDs in Parentheses) by
Prior Rank Group
Prior Wins Losses
Rank
Group Observed Estimated Observed Estimated
1-5 0.06 0.00 -7.66 -5.68
(1.29) (1.41) (3.60) (5.67)
6-10 0.74 0.34 -8.39 -8.89
(2.05) (2.46) (4.20) (4.63)
11-15 1.44 0.97 -6.49 -8.11
(2.49) (3.47) (3.16) (2.71)
16-20 2.30 2.56 -3.93 -4.22
(2.88) (3.53) (2.12) (2.02)
21-25 2.66 4.91 -0.42 -0.36
(2.54) (3.62) (1.02) (1.06)
Total 1.39 1.67 -4.90 -5.05
(2.49) (3.48) (4.12) (4.59)
Notes: Sample defined as in Table 3; N = 21,758, 6,664
for wins, losses, respectively. "Observed" = voter prior
rank-observed posterior rank; "Estimated" = voter prior
rank-estimated posterior rank.
TABLE 5
Summary Statistics of Variables Used for Overreaction Hypothesis
Testing
Variable Definition
OVER Estimated overreaction
1. HOME Home game dummy
2. SMARGIN Own score-opponent's score
3. OPPRANK Opponent ranked dummy
4. OR-SMARG SMARGIN x OPPRANK
5. EXPERIENCE Voter years of experience (since 1999)
6. STATE Team in same state as voter dummy
7. REGION Team in same region as voter dummy
8. AGGRKDIFF Aggregate rank-voter rank
9. PREY_YR_RK Previous year final aggregated rank
Wins Losses
Variable M SD M SD
OVER -0.28 3.15 -0.15 3.69
1. HOME 0.66 0.47 0.43 0.50
2. SMARGIN 23.84 15.36 -11.45 10.27
3. OPPRANK 0.27 0.44 0.62 0.48
4. OR-SMARG 4.43 9.77 -8.71 11.02
5. EXPERIENCE 2.64 2.90 2.59 2.92
6. STATE 0.03 0.18 0.03 0.18
7. REGION 0.10 0.30 0.10 0.30
8. AGGRKDIFF 0.46 3.49 0.94 3.82
9. PREY_YR_RK 20.11 15.21 24.23 17.03
Notes: Sample defined as observations used in Tables 3 and 4 with no
missing values for numbered variables; N = 21,645, 6,613 for wins,
losses, respectively. Numbered variables are elements of X in (4).
TABLE 6
Overreaction Estimation Results
Wins
(1) (2) (3) (4)
HOME 1.556 *** 1.562 *** 1.588 *** 0.953 ***
(0.173) (0.152) (0.218) (0.328)
SMARGIN -0.089 *** -0.088 *** -0.090 *** -0.097 ***
(0.005) (0.006) (0.006) (0.011)
OPPRANK -1.925 *** -1.875 *** -1.932 *** -1.602 ***
(0.300) (0.313) (0.296) (0.592)
OR_SMARG 0.052 *** 0.049 *** 0.047 *** 0.046
(0.014) (0.017) (0.015) (0.035)
TOP1_5 2.545 *** 2.522 *** 2.5151' ** 3.545 ***
(0.259) (0.216) (0.202) (0.529)
TOP6_10 2.586 *** 2.466 *** 2.447 *** 3.448 ***
(0.213) (0.179) (0.183) (0.406)
TOP11_15 2.579 *** 2.233 *** 2.432 *** 3.445 ***
(0.226) (0.201) (0.196) (0.393)
TOP16-20 1.744 *** 1.830 *** 1.523 *** 2.795 ***
(0.159) (0.163) (0.161) (0.315)
AGGRKDIFF -0.159 *** -0.163 *** -0.153 *** -0.170 ***
(0.013) (0.015) (0.016) (0.025)
[R.sup.2] 0.280 0.270 0.268 0.300
N 21,645 21,645 21,645 7,170
Losses
(1) (2) (3) (4)
HOME -1.220 *** -1.332 ** -1.194 *** -0.549
(0.472) (0.531) (0.450) (0.935)
SMARGIN -0.002 0.007 0.008 -0.057
(0.029) (0.029) (0.035) (0.076)
OPPRANK -0.077 -0.052 -0.204 -0.312
(0.639) (0.789) (0.649) (1.330)
OR_SMARG -0.012 -0.021 -0.023 0.038
(0.040) (0.044) (0.040) (0.080)
TOP1_5 2.127 ** 1.516 2.553 *** 3.061
(1.011) (1.197) (0.816) (2.348)
TOP6_10 -0.526 -0.332 -0.690 0.689
(0.480) (0.509) (0.497) (1.339)
TOP11_15 -1.341 *** -0.958 *** -1.122 *** -1.031
(0.243) (0.312) (0.299) (0.727)
TOP16-20 -0.135 -0.127 -0.186 0.135
(0.208) (0.245) (0.215) (0.407)
AGGRKDIFF 0.105 *** 0.107 *** 0.098 *** 0.045
(0.018) (0.016) (0.017) (0.040)
[R.sup.2] 0.167 0.122 0.192 0.171
N 6,613 6,613 6,613 2,010
Notes: Bootstrap standard errors clustered by game in parentheses.
The dependent variable is OVER in all models; the estimated
posteriors are constructed using the voters' individual final
rankings as the true rankings in specifications (1) and (4), and
using the aggregate AP rankings and Sagarin computer rankings as
the true rankings in (2) and (3), respectively. In specification
(4) the estimated posteriors are constructed using priors estimated
only on data from 2006 to 2007 and the regressions are estimated on
a sample using only 2008 data. Year FE, EXPERIENCE, STATE, REGION,
PREY_YR_RK, WEEK, voter FE, and voter FE-WEEK interactions included
in all specifications.
Significance levels: * 10%; ** 5%; *** 1%.
TABLE 7
Game Forecast Errors Estimation Results
Dep. Var = Win
(1) (2) (3) (4)
HIST_HOME -0.0029 0.0025 -0.0273 -0.0307
(0.0178) (0.0208) (0.0323) (0.0386)
HIST-WSM 0.0019 *** 0.0018 *** 0.0023 ** 0.0020 *
(0.0007) (0.0006) (0.0010) (0.0011)
HIST-TOP5 0.0098 0.0638 *** -0.0148 0.0710 ***
(0.0146) (0.0139) (0.0284) (0.0224)
[R.sup.2] 0.242 0.271 0.170 0.167
N 29,294 23,488 10,586 7,898
Dep. Var = SMARGIN
(1) (2) (3) (4)
HIST_HOME 0.806 0.818 1.020 0.822
(0.710) (0.788) (0.997) (1.057)
HIST-WSM 0.079 *** 0.072 *** 0.087 ** 0.075 *
(0.025) (0.025) (0.036) (0.038)
HIST-TOP5 0.172 2.075 ** -0.326 2.211 **
(0.669) (0.897) (1.085) (0.996)
[R.sup.2] 0.377 0.387 0.259 0.233
N 29,294 23,488 10,586 7,898
Notes: Bootstrap standard errors clustered by game in parentheses.
Specifications (2) and (4) use samples restricted to teams ranked 6-25;
specifications (3) and (4) use samples restricted to games in which the
opponent is ranked by at least one voter. STATE, REGION, voter FE,
WEEK FE, rank FE, and opponent rank FE included in all specifications.
Significance levels: * 10%; ** 5%; *** 1%.
TABLE 8
Mean Aggregate Posterior, Final Ranks
Conditional on Selected Prior Rank Groups and
Game Results (Standard Errors in Parentheses)
Observed Observed
Posterior Rank Final Rank
Home teams 13.72 (0.29) 20.79 (0.49)
Away teams 16.03 (0.41) 20.85 (0.59)
p-Value ([H.sub.0]: Home = Away) <0.01 0.94
Top 16-20 teams that win by
<10 15.82 (0.33) 27.04 (1.80)
Top 21-25 teams that win by
[greater than or equal to] 10 19.92 (0.22) 26.26 (1.13)
p-Value ([H.sub.0]: Top 16-20 =
Top 21-25) <0.01 0.72
Losing top 1-5 teams 10.52 (0.34) 13.35 (1.71)
Winning top 6-10 teams 7.02 (0.11) 13.69 (0.74)
p-Value ([H.sub.0]: Losing =
Winning) <0.01 0.86
Notes: Sample includes Weeks 1-6 of 1991-2005 seasons.
Posterior rank is following week rank (rank from Weeks 2-7 of
same seasons). Final rank is postseason rank. Expected rank of
50 conditional on being unranked used to calculate unconditional
expected (mean) ranks.