Stochastic evolutionary dynamics in the repeated prisoners' dilemma.
Linster, Bruce G.
I. INTRODUCTION
The prisoners' dilemma is perhaps the most widely studied of all game theory applications: it is commonly employed in such diverse fields as economics, political science, and biology. The literature primarily explores how the cooperative outcome can be sustained in the context of the repeated prisoners' dilemma (RPD). This paper further examines the prisoners' dilemma, focusing on the infinite RPD played by individuals employing very simple strategies that can be thought of as "rules of thumb." Moreover, the strategies are exposed to an evolutionary process in which the best strategies survive over time while the weakest die out. However, a random element is incorporated to allow the "rules of thumb" to change. The resulting dynamics prove to be very interesting, generating populations whose characteristics depend on the probability of another encounter or how important the future is.
The prisoners' dilemma has been used to model many economic problems. For example, Green and Porter |1984^ use it to analyze collusion among oligopolists, Brander and Spence |1985^ to explore the use of export subsidies, Farrell and Maskin |1989^ to examine advertising among competing firms, Cornes and Sandler |1986, 37-42^ to explain the provision of public goods, and Dixit and Nalebuff |1991, 115-18^ to describe the conflict between Congress and the Federal Reserve over economic policy. For the purposes of this essay, it is important that these situations are repeated and the last play of a stage game is unknown beforehand. Such settings can be modeled as infinitely repeated games with discounted payoffs. Additionally, if the strategies in these repeated games can be represented by finite automata or computing machines, we can see how the mix of strategies in the population evolves. The evolutionary framework is very important here, and we can think of the process as representing a situation where the most successful strategies are imitated in future play of the game.
Maynard Smith and Price |1973^ are among those credited with introducing evolutionary game theory, and Axelrod's |1984^ use of evolutionary arguments to justify cooperation in his prisoners' dilemma tournament is one of the most well-known applications. Axelrod's tournament pitted strategies submitted as computer programs against one another, with the most successful increasing their proportion in the population while the less effective died out. A cooperative population emerged from the evolutionary process in the tournament. Others have studied infinite repeated prisoners' dilemmas in an evolutionary framework, offering a wide variety of models with results that vary similarly.(1)
An important contribution to the study of repeated games was the introduction of bounded rationality models, allowing strategies of finite complexity to be modeled.(2) The simple strategies studied here incorporate elements of these bounded rationality models. They are also subject to evolutionary forces and may mutate or change to similar strategies in an intuitively appealing way.
The next section is devoted to motivating the study of the repeated prisoners' dilemma as an important tool in economic analysis. Specifically, a very rudimentary model of collusion between duopolists is laid out, indicating how the game can be used to analyze this situation. Section III reviews the essentials of evolutionary stability which were presented by Maynard Smith |1982, 10-27^. The results of computer simulations with randomly perturbed evolutionary processes are then reported. These allow us to see that if the future is not very important we may find noncooperative populations, but if the players care enough about the future, the population may cycle between being cooperative and noncooperative. Finally, the paper is summarized.
Il. THE PRISONERS' DILEMMA AND ECONOMIC THEORY
The prisoners' dilemma has many applications in economic theory. Consider, for example, the situation found in many economic principles texts where each of two firms simultaneously decides how much output to produce in a given period. If they both produce their share of the monopoly output, they earn a profit of 3. However, if one firm defects by expanding output while the other sticks to the collusive output, the defecting firm will increase its profits to 5 while the cooperator earns 0. If they both defect from the agreement, they each earn a profit of 1. Such games are normally represented in the bimatrix form below where C and D denote "cooperate" and "defect" respectively. Player 2 Player 1 D C D 1,1 5,0 C 0,5 3,3
If these duopolists meet only once, the only reasonable outcome has both firms defecting on their agreement since D is the better strategy for each regardless what the other firm does. In other words, (D,D) is a dominant strategy equilibrium. However, if the game continues into the next period with probability |Delta^, or if the game will be played infinitely long and the firms discount the payoffs by |Delta^, cooperation can be sustained if |Delta^ is large enough. Specifically, if |Delta^ |is greater than or is equal to^ 1/2 cooperation can be sustained by both firms choosing the GRIM strategy--cooperate until the opponent defects from the agreement and then defect forever. To see that both players choosing GRIM is an equilibrium, first note that the firms get 3 + 3|Delta^ + 3||Delta^.sup.2^ + ... = 3/(1-|Delta^) if they both choose GRIM. If a firm is going to defect, it should do it as early as possible and defect thereafter, earning the payoff 5 + |Delta^ + ||Delta^.sup.2^ + ... = 5 + |Delta^/(1-|Delta^). A little algebra reveals that choosing the GRIM strategy will be optimal against itself as long as |Delta^|is greater than or is equal to^1/2.
This type of analysis can be applied to other economic phenomena like those described above. The important element, though, is a large enough probability of another interaction. If the prisoners' dilemma is played only once, determining the equilibrium outcome is trivial, but repeated play of the stage game allows for more interesting outcomes.
III. EVOLUTIONARY STABILITY: BASIC CONCEPTS
This section reviews the evolutionary stability concept that has been applied to games. This idea is based on the fact that the success from playing a certain strategy depends not only on a given player's choice, but also on the characteristics of the population in which it plays. Loosely speaking, an evolutionarily stable strategy is one which can not be successfully invaded. One problem with the notion of evolutionary stability is that there may be multiple stable strategies, not all equally plausible. The essentials of evolutionary stability are the foci of the present discussion.
The exposition may be clearer if we think of the players' strategies as genetically determined and asexual reproductive success (number of offspring) as the payoff. That is, the more successful strategies have a greater number of expected descendants in the next generation. The question under consideration here is which, if any, strategies will be immune from invasion from a small proportion of another strategy or mix of strategies. Another interesting question is which strategy can we expect to see when multiple stable strategies exist. In order to answer this question, section IV considers perturbations in the evolutionary process that can be thought of as mutations.
Consider evolutionary stability in the context of a two-player symmetric normal form game, G, defined by the four-tuple G = <|S.sub.1^, |S.sub.2^, |u.sub.1^, |u.sub.2^> where |S.sub.i^ is player i's strategy space and |u.sub.i^: |S.sub.1^ x |S.sub.2^ |right arrow^ R is player i's payoff function. Player I chooses s |is an element of^ |S.sub.1^ while player II simultaneously chooses t |is an element of^ |S.sub.2^, and they receive payoffs |u.sub.1^(s, t) and |u.sub.2^(s, t). Since the game is symmetric, |S.sub.1^ = |S.sub.2^ and |u.sub.1^(s, t) = |u.sub.2^(t, s), or the role of each player is irrelevant. A strategy for player i is defined as ||Sigma^.sub.i^ |is an element of^ |Delta^(|S.sub.i^) where |Delta^(|S.sub.i^) is the set of all probability measures on |S.sub.i^. More formally, if there are k pure strategies, then
|Delta^(|S.sub.i^) = {||Sigma^.sub.i^ |is an element of^ |R.sub.K^: ||Sigma^.sub.i^(s) |is greater than or is equal to^ 0, |summation of^ ||Sigma^.sub.i^(s) = 1} where s |is an element of^ |S.sub.i^.
Except where necessary for clarity, notation will be abused slightly as follows: S = |S.sub.1^ = |S.sub.2^ and u = |u.sub.1^ = |u.sub.2^.
A strategy |Sigma^, then, is an evolutionarily stable strategy of a symmetric game if and only if:
(1) u(|Sigma^, |Sigma^) |is greater than or is equal to^ u (|Sigma^|prime^, |Sigma^)
and
(2) u(|Sigma^, |Sigma^) = u(|Sigma^|prime^, |Sigma^) |implies^ u(|Sigma^, |Sigma^|prime^) |is greater than^ u(|Sigma^|prime^, |Sigma^|prime^).
for all |Sigma^|prime^ |is not equal to^ |Sigma^, |Sigma^|prime^ |is an element of^ |Delta^(S). In words, these conditions require any strategy be a best reply against itself, and if there are alternate best replies, the evolutionarily stable strategy must do strictly better against an alternate best reply than the alternate best reply does against itself. Requirement (1) above means any evolutionarily stable strategy is a Nash equilibrium. However, the converse is not true because of (2), which is sometimes called the stability condition.
The definition above is actually a characterization of a more basic requirement which holds in models where players are randomly matched.(3) The following must hold for a sufficiently small proportion, |Epsilon^ |is greater than^ 0, of invaders:
(1 - |Epsilon^)u(|Sigma^, |Sigma^) + |Epsilon^u(|Sigma^, |Sigma^|prime^)
|is greater than^ (1 - |epsilon^) u (|Sigma^|prime^, |Sigma^) + |Epsilon^u (|Sigma^|prime^, |Sigma^|prime^).
This requires the expected payoff from playing |Sigma^ in a population with proportion |Epsilon^ of |Sigma^|prime^ players be strictly greater than the expected utility of those playing |Sigma^|prime^.
Consider the following game matrix. Player 2 Player 1 A B A 2,2 0,2 B 2,0 1,1
There are two symmetric Nash equilibrium strategies in this game, (A,A) and (B,B), but only the latter is evolutionarily stable. If the population is all A players, some small proportion, |Epsilon^, of B players can successfully invade the population because the expected payoff to a B player is 2(1 - |Epsilon^) + |Epsilon^ = 2 - |Epsilon^, and the expected payoff to an A player is 2(1 - |Epsilon^) + 0|Epsilon^ = 2 - 2|Epsilon^. Since the B players do better on average than the A players for any |Epsilon^ |is greater than^ 0, the A strategy is not evolutionarily stable. However, a population of all B players cannot be invaded. If the A strategy is some small proportion |Epsilon^ of the population, it will earn an expected payoff of 0(1 - |Epsilon^) + 2|Epsilon^ = 2|Epsilon^. However, the B strategy will earn 1(1 - |Epsilon^) + 2|Epsilon^ = 1 + |Epsilon^ |is greater than^ 2|Epsilon^ for |Epsilon^ |is an element of^ (0,1), and therefore cannot be invaded.
Clearly, the requirements for a strategy to be evolutionarily stable are more stringent than those of a symmetric Nash equilibrium. A Nash equilibrium in a two-player game is a strategy pair where each player's choice is a best reply to the other's. If every player in a population chooses the same strategy which is part of a symmetric Nash equilibrium, the population is neutrally stable in the sense that there exist no strategy choices that can be strictly more successful than the native strategy and, hence, displace it. For a strategy to be evolutionarily stable, it must be true that an invasion from any feasible mixed strategy will be unsuccessful with the invading strategy dying out.
One problem with this notion of stability is that games frequently have multiple stable strategies. Consider, for example, the common interest game presented below. Player 2 Player 1 A B A 2,2 0,0 B 0,0 1,1
This game has three Nash equilibria, all of which are perfect--(A,A), (B,B), and (1/3, 2/3).(4) Evolutionary stability eliminates only the mixed-strategy equilibrium in this case. The more intuitively appealing of the remaining two is (A,A) because it yields the payoff dominant outcome. However, evolutionary stability fails to eliminate the (B,B) equilibrium. The significance of this is even more apparent if we substitute a very large value for 2. For any arbitrarily high payoff for (A,A) relative to the payoff for (B,B), both A and B are evolutionarily stable strategies.
The effect of perturbations in the evolutionary process is examined through computer simulations in the next section. Foster and Young |1990^ provided a theoretical treatment of such disturbances in a general continuous-time model, but discrete time dynamics in the infinite repeated prisoners' dilemma are analyzed here through computer simulations. The random perturbations in the evolutionary process can be thought of as mutations. That is, when a strategy a reproduces, it will nearly always have offspring that are genetically identical. However, with a small probability |Mu^, the offspring will be of some other type and choose strategy a|prime^.
IV. STOCHASTIC EVOLUTIONARY PROCESSES
The prisoners' dilemma represented in section II forms the stage game for the prisoners' dilemma examined here. It is, perhaps, simplest to think of a generation of players with genetically encoded strategies playing an infinite game. The next generation is determined by an evolutionary process where the reproductive success of each strategy depends on how well it performed relative to the population average. The prisoners' dilemma is played again, the evolutionary process is repeated, and so on. Remember that the infinite game need not last an infinitely long time, but the last play of the stage game is never known beforehand.
Two simple strategies in the infinitely repeated prisoners' dilemma are the GRIM strategy, which is cooperative in the sense that it is never the first to defect, and the "always defect" (DD) strategy. DD is always a subgame perfect equilibrium strategy in the repeated prisoners' dilemma because D is the dominant strategy in the stage game.(5) GRIM is subgame perfect if the discount factor (or probability of another play of the game), |Delta^, is at least 1/2.
If only the GRIM and DD strategies are allowed to be genetically encoded in the players,
TABULAR DATA OMITTED
is the payoff matrix for the infinitely repeated game when the discount factor is |Delta^. The actual values in the payoff matrix depend on the discount factor. For example, if |Delta^ = 0.9, is the payoff matrix. The evolutionary stability concept described earlier does not indicate which of the strategies is more evolutionarily fit. In this simple game both pure strategies form symmetric Nash equilibria, and there is also a symmetric mixed-strategy Nash equilibrium, (1/17, 16/17). Applying the normal form evolutionarily stable strategy concept here eliminates the mixed-strategy equilibrium but does nothing to help us select between the cooperative and defecting equilibria. However, as we shall see, if the strategies are subject to a perturbed evolutionary process, the GRIM strategy will be selected. Player 2 Player 1 DD GRIM DD 10,10 14,9 GRIM 9,14 30,30
An evolutionarily stable strategy is stable against a single small perturbation. However, if these disturbances repeatedly take place, evolutionary stability doesn't insure a population will be secure against invasion by a different strategy. As the perturbations get smaller, the population chooses the cooperative equilibrium in this particular game. If the discount factor, or shadow of the future as it is described in Axelrod and Dion |1988^, is relatively small, the defecting equilibrium will emerge. In another very simple environment neither defection nor cooperation can be stable in a stochastic setting.
Simulations are performed instead of examining this problem analytically because any sensible mutation scheme will prove to be intractable. Also, adding strategies further increases the analytic complexity. To make the discussion as clear as possible, the stability of strategies in an evolutionary setting will be explored using computer simulations.
The exposition begins by discussing the mutation scheme under consideration. Either of the two strategies discussed in the simple version of the infinitely repeated game, GRIM and "always defect," can easily be represented by a finite automaton or Moore machine, which can be described by a four-tuple <Q, |q.sub.0^, |Lambda^, |Mu^>. Q = {|q.sub.0^, |q.sub.1^, |q.sub.2^,..., |q.sub.m^} is a finite set of states, |q.sub.0^ |is an element of^ Q is the initial state, |Lambda^: Q |right arrow^ {C,D} is the output function which maps the state into a strategic choice, and |Mu^: Q x {C,D} |right arrow^ Q is the transition function which maps the current state and the opponent's choice into a state (not necessarily different). Moore machines not only allow us to model strategies in repeated games concisely and conveniently, they let us analytically calculate payoffs for infinite games. When two finite automata play each other, they must eventually enter a cycle. Hence, the sequence of payoffs can be represented by an infinite sequence which can be summed if we discount the stage game payoffs. Figure 1 provides representations of the two automata which implement GRIM and "always defect" The circles represent the different states, and the letter inside each circle indicates the action taken in that state. The initial state is on the left, and the arrows indicate transitions. For example, GRIM transitions from the cooperating state to the defecting state if the opponent defects. Otherwise, it stays in the cooperating state.
The evolutionary process employed in the simulations has two features. The first attempts to capture the idea that successful strategies will tend to be copied, and the other reflects the idea that over time these strategies will not be perfectly duplicated. First the replication process will be described, and then how the strategies mutate is explained.
To see how the replication process works, imagine a game in which there is a strategy space A with n strategies or automata, A = {|a.sub.1^, |a.sub.2^,...,|a.sub.n^}. The prisoners' dilemma is then played at time t, and each strategy earns a payoff depending on the strategy with which it is matched. These payoffs can be represented in a n x n matrix B with element |b.sub.ij^ being the payoff to a player who uses machine |a.sub.i^ if he is matched against a player who uses |a.sub.j^. Then we have
|Mathematical Expression Omitted^.
As an example, in the infinitely repeated game with only GRIM and DD, the payoff matrix is
|Mathematical Expression Omitted^,
where DD is the first strategy and GRIM is the second. Also, the vector of proportions of each type of player in the population at time t |is an element of^ {0,1,2,...} is p(t) = ||p.sub.1^(t), ||p.sub.2^(t),..., |p.sub.n^(t)^.sup.T^. The expected payoff to a player of strategy |a.sub.i^ at time t is ||Bp(t)^.sub.i^. This is the ith element of the vector Bp(t). Also, the expected payoff for a member of the population at time t is p|(t).sup.T^ Bp(t). This process has the proportion of strategy i at time t+1 equal to its proportion at time t multiplied by the ratio of its own expected payoff to the expected payoff of all players. Then we have the following dynamic process:
(4) |p.sub.i^(t+1) = |p.sub.i^(t){||Bp(t)^.sub.i^/p|(t).sup.T^Bp(t)}.
Clearly, whether a strategy's proportion in the population gets larger or smaller depends on whether or not it is doing better or worse than average. The proportion of a strategy in the next generation depends not only on its own relative performance, but also on its proportion in the current population. This captures the idea that a strategy must be both successful and observed by other strategies to be copied.
The dynamic process described above is based on the idea that some strategies would be so unsuccessful they would probably not be used in the future, while the superior ones would be imitated. This process allows us to evaluate how well a strategy will do when the less effective ones cease being important, and only the best strategies remain to play each other. The dynamic process simulates evolution with an infinitely large population and random matching. Essentially, a machine plays against a mixed strategy which is represented by the different proportions in the population.
In order to capture the notion that strategies mutate in the evolutionary process, the n x n matrix M is used to represent the rate at which strategies change into each other. A typical element |m.sub.ij^ is the probability a strategy of type j mutates into a type i player. Therefore, after the replication process described above has taken place, the strategies undergo a mutation process which is described by M. Then, the population, which can be described by p*(t+1), plays the infinitely repeated game in period t+1. The proportions in the population are determined by p*(t+1)= Mp(t+1), where p(t+1) is the vector of proportions which result from the replication process described above.
Some very simple examples illustrate what the mutation matrix does. If one strategy can change into any other with equal probability, |Mu^, then the matrix M has form |m.sub.ij^ = |Mu^ for i |is not equal to^ j and |m.sub.ij^ = 1 - (n - 1)|Mu^ if i = j and there are n total strategies. On the other hand, if no mutation takes place the above is true with |Mu^ = 0, and M is the n x n identity matrix. The only restrictions on this matrix are the following:
|summation of^ |m.sub.ij^ where j = 1 to n = 1, |for all^i |is an element of^ {1,2,...,n}
and |m.sub.ij^ |is an element of^ |0,1^ |for all^i, j |is an element of^ {1,2,...,n}.
These merely require that a strategy either stays the same or changes. One machine of type a may not turn into some other number of type b machines.
The mutation scheme used in the simulations is based on the two automata described in Figure 1 being constructed of two easily distinguishable parts. They both have the DD strategy in common because the second state of GRIM is simply DD. Initially it is assumed it is as easy for DD to gain a cooperative state as it is for GRIM to loose one. A mutation rate of |Mu^ is hypothesized, meaning a GRIM strategy will change into DD, or a DD strategy will mutate to GRIM with probability |Mu^.
One thousand generations of the evolutionary process described above were simulated using these two strategies and a population of 100. The strategies were assigned a fitness based on their expected payoff against the population using equation (3) above with |Delta^ = 0.55 and |Delta^ = 0.7, and the replication process was simulated. Additionally, each machine changed to one of the other type with a probability of |Mu^. The mutation process can be summarized by
|Mathematical Expression Omitted^,
with |Mu^ taking on values of 0.2, 0.1, and 0.01. The initial population for all simulations was half GRIM and half DD, but the same qualitative results were observed regardless of the initial distribution.
The simulation indicate that when the discount factor is small, DD will be the result of the evolutionary dynamic process as the mutation rate, |Mu^, gets smaller. If |Delta^ is relatively large, the population will tend to evolve to all GRIM. Figures 2-7 reveal that as |Mu^ decreases the population selects one strategy, DD for |Delta^ = .55 and GRIM for |Delta^ = .7. This is more clear when we simulate the dynamic process with a mutation rate of .01. Although there is only one mutant in each generation on average when |Mu^ = .01, the population selects one equilibrium outcome. This indicates the importance of repeated perturbations in this model. A small perturbation to the system would not cause the population to move away from either stable strategy, but the repeated perturbations cause such a movement to take place.
The intuition is not difficult to see. As the discount factor decreases, DD becomes a less inferior reply to GRIM. That is, the difference between what a GRIM player gets against himself and what a DD player gets against GRIM diminishes. Therefore, as |Delta^ gets smaller, DD players will persist in the population long enough for another perturbation to take place. On the other hand, a GRIM player doesn't do as well relative to a DD player because the future is less important. If the discount factor is near one, though, the difference in payoffs to GRIM and DD players when they play a GRIM player is large enough so defectors die out quickly if almost all the population plays GRIM. If most of the population defects, a GRIM player can persevere until the next perturbation because the future isn't discounted much. Also, if other cooperators appear, the GRIM strategy does much better against them than DD does, so it can increase its proportion of the population.
Now we will examine what happens in a population when we add feasible mutant strategies. The idea of feasible mu-rants is important here. Boyd and Lorberbaum |1987^ pointed out that no strategy is immune from invasion in the infinitely repeated prisoners' dilemma. However, what strategies are feasible in a given environment is an important consideration. Darwin |1859^ first suggested invading mutants should be readily derived from the native population.
Pollock |1989^ proposed a heuristic for describing strategies in a given environment--choosing to portray them in terms of "niceness" and "provokability." For example, DD is not nice, and GRIM is nice and maximally provokable. It is best to think of niceness as a binary characteristic of a strategy here: a strategy will either defect unprovoked or it will not. Provokability, on the other hand, can be found in different degrees. The mutant strategies in these simulations allow different levels of provokability similar in spirit to Pollock's taxonomy. Specifically, mutants can be constructed from elements of the two strategies discussed above, allowing strategies called 2-GRIM, 3-GRIM, ..., n-GRIM in addition to GRIM (or 1-GRIM) and DD (or 0-GRIM). That is, the n-GRIM strategy will transition to the absorbing defection state after n defections by his opponent. However, it is not plausible to have 5-GRIM mutate to DD as easily as GRIM does. Therefore, if it is assumed that the probability a state is added or dropped is |Mu^, the probability an m-GRIM machine will change into an n-GRIM machine, or an n-GRIM machine mutates into an m-GRIM machine is ||Mu^.sup.|absolute value of^m-n^^ for n |is not equal to^ m. For example, if |Mu^ = 0.1 the GRIM strategy will mutate to DD or 2-GRIM with probability 0.1. However, it will mutate to 3-GRIM with probability 0.01 and to 4-GRIM with probability 0.001, and so on.
Another group of simulations was accomplished for 5000 generations using the six strategies described above and a population of 100. The payoffs in this game can be represented by the following matrix:
|Mathematical Expression Omitted^
The element in the m + 1st row and n + 1st column indicates the payoff to an m-GRIM player when matched with one who plays n-GRIM while m and n can take on the values 0, 1, 2, 3, 4, or 5. After B above is used in the replication process represented by (4), the following mutation matrix was employed with ||Alpha^.sub.j^ defined as |summation of^ |m.sub.ij^ over i |is not equal to^ j^, or the sum of the off-diagonal elements in the jth column.
|Mathematical Expression Omitted^ TABLE I Summary of Evolutionary Simulations Strategy Set Discount Factor (|Delta^) Resulting Population {DD, GRIM} 0.55 Defecting {DD, GRIM} 0.7 Cooperating {DD, GRIM, Cycle between 2-GRIM, ... 0.7 Defecting and 5-GRIM} Cooperating
The results of the simulations are summarized in Figures 8-9. The graphs indicate the possibility of mutation makes cooperation unsustainable in this stochastic environment. This is true even if we use a discount factor large enough so that DD would not be stable in the earlier simulations. Even this simple mutation scheme can destroy GRIM as a stable strategy. Although the choice of strategies only up to 5-GRIM was arbitrary, if the strategies chosen couldn't eliminate the cooperative strategy, less provokable strategies could have been allowed so that the cooperative strategies would be displaced. The initial population in these simulations was half DD and half GRIM, but once again the same qualitative results were observed regardless of the initial population.
Moreover, these simulations suggest the population will tend to cycle over time between cooperative and defecting outcomes if |Delta^ is large enough. The same forces which cause the population to select GRIM in the two-strategy evolutionary process will cause the population to choose GRIM initially. However, over time the less provokable strategies evolutionarily drift into the population, since they do as well against the GRIM strategy as it does against itself. As the population becomes less provokable, it becomes susceptible to invasion by DD. After the population is overtaken by DD the same forces which worked initially are present again, and the population cycles. This can be seen in Figure 9. The cooperating strategies dominated the population for a while, and then the less provokable strategies allowed DD to invade, only to be overcome by the GRIM strategy once again. If |Delta^ is small enough, there are no forces moving the population toward cooperation. It seems reasonable to expect that as the mutation rate diminishes, the cycles will be less frequent but still exist.
The results of the simulations are summarized in Table I. These outcomes are caused by the perturbations adding upon one another. If all of the strategies were available and the population was initially all GRIM players, one small disturbance would not change the equilibrium population. It could change in composition, but a small perturbation would not cause the population to change to a defecting one. The cycling is the result of repeated perturbations.
V. SUMMARY
The simulations performed and reported here suggest cooperation may not be the outcome of an evolutionary process subject to continuous perturbations. This paper described the notion of evolutionary stability in an environment subject to disturbances. Its application to the infinite repeated prisoners' dilemma yields new insight into how strategies may evolve and which strategies will survive such processes. The methodology applied here can be thought of as "rules of thumb" employed by individuals in the population. For example, in the oligopoly problem the DD machine can be thought of as the strategy that cheats on any collusive agreement and n-GRIM cooperates until the opponent cheats n times then defects forever. The evolutionary component of this model allows the strategies to change over time.
Evolutionary stability is a very powerful concept, and it offers an interesting approach to explain behavior we may see. The dynamical system explored here can serve as a vehicle for analyzing the type of economic problems for which the repeated prisoners' dilemma is appropriate. Foster and Young |1990^ examined the stability of populations when the payoff matrix is continuously subjected to perturbations. They show the population will generally select a set of strategies and will almost surely be arbitrarily close to that set as the variance of the disturbance goes to zero.
The principal result from these simulations is that if we employ a reasonable mutation scheme in this particular evolutionary process, the population may not tend monotonically toward cooperation. In fact, in some of these simulations the population cycled between being cooperative and defecting. The implications for economic problems are clear. If the probability of another interaction is small, cooperation is unlikely even though it is equilibrium behavior. If future interaction is likely, we can expect the population to cycle between cooperation and defection. This indicates that duopolists may be observed cooperating for a while and then revert to defection. In another context, two countries may trade without export subsidies and then change their behavior, only to be followed by further cooperation (free trade) later.
GRIM and DD were chosen as the initial strategies because they are both subgame perfect equilibrium strategies in the infinite repeated prisoners' dilemma. However, if we think of them as being implemented by Moore machines and imagine mutation as adding and subtracting parts of those machines to change how provokable they are, neither defection nor cooperation can survive indefinitely if the shadow of the future is large enough. If the discount factor is small enough, though, DD will be stable, even in the presence of these mutant strategies.
1. Some examples of these studies are Boyd and Lorberbaum |1987^, Farrell and Ware |1989^, Fudenberg and Maskin |1990^, Binmore and Samuelson |1989^, and Linster |1992^.
2. Aumann |1981^ and Abreu and Rubinstein |1988^ are some well-known examples.
3. Maynard Smith and Price |1973^ provide details and an explanation of this.
4. The notation here is (p, 1 - p) where p is the probability assigned to strategy A.
5. Binmore |1987^ offers an excellent discussion of this and other Nash equilibrium refinements.
REFERENCES
Abreu, Dilip, and Ariel Rubinstein. "The Structure of Nash Equilibrium in Repeated Games with Finite Automata." Econometrica, November 1988, 1259-81.
Aumann, Robert J. "Survey of Repeated Games," in Essays in Game Theory and Mathematical Economics in Honor of Oscar Morgenstern. Bibliographisches Institute: Manheim, 1981, 11-42.
Axelrod, Robert. The Evolution of Cooperation. New York: Basic Books, 1984.
Axelrod, Robert, and Douglas Dion. "The Further Evolution of Cooperation." Science, December 1988, 1385-90.
Binmore, Kenneth G. "Modeling Rational Players, Part I." Economics and Philosophy, 3(1), 1987, 179-214.
Binmore, Kenneth G., and Larry Samuelton. "Notes on Evolutionarily Stable Strategies in Repeated Games Played by Finite Automata." Working paper, University of Michigan, 1989.
Boyd, Robert, and Jeffrey Lorberbaum. "No Pure Strategy is Evolutionarily Stable in the Repeated Prisoner's Dilemma Game." Nature, 327(1), 1987, 58-59.
Brander, James, and Barbara Spencer. "Export Subsidies and International Market Share Rivalry." Journal of International Economics, 18(1), 1985, 83-100.
Cornes, Richard, and Todd Sandler. The Theory of Externalities, Public Goods, and Club Goods. Cambridge: Cambridge University Press, 1986.
Darwin, Charles. On the Origin of Species. London: John Murray, 1859.
Dixit, Avinash, and Barry Nalebuff. Thinking Strategically. New York: W. W. Norton, 1991.
Farrell, Joseph, and Eric Maskin. "Renegotiation in Repeated Games." Games and Economic Behavior, December 1989, 327-60.
Farrell, Joseph, and Roger Ware. "Evolutionary Stability in the Repeated Prisoners' Dilemma." Theoretical Population Biology, 36(1), 1989, 161-66.
Foster, Dean, and Peyton Young. "Stochastic Evolutionary Game Dynamics." Theoretical Population Biology, 38(2), 1990, 219-32.
Fudenberg, Drew, and Eric Maskin. "Evolution and Cooperation in Noisy Repeated Games." American Economic Review, May 1990, 274-79.
Green, Edward, and Robert Porter. "Non-cooperative Collusion under Imperfect Price Information." Econometrica, January 1984, 87-100.
Linster, Bruce G. "Stochastic Evolutionary Stability in the Infinitely Repeated Prisoners' Dilemma Played by Two-state Moore Machines." Southern Economic Journal, April 1992, 880-903.
Maynard Smith, John. Evolution and the Theory of Games. Cambridge: Cambridge University Press, 1982.
Maynard Smith, John, and G. R. Price. "The Logic of Animal Conflict." Nature, 246(1), 1973, 15-18.
Pollock, Gregory. "Evolutionary Stability of Reciprocity in a Viscous Lattice." Northwestern University, 1989.
BRUCE G. LINSTER, Associate Professor, United States Air Force Academy. I wish to thank the anonymous referees for their useful comments and suggestions.