文章基本信息

标题：Testing the testers: Do more tests deter athletes from doping?
作者：Baudouin, Claire ; Szymanski, Stefan
期刊名称：International Journal of Sport Finance
印刷版ISSN：1558-6235
出版年度：2016
期号：November
出版社：Fitness Information Technology Inc.

Testing the testers: Do more tests deter athletes from doping?

Baudouin, Claire ; Szymanski, Stefan

Abstract

This paper examines whether increasing the frequency of testing deters athletes from doping. Since data is not available to analyze this problem directly, an indirect approach is required. We use the relationship between testing and Olympic performance to infer the relationship between testing and doping. This requires a variety of assumptions, the most important of which is that doping improves Olympic performance. The results suggest that in some sports, such as track & field (athletics) and wrestling, carrying out more tests does deter athletes from taking drugs. In other sports in which doping is believed to be more common, though, there is no evidence of a negative relationship between testing and doping. This is notably the case in cycling. This suggests that for some sports, increasing the frequency of testing may be a simple solution to the problem of doping. In other sports, though, the problem may have deeper roots.

Keywords: doping, Olympics, testing, WADA

Introduction

Doping scandals have become increasingly common in sport in recent decades. There exists a growing literature in sports economics that uses theory to analyze this problem. However, far less attention has been devoted to an empirical analysis of the performance of anti-doping agencies. This paper examines whether testing athletes more frequently deters them from taking drugs. This is an important question since if testing does deter doping, this suggests a relatively simple solution to the problem of doping in sport. If testing does not act to deter athletes from taking drugs, it may be necessary to consider redesigning the system.

Ideally, we would directly examine the relationship between the number of times an athlete is tested and whether or not the athlete chooses to cheat. However, such an approach is not possible since we do not know whether or not an athlete chose to take drugs. Consequently, a less direct approach is required. This paper uses the relationship between testing and Summer Olympic performance to infer the relationship between testing and doping. This requires a variety of assumptions, the most important of which is that doping improves Olympic performance. This appears to be true of many, although not all, Summer Olympic sports. If doping improves Olympic performance, we would expect that accounting for other factors, countries that test more would perform worse at the Summer Olympics. This is because if testing acts to deter athletes from taking drugs, then countries that carry out more tests will have fewer competitors taking drugs and, if drugs confer a competitive advantage, will subsequently perform worse than would otherwise be expected.

The results suggest that in some sports, such as track & field and wrestling, carrying out more tests does deter athletes from taking drugs. In other sports in which doping is believed to be common, though, we cannot reject the null hypothesis that there is no relationship between testing and the proportion of medals won. This is notably the case in cycling, which in the past has been subject to various doping scandals.

There is a growing body of theoretical literature relating to the economics of doping. (1) However, this is one of the first empirical papers to analyze the existence of a relationship between doping and testing. Our analysis is possible since the World-Anti Doping Agency (WADA) has recently begun to publish detailed statistics concerning the number of tests carried out by National Anti-Doping Organizations (NADOs). To the best of our knowledge, this is the first economics paper to use this new data set. Two other papers have analyzed similar problems, although they have taken very different approaches. Hermann and Henneberg (2014) analyze how many tests would be necessary to deter athletes from doping. Using variables such as the window of detection for different drugs and how predictable testing is, they find that depending on the sport, between 16 and 50 tests would need to be carried out on athletes each year in order to completely deter doping. Secondly, Mitchell and Stewart (2004) examine whether the announcement that EPO testing would be introduced at the 2000 Olympics led some athletes to choose not to participate. They find no evidence that this was the case.

This paper also draws on the existing literature that focuses on establishing the factors that best predict how well a country will perform at the Olympics (e.g., Leeds & Leeds, 2012; Pierdzioch & Emrich, 2013). These papers examine the importance of a variety of different factors and generally find previous performance, population, and gross domestic product (GDP) per capita to be among the most important variables. However, to date, no paper has included data on testing in an analysis of the determinants of Olympic performance.

Model

If athletes are expected utility maximizers then an athlete will choose to take drugs if his or her expected utility from doing so exceeds that from not. This model, therefore, draws on the methods developed in Becker (1968). Let a be an athlete's baseline performance without drug use, which is the same for all athletes. [b.sub.j] is the multiplicative gain that athlete j derives from doping, where b[member of][[b.bar], [bar.b]]. [b.sub.j] is known to athlete j but the anti-doping agency only knows the underlying distribution from which [b.sub.j] is drawn. It is possible that [b.sub.j]<1, but in this case athlete j would never choose to take drugs. An athlete is tested with probability t and if the athlete tests positive he or she is banned for proportion n of the athlete's career. If an athlete takes drugs, he or she tests positive with probability [rho] while if the athlete does not he or she tests positive with probability [theta], where

[theta]<[rho]. Consequently, an athlete dopes if:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If t increases while n, the punishment for testing positive, is held constant, Pr[t [less than or equal to] [beta]([b.sub.j])] decreases, as long as [rho], the probability of a true positive, is strictly positive. (2) Consequently, as the probability with which an athlete is tested increases, provided the punishment for testing positive remains unchanged, we would expect the proportion of athletes taking drugs to decline. It is important to note that we would not necessarily expect to see a negative relationship between testing and doping if countries were able to compensate for a low testing probability by setting a harsher punishment. However, in reality countries cannot do this since WADA's Code (WADA, 2015) specifies set punishments that are standardized across sports and all countries that competed at the 2012 Olympics are signatories to the Code.

Even with standardized punishments, an increase in the testing probability only works to deter an athlete from doping if it increases the probability with which the athlete expects to test positive if he or she takes drugs, [rho]t. Consequently, if [rho] and [theta] are both equal to zero, an increase in t does not deter drug taking because the athlete expects that even if he or she takes drugs and is tested, he or she will never test positive.

Ideal Approach

Ideally, we would investigate the relationship between testing probability and an athlete's choice of doping by carrying out a randomized controlled trial in which different athletes were tested with different probabilities and we could observe whether or not each athlete chose to take drugs. However, even assuming that permission for such a trial could be acquired from an anti-doping agency, it would not be possible to accurately observe whether or not an athlete had taken drugs. Consequently, it is necessary to use a more indirect approach to analyze this issue.

Alternative Approach

The approach this paper takes is to use the relationship between Olympic performance and testing to infer a relationship between testing and performance. The way in which this works can be seen from the chain rule:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [M.sub.is] is the proportion of available Olympic medals won by country i in sport s, [SPA.sub.is] is samples collected in a year per athlete of country i in sport s, and [D.sub.is] is the proportion of athletes choosing to dope in country i in sport s. Term 3, the change in the proportion of athletes doping when testing changes, is the term we are really interested in, but data is not available to directly calculate term 3. Instead, data is available to calculate term 1, the change in Olympic performance when testing changes. From the chain rule it can be seen that it is possible to infer the sign, although not the magnitude, of term 3 from term 1, provided term 2 can be signed.

Term 2 is the change in Olympic performance when the proportion of athletes doping changes. There is a sizeable body of scientific evidence suggesting that doping improves performance in a wide range of sports such as cycling and weightlifting. In one study on recreational athletes, taking EPO increased running time to exhaustion by 34% (Buick et al., 1980) while in another, the time taken to run five miles improved significantly (Williams et al., 1980). Meanwhile, taking human growth hormone has been shown to result in a 4% increase in sprinting capacity (Meinhardt et al., 2010). In addition, there is evidence that athletes are motivated to "win at all costs" and are therefore willing to take drugs despite the risks that this entails. For example, Krumer et al. (2011) find that professional athletes have a higher subjective discount rate than non-athletes, suggesting that they are more biased to the present.

Consequently, we would expect that for at least some Summer Olympic sports, there would be a positive relationship between the proportion of athletes from a country who dope and Olympic performance of the country in question. This implies that, especially for sports with a sizeable strength, stamina, or speed component, [dM.sub.is]/[dD.sub.is] is likely to be positive. However, for other sports, and in particular skill-based sports, there is little evidence that drugs are able to improve performance.

If term 2 is positive, this means that the sign of term 1 must be the same as the sign of term 3. Consequently, if we find a negative relationship between Olympic performance and testing, this implies the relationship between testing and doping must also be negative.

Empirical Framework

Data

The dependent variable is the proportion of available medals won by country i in sport s at the 2012 Summer Olympic Games. The source for this data was the official website of the Olympic Games. (3) All medals were weighted equally so regardless of whether a country won gold or bronze, this was recorded as one medal. Using medal counts or medal shares is the standard approach used in the literature on Olympic performance by authors such as Bernard and Busse (2004), Johnson and Ali (2004), Leeds and Leeds (2012), and Pierdzioch and Emrich (2013). There are several reasons for using medal shares as our measure of Olympic performance. First, since at the elite level, even a small performance improvement can make the difference between winning a medal or not, we might expect medal shares to be particularly responsive to doping. Second, if a weighting system were to be employed, it is not clear how a gold medal should be weighted relative to silver or bronze. Finally, if gold, silver, and bronze medals were analyzed separately, the number of observations that do not have a positive entry for the dependent variable would increase even further. It was necessary to normalize the medal counts by the number of available medals in each sport since different sports award different numbers of medals. In total there are 1,673 observations at the level of country and sport.

The source of data for testing statistics is the World Anti-Doping Agency. All data on testing statistics is presented in the 2012 Anti-Doping Testing Figures Report, which is publicly available on WADA's website. (4) This data was published for the first time in relation to testing carried out in 2012. The report provides a detailed breakdown of the number of tests carried out by each country in each sport. For example, data is available on the number of tests carried out by the United Kingdom on cyclists. Since several organizations within the same country may carry out drug tests, the figures for each organization within a country were aggregated in order to produce a single figure for each country.

Some countries, and in particular many third-world countries, lack the resources to conduct their own testing program. Such countries are often members of a regional anti-doping organization (RADO). RADOs pool resources in order to carry out tests on athletes belonging to member countries. (5) Information is only available on the total number of tests carried out by each RADO in each sport; we do not have data on the number of tests carried out by the RADO on athletes of each member country. Therefore RADO tests were allocated to member countries in proportion to the size of the team that each country sent to the Olympics. Since in total only 403 tests were carried out by all RADOs across all sports, this step is unlikely to significantly impact the results.

The difficulty with this data set is that data is only provided at the level of country and sport. We do not have data on how many times an individual athlete was tested or even on how many athletes in total were tested. Consequently, there is no information about the number of cyclists the United Kingdom tested, only the total number of tests carried out on cyclists. However, we are interested in how doping varies with tests per athlete, not total tests. The solution we adopt is to divide testing statistics by the team size country i sent to the 2012 Summer Olympics for sport s. If, for example, the UK carried out 10 tests on cyclists and sent two cyclists to the Olympics, we would record that the UK carried out five tests per cyclist. Data on team size was acquired from the website of The Guardian, (6) a British newspaper.

We might also expect that Olympic performance would display a strong element of path dependence with countries that previously performed well at certain sports also performing well in 2012. Bernard and Busse (2004) find lagged medal share is statistically significant and improves the fit of their model. Therefore the proportion of medals won by country i in sport s in 2008 is included as an independent variable. This captures the ability of a country in a specific sport.

In order to account for the possibility that a variety of omitted variables may affect both testing and Olympic performance, country fixed effects are included in the regression. Since variables such as GDP p.c., population, and corruption do not vary by sport, the effect of these variables is also captured through the fixed effects.

Summary statistics for the variables pooled across all observations are given in Table 1.

Potential Issues

With an ideal data set, it would be possible to infer the sign of the relationship between testing and doping from the sign of the relationship between testing and Olympic performance. However, due to a lack of data, there are several issues that may hamper any such attempt. These issues can be divided into three categories: issues of aggregation, issues of omitted variables, and issues of timing.

Issues of Aggregation

The first potential issue can be seen from Table 1, which shows that the maximum number of tests per athlete in the pooled data set is 414. This figure is clearly implausibly high and occurs due to our method of handling the problem that the finest division of testing statistics is at the country-sport level. Since we only have data on the total number of tests carried out by a country in a sport, we created a value for samples per athlete by dividing by Olympic team size. The difficulty with this approach is that not all athletes who are tested may be sent to the Olympics. In a small number of cases, this results in a value for samples per athlete which is infeasibly high. Therefore some of our regressions were run restricting observations to those in which samples per athlete was less than once per week. (7)

The second issue of aggregated test statistics is that even if the Olympic team size accurately conveyed the number of athletes being tested, average figures may mask effects if testing is targeted. For example, we can imagine a simple case in which there exist two heterogeneous groups of athletes. Group A may dope if the probability with which they are tested is sufficiently low. Group B has high moral standards and will never take drugs regardless of the testing probability. An anti-doping agency moves from a regime in which all athletes are tested with equal probability to one in which athletes in group A are tested more often than athletes in group B but the total number of tests is unchanged. The result of this would be a decline in average performance, but no change in total tests. Consequently, even though testing is effective (it deters group A from doping), this is not captured in the relationship between the average number of tests per athlete and performance. This model is only able to assess whether carrying out a greater number of tests deters cheating; it is incapable of analyzing whether an improvement in the quality of testing, for example through targeted testing, with no change in the total number of tests carried out, acts as a deterrent. This is because no data is available on the quality of testing by each national anti-doping agency.

Consequently, in this model, quality of testing is an omitted variable and may result in omitted variable bias. This issue will arise if countries have different qualities of testing and the quality of a country's testing is systematically related to the number of tests that are carried out. The direction of the bias depends on whether quality and number of tests are positively or negatively correlated. It seems most likely that quality and number of tests would be positively correlated since countries with greater resources are likely to be able to both carry out more tests and devote more money to better targeting tests. In this case, the relationship between number of tests and performance would be biased in a negative direction. Alternatively, there could be a negative relationship between the number of tests performed by a country and the quality of testing. This would occur if countries compensated for a lack of quality testing by carrying out a greater number of tests. In this case, any negative relationship between number of tests and performance could be masked by a correspondingly lower quality of testing positively impacting on performance. Most countries have only one organization that is responsible for carrying out tests for all sports. Consequently, we would not expect the quality of testing carried out by a country to differ systematically across sports. Therefore including country fixed effects in the regression should account for cross country differences in quality of testing.

Issues of Omitted Variables

We can only interpret the coefficient on samples per athlete (SPA) if there is not omitted variable bias. This will occur if there is a variable that is correlated with both SPA and Olympic performance, for which we do not account. An obvious candidate for such a variable is funding for sport. Countries with more funding for sport are likely to perform better at the Olympics since they are able to spend more money on coaches and facilities. This funding is also likely to be used for testing so these countries also carry out more tests. If this is true and we do not account for this effect, the coefficient on samples per athlete would be positively biased, since it would also be capturing the effect of increased funding. Since data is not available on funding for sport, country fixed effects are included in the regression.

The second potential issue with regard to omitted variables occurs because the data on testing only captures the average number of times an athlete was tested by his or her national anti-doping authority. However, athletes can also be tested by other organizations such as the testing authority for their sport. Although data is published on how many tests were carried out by other organizations in a sport, there is no information regarding how these tests are distributed across countries. In an ideal data set, we would observe the number of times an individual athlete was tested by any anti-doping agency, [SPA.sub.Tis]. Using [SPA.sub.N] to refer to samples per athlete conducted by a national anti-doping agency and [SPA.sub.O] to refer to samples per athlete carried out by other organizations, and for notational simplicity, ignoring other variables, we would want to estimate:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Instead what we are estimating is:

[M.sub.is] = [alpha] + [[gamma].sub.1]SPA[N.sub.Nis] + [v.sub.is]

Therefore:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [??] is the coefficient from a regression of [SPA.sub.Ois] on [SPA.sub.Nis]. It is possible that if a country carries out more tests on its athletes, other testing agencies choose to carry out fewer tests. If this is the case [SPA.sub.Nis] and [SPA.sub.Ois] would be negatively correlated, implying [??]<0. It is not clear to what extent coordination of testing between authorities does occur, but regardless, we would not expect that when a country carries out one more test on an athlete, other organizations would reduce their number of tests by more than one. If it did in fact lead to a reduction of more than one, this would imply that the net effect of a country carrying out one more test would be a reduction in the number of total tests, which seems implausible. Therefore, -1<[??]<0 and the sign of [[??].sub.1] will be correct. However, we have attenuation bias since [[??].sub.1] is biased towards zero. Our regression results will accurately convey the impact of an increase in national testing on Olympic performance. However, this may provide an under-estimate of the impact of an overall increase in testing if there is negative correlation between the tests carried out by different authorities.

Issues of Timing

Finally, issues of timing occur as a result of data availability. The use of testing statistics for all of 2012 is potentially problematic, but unfortunately WADA does not publish testing statistics by month, nor were they published prior to 2012. The difficulty with using testing statistics from 2012 to analyze Olympic performance in 2012 is that while the Summer Olympics took place in August, the testing year ran until December. Ideally, we would have data on the number of samples per athlete in the year prior to the Olympics. Instead, some of the samples recorded in our testing statistics were carried out after the Olympics. This would be of particular concern if there was feedback from Olympic performance to testing. It seems unlikely that such feedback would occur. First, it is unlikely that sports' budgets would be reallocated so quickly after the Olympics. Second, in most countries testing is carried out by a national testing organization, not a national sport-specific organization. It seems most likely that the budget for such an organization would be independent of Olympic performance.

Even without feedback, the use of data from all of 2012 rather than the year prior to the Summer Olympics introduces measurement error into the testing statistics. If we use the classical errors in variables model, coefficients will be biased towards zero. However, the variance of the measurement error is likely to be small and as a robustness check, results were also derived using data from the 2013 World Championships in Athletics.

It is also possible to use the regression results to determine the extent to which problems have arisen due to normalizing by Olympic team size and funding being an omitted variable. Unfortunately, though, without a more detailed breakdown of the testing statistics it is not possible to assess the extent to which the other potential problems have occurred.

Regression Specification

The dependent variable in the regression is a proportion. Therefore we would ideally carry out a non-linear regression such as GLM with family set to binomial and link to logit. However, the problem with such non-linear models is that the coefficients are generally biased when fixed effects are included in the regression (Greene, 2004). There is considerable controversy surrounding whether or not it is appropriate to use the linear probability model when the dependent variable is limited. In our context, the advantage of using OLS is that including fixed effects then does not bias the coefficients. However, OLS can produce predictions that are less than zero or greater than one. Given there is no clear consensus on the best approach in this situation, regressions were carried out using both OLS and GLM.

The regression specification is:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where 12[M.sub.is] is the proportion of medals won by country i in sport s in 2012 and 8[M.sub.is] is defined analogously for 2008; [SPA.sub.is] is the number of samples per athlete carried out by country i in sport s in 2012, while [[gamma].sub.i] are the country fixed effects.

Results

Results are presented in Table 2. Track & field, judo, rowing, shooting, and wrestling all have a coefficient on samples per athlete that is negative and significant at the 5% level in at least three of the regression specifications. Boxing and weightlifting have significant coefficients both when OLS and GLM are used, but only when all observations are included. Aquatics and table tennis both have a significant coefficient in only one of the regressions. However, in other sports we cannot reject the null hypothesis that there is no relationship between testing and the proportion of medals won ([[beta].sub.1s]=0). These results, therefore, suggest that while in some sports there exists a negative relationship between testing and doping, in other sports no such relationship exists.

Results are presented in Table 2 from regressions using both OLS and GLM techniques. While the magnitudes of the OLS and GLM coefficients cannot be compared, their statistical significance can be. This suggests the OLS and GLM techniques produce similar results since for all except four sports, both techniques show either no significance at the 5% level or significance at least at the 5% level. In addition, results are presented both when all observations are included in the regression and when observations are restricted to those in which samples per athlete is less than once per week. Restricting observations means that 39 observations are dropped.

For the vast majority of sports and specifications, the coefficient on medal share in 2008 is positive and highly significant. This confirms the finding of other studies, such as Bernard and Busse (2004), which have shown that Olympic performance is highly path dependent.

Robustness

Using these results, it is possible to assess, at least to some extent, the likelihood that the problems discussed in this paper have arisen with this data set. Since many countries have only one national testing agency, as opposed to separate national testing agencies for each sport, we would expect any biases that arise as a result of the available data to affect the coefficients on all sports in a similar manner. The first issue that was discussed was that since the testing statistics are only available at the country-sport level, data on samples per athlete was generated by dividing these statistics by Olympic team size. In a minority of observations, though, this approach results in the number of samples per athlete being implausibly high. Consequently, regressions were carried out both on the full set of observations and restricting observations to those in which samples per athlete is less than once per week. Table 2 suggests that the results are to some extent dependent on observations with high values of samples per athlete. Comparing between the regressions in which observations are restricted and those in which they are not, in 16 sports, restricting observations does not alter whether or not the coefficient on samples per athlete is significant at the 5% level for both OLS and GLM. Five sports have a coefficient that is significant at the 5% level with all observations but do not when observations are restricted when either OLS or both OLS and GLM techniques are used. Aquatics is the only sport that has a significant coefficient when observations are restricted, but otherwise does not.

This suggests that for some sports, the analysis is sensitive to the inclusion of high values of samples per athlete, especially when OLS is used. Considering the sports that seem to be sensitive to this suggests that issues of aggregation are likely to be worst in sports in which many more athletes compete than there are available places at the Olympics. In addition, sports such as boxing have a strong divide between amateur and professional athletes, with only amateur athletes being eligible to compete at the 2012 Summer Olympics.

Another issue that was discussed was that funding could simultaneously affect testing and Olympic performance. This would result in the coefficients on samples per athlete being biased in a positive direction. It is therefore reassuring that no sport has a positive and significant coefficient. It is likely that in some sports, especially skill-based sports, drug taking offers little benefit and therefore does not occur. Such skill-based sports may also be precisely those sports in which increased spending on coaching and facilities is most beneficial. Consequently, if the analysis was picking up residual effects of funding, we would expect at least some sports, especially sports such as sailing, to have positive and significant coefficients. It is therefore reassuring that none of the regressions have a positive and significant coefficient on samples per athlete for any sport.

A further potential issue is that while the Summer Olympics were in August, the testing year ran until December. In order to check the results are not being biased as a result of feedback from Olympic performance to testing, the regressions were repeated, substituting the 2012 medal proportions for track & field with data from the 2013 World Championships in Athletics. The coefficients on samples per athlete for track & field were closely comparable in both magnitude and significance, suggesting that feedback is not a cause for concern.

As an additional robustness check, a panel data set was compiled for track & field. This consisted of the proportion of medals won at the 2012 Olympics and the 2013 World Championships in Athletics as well as samples per athlete by country for both years. Samples per athlete for 2013 were computed using the WADA 2013 testing statistics for total number of tests carried out by each country in track & field, and then dividing by the size of the team each country sent to the 2013 World Championships. Coefficients were computed using the regression specification: (8)

[M.sub.it] = [alpha] + [[beta].sub.1]SP[A.sub.it]+[[gamma].sub.i] + [[epsilon].sub.it]

where t refers to the time period.

The same four techniques previously described were used to compute the coefficient on samples per athlete. The results are given in Table 3. In the main data set the coefficient on samples per athlete for track & field was negative and significant in the majority of the specifications. This is also the case in the panel data set, suggesting that the results from the main data set are indeed robust. We would not necessarily expect the magnitude of the coefficients to be comparable across these two data sets since for the panel data set fixed effects capture country fixed effects specifically for track & field, while when the main data set was used, the fixed effects were not sport specific.

Discussion

Overall, the results suggest that in some sports the anti-doping picture is potentially less bleak than it is often portrayed as being. This is particularly true of sports such as track & field, judo, rowing, shooting, and wrestling. In addition, boxing and weightlifting perform well in some of the specifications. These are all sports in which we would expect doping to potentially improve performance. In endurance track & field events and rowing, methods of doping that improve an athlete's red blood cell count are known to be effective (Thompson, 2012). Meanwhile, in disciplines requiring strength or speed, steroids have been shown to enhance performance. At first, the presence of shooting in the list of sports in which testing appears to deter doping may be puzzling. However, shooters can benefit from using beta-blockers to reduce tremors, allowing them to have a steadier aim (Oransky, 2008).

But in other sports in which doping is believed to be prevalent it does not appear to be the case that increasing testing significantly reduces doping. It is notable that cycling does not have a negative and significant coefficient on samples per athlete in any of the regression specifications, despite the fact that blood doping is believed to be widespread in the sport.

These results imply that in sports such as track & field, carrying out more tests may deter athletes from doping. It is important to note that the results do not imply that doping is not an issue in the sports with a negative and significant coefficient. Nevertheless, they do suggest that a relative simple, though expensive, solution to the doping problem is to test athletes more frequently. In other sports, such as cycling, it seems that the problem of doping may have deeper roots. In order to better understand this, we need to consider why an athlete may not change his or her decision about whether or not to take drugs when the athlete is tested more frequently. The key reason for this is if the athlete does not believe he or she will test positive regardless of how often he or she is tested. In turn, there are two main explanations for why an athlete may not believe he or she will ever test positive. The first is if the tests being used by anti-doping agencies are not capable of detecting the substances being used by athletes. For example, a test for THG, a designer steroid, was only developed when the U.S. anti-doping agency was anonymously sent a syringe containing the substance. A second possibility is that anti-doping agencies are colluding with athletes. This would be the case if anti-doping agencies were either warning athletes about upcoming tests or falsifying test results. There is evidence that the Russian Anti-Doping Agency has engaged in both of these practices (Pound et al., 2015).

This paper compares the effectiveness of testing across sports rather than across countries. The results, therefore, cannot be used to compare the testing regimes of different countries nor to comment directly on the Russian doping scandal. However, the results for certain sports are consistent with the possibility that at least some countries have corrupt testing regimes. In the future we hope to adapt our method to analyze the efficiency of testing across countries rather than across sports.

It is important to treat the results presented in this paper with caution. Several assumptions were required to conduct the analysis, mainly as a result of limited data availability. While it appears that many of the assumptions are indeed satisfied, it is not possible to check the validity of all of the assumptions.

Conclusion

The results presented in this paper suggest that while for some sports increased testing deters athletes from doping, in other sports there is no evidence that this is the case. Sports such as track & field and wrestling appear to have testing regimes that are potentially capable of deterring doping, while cycling does not. This suggests that in sports such as track & field, carrying out more tests would be an effective method for solving the problem of doping. In sports such as cycling, though, more analysis is required to determine why athletes may not believe they are at risk of testing positive despite being tested.

As WADA publishes testing statistics for more years, a panel data approach to analyzing the relationship between testing and doping will become possible. In particular, after the 2016 testing statistics are published, it will be possible to compile a panel data set consisting of two Summer Olympic Games. In the meantime, WADA continues to make new information available and for the first time, the 2013 testing statistics contain a breakdown of testing in specific disciplines, such as marathon running, within a sport. The availability of more detailed data and a greater quantity of it should enable more in-depth analysis to be carried out in the future.

References

Becker, G. (1968). Crime and punishment: An economic approach. Journal of Political Economy, 76, 169-217.

Berentsen, A. (2002). The economics of doping. European Journal of Political Economy, 18, 109-127.

Bernard, A. B., & Busse, M. R. (2004). Who wins the Olympic Games: Economic resources and medal totals. The Review of Economics and Statistics, 86, 413-417.

Buechel, B., Emrich, E., & Pohlkamp, S. (2014). Nobody's innocent: The role of customers in the doping dilemma. Journal of Sports Economics.

Buick, F. J., Gledhill, N., Froese, A. B., Spriet, L., & Meyers, E. C. (1980). Effect of induced erythrocythemia on aerobic work capacity. Journal of Applied Physiology, 48, 636-642.

Greene, W. H. (2004). The behavior of the fixed effects estimator in non-linear models. Econometrics Journal, 7, 98-119.

Hermann, A., & Henneberg, M. (2014). Anti-doping systems in sports are doomed to fail: A probability and cost anaylsis. Journal of Sports Medicine and Doping Studies, 4, 148-159

Johnson, D., & Ali, A. (2004). A tale of two seasons: Participation and medal counts at the Summer and Winter Olympic Games. Social Science Quarterly, 85, 974-993.

Krumer, A., Shavit, T., & Rosenboim, M. (2011). Why do professional athletes have different time preferences than non-athletes? Judgment and Decision Making, 6, 542-551.

Leeds, E., & Leeds, M. (2012). Gold, silver, and bronze: Determining national success in men's and women's summer Olympic events. Journal of Economics and Statistics, 232, 279-292.

Maennig, W. (2002). On the economics of doping and corruption in international sports. Journal of Sports Economics, 3, 61-89.

Meinhardt, U., Nelson, A. E., Hansen, J. L., Birzniece, V., Clifford, D., Leung, K.-C., Graham, K., & Ho, K. K. Y. (2010). The effects of growth hormone on body composition and physical performance in recreational athletes: A randomized trial. Annals of Internal Medicine, 152, 568-577.

Mitchell, H., & Stewart, M. F. (2004). Does drug-testing deter participation in athletic events? EPO and the Sydney Olympics. Chance, 17(2), 13-18.

Oransky, I. (2008, August 15). Why would an Olympic shooter take propranolol? Scientific American. Retrieved from http://www.scientificamerican.com/article/olympics-shooter-doping-propranolol/

Pierdzioch, C., & Emrich, E. (2013). A note on corruption and national Olympic success. Atlantic Economic Journal, 41, 405-411.

Pound, R. W., McLaren, R. H., & Younger, G. (2015). The Independent Commission Report #1, Technical report.

Thompson, H. (2012). Performance enhancement: Superhuman athletes. Nature, 487, 287-289.

WADA (2015). World Anti-Doping Code 2015, Technical report.

Williams, M. H., Wesseldine, S., Somma, T., & Schuster, R. (1980). The effect of induced erythro-cythemia upon 5-mile treadmill run time. Medicine and Science in Sports and Exercise, 13, 169-175.

Endnotes

(1) Examples include Berentsen (2002), Buechel et al. (2014), and Maennig (2002).

(2) Doping can only be deterred if [beta]([b.sub.j]) [less than or equal to] 1. This will always be the case if b [less than or equal to] (1-[theta]n)/(1-[rho]n).

(3) http://www.olympic.org/

(4) https://www.wada-ama.org/

(5) For example, the participating countries in the South Asia RADO are Bangladesh, Bhutan, Maldives, Nepal, and Sri Lanka.

(6) http://www.theguardian.com/sport/datablog/2012/jul/30/olympics-2012-alternative-medal-table#data

(7) Once per week is to some extent an arbitrary cutoff. However, given the cost of conducting tests, we believe it is highly unlikely that on average athletes would be tested more than once per week.

(8) Data is available for performance at the 2011 World Championships in Athletics but testing statistics are not available for this year. Since we have three periods of data for the dependent variable, it is also possible to compute coefficients for a regression specification that includes both fixed effects and lagged performance using the Arellano-Bond estimator. When this is done, the coefficient on samples per athlete is negative but not significant. However, the coefficient on lagged performance is also not significant while in the main regression it was highly significant. This suggests it is not appropriate to include both fixed effects and lagged performance in the regression. In the main data set that consists of only Olympic data, we can only compute general fixed effects rather than sport-specific fixed effects. Consequently, it is necessary to also include previous Olympic performance in order to account for the ability of a given country in the specific sport in question. In contrast, in the track & field panel data set, the fixed effects computed are country fixed effects specifically for track & field. Therefore, for the track & field data, fixed effects already capture a country's ability specifically in track & field.

Authors' Note

The authors would like to thank Ian Crawford and Tom Norman for their valuable comments on this paper. We are grateful to two anonymous referees for their helpful feedback. We would also like to acknowledge the Economic and Social Research Council (ESRC) for their financial support.

Claire Baudouin1 and Stefan Szymanski2

(1) University of Oxford

(2) University of Michigan

Claire Baudouin recently completed her PhD at the Department of Economics. Her research focuses on the economics of doping.

Stefan Szymanski is the Stephen J. Galetti professor of sport management in the School of Kinesiology. His research interests focus on the economics and history of sport. Table 1. Summary Statistics Variable Mean Std. Dev. Min Max 2012 medal proportion 0.0155409 0.0450829 0 0.5333334 Samples per athlete 7.549979 17.61526 0 414 2008 medal proportion 0.0150872 0.0472328 0 0.6666667 Table 2. Coefficients on Samples per Athlete I II III Method OLS GLM OLS Observations All All SPA <53 Aquatics -0.000162 (*) -0.0500 (*) -0.000533 (**) (8.99e-05) (0.0296) (0.000263) Archery -0.000898 -0.0806 -0.000910 (0.000553) (0.0567) (0.000594) Badminton -0.000721 (*) -0.0625 -0.000691 (*) (0.000384) (0.0387) (0.000373) Boxing -0.000142 (**) -0.0179 (**) -0.000125 (7.12e-05) (0.00879) (0.000243) Canoe -0.000204 -0.0361 (*) -0.000223 (0.000287) (0.0202) (0.000296) Cycling -7.79e-05 -0.00575 -8.32e-05 (7.35e-05) (0.00608) (0.000216) Equestrian 6.97e-05 -0.0266 8.92e-05 (0.000904) (0.0690) (0.000924) Fencing 5.28e-05 -0.0111 3.89e-05 (0.000416) (0.0310) (0.000401) Gymnastics -0.000168 -0.0309 -0.000170 (0.000186) (0.0297) (0.000164) Judo -0.000226 (**) -0.0383 (**) -0.000157 (9.91e-05) (0.0169) (0.000208) Other 0.000173 -0.00497 0.000783 (0.000516) (0.00715) (0.00103) Rowing -0.000358 (***) -0.0596 (***) -0.000846 (***) (0.000108) (0.0228) (0.000217) Sailing -0.00147 -0.152 -0.00156 (0.00108) (0.0990) (0.00115) Shooting -0.000427 (**) -0.102 (***) -0.000442 (**) (0.000178) (0.0393) (0.000198) Table Tennis -0.000434 (**) -0.0516 -0.000441 (*) (0.000211) (0.0382) (0.000242) Taekwondo 0.000378 0.00522 2.46e-05 (0.000231) (0.00506) (0.000311) Tennis -0.000232 -0.0143 -0.000272 (0.000475) (0.0291) (0.000488) Track & Field -0.000204 (**) -0.0574 (**) -0.000312 (*) (9.40e-05) (0.0263) (0.000187) Triathlon 0.000242 0.00169 0.000570 (0.000433) (0.0119) (0.000647) Volleyball -0.000243 -0.0452 -0.000341 (0.000357) (0.0481) (0.000766) Weightlifting -7.80e-05 (**) -0.0328 (***) -0.000314 (3.14e-05) (0.0107) (0.000196) Wrestling -0.000379 (***) -0.0462 (***) -0.000441 (***) (0.000146) (0.0176) (0.000165) Observations 1,673 1,673 1,634 IV Method GLM Observations SPA <53 Aquatics -0.0445 (0.0308) Archery -0.0741 (0.0572) Badminton -0.0557 (0.0371) Boxing -0.0200 (0.0141) Canoe -0.0341 (0.0208) Cycling -0.000127 (0.00878) Equestrian -0.0216 (0.0712) Fencing -0.00828 (0.0297) Gymnastics -0.0256 (0.0257) Judo -0.0593 (***) (0.0225) Other 0.00894 (0.0228) Rowing -0.0551 (**) (0.0234) Sailing -0.136 (0.103) Shooting -0.0972 (**) (0.0396) Table Tennis -0.0467 (0.0392) Taekwondo -0.00381 (0.0176) Tennis -0.0108 (0.0289) Track & Field -0.0536 (**) (0.0271) Triathlon 0.0227 (0.0247) Volleyball -0.0350 (0.0531) Weightlifting -0.0367 (*) (0.0193) Wrestling -0.0577 (**) (0.0249) Observations 1,634 Notes: Clustered standard errors in parentheses. (***) p < 0.01, (**) p < 0.05, (*) p< 0.1. The regression also included the proportion of medals won in 2008 and country fixed effects. Table 3. Coefficients on Samples per Athlete from Track & Field Panel Data Set Method I II Observations OLS GLM All All SPA -0.000124 (*) -0.0400 (**) (6.28e-05) (0.0186) Observations 406 406 Method III IV Observations OLS GLM SPA<53 SPA <53 SPA -0.000144 (**) -0.0400 (**) (6.88e-05) (0.0186) Observations 402 402 Notes: Clustered standard errors in parentheses. (***) p < 0.01, (**) p < 0.05, (*) p < 0.1. The regression also included country fixed effects.