首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:On the external validity of laboratory tax compliance experiments.
  • 作者:Alm, James ; Bloomquist, Kim M. ; McKee, Michael
  • 期刊名称:Economic Inquiry
  • 印刷版ISSN:0095-2583
  • 出版年度:2015
  • 期号:April
  • 语种:English
  • 出版社:Western Economic Association International
  • 摘要:Laboratory methods are now widely accepted as a methodological approach in economics and, increasingly, they have been used to examine specific public policy issues. There is much to be gained from careful laboratory experiments. They offer a low cost means of testing (and replicating) policy innovations, and they generate precise data on individual behavior, thereby allowing estimation of behavioral responses. Importantly, they allow many policy innovations to be introduced singly and exogenously in a controlled environment, and as a result, laboratory experiments are typically seen as having a high degree of "internal validity" (Brewer 2000; Campbell and Stanley 1966; Shadish, Cook, and Campbell 2002) because the causal relation between variables can be properly demonstrated. However, as emphasized by Plott (1987), using laboratory experiments can allow more general inferences regarding human behavior only when the setting implemented in the laboratory parallels what is observed in the naturally occurring world. Even beyond this requirement of "parallelism" is the need for generalizing behavioral observations from the laboratory to the field, or "external validity." Internal validity can be demonstrated through the evaluation of the design. However, external validity can only be verified empirically and only with respect to the specific setting being investigated. This paper investigates the external validity of laboratory experiments, and does so in the context of experiments on tax compliance behavior.
  • 关键词:Consumer behavior;Taxpayer compliance

On the external validity of laboratory tax compliance experiments.


Alm, James ; Bloomquist, Kim M. ; McKee, Michael 等


I. INTRODUCTION

Laboratory methods are now widely accepted as a methodological approach in economics and, increasingly, they have been used to examine specific public policy issues. There is much to be gained from careful laboratory experiments. They offer a low cost means of testing (and replicating) policy innovations, and they generate precise data on individual behavior, thereby allowing estimation of behavioral responses. Importantly, they allow many policy innovations to be introduced singly and exogenously in a controlled environment, and as a result, laboratory experiments are typically seen as having a high degree of "internal validity" (Brewer 2000; Campbell and Stanley 1966; Shadish, Cook, and Campbell 2002) because the causal relation between variables can be properly demonstrated. However, as emphasized by Plott (1987), using laboratory experiments can allow more general inferences regarding human behavior only when the setting implemented in the laboratory parallels what is observed in the naturally occurring world. Even beyond this requirement of "parallelism" is the need for generalizing behavioral observations from the laboratory to the field, or "external validity." Internal validity can be demonstrated through the evaluation of the design. However, external validity can only be verified empirically and only with respect to the specific setting being investigated. This paper investigates the external validity of laboratory experiments, and does so in the context of experiments on tax compliance behavior.

Tax evasion is central to many important policy questions. Current estimates report the "tax gap" (or difference between taxes owed and taxes paid) in the United States to be $450 billion annually (Internal Revenue Service 2012). Beyond these massive revenue losses, evasion creates major misallocations in resource use when individuals alter their behavior to cheat on their taxes. Its presence requires that government expend resources to detect noncompliance, to measure its magnitude, and to penalize tax evasion. Evasion alters the distribution of income in arbitrary, unpredictable, and unfair ways, and it may contribute to feelings of unjust treatment and disrespect for the law. More broadly, it is not possible to understand the true impact of taxation without recognizing the existence and the effects of tax evasion.

Laboratory methods have been used to examine a wide range of policies that may affect the compliance decision, policies that have not always proven amenable to either theoretical analyses or empirical analyses with field data. However, laboratory studies of compliance are sometimes viewed with skepticism. The most common criticism is that the student subjects typically used in experiments may not be representative of taxpayers. Undergraduates may have little experience with filing tax returns, and their economic and demographic backgrounds may differ from those of taxpayers. Another criticism is that the context of laboratory compliance experiments does not closely enough resemble the context in which actual compliance decisions are made. As a result, there is a concern that experimental results on policy innovations that rely upon student subjects in laboratory compliance experiments cannot be generalized to the population. It is this issue that we examine here. Building on previous research, we present several types of evidence on the external validity of experiments on individual compliance decisions. A first question examines whether behavior of laboratory participants is replicated by behavior of individuals making a similar decision in the naturally occurring world; that is, do participants in laboratory experiments exhibit different patterns of behavior than individuals in a similar naturally occurring setting? To answer this question, we utilize a special data set from the U.S. Internal Revenue Service (IRS) assembled as part of its National Research Program (NRP). These data allow us to compare actual taxpayer behavior with data generated by laboratory subjects, where everyone is engaged in a similar tax reporting decision. A second question examines a different aspect of external validity: that is, do students behave differently than nonstudents in identical laboratory experiments? We are able to answer this question with further analysis of previously reported data from laboratory experiments that compare the decisions of a population of adults with those of undergraduate students, both of whom participate in the identical laboratory experiment.

Together, we are therefore able to examine whether the "moments of the data" (e.g., the mean reporting compliance rate and its distribution) are similar when estimated in naturally occurring versus laboratory settings, and also whether the "treatment effects" of policy innovations are similar when estimated with different subjects (e.g., students and nonstudents).

Our analysis indicates that there is an overall similarity between the behavior of individual taxpayers in the field and of student subjects making comparable decisions in the laboratory, so that data from the laboratory closely align with data from the field. Our analysis also indicates that student and nonstudent subjects exhibit broadly similar behavior in the laboratory, even though there are some small differences in their responses to individual policy treatments. These results confirm that compliance behavior in the laboratory generalizes beyond the laboratory.

II. THE PROMISE AND THE PITFALLS OF LABORATORY EXPERIMENTS

As a science, economics is based on the development of theory and on the ability of that theory to explain observed behavior. However, unlike some other sciences, economics faces difficulties in empirically testing the predictive power of its theories using field data from the naturally occurring world. Even where field data are readily available, it is almost impossible to ensure the independence required to conduct econometric research using field data (Manski 2000). (1) Controlled field experiments can achieve this independence, and they often use participants who are representative of the larger population of interest; however, field experiments require simplified procedures, they are costly to implement, and they may raise ethical issues. (2) Overall, despite many significant methodological advances in recent years, there are few instances in which identification using field data, whether naturally occurring or from controlled field experiments, is uncontroversial and easily achieved. (3)

The use of laboratory experiments is a different response to these difficulties. Experimental methods involve the creation of a real microeconomic system in the laboratory, one that parallels the naturally occurring world that is the subject of investigation and one in which subjects (usually students) make decisions that yield individual financial payoffs whose magnitude depends on their decisions. (4) The essence of this system is control over the environment, the institutions, the incentives, and the preferences that subjects face. Control over preferences is particularly crucial, and is achieved via the method of "induced values." As described by Smith (1976), "[s]uch control can be achieved by using a reward structure to induce prescribed monetary value on actions."

Tax compliance seems especially amenable to laboratory investigation. Theoretical models yield ambiguous results when asked to incorporate many of the factors deemed relevant to the individual compliance decision, and many empirical studies of tax compliance using field data are plagued by the absence of reliable information on individual compliance decisions. It is difficult to measure--and to measure accurately--something that by its very nature people want to conceal. Even when data are available and not subject to confidentiality restrictions, it is also difficult to control in econometric work for the resulting errors in variables and the many unobservable factors that affect the compliance decision. Even aside from cost concerns, controlled field experiments face many of these same problems stemming largely from a loss of control over the decision setting. Laboratory methods allow many factors suggested by theory to be introduced orthogonally. Experiments also generate precise data on individual compliance decisions, which allow econometric estimation of individual responses in ways that are simply not possible with field data. Indeed, laboratory methods have been used to examine a wide range of factors in the compliance decision, factors that have not proven amenable to either theoretical or empirical analyses with field data (Aim and Jacobson 2007).

Of course, there are some obvious limitations of laboratory experiments, especially if the intention is to use the results for informing public policy. Perhaps the most compelling critique comes from Levitt and List (2007), who caution researchers about making the "parallelism" assumption necessary in using laboratory experiments to make general statements about behavior outside the laboratory. As we have argued earlier, parallelism is an internal validity issue addressed by the design. However, the deeper essence of criticisms such as Levitt and List (2007) is the external validity of the results. This issue can only be addressed empirically. If laboratory results comport with field observations where such results are available and comparable, then one has greater confidence in applying the laboratory results in cases where field data are not available.5

Of perhaps most relevance to the external validity of compliance experiments are subject pool effects. It is typically the case that laboratory subjects for various tax compliance experiments are drawn from student populations. Levitt and List (2007) suggest that student responses are unlikely to be the same as nonstudent responses in large part because students are younger, better educated, less representative, and less experienced in the decisions being examined than nonstudents. If valid, these concerns are especially germane for tax compliance experiments where a common comment on experimental analysis of tax compliance is that "undergraduate volunteers differ from the taxpayer population in very important ways," and so cannot "tell us something" about typical taxpayer behavior (Gravelle 2009).

Subject pool effects can be examined by comparing the responses of student subjects with nonstudent subjects in (more-or-less) identical laboratory experiments. There are relatively few such studies, but the available evidence is that the experimental responses of students are often largely the same as the responses of other subject pools in similar laboratory experiments (Ball and Cech 1996; Chamess and Villeval 2009; Guth and Kirchkamp 2012; Guth, Schmidt, and Sutter 2007; Plott 1987). (6) Plott (1987) reports comparisons of behavior of student subjects with those of corporate executives in the same policy decision setting, and he observes similar decisions among the student subjects and the executives. Dyer, Kagel, and Levin (1989) study bidding behavior in auctions using experienced traders and students as subjects, and find similar results; Shogren etal. (1999) also find comparable responses between student and nonstudent subjects in a study of food safety choices in retail, survey, and experimental settings.

Also of importance for the external validity of compliance experiments are context effects. "Context" relates to the complex combination of individuals' perceptions and past experiences that influence how individuals respond in a laboratory setting designed to mimic the naturally occurring setting; that is, does the context in the laboratory decision resemble the context in the field for the same decision? The contextual setting effect can be examined by comparing student and nonstudent responses in laboratory experiments to the responses of participants in similarly constructed controlled field experiments, in which the same basic choice is examined in both settings. Brookshire, Coursey, and Schulze (1987) compare prices obtained from buyers of strawberries in a laboratory setting versus those in a field setting. The field setting in their study mimicked the laboratory market institution, but they implemented it with nonstudents making purchase decisions in their homes rather than in the laboratory. They find equivalent bidding behavior in both settings. More recently, there are investigations of behavior of fishermen (Carpenter and Seki 2011) and of water markets (Chermak et al. 2013). A range of other studies is summarized by Camerer (2011), in which student responses in laboratory experiments are compared to responses of participants in controlled field experiments in such areas as sports card trading, open-air flea markets, donations to student funds, soccer, communal fishing ponds, proofreading/exam grading, and restaurant spending. In most--although not in all--cases, these comparisons have shown no significant differences in behavior.

This previous literature has considered subject pool and context effects, but rarely have both been examined in the pursuit of a common decision setting and, to our knowledge, there are no studies that have looked at tax compliance. (7) Our focus is on tax reporting behavior, and we ask whether behavior observed in the laboratory is likely to be similar to the behavior observed in the naturally occurring environment. Specifically, do participants in laboratory experiments behave differently than individuals in a similar but naturally occurring setting? Further, do student subjects in tax compliance laboratory experiments behave differently than nonstudent subjects in identical laboratory experiments? The next sections present our results.

III. TEST (1): EXPERIMENTAL RESULTS VERSUS NONEXPERIMENTAL RESULTS

A first type of evidence compares experimental behavior with behavior in similar but naturally occurring settings (i.e., the field), in order to determine whether patterns of behavior in the field match patterns of behavior in the laboratory. For this evaluation of context effects, we compare the behavior of student subjects in experiments with that of individuals making similar decisions in the field who are subjected to random taxpayer audits conducted under the NRP of the IRS. We discuss the data, and we then present comparisons of tax reporting compliance by students in laboratory experiments versus actual taxpayers.

A. Data: Taxpayer Sample versus Experimental Sample

The comparisons here involved two separate data sets: taxpayer (field) data and experimental (laboratory) data. The "Taxpayer Sample" is a subsample of NRP data for tax year (TY) 2001 (Bennett 2005). In that year the NRP audited tax returns of 44,768 taxpayers selected using stratified random sampling, which can be weighted to represent the population of 125.8 million taxpayers who filed timely tax returns. Our subsample consisted of taxpayers whose sole source of income (pre- and post-audit) is from a Schedule C sole proprietorship. Filers with Schedule C income were selected because this source of income has no third-party information reporting (e.g., Form W-2 for wage income), and this mimics our laboratory setting in which there is no matched information on earnings. This reduced the NRP data to 1,673 NRP audit cases weighted to represent 1.1 million taxpayers. Our subsample was further narrowed to make the tax reporting task as similar as possible to the situation faced by laboratory subjects. Only those Schedule C filers having positive taxable income as determined by the examiner were selected. Again, excluding taxpayers with zero taxable income was done to ensure that taxpayers selected for comparison share circumstances similar to those faced by experimental subjects who decide to report none, some, or all of a positive amount of income. The resulting sample of taxpayers contained 1,101 NRP audit cases weighted to represent the tax returns of 559,555 individuals. Finally, within this data set, there were 29 cases where reported taxable income exceeded the amount of taxable income following examination. These cases (representing 13,131 taxpayers) were assumed to have 100% reporting compliance. (8)

Table 1 displays summary statistics for the Taxpayer Sample. The figures in the two rightmost columns refer to the mean of the individual reporting compliance rates and the overall mean reporting compliance rate, defined as the amount of taxable income reported divided by the amount of taxable income per exam. (9) The range of taxable income per exam for this sample spans five orders of magnitude from less than $40 to more than $4 million. The probabilities of audit for individual taxpayers as a whole in calendar year 2002 were .57% and 1.72% for all Schedule C filers (Internal Revenue Service 2002).

The experimental data ("Experimental Sample") were collected from college-age subjects using a basic experimental design similar to the design discussed in more detail later (Aim, Deskins, and McKee 2009; Aim, Jackson, and McKee 2009; Aim and McKee 2004; Aim et al. 2010, 2012; McKee et al. 2008). (10) Participants earned income, chose whether to file a tax return, and (conditional upon filing) self-reported tax liability to the tax authority at an announced tax rate. Audits occurred with an announced probability, and any underreporting was discovered by the audit. If the participant had not paid the appropriate tax, then both unpaid taxes and penalties were collected. This process was repeated over multiple rounds, and subjects were paid their after-tax earnings at the end of the experiment.

The "Full Sample" of these experimental data consisted of 16,560 observations from 1,072 individual subjects, and contained observations for base case (or no treatment) scenarios and several treatment scenarios, including the existence of a public good, unofficial communication among participants, and official communication from the tax authority. In our comparisons, we used a "Selected Sample," or data from only the base case scenarios. (11) In these base case sessions, participants were informed of the number of audits performed (including zero if no audits were performed) following each round. This is similar to the IRS policy that makes publicly available the number of audits performed each year. The Selected Sample subset had 3,780 observations from 252 individuals. Descriptive statistics for both samples are shown in Table 2, for the five different audit rates in the experiments.

B. Mean Reporting Compliance Rates

A comparison of Tables 1 and 2 showed that mean reporting compliance rates (computed as the average of individual compliance rates) for the lowest two audit rate categories in the Selected Sample of the Experimental Sample were comparable to the unweighted mean compliance rate for individuals in the Taxpayer Sample. (12) The mean reporting compliance rate in the Experimental Sample is .286 when the audit rate is zero and .368 when the audit rate is .05. Assuming the TY 2001 audit rate of 1.72% for Schedule C filers, we interpolated a reporting compliance rate of .314 for the Experimental Sample for this audit rate. This rate was essentially identical to the unweighted mean reporting compliance rate for individuals of .313 for the Taxpayer Sample. (13)

We also conducted a simple test to determine if the observed difference is or is not statistically significant. For this test, we constructed four additional taxpayer samples using NRP data for TY 2006 to TY 2009 using the same criteria applied to the TY 2001 NRP data. Using these data, we calculated mean unweighted reporting compliance rates of .341, .321, .324, and .327, respectively, for these additional tax years, all with a largely unchanged audit rate. The average mean reporting compliance for all the five observations was .325, and the 95% confidence interval was .013 based on a standard deviation of .010 using these five observations. This implied that the population mean was between .312 and .338, which encompasses our interpolated value of .314 for the Experimental Sample. (14) This finding could be further strengthened by having more experimental observations on individuals' reporting behavior for audit probabilities that reflect more closely the conditions in the naturally occurring world (i.e., audit probabilities between .00 and .05), as well as by additional years of NRP data.

C. Distribution of Reporting Compliance Rates

Another way to externally validate the experimental data is to compare the distribution of subjects' reporting compliance rates to those of actual taxpayers. Figure 1 displays the distribution of reporting compliance rates for the Taxpayer Sample (unweighted and weighted), and Figure 2 shows the distribution of individual reporting compliance rates for the Experimental Sample for different audit rates. (We omit in Figure 2 the observations from the Experimental Sample where the audit probability is .40 for brevity.)

[FIGURE 1 OMITTED]

Visual inspection of these plots revealed that both the Taxpayer Sample and the Experimental Sample have a bimodal distribution and an apparently random distribution of observations between these two modes. It is also evident from these plots that both samples exhibited a small and similarly sized group of individuals who exhibited 100% compliance even though the rational choice (from a purely economic standpoint) is to underreport income. Once again, laboratory experiments can reliably replicate the behavior in the naturally occurring world. (15)

[FIGURE 2 OMITTED]

IV. TEST (2): STUDENTS VERSUS NONSTUDENTS IN IDENTICAL EXPERIMENTS

The comparison of experimental behavior with behavior in similar but naturally occurring settings (i.e., the field) addresses one aspect of external validity (context effects). A second type of evidence compares student and nonstudent subjects in the same experimental setting in order to address subject pool effects.

These comparisons are based on further analysis of data derived from laboratory experiments conducted by Aim et al. (2010, 2012). (16) In both studies, the subject pool consisted of students and nonstudents, but the focus in those papers was on the policy instrument performance rather than subject pool effects. Here, we used these data to address the issue of subject pool effects by testing whether behavior is statistically different across the student versus nonstudent pools. By using two studies involving different subject pools and run at different times, we broadened the base for analyzing the effects of alternate subject pools. We first discuss the experimental designs, and we then present the comparison of student versus nonstudent responses.

A. The Experimental Designs

The basic experimental setting was common to both papers, and implemented the fundamental elements of the voluntary filing and reporting system of the individual income tax in most countries. The setting was "context rich," in that tax language was used throughout. Participants earned income by performing a task, chose whether to file a tax return, and (conditional upon filing) self-reported tax liability to the tax authority at an announced tax rate. At the time of filing and reporting decisions, only the individual knew his or her true (or expected) level of tax liability, and could choose to file and then to report any amount from zero on up. Audits occurred with an announced probability, and any underreporting was discovered by the audit, and the participant was required to pay unpaid taxes and penalties if he or she had not paid the appropriate taxes. This process was repeated over a number of rounds each representing a "tax year." Participants were informed that they would be paid their after-tax earnings at the end of the experiment, converted from lab dollars to U.S. dollars at a fixed and announced conversion rate. The sessions lasted 20 rounds; this was not announced to the subjects.

Participants were told, with certainty, of the audit probability, the penalty rate, and the tax rate. The tax rate was set at 35% for all sessions; the penalty rate was also fixed for all sessions at 150% (i.e., unpaid taxes plus a penalty of 50% of unpaid taxes if audited). The audit probability for filed tax returns was varied once within the session. Participants were also told that there was a zero probability of audit if no tax form was filed. (17) There was no public good financed by the tax payments in order to focus subject attention entirely on the filing and tax reporting tasks rather than fiscal exchange.

Into this setting, various policy innovations were introduced. A first set of experiments (Aim et al. 2010) investigated the effects of information services on compliance decisions. Here, the basic tax reporting decision was "complicated" in different treatments through the introduction of uncertainty regarding the true tax liability, and then information services were provided by the "tax administration" that partially or fully resolved the uncertainty, thereby allowing subjects to compute their tax liabilities more easily. Also contributing to complicating the decisions were a tax deduction (comparable to an itemized deduction) and a tax credit (comparable to the Earned Income Tax Credit), each of which was conditional upon filing. The tax deduction was set at 15% of income, and the tax credit began at a given level and declined at a stated rate as income increases. As a treatment, the exact levels of the deduction and credit were uncertain to the taxpayer at the time of filing. Uncertainty was implemented via mean-preserving spreads (with a uniform distribution) in each, where the participants were informed of the means and the ranges of the allowed credit and deduction. As an additional treatment, information services were provided that resolved the uncertainty. The information was complete, accurate, and costless to the participant.

A more direct set of positive inducements was also investigated in a second set of experiments (Aim etal. 2012). In one treatment, income tax credits were introduced that were available to participants but only to those who hied a tax return. In a second treatment a "social safety net" (e.g., unemployment replacement income) was present in which individuals faced some probability of unemployment but replacement income could be provided, with the benefits conditional upon past filing behavior. There was a known probability of unemployment, and, if the individual became unemployed and earned no income, then he or she was unemployed for two periods. Unemployment replacement income was received only if the individual had filed a tax return in each of the two previous periods, the level of which was based on reported income.

These various treatments are summarized in Tables 3 and 4, with Table 3 showing the information services design of Aim et al. (2010) and Table 4 showing the positive inducements design of Aim etal. (2012). (18) In Table 3, treatment T1 provides a baseline setting that entails no uncertainty and no tax authority information. The second treatment (T2) introduces tax liability uncertainty, in which participants face uncertainty regarding their allowed deduction and tax credit. The third treatment (T3) entails the same uncertainty as in the second treatment, but introduces the option of resolving the uncertainty by receiving information from the tax authority; that is, participants in this treatment were able to click on a button to reveal the true levels of the deduction and the tax credit. In Table 4, treatment T4 establishes a baseline with no positive inducements, a tax credit is introduced in T5, and an unemployment benefit is introduced in T6. The parameters used for the different treatments are reported in Table 5. The Appendix shows a representative screen. (19)

As noted above, the experimental interface and instructions made intensive use of tax language. Participants also decided whether or not to file a tax return. They disclosed tax liability in the same manner as on the typical tax form (e.g., entering income, deductions, and credits on a tax form). There was a time limit on the filing of income, comparable to a filing deadline, and the individual was automatically audited if he or she failed to file on time. A timer was shown on the screen; when 15 seconds remained, the timer changed the color to red, and the clock began to flash as a reminder that the filing period was about to end.

The dedicated laboratory consisted of 25 networked computers, a server, and software designed for these experiments. Sessions were conducted at a major state university, using both students and staff as participants. (20) Recruiting was conducted using the Online Recruiting System for Experimental Economics (ORSEE) developed by Greiner (2004). The participant database was built using announcements sent via email to all students and staff. Participants were invited to a session via email and were permitted to participate in only one tax experiment. The experiments followed procedures that implemented a single- and double-blind setting (e.g., no subject communication, use of computer screens to convey information, no individual identification, complete privacy in subject payment). Methods adhered to all guidelines concerning the ethical treatment of human subjects.

Of most importance for the purposes of this analysis, participants included both students and nonstudents, thereby allowing one aspect of the external validity of experiments to be examined: do students behave differently than nonstudents in identical experiments? A given session consisted of either student or nonstudent participants, not both. The experimental design was identical for students and nonstudents, with only the compensation varied for students and nonstudents by means of the exchange rate. The sessions lasted approximately 1 hour. For student participants, the conversion rate was 80 lab dollars to 1 U.S. dollar, while staff participants received a higher exchange rate to reflect their higher outside earnings, with a conversion rate of 50 lab dollars to 1 U.S. dollar. Earnings averaged $18 for student subjects. The average payoff for staff was $28.

B. Laboratory Experimental Results

A total of 347 individuals participated in a session in one of the two series of experiments. We present the distribution of subjects and some basic demographic data by treatment in Table 6. In the sessions designed to investigate the role of tax information services (T1 through T3), there were 131 subjects, 54% of whom were students. In the sessions designed to investigate the effects of positive inducements (T4 through T6), there were 216 subjects with 55% students. Note that while the design of the experiments was balanced in terms of treatments, it was not completely balanced in terms of equal numbers of students and nonstudents for each treatment. Because the design was not strictly balanced in terms of numbers of students/nonstudents, a simple comparison by treatment of results for students versus nonstudents (e.g., average compliance rates) may be misleading. Instead, we focus on our econometric results because this method includes control variables that address subject characteristics. (21)

In order to control for various factors, for each series, we estimated the conditional effects of design parameters on reporting behavior, while holding other factors constant. We estimated these responses separately for the two experimental designs, using the basic specifications of:

Information Services

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Positive Inducements

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the dependent variable [Y.sub.i,t] denotes subject i's decision to report income in period f; [Income.sup.it] is subject i's earned income in period P, [Wealt.sub.i,t] is subject i's accumulated earnings (or Wealth) in period f; [AuditProbablity.sub.it] is the audit rate for subject i in period t; [TaxLiabilityUncertainty.sub.t] is an indicator variable that signifies the presence of uncertainty about tax features in period t; [TaxAgencyInformation.sub.t] is an indicator variable that signifies the presence of agencyprovided information in period t: [TaxCredit.sub.i] is an indicator variable that signifies the presence of a tax credit that the subject can claim on filing a tax report; [UnemploymentBenefit.sub.it] is an indicator variable that signifies the presence of a safety net that (partially) makes up for income lost as a result of unemployment; Xt denotes dummy variable denoting whether the participant is a student (coded 1) or staff (coded 0) as well as possible interaction terms, (22) [[psi].sub.t] is a set of T-1 dummies that capture potential nonlinear period effects (T denotes the number of time periods); [u.sub.t] are random effects that control for unobservable individual characteristics; [e.sub.i,t] is the contemporaneous additive error term; and [[beta].sub.k] is the coefficient for variable k. We also included a dummy variable for whether the individual is audited in the previous period ([LagAudit.sub.i,t]). We report results for a (subject) random effects generalized least squares estimation of the panel data with standard errors corrected for clustering at the subject level. The dependent variable is defined as the reporting compliance rate of individual i in period t, where [Y.sub.i,t] equals reported tax paid divided by true tax owed of individual i in period t. (23)

We first estimated the models for the pooled sample (students and nonstudents), reported as Model 1 in Table 7 for the information services experiments (Tl, T2, and T3) and in Table 8 for the positive inducements experiments (T4, T5, and T6). The various coefficient estimates are consistent with expectations.

In Model 2, we introduced a dummy variable {Student) denoting whether the subject is in the student or the nonstudent pool, equal to 1 for students and 0 for nonstudents. The coefficient on Student was not statistically different from zero for both series of experiments. Also, the remaining coefficients were virtually unchanged across Model 1 and Model 2 for the different subject pools, a result that demonstrated that the pooled analysis (e.g., students/nonstudents) in Aim et al. (2010, 2012) was appropriate.

A more critical test of subject pool effects involves testing for differences in behavioral responses of students versus nonstudents to the policy initiatives. To do this, we interacted the Student dummy variable with the policy treatment variables associated with each session. These results are reported in Model 3 in Tables 7 and 8. For the information services setting (Table 7), tax liability uncertainty actually increased compliance for the student pool (Model 3) while the overall effect of uncertainty was negative (Models 1 and 2), perhaps because the relative lack of experience in filing and reporting by the student pool leads them to overestimate the costs of reporting errors. However, the coefficient on the interaction effect when the information service is offered was not different from zero; that is, we cannot reject the null hypothesis that students and nonstudents respond in the same way. Also, the coefficient for the information services variable in Model 3 was largely the same as in Model 1 or Model 2.

When positive inducements were examined (Table 8), the coefficients on the interaction terms were never significant, and the coefficients on the treatment variables themselves were not different than when these interaction terms are or are not included. Here, students responded to the presence of the safety net (unemployment benefits) exactly as did nonstudents. When the refundable credit was interacted with student status, the coefficient was again not significant, also indicating that the subject pools respond in the same manner. (24)

Another policy variable modified during these experiments is the change in the probability of being selected for an audit. Only two audit rates were implemented in these experiments, and it is not overly surprising that the coefficient on Audit Probability was never significant. In addition, we interacted Audit Probability and Student in Model 4 in Tables 7 and 8. The coefficient on this interaction term was not significant either for the information services series (Table 7) or for the positive inducement series (Table 8).

In sum, the coefficient on the subject-type dummy variable by itself was never statistically significant for either set of experiments, and the coefficients on the student-treatment interaction terms were also insignificant in almost all cases, with the only exception arising over uncertain tax liabilities. Overall, students behaved largely the same as nonstudents did in identical experiments in their reporting decisions, especially in their changes in compliance behavior in response to the policy variables (if not necessarily in the levels of their compliance behavior).25

V. CONCLUSIONS

Our analysis suggests two main conclusions regarding the external validity of tax compliance experiments. Both conclusions are consistent with the result that students and nonstudents behave largely the same. Even so, both conclusions also suggest the areas where care must be taken in transferring the results of laboratory experiments to field settings.

First, experimental data versus nonexperimental (NRP) data indicate very similar patterns. The comparison of the Taxpayer and Experimental Samples finds that the experimental data can reliably replicate known features of taxpayer compliance behavior for similar decisions in the naturally occurring world, including a bimodal distribution of reporting compliance rates, a largely random distribution of individuals between the extremes, and the existence of a small group of "pathologically honest" individuals who report 100% of income. The Taxpayer and Experimental Samples also appear quite similar with respect to a point estimate of the levels of reporting compliance, with the caveats noted above. The interpolated reporting compliance rate for the Experimental Sample is indistinguishable from the mean reporting compliance rate for individuals for the Taxpayer Sample, and is within the 95% confidence interval based on five independent NRP sample data sets for TY 2001 and TY 2006 to TY 2009.

Second, the experimental responses of students are largely similar to the experimental responses of nonstudent subject pools when faced with policy treatments. When student status is interacted with the policy changes being implemented, the resulting coefficients are not generally significant. However, there is at least one exception to this result, and this gives rise to a caveat: care must be taken when the policy treatment may incorporate a substantial level of external experience. We find that students respond differently to the presence of tax liability uncertainty, and our conjecture is that this may be the result of nonstudent subjects having more experience with this specific phenomenon in the field. Regardless, however, we still find that the changes in compliance behavior in response to institutional changes (treatments) of these pools (if not always their levels) largely parallel each other.

In sum, our results are consistent with studies showing that laboratory behaviors largely parallel real-world behaviors in settings that compare similar types of decisions in similar types of settings. Our results are also consistent with studies that demonstrate that student and nonstudent subjects behave and especially respond similarly. Concerns with the external validity of experimental results, at least in the context of tax compliance and in the comparison of changes in behavior, seem largely unwarranted.

Even so, we recognize that one must use the results from laboratory experiments with some care. However, such use depends largely upon the purpose of the experiment. According to Roth (1987), experiments can be classified into three broad categories that depend upon the dialog in which they are meant to participate. "Speaking to Theorists" includes those experiments designed to test well-articulated theories. "Searching for Facts" involves experiments that examine the effects of variables about which existing theory has little to say. "Whispering in the Ears of Princes" identifies those experiments motivated by policy issues. To date, most experiments in behavioral public economics have fallen into the first two categories. However, this is now changing, and experiments are being increasingly used to illuminate policy debates, especially in the area of tax compliance.

In sum, we believe that the reported results demonstrate that laboratory experiments in the area of tax compliance behavior meet the key conditions for external validity. This is an important result, especially because empirical analyses of compliance behavior with naturally occurring field data are limited and field experiments of compliance are costly to implement. We do not argue that laboratory experiments can be used to calibrate field results (e.g., provide point estimates). The stakes are obviously smaller in the laboratory, and the decision settings are necessarily less rich. Thus, the magnitudes of the responses to the external stimuli will be different in the two environments. However, as Kessler and Vesterlund (forthcoming) argue, " ... for most laboratory studies it is only relevant to ask whether the qualitative results are externally valid" (e.g., the direction of response), and not whether an exact quantitative result (e.g., the magnitude of response) is found in laboratory versus field data. They contend that "... there is much less (and possibly no) disagreement on the extent to which the qualitative results of a laboratory study are externally valid." Indeed, our results in this paper are largely consistent with their position: we have shown that the behavioral patterns are sufficiently similar that we can safely predict the effects that would arise in the field from a policy based on the results observed in the laboratory.

We find the result of our investigations both comforting and plausible. We believe that these results suggest that the burden should now be on skeptics to prove that results from laboratory compliance experiments differ in meaningful ways from the behavior we observe in the field.

APPENDIX: SAMPLE EXPERIMENTAL INSTRUCTIONS AND SCREEN SHOT [POSITIVE INDUCEMENTS VIA SOCIAL PROGRAMS EXPERIMENTS - UNEMPLOYMENT BENEFITS]

INTRODUCTION

You are about to participate in an experiment in economic decision making. Please follow the instructions carefully, as the amount of money you earn in the experiment will depend on your decisions. At the end of today's session, you will be paid your earnings privately and in cash. Please do not communicate with other participants during the experiment unless instructed. Importantly, please refrain from verbally reacting to events that occur during the experiment.

Today's experiment will involve several decision "rounds." You will not know the number of rounds until the end of the experiment. The rounds are arranged into multiple series. After all decision rounds are finished, we will ask you to complete a questionnaire.

Aside from decisions in "training" rounds, each decision impacts your earnings, which means that it is very important to consider each decision carefully prior to making it. Each decision round is separate from the other rounds, in the sense that the decisions you make in one round will not affect the outcome or earnings of any other round. All money amounts are denominated in lab dollars, and will be exchanged at a rate of xxx lab dollars to US$1 at the end of the experiment.

There are four parts to each decision round: the Income Earning Stage, the Tax Reporting Stage, the Audit Determination Stage, and the Round Summary Stage. We will now describe each part.

INCOME EARNING STAGE

In each round or period, you will complete a task that determines your income for the round. You will be required to sort the numbers 1 through 9 into the correct order. The task is timed. The person completing the task in the shortest time earns the highest income, the second fastest the second highest income, and so on.

TAX REPORTING STAGE

When the tax year has finished, you enter the tax reporting or filing stage. You will know your income and your allowable deductions and credits but these amounts are not known to the tax agency. You will fill out and file a tax form as you saw in the computer instructions.

After you choose income and deduction amounts to report, you click on the "FILE TAXES" button to submit your tax form. Your taxes are determined by subtracting what you report in deductions from what you report in income, and multiplying this difference by the tax rate of 35%. On your screen, this amount is included among the tax form calculations as "Reported Taxes."

There is a timer on the tax reporting screen. If you do not file the tax form before the time runs out, this will be treated the same as if you submitted a form that reported 0 in income and 0 in deductions. In addition, your tax form will be automatically audited. In other words, it is not in your best interest to let the tax reporting screen time out!

AUDIT DETERMINATION STAGE

There is a chance that you will be randomly selected for audit. You will know this chance prior to making your tax reporting decisions. The chance does not increase or decrease depending on your current or past reporting choices or on the decision made by others in the group. This is a random selection process.

After you file the tax form, you will see an audit screen. While you are on this screen, the computer is randomly determining whether to select you for audit. This selection is done separately for each participant and each round.

If you are selected for audit, your reported income, credits, and deductions will be checked against your actual income, credits, and deductions. These amounts will be checked separately. If you underreported your taxes, all unpaid taxes will be discovered. If you are not audited, however, no unpaid taxes will be discovered.

If you are audited, you will have unpaid taxes if you reported too little in income or too much in deductions or credits. Unpaid taxes are calculated as the difference between your actual and reported amounts multiplied by the tax rate. Any unpaid taxes discovered in the audit must be paid back.

If you have unpaid taxes, a penalty of 150% will be assessed. What this means is that, if you are audited, for every lab dollar in unpaid taxes you will have to pay back the 1 dollar you owed, and in addition you will have to pay .5 lab dollars in penalties.

[FIGURE A1 OMITTED]

ROUND SUMMARY STAGE

After the tax reporting decision, three things can happen: (1) you are not audited; (2) you are audited but you did not underreport your taxes; or (3) you are audited and you did underreport your taxes. Your earnings are, of course, the same for the first two scenarios. The computer will calculate your earnings for you.

UNEMPLOYMENT

There is a chance that you will be unemployed in a round. The chance of this happening is shown on your screen as described in the computer instructions. If you are unemployed, you will not complete the income earning task in that round. Instead, you will receive unemployment benefits if you filed a tax form in the previous two rounds, calculated as 50% of the average income you reported in the previous two rounds on your filed tax forms. However, if you have not filed a tax form for the previous two rounds (both rounds), your unemployment benefits will be zero, and you will earn no income for the rounds you are unemployed.

BEGINNING THE EXPERIMENT

We have now finished the instructions. We will continue on to a second training round. As with the first training round, your decisions in the training round will not affect your earnings. After the training round, you will have a final opportunity to ask questions.
ABBREVIATIONS

IRS: U.S. Internal Revenue Service
NRP: National Research Program


doi: 10.1111/ecin.12196

REFERENCES

Aim, J., T. Cherry, M. Jones, and M. McKee. "Taxpayer Information Assistance Services and Tax Reporting Behavior." Journal of Economic Psychology, 31(4), 2010, 577-86.

--. "Social Programs as Positive Inducements for Tax Participation." Journal of Economic Behavior & Organization, 84(1), 2012, 85-96.

Alm, J., J. Deskins, and M. McKee. "Do Individuals Comply on Income Not Reported by Their Employer?" Public Finance Review, 37(2), 2009, 120-41.

Aim, J., B. R. Jackson, and M. McKee. "Estimating the Determinants of Taxpayer Compliance with Experimental Data." National Tax Journal, 45(1), 1992, 107-14.

--. "Getting the Word Out: Increased Enforcement, Audit Information Dissemination, and Compliance Behavior." Journal of Public Economics, 93(3-4), 2009, 392-402.

Aim, J., and S. Jacobson. "Using Laboratory Experiments in Public Economics." National Tax Journal, 60(1), 2007, 129-52.

Aim, J., and M. McKee. "Tax Compliance as a Coordination Game." Journal of Economic Behavior & Organization, 54(3), 2004, 297-312.

Armender, 0., and A. Boly. "On the External Validity of Experiments in Corruption," in New Advances in Experimental Research on Corruption. Research in Experimental Economics, Vol. 15, edited by D. Serra and L. Wantchekon. Bingley, UK: Emerald Group, 2012, 117-44.

Ball, S. B., and P.-A. Cech. "Subject Pool Choice and Treatment Effects in Economic Laboratory Research." Research in Experimental Economics, 60, 1996, 239-92.

Bennett, C. "Preliminary Results of the National Research Program's Reporting Compliance Study of Tax Year 2001 Individual Returns." Paper presented at the Annual IRS Research Conference, Washington, D.C., 2005.

Bigoni, M., G. Camera, and M. Casari. "Strategies of Cooperation and Punishment among Students and Clerical Workers." Journal of Economic Behavior & Organization, 94, 2013, 172-82.

Bott, K., A. W. Cappelen, E. 0. Sprensen, and B. Tungodden. "You've Got Mail: A Randomised Field Experiment on Tax Evasion." Norwegian School of Economics and Business Administration Discussion Paper. Oslo, Norway, 2013.

Brewer, M. B. "Research Design and Issues of Validity," in Handbook of Research Methods in Social and Personality Psychology, edited by H. T. Reis and C. M. Judd. Cambridge: Cambridge University Press, 2000, 3-16.

Brookshire, D" D. Coursey, and W. D. Schulze. "The External Validity of Experimental Economics Techniques: Analysis of Demand Behavior." Economic Inquiry, 25(2), 1987, 239-50.

Camerer, C. F. "The Promise and Success of Lab-field Generalizability in Experimental Economics: A Critical Reply to Levitt and List." California Institute of Technology, Division of the Humanities and Social Sciences Working Paper. Los Angeles, CA, 2011.

Campbell, D. T., and J. C. Stanley. Experimental and QuasiExperimental Designs for Research. Chicago: Rand McNally College Publishing, Co., 1966.

Cappelen, A. W., K. Nygaard, E. 0. Sprensen, and B. Tungodden. "Efficiency, Equality and Reciprocity in Social Preferences: A Comparison of Students and a Representative Population." Norwegian School of Economics and Business Administration Discussion Paper. Oslo, Norway, 2010.

Carpenter, J., and E. Seki. "Do Social Preferences Increase Productivity? Field Experimental Evidence from Fisherman in Toyama Bay." Economic Inquiry, 49(2), 2011, 612-30.

Castro, L., and C. Scartascini. "Tax Compliance and Enforcement in the Pampas: Evidence from a Field Experiment." IDB Working Paper Series No. IDB-WP-472. Washington, D.C.: Inter-American Development Bank, 2013.

Charness, G., and M.-C. Villeval. "Cooperation and Competition in Intergenerational Experiments in the Field and the Laboratory." American Economic Review, 99(3), 2009, 956-78.

Chermak, J. M" K. Krause, D. S. Brookshire, and H. S. Burness. "Moving Forward by Looking Back: Comparing Laboratory Results with Ex Ante Market Data." Economic Inquiry, 51(1), 2013, 1035-49.

Davis, D. D., and C. A. Holt. Experimental Economics. Princeton, NJ: Princeton University Press, 1993.

Dyer, D.. J. H. Kagel, and D. Levin. "A Comparison of Naive and Experienced Bidders in Common Value Offer Auctions: A Laboratory Analysis." Economic Journal, 99(1), 1989, 108-15.

Erard, B., and C.-C. Ho. "Searching for Ghosts: Who Are the Non-filers and How Much Tax Do They Owe?" Journal of Public Economics, 81(1), 2001, 25-50.

Falk, A., and J. J. Heckman. "Lab Experiments Are a Major Source of Knowledge in the Social Sciences." Science, 326(5952), 2009, 535-8.

Fellner, G., R. Sausgruber, and C. Traxler. "Testing Enforcement Strategies in the Field: Threat. Moral Appeal and Social Information." Journal of the European Economic Association, 11(3), 2011, 634-60.

Ferber, R., and W. Z. Hirsch Social Experimentation and Public Policy. Cambridge, MA: Cambridge University Press, 19S2.

Frechette, G. R. "Laboratory Experiments: Professionals versus Students," in The Methods of Modern Experimental Economics, edited by G. R. Frechette and A. Schotter. New York: Oxford University Press, forthcoming.

Gramlich, E. M. "Reflections of a Policy Economist." The American Economist, 41(1), 1997, 22-30.

Gravelle, J. "Comments on Innovative Approaches to Improving Tax Compliance." The IRS Research Bulletin, Recent Research on Tax Administration and Compliance, Selected Papers Given at the 2008 IRS Research Conference. Washington, D.C., 2009, 59-60.

Greiner, B. "The Online Recruitment System ORSEE 2.0: A Guide for the Organization of Experiments in Economics." Working Paper Series in Economics 10, Department of Economics, University of Cologne. Cologne, Germany, 2004.

Giith, W., and O. Kirchkamp. "Will You Accept Without Knowing What? The Yes-No Game in the Newspaper and in the Lab." Experimental Economics, 15(4), 2012, 656-66.

Giith, W" C. Schmidt, and M. Sutter. "Bargaining Outside the Lab--A Newspaper Experiment of a Three-person Ultimatum Game." Economic Journal, 117(518), 2007, 449-69.

Harrison, G. W" M. Lau, and E. E. Rutstrom. "Theory, Experimental Design and Econometrics Are Complementary (And So Are Lab and Field Experiments)," in The Methods of Modern Experimental Economics, edited by G. R. Frechette and A. Schotter. New York: Oxford University Press, forthcoming.

Harrison, G. W., and J. A. List. "Field Experiments." Journal of Economic Literature, 42(2), 2004, 1009-55.

Heckman, J. J., and J. A. Smith. "Assessing the Case for Social Experiments." Journal of Economic Perspectives, 9(2), 1995, 85-110.

Internal Revenue Service. "IRS Data Book 2002, Publication 55B." Washington, D.C., 2002. Accessed November 12, 2014. http://www.irs.gov/pub/irs-soi/02databk.pdf.

--. "Tax Gap for Tax Year 2006." Washington, D.C., 2012. Accessed November 12, 2014. http://www.irs. gov/pub/newsroom/overview_tax_gap_2006.pdf.

Iyer, G. S., P. M. J. Reckers, and D. L. Sanders, "increasing Tax Compliance in Washington State: A Field Experiment." National Tax Journal, 63(1), 2010, 7-32.

Kagel, J. H. "Laboratory Experiments," in The Methods of Modern Experimental Economics, edited by G. R. Frechette and A. Schotter. New York: Oxford University Press, forthcoming.

Kagel, J. H., and A. E. Roth, ed. The Handbook of Experimental Economics. Princeton, NJ: Princeton University Press, 1995.

Kessler, J., and L. Vesterlund "The External Validity of Laboratory Experiments: The Misleading Emphasis on Quantitative Effects," in The Methods of Modern Experimental Economics, edited by G. R. Frechette and A. Schotter. New York: Oxford University Press, forthcoming.

Kleven, H. J., M. B. Knudsen, C. T. Kreiner, S. Pedersen, and E. Saez. "Unwilling or Unable to Cheat? Evidence from a Randomized Tax Audit Experiment in Denmark." Econometrica, 79(3), 2011, 651-92.

Learner, E. E. "Let's Take the Con Out of Econometrics." American Economic Review, 73(1), 1983, 31-43.

Levitt. S. D., and J. A. List. "What Do Laboratory Experiments Measuring Social Preferences Reveal About the Real World?" Journal of Economic Perspectives, 21(2), 2007, 153-74.

List, J. A. "The Behavioralist Meets the Market: Measuring Social Preferences and Reputation Effects in Actual Transactions." Journal of Political Economy, 114(1), 2006, 1-37.

Manski, C. F. "Economic Analysis of Social Interactions." Journal of Economic Perspectives, 14(3), 2000, 115-36.

McKee, M., J. Aim, T. Cherry, and M. Jones. "Final Report for TIRNO-07-P-00683 on Behavioral Tax Research." Washington, D.C., 2008.

Plott, C. R. "Dimensions of Parallelism: Some Policy Applications of Experimental Methods," in Laboratory Experimentation in Economics: Six Points of View, edited by A. E. Roth. New York: Cambridge University Press, 1987, 193-229.

Pomeranz, D. "No Taxation without Information: Deterrence and Self-Enforcement in the Value Added Tax." NBER Working Paper 19199. Cambridge, MA: National Bureau of Economic Research, 2013.

Roth, A. E. "Laboratory Experimentation in Economics," in Advances in Economic Theory, Fifth World Congress, edited by T. Bewley. Cambridge: Cambridge University Press, 1987, 269-99.

Scheiber, N. "Freaks and Geeks." The New Republic, 2(April), 2007, 27-31.

Shadish, W. R., T, D. Cook, and D. T. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston, MA: Houghton Mifflin, Inc., 2002.

Shogren, J.. J. Fox, D. Hayes, and J. Roosen. "Observed Choices for Food Safety in Retail, Survey, and Auction Markets." American Journal of Agricultural Economics, 81(5), 1999, 1192-99.

Slemrod, J., M. Blumenthal, and C. Christian. "Taxpayer Response to an Increased Probability of Audit: Evidence from a Controlled Experiment in Minnesota." Journal of Public Economics, 79(3), 2001,455-83.

Smith, V. L. "Experimental Economics: Induced Value Theory." American Economic Review Papers and Proceedings, 66(2), 1976, 274-9.

JAMES ALM, KIM M. BLOOMQUIST, MICHAEL MCKEE

* Portions of this research were funded by the US IRS (TIRNO-07-P-00683). The views expressed here are those of the authors and should not be interpreted as those of the U.S. Internal Revenue Service. Previous versions of this paper have been presented at the November 2010 National Tax Association Annual Conference in Chicago, IL, at the June 2011 Internal Revenue Service--Tax Policy Center Research Conference in Washington. D.C., and a seminar at Virginia Tech (April 2013). We are grateful to Charles Christian, John Deskins, Brian Erard, Elaine Maag, Rosemary Marcus, Alan Plumley, Joel Slemrod, Nic Tideman, three anonymous referees, and the Editor for helpful comments and discussions. Aim: Department of Economics, Tulane University, 6823 St. Charles Avenue, 208 Tilton Hall, New Orleans, LA 70118. Phone 504 862 8344, Fax 504 865 5869, E-mail jalm@tulane.edu

Bloomquist: National Headquarters, Office of Research, U.S. Internal Revenue Service, 1111 Constitution Avenue NW, Washington, D.C. 20224. Phone 202 874 0171, Fax 202 874 0660, E-mail kim.bloomquist@irs.gov

McKee: Department of Economics, Walker College of Business, Appalachian State University, Boone, NC 28608. Phone 828 262 6080, Fax 828 262 6105, E-mail mckeemj@appstate.edu

(1.) See also the alternative perspectives of Learner (1983), Heckman and Smith (1995), and Harrison and List (2004).

(2.) There is an extensive literature on the use of field trials in economic policy. The 1960s was a period of wide use of field trials in a variety of policy endeavors, including the provision of education services and income support programs. For many reasons, especially their costs and their potential for irreversible damages, the use of field trials has largely been abandoned (Ferber and Hirsch 1982; Gramlich 1997). More recently, field experiments have been increasingly used to test hypotheses derived from basic theory (Harrison and List 2004; List 2006). In general, the intent of these studies is less to establish external validity of an experimental design but to provide a substitute for the laboratory by introducing social settings to the decision tasks.

(3.) For an especially provocative perspective on the difficulties of achieving identification, written for a nontechnical audience, see Scheiber (2007).

(4.) For comprehensive surveys of experimental methods, see Davis and Holt (1993) and Kagel and Roth (1995).

(5.) The general critique of Levitt and List (2007) has itself been the subject of energetic critiques. See Falk and Heckman (2009), Camerer (2011), Armentier and Boly (2012), Kagel (forthcoming), Harrison, Lau, and Rutstrom (forthcoming), and Frechette (forthcoming).

(6.) For some contrary evidence that reports some differences between students and nonstudents, see Cappelen et al. (2010) and Bigoni, Camera, and Casari (2013).

(7.) Note that tax compliance has been the subject of several controlled field experiments. In a typical experiment, a treatment group of individuals is randomly selected to receive a letter from the tax authority suggesting that they will be under special scrutiny, while a control group of individuals does not receive the letter. Comparison of the treated group with the control group then gives a measure of the effectiveness of increased enforcement. One of the first of these field experiments was performed by Slemrod, Blumenthal, and Christian (2001); more recent examples include Iyer, Reckers, and Sanders (2010), Kleven et al. (2011), Fellner, Sausgruber, and Traxler (2011), Pomeranz (2013), Castro and Scartascini (2013), and Bott et al. (2013).

(8.) For example, if the taxpayer reported $110 in Schedule C net profits and the NRP examiner determined that the correct amount should have been $100. then the calculated reporting compliance ratio of 1.1 was recoded to 1.0. Recoding these 29 cases ensures that all 1,101 observations of the reporting compliance ratio fall in the range between 0 and 1, inclusive.

(9.) The term "income per exam" refers to the income that should have been reported based on the judgment of NRP examiners, and reflects population weights, as appropriate.

(10.) Note that this Experimental Sample overlaps partially with the experimental data in Aim et al. (2010, 2012), which we used later and in which both students and nonstudents were the subjects. For our analysis in this section, we used only student responses from Aim et al. (2010, 2012). All of our other experimental studies included only student participants.

(11.) We used the base case simulations since these observations excluded behavioral influences potentially induced by the specific treatments explored in the non-base case simulations, influences that we believe are likely absent in actual taxpayer behavior. However, as we report below, our basic findings hold using either the Full Sample or the Selected Sample of experimental data.

(12.) An individual's compliance rate is computed for each subject after each round in the Experimental Sample and for each individual in the NRP-based Taxpayer Sample. We believe using the unweighted mean for comparison is appropriate here because it is not possible to construct weights for the Experimental Sample that would equate this group with the NRP-stratified sample weights.

(13.) The virtually identical values of 0.314 for the base case Experimental Sample and 0.313 for the TY 2001 Taxpayer Sample are apparently coincidental, although we found that the point estimate for the Experimental Sample falls within the 95% confidence interval using five independent Taxpayer Sample observations. See the discussion in the text.

(14.) A comparison using the full Experimental Sample gives similar results. Using the data for the full Experimental Sample found in Table 2, the interpolated compliance rate was 0.331. This value also falls within the 95% confidence interval calculated using the 5 years of NRP data.

(15.) An additional comparison between behavior in laboratory experiments and field data would compare the behavioral elasticities estimated with laboratory data to the elasticities estimated with field data. The field data that we have here do not allow us to make these estimations and comparisons. However, Aim, Jackson, and McKee (1992) compared behavioral responses to audit and tax rates estimated with laboratory data to responses estimated with field data, and found very similar elasticities.

(16.) These studies are hereafter referred to as Aim et al. (2010, 2012), respectively. Because no subject participated in more than one session, we have a total of 347 subjects for our analysis here, as discussed later.

(17.) The probability of audit if the individual does not file was set at zero to reflect the fact in most countries that an individual who does not file faces no effective chance of detection. The actual audit probability for non-filers in the field may not strictly be zero. However, there is substantial evidence that this non-filing audit rate is effectively very close to zero. For example, in the United States, the IRS conducts audits of non-filers based on tips, on "lifestyle audits" in which visible expenditures are a flag for an audit, or through passive income sources such as deposit interest. However, the frequency of non-filing audit is very low even in the United States, and in many countries, it is essentially zero (Erard and Ho 2001). Accordingly, we elected to implement for simplicity a zero audit probability in the laboratory setting. Note that this framework required only that the probability of audit for non-filers be less than the probability of audit for filers.

(18.) The main intent of Aim etal. (2010, 2012) was to investigate policies to induce filing when non-filing is possible. Both policies were found to be effective.

(19.) T1 and T4 were separate "baselines" for each respective series of experiments, and did not present the same environment.

(20.) The student portion of the subject pool covered a very broad range of year in studies and major, and no single major exceeded 8% of the pool. The staff pool was similarly diverse, covering all levels of support staff and nonacademic professional staff.

(21.) In fact, we found some differences in the levels of compliance between the two subject pools across treatments. However, the changes in compliance in response to the treatment effects were quite similar for both subject pools, and it is this result that we emphasize in our discussion.

(22.) In Aim et al. (2010, 2012), a vector of demographic variables (e.g., gender, subject age, subject own preparation of tax returns, subject claimed as a dependent on parental tax returns) was included. However, these variables are highly correlated with participant membership in the subject pool, and so are not included separately in the current analysis.

(23.) Note that Income and Wealth are exogenous variables, which justifies their inclusion as explanatory variables. Income is earned each period prior to the tax reporting decision, and performance on the task (sorting nine numbers into the correct order) is uncorrelated with the tax reporting decision. Similarly, Wealth is accumulated over time, and, between the income earning task and the random nature of the audits, this variable is not correlated with past decisions.

(24.) The purposes of Aim et al. (2010, 2012) were narrowly defined to study possible policy actions of the tax agency in the areas of information services and positive inducements, and so they did not fully explore the effects of audit policies.

(25.) Note that we found similar results for filing behavior.
TABLE 1
Summary Statistics for Taxpayer Sample

                         Taxable Income
                           as Reported

                                    Standard          Sum
                N      Mean ($)   Deviation ($)   ($ millions)

Unweighted    1,101     5,461        12,081           6.0
Weighted     559,555    3,708         9,854         2,075.0

                     Taxable Income That
                  Should Have Been Reported

              Mean      Standard          Sum
              ($)     Deviation ($)   ($ millions)

Unweighted   25,277      132,064          27.8
Weighted     16,054      78,165         8,983.3

                  Mean Reporting
                  Compliance Rate

               Mean of     Overall
             Individuals    Mean

Unweighted      .313        .216
Weighted        .242        .231

Note: The data in this table reflect only the "raw" NRP
audit adjustments, and do not account for any unreported
income that the auditors did not detect.

TABLE 2
Summary Statistics for Experimental Sample

                          Full Sample

                                            Mean
                                         Reporting
Audit         Number of    Number of     Compliance
Probability   Subjects    Observations      Rate

.00              16           240           .288
.05              180         2,700          .413
.10              356         5,580          .544
.30              298         4,710          .590
.40              222         3,330          .638
Total           1,072        16,560         .551

                          Selected Sample

                                            Mean      Overall Mean
                                         Reporting     Reporting
Audit         Number of    Number of     Compliance    Compliance
Probability   Subjects    Observations      Rate          Rate

.00              16           240           .288          .286
.05              48           720           .404          .368
.10              78          1,170          .475          .476
.30              32           480           .558          .536
.40              78          1,170          .672          .668
Total            252         3,780          .521          .517

TABLE 3
Experimental Treatments: Information Services
Experiments

Tax           Information Services Provided?
Liability
Uncertain?         No             Yes

No                 T1              --
Yes                T2              T3

TABLE 4
Experimental Treatments: Positive Inducements
via Social Programs Experiments

Positive Inducements Provided?

                      Yes,
     Yes, via         via
No   Tax Credit   Unemployment
                    Benefits

T4       T5            T6

TABLE 5
Experimental Parameters

Parameter           Values

Income              Mean = 50, High =100, Low =10,
                      Increment =10
Audit Probability   .3 and .4; .0 if Not File is selected
Fine Rate           150%, fixed across all sessions
Tax Rate            35%, fixed across all sessions
Tax Deduction       20%, with uncertainty (when present)
                      via a uniform distribution
Tax Credit          Credit = 30-.6*Income, with
                      uncertainty via a uniform
                      distribution
Unemployment        .2 and .4-fixed for a session
  Probability
Unemployment        Benefits =.5, .6 times reported income
  Benefit             average in the past 2 periods

TABLE 6
Descriptive Statistics for Student and Nonstudent Subjects

Treatment/Students   Metric              Students   Nonstudents
/Nonstudents

Information          Age (years)          20.1       43.8
Services

T1: 40 Students/     Gender (% male)      55.0       18.6
  18 Nonstudents

T2: 14 Students/     Dependent (% yes)    81.9        0
  20 Nonstudents

T3: 18 Students/     Prepare Own          27.7       44.1
  21 Nonstudents       Tax (% yes)
                     Number of            72         59
                       Subjects

Positive             Age (years)          20.2       43.9
  Inducements via
  Social Programs

T4: 50 Students/     Gender (% male)      51.7       21.2
  30 Nonstudents

T5: 20 Students/     Dependent (% yes)    83.6        3.0
  38 Nonstudents

T6: 46 Students/     Prepare Own          36.2       48.1
  32 Nonstudents       Tax (% yes)

                     Number of           116        100
                       Subjects

TABLE 7
Estimates for Reporting Compliance: Information
Services Experiments

                                 Dependent Variable:
                                 Tax Compliance Rate

Independent Variable          Model 1        Model 2

Constant                        .9200 **       .9074 **
                               (.0754)        (.076)
Period income                  -.0016 **      -.0016 **
                               (.0004)        (.0004)
Cumulative wealth              -.0003 **      -.0003 **
                               (.00003)       (.00003)
Audit probability              -.1653         -.1634
                               (.194)         (.194)
Lag audit                       .1632 **       .1630 **
                               (.022)         (.022)
Tax liability                  -.0491 **      -.0429 *
  uncertainty                  (.023)         (.020)
Tax agency                      .0636 **       .0622 **
  information                  (.0255)        (.024)
Student                                        .0222
                                              (.020)
Student X
Tax liability uncertainty
Student X
Tax agency information
Student X
Audit probability
Wald x2                      228.06 **      229.43 **
Panels                       131            131
N                           2489           2489

                                  Dependent Variable:
                                  Tax Compliance Rate

Independent Variable          Model 3        Model 4

Constant                        .9337 **       .9162 **
                               (.078)         (.105)
Period income                  -.0017 **      -.0017 **
                               (.0003)        (.0003)
Cumulative wealth              -.0003 **      -.0003 **
                               (.00003)       (.00003)
Audit probability              -.1105         -.1602
                               (.183)         (.269)
Lag audit                      -.0105         -.0104
                               (.015)         (.015)
Tax liability                  -.2254 **      -.2257 **
  uncertainty                  (.042)         (.042)
Tax agency                      .0752 **       .0752 **
  information                  (.032)         (.035)
Student                        -.0426         -.0117
                               (.037)         (.135)
Student X                       .3071 **       .3078 **
Tax liability uncertainty      (.057)         (.057)
Student X                       .0780          .0778
Tax agency information         (.068)         (.068)
Student X                                     -.0889
Audit probability                             (.363)
Wald x2                      245.58 **      247.40 **
Panels                       131            131
N                           2489           2489

Notes: Panel estimations with clustered (subject level)
standard errors. The dependent variable is the ratio of
reported taxes to true taxes of individual i in period t.

* and ** indicate significance at the 5% and 1% levels,
respectively.

TABLE 8
Estimates for Reporting Compliance: Positive
Inducements via Social Programs Experiments

                                 Dependent Variable:
                                 Tax Compliance Rate

Independent Variable         Model 1           Model 2

Constant                     .5429 **          .5028 **
                            (.064)            (.081)
Period income               -.0007 *          -.0007 *
                            (.0004)           (.0004)
Cumulative wealth           -.0001 **         -.0001 **
                            (.00003)          (.00003)
Audit probability            .0807             .0906
                            (.132)            (.132)
Lag audit                    .0051             .0049
                            (.015)            (.015)
Tax credit                   .1131 **          .1376 **
                            (.047)            (.055)
Unemployment benefit         .2468 **          .2541 **
                            (.088)            (.089)
Student                                        .0475
                                              (.054)
Student X
Tax credit
Student X
  unemployment benefit
Student X
Audit probability
Wald [chi square]         803.52 **         845.66 **
Panels                    216               216
N                        4104              4104

                                 Dependent Variable:
                                 Tax Compliance Rate

Independent Variable         Model 3           Model 4

Constant                     .5537 **          .4842 **
                            (.083)            (.069)
Period income               -.0007 *          -.0007 *
                            (.0003)           (.0003)
Cumulative wealth           -.0001 **         -.0001 **
                            (.00003)          (.00003)
Audit probability            .1072             .1224
                            (.131)            (.156)
Lag audit                    .0034             .0050
                            (.015)            (.0154)
Tax credit                   .1259 **          .1392 **
                            (.053)            (.057)
Unemployment benefit         .2364 **          .2382 *
                            (.112)            (.118)
Student                      .0103            -.1053
                            (.0864)           (.122)
Student X                    .1310             .1313
Tax credit                  (.098)            (.087)
Student X                    .1182             .1236
  unemployment benefit      (.103)            (.102)
Student X                                      .3250
Audit probability                             (.253)
Wald [chi square]         894.84 **         904.90 **
Panels                    216               216
N                        4104              4104

Note: Panel estimations with clustered (subject level)
standard errors. The dependent variable is the ratio of
reported taxes to true taxes of individual i in period t.

* and ** indicate significance at the 5% and 1 % levels,
respectively.
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有