On the external validity of laboratory tax compliance experiments.
Alm, James ; Bloomquist, Kim M. ; McKee, Michael 等
I. INTRODUCTION
Laboratory methods are now widely accepted as a methodological
approach in economics and, increasingly, they have been used to examine
specific public policy issues. There is much to be gained from careful
laboratory experiments. They offer a low cost means of testing (and
replicating) policy innovations, and they generate precise data on
individual behavior, thereby allowing estimation of behavioral
responses. Importantly, they allow many policy innovations to be
introduced singly and exogenously in a controlled environment, and as a
result, laboratory experiments are typically seen as having a high
degree of "internal validity" (Brewer 2000; Campbell and
Stanley 1966; Shadish, Cook, and Campbell 2002) because the causal
relation between variables can be properly demonstrated. However, as
emphasized by Plott (1987), using laboratory experiments can allow more
general inferences regarding human behavior only when the setting
implemented in the laboratory parallels what is observed in the
naturally occurring world. Even beyond this requirement of
"parallelism" is the need for generalizing behavioral
observations from the laboratory to the field, or "external
validity." Internal validity can be demonstrated through the
evaluation of the design. However, external validity can only be
verified empirically and only with respect to the specific setting being
investigated. This paper investigates the external validity of
laboratory experiments, and does so in the context of experiments on tax
compliance behavior.
Tax evasion is central to many important policy questions. Current
estimates report the "tax gap" (or difference between taxes
owed and taxes paid) in the United States to be $450 billion annually
(Internal Revenue Service 2012). Beyond these massive revenue losses,
evasion creates major misallocations in resource use when individuals
alter their behavior to cheat on their taxes. Its presence requires that
government expend resources to detect noncompliance, to measure its
magnitude, and to penalize tax evasion. Evasion alters the distribution
of income in arbitrary, unpredictable, and unfair ways, and it may
contribute to feelings of unjust treatment and disrespect for the law.
More broadly, it is not possible to understand the true impact of
taxation without recognizing the existence and the effects of tax
evasion.
Laboratory methods have been used to examine a wide range of
policies that may affect the compliance decision, policies that have not
always proven amenable to either theoretical analyses or empirical
analyses with field data. However, laboratory studies of compliance are
sometimes viewed with skepticism. The most common criticism is that the
student subjects typically used in experiments may not be representative
of taxpayers. Undergraduates may have little experience with filing tax
returns, and their economic and demographic backgrounds may differ from
those of taxpayers. Another criticism is that the context of laboratory
compliance experiments does not closely enough resemble the context in
which actual compliance decisions are made. As a result, there is a
concern that experimental results on policy innovations that rely upon
student subjects in laboratory compliance experiments cannot be
generalized to the population. It is this issue that we examine here.
Building on previous research, we present several types of evidence on
the external validity of experiments on individual compliance decisions.
A first question examines whether behavior of laboratory participants is
replicated by behavior of individuals making a similar decision in the
naturally occurring world; that is, do participants in laboratory
experiments exhibit different patterns of behavior than individuals in a
similar naturally occurring setting? To answer this question, we utilize
a special data set from the U.S. Internal Revenue Service (IRS)
assembled as part of its National Research Program (NRP). These data
allow us to compare actual taxpayer behavior with data generated by
laboratory subjects, where everyone is engaged in a similar tax
reporting decision. A second question examines a different aspect of
external validity: that is, do students behave differently than
nonstudents in identical laboratory experiments? We are able to answer
this question with further analysis of previously reported data from
laboratory experiments that compare the decisions of a population of
adults with those of undergraduate students, both of whom participate in
the identical laboratory experiment.
Together, we are therefore able to examine whether the
"moments of the data" (e.g., the mean reporting compliance
rate and its distribution) are similar when estimated in naturally
occurring versus laboratory settings, and also whether the
"treatment effects" of policy innovations are similar when
estimated with different subjects (e.g., students and nonstudents).
Our analysis indicates that there is an overall similarity between
the behavior of individual taxpayers in the field and of student
subjects making comparable decisions in the laboratory, so that data
from the laboratory closely align with data from the field. Our analysis
also indicates that student and nonstudent subjects exhibit broadly
similar behavior in the laboratory, even though there are some small
differences in their responses to individual policy treatments. These
results confirm that compliance behavior in the laboratory generalizes
beyond the laboratory.
II. THE PROMISE AND THE PITFALLS OF LABORATORY EXPERIMENTS
As a science, economics is based on the development of theory and
on the ability of that theory to explain observed behavior. However,
unlike some other sciences, economics faces difficulties in empirically
testing the predictive power of its theories using field data from the
naturally occurring world. Even where field data are readily available,
it is almost impossible to ensure the independence required to conduct
econometric research using field data (Manski 2000). (1) Controlled
field experiments can achieve this independence, and they often use
participants who are representative of the larger population of
interest; however, field experiments require simplified procedures, they
are costly to implement, and they may raise ethical issues. (2) Overall,
despite many significant methodological advances in recent years, there
are few instances in which identification using field data, whether
naturally occurring or from controlled field experiments, is
uncontroversial and easily achieved. (3)
The use of laboratory experiments is a different response to these
difficulties. Experimental methods involve the creation of a real
microeconomic system in the laboratory, one that parallels the naturally
occurring world that is the subject of investigation and one in which
subjects (usually students) make decisions that yield individual
financial payoffs whose magnitude depends on their decisions. (4) The
essence of this system is control over the environment, the
institutions, the incentives, and the preferences that subjects face.
Control over preferences is particularly crucial, and is achieved via
the method of "induced values." As described by Smith (1976),
"[s]uch control can be achieved by using a reward structure to
induce prescribed monetary value on actions."
Tax compliance seems especially amenable to laboratory
investigation. Theoretical models yield ambiguous results when asked to
incorporate many of the factors deemed relevant to the individual
compliance decision, and many empirical studies of tax compliance using
field data are plagued by the absence of reliable information on
individual compliance decisions. It is difficult to measure--and to
measure accurately--something that by its very nature people want to
conceal. Even when data are available and not subject to confidentiality
restrictions, it is also difficult to control in econometric work for
the resulting errors in variables and the many unobservable factors that
affect the compliance decision. Even aside from cost concerns,
controlled field experiments face many of these same problems stemming
largely from a loss of control over the decision setting. Laboratory
methods allow many factors suggested by theory to be introduced
orthogonally. Experiments also generate precise data on individual
compliance decisions, which allow econometric estimation of individual
responses in ways that are simply not possible with field data. Indeed,
laboratory methods have been used to examine a wide range of factors in
the compliance decision, factors that have not proven amenable to either
theoretical or empirical analyses with field data (Aim and Jacobson
2007).
Of course, there are some obvious limitations of laboratory
experiments, especially if the intention is to use the results for
informing public policy. Perhaps the most compelling critique comes from
Levitt and List (2007), who caution researchers about making the
"parallelism" assumption necessary in using laboratory
experiments to make general statements about behavior outside the
laboratory. As we have argued earlier, parallelism is an internal
validity issue addressed by the design. However, the deeper essence of
criticisms such as Levitt and List (2007) is the external validity of
the results. This issue can only be addressed empirically. If laboratory
results comport with field observations where such results are available
and comparable, then one has greater confidence in applying the
laboratory results in cases where field data are not available.5
Of perhaps most relevance to the external validity of compliance
experiments are subject pool effects. It is typically the case that
laboratory subjects for various tax compliance experiments are drawn
from student populations. Levitt and List (2007) suggest that student
responses are unlikely to be the same as nonstudent responses in large
part because students are younger, better educated, less representative,
and less experienced in the decisions being examined than nonstudents.
If valid, these concerns are especially germane for tax compliance
experiments where a common comment on experimental analysis of tax
compliance is that "undergraduate volunteers differ from the
taxpayer population in very important ways," and so cannot
"tell us something" about typical taxpayer behavior (Gravelle
2009).
Subject pool effects can be examined by comparing the responses of
student subjects with nonstudent subjects in (more-or-less) identical
laboratory experiments. There are relatively few such studies, but the
available evidence is that the experimental responses of students are
often largely the same as the responses of other subject pools in
similar laboratory experiments (Ball and Cech 1996; Chamess and Villeval
2009; Guth and Kirchkamp 2012; Guth, Schmidt, and Sutter 2007; Plott
1987). (6) Plott (1987) reports comparisons of behavior of student
subjects with those of corporate executives in the same policy decision
setting, and he observes similar decisions among the student subjects
and the executives. Dyer, Kagel, and Levin (1989) study bidding behavior
in auctions using experienced traders and students as subjects, and find
similar results; Shogren etal. (1999) also find comparable responses
between student and nonstudent subjects in a study of food safety
choices in retail, survey, and experimental settings.
Also of importance for the external validity of compliance
experiments are context effects. "Context" relates to the
complex combination of individuals' perceptions and past
experiences that influence how individuals respond in a laboratory
setting designed to mimic the naturally occurring setting; that is, does
the context in the laboratory decision resemble the context in the field
for the same decision? The contextual setting effect can be examined by
comparing student and nonstudent responses in laboratory experiments to
the responses of participants in similarly constructed controlled field
experiments, in which the same basic choice is examined in both
settings. Brookshire, Coursey, and Schulze (1987) compare prices
obtained from buyers of strawberries in a laboratory setting versus
those in a field setting. The field setting in their study mimicked the
laboratory market institution, but they implemented it with nonstudents
making purchase decisions in their homes rather than in the laboratory.
They find equivalent bidding behavior in both settings. More recently,
there are investigations of behavior of fishermen (Carpenter and Seki
2011) and of water markets (Chermak et al. 2013). A range of other
studies is summarized by Camerer (2011), in which student responses in
laboratory experiments are compared to responses of participants in
controlled field experiments in such areas as sports card trading,
open-air flea markets, donations to student funds, soccer, communal
fishing ponds, proofreading/exam grading, and restaurant spending. In
most--although not in all--cases, these comparisons have shown no
significant differences in behavior.
This previous literature has considered subject pool and context
effects, but rarely have both been examined in the pursuit of a common
decision setting and, to our knowledge, there are no studies that have
looked at tax compliance. (7) Our focus is on tax reporting behavior,
and we ask whether behavior observed in the laboratory is likely to be
similar to the behavior observed in the naturally occurring environment.
Specifically, do participants in laboratory experiments behave
differently than individuals in a similar but naturally occurring
setting? Further, do student subjects in tax compliance laboratory
experiments behave differently than nonstudent subjects in identical
laboratory experiments? The next sections present our results.
III. TEST (1): EXPERIMENTAL RESULTS VERSUS NONEXPERIMENTAL RESULTS
A first type of evidence compares experimental behavior with
behavior in similar but naturally occurring settings (i.e., the field),
in order to determine whether patterns of behavior in the field match
patterns of behavior in the laboratory. For this evaluation of context
effects, we compare the behavior of student subjects in experiments with
that of individuals making similar decisions in the field who are
subjected to random taxpayer audits conducted under the NRP of the IRS.
We discuss the data, and we then present comparisons of tax reporting
compliance by students in laboratory experiments versus actual
taxpayers.
A. Data: Taxpayer Sample versus Experimental Sample
The comparisons here involved two separate data sets: taxpayer
(field) data and experimental (laboratory) data. The "Taxpayer
Sample" is a subsample of NRP data for tax year (TY) 2001 (Bennett
2005). In that year the NRP audited tax returns of 44,768 taxpayers
selected using stratified random sampling, which can be weighted to
represent the population of 125.8 million taxpayers who filed timely tax
returns. Our subsample consisted of taxpayers whose sole source of
income (pre- and post-audit) is from a Schedule C sole proprietorship.
Filers with Schedule C income were selected because this source of
income has no third-party information reporting (e.g., Form W-2 for wage
income), and this mimics our laboratory setting in which there is no
matched information on earnings. This reduced the NRP data to 1,673 NRP
audit cases weighted to represent 1.1 million taxpayers. Our subsample
was further narrowed to make the tax reporting task as similar as
possible to the situation faced by laboratory subjects. Only those
Schedule C filers having positive taxable income as determined by the
examiner were selected. Again, excluding taxpayers with zero taxable
income was done to ensure that taxpayers selected for comparison share
circumstances similar to those faced by experimental subjects who decide
to report none, some, or all of a positive amount of income. The
resulting sample of taxpayers contained 1,101 NRP audit cases weighted
to represent the tax returns of 559,555 individuals. Finally, within
this data set, there were 29 cases where reported taxable income
exceeded the amount of taxable income following examination. These cases
(representing 13,131 taxpayers) were assumed to have 100% reporting
compliance. (8)
Table 1 displays summary statistics for the Taxpayer Sample. The
figures in the two rightmost columns refer to the mean of the individual
reporting compliance rates and the overall mean reporting compliance
rate, defined as the amount of taxable income reported divided by the
amount of taxable income per exam. (9) The range of taxable income per
exam for this sample spans five orders of magnitude from less than $40
to more than $4 million. The probabilities of audit for individual
taxpayers as a whole in calendar year 2002 were .57% and 1.72% for all
Schedule C filers (Internal Revenue Service 2002).
The experimental data ("Experimental Sample") were
collected from college-age subjects using a basic experimental design
similar to the design discussed in more detail later (Aim, Deskins, and
McKee 2009; Aim, Jackson, and McKee 2009; Aim and McKee 2004; Aim et al.
2010, 2012; McKee et al. 2008). (10) Participants earned income, chose
whether to file a tax return, and (conditional upon filing)
self-reported tax liability to the tax authority at an announced tax
rate. Audits occurred with an announced probability, and any
underreporting was discovered by the audit. If the participant had not
paid the appropriate tax, then both unpaid taxes and penalties were
collected. This process was repeated over multiple rounds, and subjects
were paid their after-tax earnings at the end of the experiment.
The "Full Sample" of these experimental data consisted of
16,560 observations from 1,072 individual subjects, and contained
observations for base case (or no treatment) scenarios and several
treatment scenarios, including the existence of a public good,
unofficial communication among participants, and official communication
from the tax authority. In our comparisons, we used a "Selected
Sample," or data from only the base case scenarios. (11) In these
base case sessions, participants were informed of the number of audits
performed (including zero if no audits were performed) following each
round. This is similar to the IRS policy that makes publicly available
the number of audits performed each year. The Selected Sample subset had
3,780 observations from 252 individuals. Descriptive statistics for both
samples are shown in Table 2, for the five different audit rates in the
experiments.
B. Mean Reporting Compliance Rates
A comparison of Tables 1 and 2 showed that mean reporting
compliance rates (computed as the average of individual compliance
rates) for the lowest two audit rate categories in the Selected Sample
of the Experimental Sample were comparable to the unweighted mean
compliance rate for individuals in the Taxpayer Sample. (12) The mean
reporting compliance rate in the Experimental Sample is .286 when the
audit rate is zero and .368 when the audit rate is .05. Assuming the TY
2001 audit rate of 1.72% for Schedule C filers, we interpolated a
reporting compliance rate of .314 for the Experimental Sample for this
audit rate. This rate was essentially identical to the unweighted mean
reporting compliance rate for individuals of .313 for the Taxpayer
Sample. (13)
We also conducted a simple test to determine if the observed
difference is or is not statistically significant. For this test, we
constructed four additional taxpayer samples using NRP data for TY 2006
to TY 2009 using the same criteria applied to the TY 2001 NRP data.
Using these data, we calculated mean unweighted reporting compliance
rates of .341, .321, .324, and .327, respectively, for these additional
tax years, all with a largely unchanged audit rate. The average mean
reporting compliance for all the five observations was .325, and the 95%
confidence interval was .013 based on a standard deviation of .010 using
these five observations. This implied that the population mean was
between .312 and .338, which encompasses our interpolated value of .314
for the Experimental Sample. (14) This finding could be further
strengthened by having more experimental observations on
individuals' reporting behavior for audit probabilities that
reflect more closely the conditions in the naturally occurring world
(i.e., audit probabilities between .00 and .05), as well as by
additional years of NRP data.
C. Distribution of Reporting Compliance Rates
Another way to externally validate the experimental data is to
compare the distribution of subjects' reporting compliance rates to
those of actual taxpayers. Figure 1 displays the distribution of
reporting compliance rates for the Taxpayer Sample (unweighted and
weighted), and Figure 2 shows the distribution of individual reporting
compliance rates for the Experimental Sample for different audit rates.
(We omit in Figure 2 the observations from the Experimental Sample where
the audit probability is .40 for brevity.)
[FIGURE 1 OMITTED]
Visual inspection of these plots revealed that both the Taxpayer
Sample and the Experimental Sample have a bimodal distribution and an
apparently random distribution of observations between these two modes.
It is also evident from these plots that both samples exhibited a small
and similarly sized group of individuals who exhibited 100% compliance
even though the rational choice (from a purely economic standpoint) is
to underreport income. Once again, laboratory experiments can reliably
replicate the behavior in the naturally occurring world. (15)
[FIGURE 2 OMITTED]
IV. TEST (2): STUDENTS VERSUS NONSTUDENTS IN IDENTICAL EXPERIMENTS
The comparison of experimental behavior with behavior in similar
but naturally occurring settings (i.e., the field) addresses one aspect
of external validity (context effects). A second type of evidence
compares student and nonstudent subjects in the same experimental
setting in order to address subject pool effects.
These comparisons are based on further analysis of data derived
from laboratory experiments conducted by Aim et al. (2010, 2012). (16)
In both studies, the subject pool consisted of students and nonstudents,
but the focus in those papers was on the policy instrument performance
rather than subject pool effects. Here, we used these data to address
the issue of subject pool effects by testing whether behavior is
statistically different across the student versus nonstudent pools. By
using two studies involving different subject pools and run at different
times, we broadened the base for analyzing the effects of alternate
subject pools. We first discuss the experimental designs, and we then
present the comparison of student versus nonstudent responses.
A. The Experimental Designs
The basic experimental setting was common to both papers, and
implemented the fundamental elements of the voluntary filing and
reporting system of the individual income tax in most countries. The
setting was "context rich," in that tax language was used
throughout. Participants earned income by performing a task, chose
whether to file a tax return, and (conditional upon filing)
self-reported tax liability to the tax authority at an announced tax
rate. At the time of filing and reporting decisions, only the individual
knew his or her true (or expected) level of tax liability, and could
choose to file and then to report any amount from zero on up. Audits
occurred with an announced probability, and any underreporting was
discovered by the audit, and the participant was required to pay unpaid
taxes and penalties if he or she had not paid the appropriate taxes.
This process was repeated over a number of rounds each representing a
"tax year." Participants were informed that they would be paid
their after-tax earnings at the end of the experiment, converted from
lab dollars to U.S. dollars at a fixed and announced conversion rate.
The sessions lasted 20 rounds; this was not announced to the subjects.
Participants were told, with certainty, of the audit probability,
the penalty rate, and the tax rate. The tax rate was set at 35% for all
sessions; the penalty rate was also fixed for all sessions at 150%
(i.e., unpaid taxes plus a penalty of 50% of unpaid taxes if audited).
The audit probability for filed tax returns was varied once within the
session. Participants were also told that there was a zero probability
of audit if no tax form was filed. (17) There was no public good
financed by the tax payments in order to focus subject attention
entirely on the filing and tax reporting tasks rather than fiscal
exchange.
Into this setting, various policy innovations were introduced. A
first set of experiments (Aim et al. 2010) investigated the effects of
information services on compliance decisions. Here, the basic tax
reporting decision was "complicated" in different treatments
through the introduction of uncertainty regarding the true tax
liability, and then information services were provided by the "tax
administration" that partially or fully resolved the uncertainty,
thereby allowing subjects to compute their tax liabilities more easily.
Also contributing to complicating the decisions were a tax deduction
(comparable to an itemized deduction) and a tax credit (comparable to
the Earned Income Tax Credit), each of which was conditional upon
filing. The tax deduction was set at 15% of income, and the tax credit
began at a given level and declined at a stated rate as income
increases. As a treatment, the exact levels of the deduction and credit
were uncertain to the taxpayer at the time of filing. Uncertainty was
implemented via mean-preserving spreads (with a uniform distribution) in
each, where the participants were informed of the means and the ranges
of the allowed credit and deduction. As an additional treatment,
information services were provided that resolved the uncertainty. The
information was complete, accurate, and costless to the participant.
A more direct set of positive inducements was also investigated in
a second set of experiments (Aim etal. 2012). In one treatment, income
tax credits were introduced that were available to participants but only
to those who hied a tax return. In a second treatment a "social
safety net" (e.g., unemployment replacement income) was present in
which individuals faced some probability of unemployment but replacement
income could be provided, with the benefits conditional upon past filing
behavior. There was a known probability of unemployment, and, if the
individual became unemployed and earned no income, then he or she was
unemployed for two periods. Unemployment replacement income was received
only if the individual had filed a tax return in each of the two
previous periods, the level of which was based on reported income.
These various treatments are summarized in Tables 3 and 4, with
Table 3 showing the information services design of Aim et al. (2010) and
Table 4 showing the positive inducements design of Aim etal. (2012).
(18) In Table 3, treatment T1 provides a baseline setting that entails
no uncertainty and no tax authority information. The second treatment
(T2) introduces tax liability uncertainty, in which participants face
uncertainty regarding their allowed deduction and tax credit. The third
treatment (T3) entails the same uncertainty as in the second treatment,
but introduces the option of resolving the uncertainty by receiving
information from the tax authority; that is, participants in this
treatment were able to click on a button to reveal the true levels of
the deduction and the tax credit. In Table 4, treatment T4 establishes a
baseline with no positive inducements, a tax credit is introduced in T5,
and an unemployment benefit is introduced in T6. The parameters used for
the different treatments are reported in Table 5. The Appendix shows a
representative screen. (19)
As noted above, the experimental interface and instructions made
intensive use of tax language. Participants also decided whether or not
to file a tax return. They disclosed tax liability in the same manner as
on the typical tax form (e.g., entering income, deductions, and credits
on a tax form). There was a time limit on the filing of income,
comparable to a filing deadline, and the individual was automatically
audited if he or she failed to file on time. A timer was shown on the
screen; when 15 seconds remained, the timer changed the color to red,
and the clock began to flash as a reminder that the filing period was
about to end.
The dedicated laboratory consisted of 25 networked computers, a
server, and software designed for these experiments. Sessions were
conducted at a major state university, using both students and staff as
participants. (20) Recruiting was conducted using the Online Recruiting
System for Experimental Economics (ORSEE) developed by Greiner (2004).
The participant database was built using announcements sent via email to
all students and staff. Participants were invited to a session via email
and were permitted to participate in only one tax experiment. The
experiments followed procedures that implemented a single- and
double-blind setting (e.g., no subject communication, use of computer
screens to convey information, no individual identification, complete
privacy in subject payment). Methods adhered to all guidelines
concerning the ethical treatment of human subjects.
Of most importance for the purposes of this analysis, participants
included both students and nonstudents, thereby allowing one aspect of
the external validity of experiments to be examined: do students behave
differently than nonstudents in identical experiments? A given session
consisted of either student or nonstudent participants, not both. The
experimental design was identical for students and nonstudents, with
only the compensation varied for students and nonstudents by means of
the exchange rate. The sessions lasted approximately 1 hour. For student
participants, the conversion rate was 80 lab dollars to 1 U.S. dollar,
while staff participants received a higher exchange rate to reflect
their higher outside earnings, with a conversion rate of 50 lab dollars
to 1 U.S. dollar. Earnings averaged $18 for student subjects. The
average payoff for staff was $28.
B. Laboratory Experimental Results
A total of 347 individuals participated in a session in one of the
two series of experiments. We present the distribution of subjects and
some basic demographic data by treatment in Table 6. In the sessions
designed to investigate the role of tax information services (T1 through
T3), there were 131 subjects, 54% of whom were students. In the sessions
designed to investigate the effects of positive inducements (T4 through
T6), there were 216 subjects with 55% students. Note that while the
design of the experiments was balanced in terms of treatments, it was
not completely balanced in terms of equal numbers of students and
nonstudents for each treatment. Because the design was not strictly
balanced in terms of numbers of students/nonstudents, a simple
comparison by treatment of results for students versus nonstudents
(e.g., average compliance rates) may be misleading. Instead, we focus on
our econometric results because this method includes control variables
that address subject characteristics. (21)
In order to control for various factors, for each series, we
estimated the conditional effects of design parameters on reporting
behavior, while holding other factors constant. We estimated these
responses separately for the two experimental designs, using the basic
specifications of:
Information Services
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
Positive Inducements
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where the dependent variable [Y.sub.i,t] denotes subject i's
decision to report income in period f; [Income.sup.it] is subject
i's earned income in period P, [Wealt.sub.i,t] is subject i's
accumulated earnings (or Wealth) in period f; [AuditProbablity.sub.it]
is the audit rate for subject i in period t;
[TaxLiabilityUncertainty.sub.t] is an indicator variable that signifies
the presence of uncertainty about tax features in period t;
[TaxAgencyInformation.sub.t] is an indicator variable that signifies the
presence of agencyprovided information in period t: [TaxCredit.sub.i] is
an indicator variable that signifies the presence of a tax credit that
the subject can claim on filing a tax report;
[UnemploymentBenefit.sub.it] is an indicator variable that signifies the
presence of a safety net that (partially) makes up for income lost as a
result of unemployment; Xt denotes dummy variable denoting whether the
participant is a student (coded 1) or staff (coded 0) as well as
possible interaction terms, (22) [[psi].sub.t] is a set of T-1 dummies
that capture potential nonlinear period effects (T denotes the number of
time periods); [u.sub.t] are random effects that control for
unobservable individual characteristics; [e.sub.i,t] is the
contemporaneous additive error term; and [[beta].sub.k] is the
coefficient for variable k. We also included a dummy variable for
whether the individual is audited in the previous period
([LagAudit.sub.i,t]). We report results for a (subject) random effects
generalized least squares estimation of the panel data with standard
errors corrected for clustering at the subject level. The dependent
variable is defined as the reporting compliance rate of individual i in
period t, where [Y.sub.i,t] equals reported tax paid divided by true tax
owed of individual i in period t. (23)
We first estimated the models for the pooled sample (students and
nonstudents), reported as Model 1 in Table 7 for the information
services experiments (Tl, T2, and T3) and in Table 8 for the positive
inducements experiments (T4, T5, and T6). The various coefficient
estimates are consistent with expectations.
In Model 2, we introduced a dummy variable {Student) denoting
whether the subject is in the student or the nonstudent pool, equal to 1
for students and 0 for nonstudents. The coefficient on Student was not
statistically different from zero for both series of experiments. Also,
the remaining coefficients were virtually unchanged across Model 1 and
Model 2 for the different subject pools, a result that demonstrated that
the pooled analysis (e.g., students/nonstudents) in Aim et al. (2010,
2012) was appropriate.
A more critical test of subject pool effects involves testing for
differences in behavioral responses of students versus nonstudents to
the policy initiatives. To do this, we interacted the Student dummy
variable with the policy treatment variables associated with each
session. These results are reported in Model 3 in Tables 7 and 8. For
the information services setting (Table 7), tax liability uncertainty
actually increased compliance for the student pool (Model 3) while the
overall effect of uncertainty was negative (Models 1 and 2), perhaps
because the relative lack of experience in filing and reporting by the
student pool leads them to overestimate the costs of reporting errors.
However, the coefficient on the interaction effect when the information
service is offered was not different from zero; that is, we cannot
reject the null hypothesis that students and nonstudents respond in the
same way. Also, the coefficient for the information services variable in
Model 3 was largely the same as in Model 1 or Model 2.
When positive inducements were examined (Table 8), the coefficients
on the interaction terms were never significant, and the coefficients on
the treatment variables themselves were not different than when these
interaction terms are or are not included. Here, students responded to
the presence of the safety net (unemployment benefits) exactly as did
nonstudents. When the refundable credit was interacted with student
status, the coefficient was again not significant, also indicating that
the subject pools respond in the same manner. (24)
Another policy variable modified during these experiments is the
change in the probability of being selected for an audit. Only two audit
rates were implemented in these experiments, and it is not overly
surprising that the coefficient on Audit Probability was never
significant. In addition, we interacted Audit Probability and Student in
Model 4 in Tables 7 and 8. The coefficient on this interaction term was
not significant either for the information services series (Table 7) or
for the positive inducement series (Table 8).
In sum, the coefficient on the subject-type dummy variable by
itself was never statistically significant for either set of
experiments, and the coefficients on the student-treatment interaction
terms were also insignificant in almost all cases, with the only
exception arising over uncertain tax liabilities. Overall, students
behaved largely the same as nonstudents did in identical experiments in
their reporting decisions, especially in their changes in compliance
behavior in response to the policy variables (if not necessarily in the
levels of their compliance behavior).25
V. CONCLUSIONS
Our analysis suggests two main conclusions regarding the external
validity of tax compliance experiments. Both conclusions are consistent
with the result that students and nonstudents behave largely the same.
Even so, both conclusions also suggest the areas where care must be
taken in transferring the results of laboratory experiments to field
settings.
First, experimental data versus nonexperimental (NRP) data indicate
very similar patterns. The comparison of the Taxpayer and Experimental
Samples finds that the experimental data can reliably replicate known
features of taxpayer compliance behavior for similar decisions in the
naturally occurring world, including a bimodal distribution of reporting
compliance rates, a largely random distribution of individuals between
the extremes, and the existence of a small group of "pathologically
honest" individuals who report 100% of income. The Taxpayer and
Experimental Samples also appear quite similar with respect to a point
estimate of the levels of reporting compliance, with the caveats noted
above. The interpolated reporting compliance rate for the Experimental
Sample is indistinguishable from the mean reporting compliance rate for
individuals for the Taxpayer Sample, and is within the 95% confidence
interval based on five independent NRP sample data sets for TY 2001 and
TY 2006 to TY 2009.
Second, the experimental responses of students are largely similar
to the experimental responses of nonstudent subject pools when faced
with policy treatments. When student status is interacted with the
policy changes being implemented, the resulting coefficients are not
generally significant. However, there is at least one exception to this
result, and this gives rise to a caveat: care must be taken when the
policy treatment may incorporate a substantial level of external
experience. We find that students respond differently to the presence of
tax liability uncertainty, and our conjecture is that this may be the
result of nonstudent subjects having more experience with this specific
phenomenon in the field. Regardless, however, we still find that the
changes in compliance behavior in response to institutional changes
(treatments) of these pools (if not always their levels) largely
parallel each other.
In sum, our results are consistent with studies showing that
laboratory behaviors largely parallel real-world behaviors in settings
that compare similar types of decisions in similar types of settings.
Our results are also consistent with studies that demonstrate that
student and nonstudent subjects behave and especially respond similarly.
Concerns with the external validity of experimental results, at least in
the context of tax compliance and in the comparison of changes in
behavior, seem largely unwarranted.
Even so, we recognize that one must use the results from laboratory
experiments with some care. However, such use depends largely upon the
purpose of the experiment. According to Roth (1987), experiments can be
classified into three broad categories that depend upon the dialog in
which they are meant to participate. "Speaking to Theorists"
includes those experiments designed to test well-articulated theories.
"Searching for Facts" involves experiments that examine the
effects of variables about which existing theory has little to say.
"Whispering in the Ears of Princes" identifies those
experiments motivated by policy issues. To date, most experiments in
behavioral public economics have fallen into the first two categories.
However, this is now changing, and experiments are being increasingly
used to illuminate policy debates, especially in the area of tax
compliance.
In sum, we believe that the reported results demonstrate that
laboratory experiments in the area of tax compliance behavior meet the
key conditions for external validity. This is an important result,
especially because empirical analyses of compliance behavior with
naturally occurring field data are limited and field experiments of
compliance are costly to implement. We do not argue that laboratory
experiments can be used to calibrate field results (e.g., provide point
estimates). The stakes are obviously smaller in the laboratory, and the
decision settings are necessarily less rich. Thus, the magnitudes of the
responses to the external stimuli will be different in the two
environments. However, as Kessler and Vesterlund (forthcoming) argue,
" ... for most laboratory studies it is only relevant to ask
whether the qualitative results are externally valid" (e.g., the
direction of response), and not whether an exact quantitative result
(e.g., the magnitude of response) is found in laboratory versus field
data. They contend that "... there is much less (and possibly no)
disagreement on the extent to which the qualitative results of a
laboratory study are externally valid." Indeed, our results in this
paper are largely consistent with their position: we have shown that the
behavioral patterns are sufficiently similar that we can safely predict
the effects that would arise in the field from a policy based on the
results observed in the laboratory.
We find the result of our investigations both comforting and
plausible. We believe that these results suggest that the burden should
now be on skeptics to prove that results from laboratory compliance
experiments differ in meaningful ways from the behavior we observe in
the field.
APPENDIX: SAMPLE EXPERIMENTAL INSTRUCTIONS AND SCREEN SHOT
[POSITIVE INDUCEMENTS VIA SOCIAL PROGRAMS EXPERIMENTS - UNEMPLOYMENT
BENEFITS]
INTRODUCTION
You are about to participate in an experiment in economic decision
making. Please follow the instructions carefully, as the amount of money
you earn in the experiment will depend on your decisions. At the end of
today's session, you will be paid your earnings privately and in
cash. Please do not communicate with other participants during the
experiment unless instructed. Importantly, please refrain from verbally
reacting to events that occur during the experiment.
Today's experiment will involve several decision
"rounds." You will not know the number of rounds until the end
of the experiment. The rounds are arranged into multiple series. After
all decision rounds are finished, we will ask you to complete a
questionnaire.
Aside from decisions in "training" rounds, each decision
impacts your earnings, which means that it is very important to consider
each decision carefully prior to making it. Each decision round is
separate from the other rounds, in the sense that the decisions you make
in one round will not affect the outcome or earnings of any other round.
All money amounts are denominated in lab dollars, and will be exchanged
at a rate of xxx lab dollars to US$1 at the end of the experiment.
There are four parts to each decision round: the Income Earning
Stage, the Tax Reporting Stage, the Audit Determination Stage, and the
Round Summary Stage. We will now describe each part.
INCOME EARNING STAGE
In each round or period, you will complete a task that determines
your income for the round. You will be required to sort the numbers 1
through 9 into the correct order. The task is timed. The person
completing the task in the shortest time earns the highest income, the
second fastest the second highest income, and so on.
TAX REPORTING STAGE
When the tax year has finished, you enter the tax reporting or
filing stage. You will know your income and your allowable deductions
and credits but these amounts are not known to the tax agency. You will
fill out and file a tax form as you saw in the computer instructions.
After you choose income and deduction amounts to report, you click
on the "FILE TAXES" button to submit your tax form. Your taxes
are determined by subtracting what you report in deductions from what
you report in income, and multiplying this difference by the tax rate of
35%. On your screen, this amount is included among the tax form
calculations as "Reported Taxes."
There is a timer on the tax reporting screen. If you do not file
the tax form before the time runs out, this will be treated the same as
if you submitted a form that reported 0 in income and 0 in deductions.
In addition, your tax form will be automatically audited. In other
words, it is not in your best interest to let the tax reporting screen
time out!
AUDIT DETERMINATION STAGE
There is a chance that you will be randomly selected for audit. You
will know this chance prior to making your tax reporting decisions. The
chance does not increase or decrease depending on your current or past
reporting choices or on the decision made by others in the group. This
is a random selection process.
After you file the tax form, you will see an audit screen. While
you are on this screen, the computer is randomly determining whether to
select you for audit. This selection is done separately for each
participant and each round.
If you are selected for audit, your reported income, credits, and
deductions will be checked against your actual income, credits, and
deductions. These amounts will be checked separately. If you
underreported your taxes, all unpaid taxes will be discovered. If you
are not audited, however, no unpaid taxes will be discovered.
If you are audited, you will have unpaid taxes if you reported too
little in income or too much in deductions or credits. Unpaid taxes are
calculated as the difference between your actual and reported amounts
multiplied by the tax rate. Any unpaid taxes discovered in the audit
must be paid back.
If you have unpaid taxes, a penalty of 150% will be assessed. What
this means is that, if you are audited, for every lab dollar in unpaid
taxes you will have to pay back the 1 dollar you owed, and in addition
you will have to pay .5 lab dollars in penalties.
[FIGURE A1 OMITTED]
ROUND SUMMARY STAGE
After the tax reporting decision, three things can happen: (1) you
are not audited; (2) you are audited but you did not underreport your
taxes; or (3) you are audited and you did underreport your taxes. Your
earnings are, of course, the same for the first two scenarios. The
computer will calculate your earnings for you.
UNEMPLOYMENT
There is a chance that you will be unemployed in a round. The
chance of this happening is shown on your screen as described in the
computer instructions. If you are unemployed, you will not complete the
income earning task in that round. Instead, you will receive
unemployment benefits if you filed a tax form in the previous two
rounds, calculated as 50% of the average income you reported in the
previous two rounds on your filed tax forms. However, if you have not
filed a tax form for the previous two rounds (both rounds), your
unemployment benefits will be zero, and you will earn no income for the
rounds you are unemployed.
BEGINNING THE EXPERIMENT
We have now finished the instructions. We will continue on to a
second training round. As with the first training round, your decisions
in the training round will not affect your earnings. After the training
round, you will have a final opportunity to ask questions.
ABBREVIATIONS
IRS: U.S. Internal Revenue Service
NRP: National Research Program
doi: 10.1111/ecin.12196
REFERENCES
Aim, J., T. Cherry, M. Jones, and M. McKee. "Taxpayer
Information Assistance Services and Tax Reporting Behavior."
Journal of Economic Psychology, 31(4), 2010, 577-86.
--. "Social Programs as Positive Inducements for Tax
Participation." Journal of Economic Behavior & Organization,
84(1), 2012, 85-96.
Alm, J., J. Deskins, and M. McKee. "Do Individuals Comply on
Income Not Reported by Their Employer?" Public Finance Review,
37(2), 2009, 120-41.
Aim, J., B. R. Jackson, and M. McKee. "Estimating the
Determinants of Taxpayer Compliance with Experimental Data."
National Tax Journal, 45(1), 1992, 107-14.
--. "Getting the Word Out: Increased Enforcement, Audit
Information Dissemination, and Compliance Behavior." Journal of
Public Economics, 93(3-4), 2009, 392-402.
Aim, J., and S. Jacobson. "Using Laboratory Experiments in
Public Economics." National Tax Journal, 60(1), 2007, 129-52.
Aim, J., and M. McKee. "Tax Compliance as a Coordination
Game." Journal of Economic Behavior & Organization, 54(3),
2004, 297-312.
Armender, 0., and A. Boly. "On the External Validity of
Experiments in Corruption," in New Advances in Experimental
Research on Corruption. Research in Experimental Economics, Vol. 15,
edited by D. Serra and L. Wantchekon. Bingley, UK: Emerald Group, 2012,
117-44.
Ball, S. B., and P.-A. Cech. "Subject Pool Choice and
Treatment Effects in Economic Laboratory Research." Research in
Experimental Economics, 60, 1996, 239-92.
Bennett, C. "Preliminary Results of the National Research
Program's Reporting Compliance Study of Tax Year 2001 Individual
Returns." Paper presented at the Annual IRS Research Conference,
Washington, D.C., 2005.
Bigoni, M., G. Camera, and M. Casari. "Strategies of
Cooperation and Punishment among Students and Clerical Workers."
Journal of Economic Behavior & Organization, 94, 2013, 172-82.
Bott, K., A. W. Cappelen, E. 0. Sprensen, and B. Tungodden.
"You've Got Mail: A Randomised Field Experiment on Tax
Evasion." Norwegian School of Economics and Business Administration
Discussion Paper. Oslo, Norway, 2013.
Brewer, M. B. "Research Design and Issues of Validity,"
in Handbook of Research Methods in Social and Personality Psychology,
edited by H. T. Reis and C. M. Judd. Cambridge: Cambridge University
Press, 2000, 3-16.
Brookshire, D" D. Coursey, and W. D. Schulze. "The
External Validity of Experimental Economics Techniques: Analysis of
Demand Behavior." Economic Inquiry, 25(2), 1987, 239-50.
Camerer, C. F. "The Promise and Success of Lab-field
Generalizability in Experimental Economics: A Critical Reply to Levitt
and List." California Institute of Technology, Division of the
Humanities and Social Sciences Working Paper. Los Angeles, CA, 2011.
Campbell, D. T., and J. C. Stanley. Experimental and
QuasiExperimental Designs for Research. Chicago: Rand McNally College
Publishing, Co., 1966.
Cappelen, A. W., K. Nygaard, E. 0. Sprensen, and B. Tungodden.
"Efficiency, Equality and Reciprocity in Social Preferences: A
Comparison of Students and a Representative Population." Norwegian
School of Economics and Business Administration Discussion Paper. Oslo,
Norway, 2010.
Carpenter, J., and E. Seki. "Do Social Preferences Increase
Productivity? Field Experimental Evidence from Fisherman in Toyama
Bay." Economic Inquiry, 49(2), 2011, 612-30.
Castro, L., and C. Scartascini. "Tax Compliance and
Enforcement in the Pampas: Evidence from a Field Experiment." IDB
Working Paper Series No. IDB-WP-472. Washington, D.C.: Inter-American
Development Bank, 2013.
Charness, G., and M.-C. Villeval. "Cooperation and Competition
in Intergenerational Experiments in the Field and the Laboratory."
American Economic Review, 99(3), 2009, 956-78.
Chermak, J. M" K. Krause, D. S. Brookshire, and H. S. Burness.
"Moving Forward by Looking Back: Comparing Laboratory Results with
Ex Ante Market Data." Economic Inquiry, 51(1), 2013, 1035-49.
Davis, D. D., and C. A. Holt. Experimental Economics. Princeton,
NJ: Princeton University Press, 1993.
Dyer, D.. J. H. Kagel, and D. Levin. "A Comparison of Naive
and Experienced Bidders in Common Value Offer Auctions: A Laboratory
Analysis." Economic Journal, 99(1), 1989, 108-15.
Erard, B., and C.-C. Ho. "Searching for Ghosts: Who Are the
Non-filers and How Much Tax Do They Owe?" Journal of Public
Economics, 81(1), 2001, 25-50.
Falk, A., and J. J. Heckman. "Lab Experiments Are a Major
Source of Knowledge in the Social Sciences." Science, 326(5952),
2009, 535-8.
Fellner, G., R. Sausgruber, and C. Traxler. "Testing
Enforcement Strategies in the Field: Threat. Moral Appeal and Social
Information." Journal of the European Economic Association, 11(3),
2011, 634-60.
Ferber, R., and W. Z. Hirsch Social Experimentation and Public
Policy. Cambridge, MA: Cambridge University Press, 19S2.
Frechette, G. R. "Laboratory Experiments: Professionals versus
Students," in The Methods of Modern Experimental Economics, edited
by G. R. Frechette and A. Schotter. New York: Oxford University Press,
forthcoming.
Gramlich, E. M. "Reflections of a Policy Economist." The
American Economist, 41(1), 1997, 22-30.
Gravelle, J. "Comments on Innovative Approaches to Improving
Tax Compliance." The IRS Research Bulletin, Recent Research on Tax
Administration and Compliance, Selected Papers Given at the 2008 IRS
Research Conference. Washington, D.C., 2009, 59-60.
Greiner, B. "The Online Recruitment System ORSEE 2.0: A Guide
for the Organization of Experiments in Economics." Working Paper
Series in Economics 10, Department of Economics, University of Cologne.
Cologne, Germany, 2004.
Giith, W., and O. Kirchkamp. "Will You Accept Without Knowing
What? The Yes-No Game in the Newspaper and in the Lab."
Experimental Economics, 15(4), 2012, 656-66.
Giith, W" C. Schmidt, and M. Sutter. "Bargaining Outside
the Lab--A Newspaper Experiment of a Three-person Ultimatum Game."
Economic Journal, 117(518), 2007, 449-69.
Harrison, G. W" M. Lau, and E. E. Rutstrom. "Theory,
Experimental Design and Econometrics Are Complementary (And So Are Lab
and Field Experiments)," in The Methods of Modern Experimental
Economics, edited by G. R. Frechette and A. Schotter. New York: Oxford
University Press, forthcoming.
Harrison, G. W., and J. A. List. "Field Experiments."
Journal of Economic Literature, 42(2), 2004, 1009-55.
Heckman, J. J., and J. A. Smith. "Assessing the Case for
Social Experiments." Journal of Economic Perspectives, 9(2), 1995,
85-110.
Internal Revenue Service. "IRS Data Book 2002, Publication
55B." Washington, D.C., 2002. Accessed November 12, 2014.
http://www.irs.gov/pub/irs-soi/02databk.pdf.
--. "Tax Gap for Tax Year 2006." Washington, D.C., 2012.
Accessed November 12, 2014. http://www.irs.
gov/pub/newsroom/overview_tax_gap_2006.pdf.
Iyer, G. S., P. M. J. Reckers, and D. L. Sanders, "increasing
Tax Compliance in Washington State: A Field Experiment." National
Tax Journal, 63(1), 2010, 7-32.
Kagel, J. H. "Laboratory Experiments," in The Methods of
Modern Experimental Economics, edited by G. R. Frechette and A.
Schotter. New York: Oxford University Press, forthcoming.
Kagel, J. H., and A. E. Roth, ed. The Handbook of Experimental
Economics. Princeton, NJ: Princeton University Press, 1995.
Kessler, J., and L. Vesterlund "The External Validity of
Laboratory Experiments: The Misleading Emphasis on Quantitative
Effects," in The Methods of Modern Experimental Economics, edited
by G. R. Frechette and A. Schotter. New York: Oxford University Press,
forthcoming.
Kleven, H. J., M. B. Knudsen, C. T. Kreiner, S. Pedersen, and E.
Saez. "Unwilling or Unable to Cheat? Evidence from a Randomized Tax
Audit Experiment in Denmark." Econometrica, 79(3), 2011, 651-92.
Learner, E. E. "Let's Take the Con Out of
Econometrics." American Economic Review, 73(1), 1983, 31-43.
Levitt. S. D., and J. A. List. "What Do Laboratory Experiments
Measuring Social Preferences Reveal About the Real World?" Journal
of Economic Perspectives, 21(2), 2007, 153-74.
List, J. A. "The Behavioralist Meets the Market: Measuring
Social Preferences and Reputation Effects in Actual Transactions."
Journal of Political Economy, 114(1), 2006, 1-37.
Manski, C. F. "Economic Analysis of Social Interactions."
Journal of Economic Perspectives, 14(3), 2000, 115-36.
McKee, M., J. Aim, T. Cherry, and M. Jones. "Final Report for
TIRNO-07-P-00683 on Behavioral Tax Research." Washington, D.C.,
2008.
Plott, C. R. "Dimensions of Parallelism: Some Policy
Applications of Experimental Methods," in Laboratory
Experimentation in Economics: Six Points of View, edited by A. E. Roth.
New York: Cambridge University Press, 1987, 193-229.
Pomeranz, D. "No Taxation without Information: Deterrence and
Self-Enforcement in the Value Added Tax." NBER Working Paper 19199.
Cambridge, MA: National Bureau of Economic Research, 2013.
Roth, A. E. "Laboratory Experimentation in Economics," in
Advances in Economic Theory, Fifth World Congress, edited by T. Bewley.
Cambridge: Cambridge University Press, 1987, 269-99.
Scheiber, N. "Freaks and Geeks." The New Republic,
2(April), 2007, 27-31.
Shadish, W. R., T, D. Cook, and D. T. Campbell. Experimental and
Quasi-Experimental Designs for Generalized Causal Inference. Boston, MA:
Houghton Mifflin, Inc., 2002.
Shogren, J.. J. Fox, D. Hayes, and J. Roosen. "Observed
Choices for Food Safety in Retail, Survey, and Auction Markets."
American Journal of Agricultural Economics, 81(5), 1999, 1192-99.
Slemrod, J., M. Blumenthal, and C. Christian. "Taxpayer
Response to an Increased Probability of Audit: Evidence from a
Controlled Experiment in Minnesota." Journal of Public Economics,
79(3), 2001,455-83.
Smith, V. L. "Experimental Economics: Induced Value
Theory." American Economic Review Papers and Proceedings, 66(2),
1976, 274-9.
JAMES ALM, KIM M. BLOOMQUIST, MICHAEL MCKEE
* Portions of this research were funded by the US IRS
(TIRNO-07-P-00683). The views expressed here are those of the authors
and should not be interpreted as those of the U.S. Internal Revenue
Service. Previous versions of this paper have been presented at the
November 2010 National Tax Association Annual Conference in Chicago, IL,
at the June 2011 Internal Revenue Service--Tax Policy Center Research
Conference in Washington. D.C., and a seminar at Virginia Tech (April
2013). We are grateful to Charles Christian, John Deskins, Brian Erard,
Elaine Maag, Rosemary Marcus, Alan Plumley, Joel Slemrod, Nic Tideman,
three anonymous referees, and the Editor for helpful comments and
discussions. Aim: Department of Economics, Tulane University, 6823 St.
Charles Avenue, 208 Tilton Hall, New Orleans, LA 70118. Phone 504 862
8344, Fax 504 865 5869, E-mail jalm@tulane.edu
Bloomquist: National Headquarters, Office of Research, U.S.
Internal Revenue Service, 1111 Constitution Avenue NW, Washington, D.C.
20224. Phone 202 874 0171, Fax 202 874 0660, E-mail
kim.bloomquist@irs.gov
McKee: Department of Economics, Walker College of Business,
Appalachian State University, Boone, NC 28608. Phone 828 262 6080, Fax
828 262 6105, E-mail mckeemj@appstate.edu
(1.) See also the alternative perspectives of Learner (1983),
Heckman and Smith (1995), and Harrison and List (2004).
(2.) There is an extensive literature on the use of field trials in
economic policy. The 1960s was a period of wide use of field trials in a
variety of policy endeavors, including the provision of education
services and income support programs. For many reasons, especially their
costs and their potential for irreversible damages, the use of field
trials has largely been abandoned (Ferber and Hirsch 1982; Gramlich
1997). More recently, field experiments have been increasingly used to
test hypotheses derived from basic theory (Harrison and List 2004; List
2006). In general, the intent of these studies is less to establish
external validity of an experimental design but to provide a substitute
for the laboratory by introducing social settings to the decision tasks.
(3.) For an especially provocative perspective on the difficulties
of achieving identification, written for a nontechnical audience, see
Scheiber (2007).
(4.) For comprehensive surveys of experimental methods, see Davis
and Holt (1993) and Kagel and Roth (1995).
(5.) The general critique of Levitt and List (2007) has itself been
the subject of energetic critiques. See Falk and Heckman (2009), Camerer
(2011), Armentier and Boly (2012), Kagel (forthcoming), Harrison, Lau,
and Rutstrom (forthcoming), and Frechette (forthcoming).
(6.) For some contrary evidence that reports some differences
between students and nonstudents, see Cappelen et al. (2010) and Bigoni,
Camera, and Casari (2013).
(7.) Note that tax compliance has been the subject of several
controlled field experiments. In a typical experiment, a treatment group
of individuals is randomly selected to receive a letter from the tax
authority suggesting that they will be under special scrutiny, while a
control group of individuals does not receive the letter. Comparison of
the treated group with the control group then gives a measure of the
effectiveness of increased enforcement. One of the first of these field
experiments was performed by Slemrod, Blumenthal, and Christian (2001);
more recent examples include Iyer, Reckers, and Sanders (2010), Kleven
et al. (2011), Fellner, Sausgruber, and Traxler (2011), Pomeranz (2013),
Castro and Scartascini (2013), and Bott et al. (2013).
(8.) For example, if the taxpayer reported $110 in Schedule C net
profits and the NRP examiner determined that the correct amount should
have been $100. then the calculated reporting compliance ratio of 1.1
was recoded to 1.0. Recoding these 29 cases ensures that all 1,101
observations of the reporting compliance ratio fall in the range between
0 and 1, inclusive.
(9.) The term "income per exam" refers to the income that
should have been reported based on the judgment of NRP examiners, and
reflects population weights, as appropriate.
(10.) Note that this Experimental Sample overlaps partially with
the experimental data in Aim et al. (2010, 2012), which we used later
and in which both students and nonstudents were the subjects. For our
analysis in this section, we used only student responses from Aim et al.
(2010, 2012). All of our other experimental studies included only
student participants.
(11.) We used the base case simulations since these observations
excluded behavioral influences potentially induced by the specific
treatments explored in the non-base case simulations, influences that we
believe are likely absent in actual taxpayer behavior. However, as we
report below, our basic findings hold using either the Full Sample or
the Selected Sample of experimental data.
(12.) An individual's compliance rate is computed for each
subject after each round in the Experimental Sample and for each
individual in the NRP-based Taxpayer Sample. We believe using the
unweighted mean for comparison is appropriate here because it is not
possible to construct weights for the Experimental Sample that would
equate this group with the NRP-stratified sample weights.
(13.) The virtually identical values of 0.314 for the base case
Experimental Sample and 0.313 for the TY 2001 Taxpayer Sample are
apparently coincidental, although we found that the point estimate for
the Experimental Sample falls within the 95% confidence interval using
five independent Taxpayer Sample observations. See the discussion in the
text.
(14.) A comparison using the full Experimental Sample gives similar
results. Using the data for the full Experimental Sample found in Table
2, the interpolated compliance rate was 0.331. This value also falls
within the 95% confidence interval calculated using the 5 years of NRP
data.
(15.) An additional comparison between behavior in laboratory
experiments and field data would compare the behavioral elasticities
estimated with laboratory data to the elasticities estimated with field
data. The field data that we have here do not allow us to make these
estimations and comparisons. However, Aim, Jackson, and McKee (1992)
compared behavioral responses to audit and tax rates estimated with
laboratory data to responses estimated with field data, and found very
similar elasticities.
(16.) These studies are hereafter referred to as Aim et al. (2010,
2012), respectively. Because no subject participated in more than one
session, we have a total of 347 subjects for our analysis here, as
discussed later.
(17.) The probability of audit if the individual does not file was
set at zero to reflect the fact in most countries that an individual who
does not file faces no effective chance of detection. The actual audit
probability for non-filers in the field may not strictly be zero.
However, there is substantial evidence that this non-filing audit rate
is effectively very close to zero. For example, in the United States,
the IRS conducts audits of non-filers based on tips, on "lifestyle
audits" in which visible expenditures are a flag for an audit, or
through passive income sources such as deposit interest. However, the
frequency of non-filing audit is very low even in the United States, and
in many countries, it is essentially zero (Erard and Ho 2001).
Accordingly, we elected to implement for simplicity a zero audit
probability in the laboratory setting. Note that this framework required
only that the probability of audit for non-filers be less than the
probability of audit for filers.
(18.) The main intent of Aim etal. (2010, 2012) was to investigate
policies to induce filing when non-filing is possible. Both policies
were found to be effective.
(19.) T1 and T4 were separate "baselines" for each
respective series of experiments, and did not present the same
environment.
(20.) The student portion of the subject pool covered a very broad
range of year in studies and major, and no single major exceeded 8% of
the pool. The staff pool was similarly diverse, covering all levels of
support staff and nonacademic professional staff.
(21.) In fact, we found some differences in the levels of
compliance between the two subject pools across treatments. However, the
changes in compliance in response to the treatment effects were quite
similar for both subject pools, and it is this result that we emphasize
in our discussion.
(22.) In Aim et al. (2010, 2012), a vector of demographic variables
(e.g., gender, subject age, subject own preparation of tax returns,
subject claimed as a dependent on parental tax returns) was included.
However, these variables are highly correlated with participant
membership in the subject pool, and so are not included separately in
the current analysis.
(23.) Note that Income and Wealth are exogenous variables, which
justifies their inclusion as explanatory variables. Income is earned
each period prior to the tax reporting decision, and performance on the
task (sorting nine numbers into the correct order) is uncorrelated with
the tax reporting decision. Similarly, Wealth is accumulated over time,
and, between the income earning task and the random nature of the
audits, this variable is not correlated with past decisions.
(24.) The purposes of Aim et al. (2010, 2012) were narrowly defined
to study possible policy actions of the tax agency in the areas of
information services and positive inducements, and so they did not fully
explore the effects of audit policies.
(25.) Note that we found similar results for filing behavior.
TABLE 1
Summary Statistics for Taxpayer Sample
Taxable Income
as Reported
Standard Sum
N Mean ($) Deviation ($) ($ millions)
Unweighted 1,101 5,461 12,081 6.0
Weighted 559,555 3,708 9,854 2,075.0
Taxable Income That
Should Have Been Reported
Mean Standard Sum
($) Deviation ($) ($ millions)
Unweighted 25,277 132,064 27.8
Weighted 16,054 78,165 8,983.3
Mean Reporting
Compliance Rate
Mean of Overall
Individuals Mean
Unweighted .313 .216
Weighted .242 .231
Note: The data in this table reflect only the "raw" NRP
audit adjustments, and do not account for any unreported
income that the auditors did not detect.
TABLE 2
Summary Statistics for Experimental Sample
Full Sample
Mean
Reporting
Audit Number of Number of Compliance
Probability Subjects Observations Rate
.00 16 240 .288
.05 180 2,700 .413
.10 356 5,580 .544
.30 298 4,710 .590
.40 222 3,330 .638
Total 1,072 16,560 .551
Selected Sample
Mean Overall Mean
Reporting Reporting
Audit Number of Number of Compliance Compliance
Probability Subjects Observations Rate Rate
.00 16 240 .288 .286
.05 48 720 .404 .368
.10 78 1,170 .475 .476
.30 32 480 .558 .536
.40 78 1,170 .672 .668
Total 252 3,780 .521 .517
TABLE 3
Experimental Treatments: Information Services
Experiments
Tax Information Services Provided?
Liability
Uncertain? No Yes
No T1 --
Yes T2 T3
TABLE 4
Experimental Treatments: Positive Inducements
via Social Programs Experiments
Positive Inducements Provided?
Yes,
Yes, via via
No Tax Credit Unemployment
Benefits
T4 T5 T6
TABLE 5
Experimental Parameters
Parameter Values
Income Mean = 50, High =100, Low =10,
Increment =10
Audit Probability .3 and .4; .0 if Not File is selected
Fine Rate 150%, fixed across all sessions
Tax Rate 35%, fixed across all sessions
Tax Deduction 20%, with uncertainty (when present)
via a uniform distribution
Tax Credit Credit = 30-.6*Income, with
uncertainty via a uniform
distribution
Unemployment .2 and .4-fixed for a session
Probability
Unemployment Benefits =.5, .6 times reported income
Benefit average in the past 2 periods
TABLE 6
Descriptive Statistics for Student and Nonstudent Subjects
Treatment/Students Metric Students Nonstudents
/Nonstudents
Information Age (years) 20.1 43.8
Services
T1: 40 Students/ Gender (% male) 55.0 18.6
18 Nonstudents
T2: 14 Students/ Dependent (% yes) 81.9 0
20 Nonstudents
T3: 18 Students/ Prepare Own 27.7 44.1
21 Nonstudents Tax (% yes)
Number of 72 59
Subjects
Positive Age (years) 20.2 43.9
Inducements via
Social Programs
T4: 50 Students/ Gender (% male) 51.7 21.2
30 Nonstudents
T5: 20 Students/ Dependent (% yes) 83.6 3.0
38 Nonstudents
T6: 46 Students/ Prepare Own 36.2 48.1
32 Nonstudents Tax (% yes)
Number of 116 100
Subjects
TABLE 7
Estimates for Reporting Compliance: Information
Services Experiments
Dependent Variable:
Tax Compliance Rate
Independent Variable Model 1 Model 2
Constant .9200 ** .9074 **
(.0754) (.076)
Period income -.0016 ** -.0016 **
(.0004) (.0004)
Cumulative wealth -.0003 ** -.0003 **
(.00003) (.00003)
Audit probability -.1653 -.1634
(.194) (.194)
Lag audit .1632 ** .1630 **
(.022) (.022)
Tax liability -.0491 ** -.0429 *
uncertainty (.023) (.020)
Tax agency .0636 ** .0622 **
information (.0255) (.024)
Student .0222
(.020)
Student X
Tax liability uncertainty
Student X
Tax agency information
Student X
Audit probability
Wald x2 228.06 ** 229.43 **
Panels 131 131
N 2489 2489
Dependent Variable:
Tax Compliance Rate
Independent Variable Model 3 Model 4
Constant .9337 ** .9162 **
(.078) (.105)
Period income -.0017 ** -.0017 **
(.0003) (.0003)
Cumulative wealth -.0003 ** -.0003 **
(.00003) (.00003)
Audit probability -.1105 -.1602
(.183) (.269)
Lag audit -.0105 -.0104
(.015) (.015)
Tax liability -.2254 ** -.2257 **
uncertainty (.042) (.042)
Tax agency .0752 ** .0752 **
information (.032) (.035)
Student -.0426 -.0117
(.037) (.135)
Student X .3071 ** .3078 **
Tax liability uncertainty (.057) (.057)
Student X .0780 .0778
Tax agency information (.068) (.068)
Student X -.0889
Audit probability (.363)
Wald x2 245.58 ** 247.40 **
Panels 131 131
N 2489 2489
Notes: Panel estimations with clustered (subject level)
standard errors. The dependent variable is the ratio of
reported taxes to true taxes of individual i in period t.
* and ** indicate significance at the 5% and 1% levels,
respectively.
TABLE 8
Estimates for Reporting Compliance: Positive
Inducements via Social Programs Experiments
Dependent Variable:
Tax Compliance Rate
Independent Variable Model 1 Model 2
Constant .5429 ** .5028 **
(.064) (.081)
Period income -.0007 * -.0007 *
(.0004) (.0004)
Cumulative wealth -.0001 ** -.0001 **
(.00003) (.00003)
Audit probability .0807 .0906
(.132) (.132)
Lag audit .0051 .0049
(.015) (.015)
Tax credit .1131 ** .1376 **
(.047) (.055)
Unemployment benefit .2468 ** .2541 **
(.088) (.089)
Student .0475
(.054)
Student X
Tax credit
Student X
unemployment benefit
Student X
Audit probability
Wald [chi square] 803.52 ** 845.66 **
Panels 216 216
N 4104 4104
Dependent Variable:
Tax Compliance Rate
Independent Variable Model 3 Model 4
Constant .5537 ** .4842 **
(.083) (.069)
Period income -.0007 * -.0007 *
(.0003) (.0003)
Cumulative wealth -.0001 ** -.0001 **
(.00003) (.00003)
Audit probability .1072 .1224
(.131) (.156)
Lag audit .0034 .0050
(.015) (.0154)
Tax credit .1259 ** .1392 **
(.053) (.057)
Unemployment benefit .2364 ** .2382 *
(.112) (.118)
Student .0103 -.1053
(.0864) (.122)
Student X .1310 .1313
Tax credit (.098) (.087)
Student X .1182 .1236
unemployment benefit (.103) (.102)
Student X .3250
Audit probability (.253)
Wald [chi square] 894.84 ** 904.90 **
Panels 216 216
N 4104 4104
Note: Panel estimations with clustered (subject level)
standard errors. The dependent variable is the ratio of
reported taxes to true taxes of individual i in period t.
* and ** indicate significance at the 5% and 1 % levels,
respectively.