Norms of punishment: experiments with students and the general population.
Bortolotti, Stefania ; Casari, Marco ; Pancotto, Francesca 等
I. INTRODUCTION
The issue of external validity of laboratory experiments has
received increasing attention in the last decades. While the vast
majority of experiments are conducted with a fitting sample of college
students, it remains an open question whether the behavior observed in
such studies is informative about society at large. Here we focus on
experiments on social dilemmas to study other-regarding preferences and
civic norms of cooperation. We compare cooperation levels in two
distinct subject pools originating from the same geographical area. One
sample was drawn from the student population of a large, public
university (Student treatment). The other sample was drawn from the
general adult population (Representative treatment) and stratified
according to gender, age, and employment status. Everyone participated
in public good games with and without opportunities for peer punishment.
There are many contributions in the literature that compare the
behavior of students with other pools of participants with the explicit
aim to test the external validity of laboratory experiments. (1) The
existing evidence appears to suggest that students are less prosocial
than other subject pools. For instance, in a prisoner's dilemma,
students cooperate less than white-collar workers (Bigoni, Casari, and
Camera 2012) or bicycle messengers (Burks, Carpenter, and Goette 2009).
Similarly, students are less prosocial than rural and urban citizens in
a public good game (Gachter and Herrmann 2011), than rural villagers in
the appropriation of common-pool resources (Cardenas 2005), and than
employees in a dictator game (Carpenter, Burks, and Verhoogen 2005;
Dragone, Galeotti, and Orsini 2013). The gap remains when one compares
students and professionals, that is self-selected subjects with a high
degree of expertise who ordinarily deal with situations resembling the
experimental task. The evidence includes studies on: voluntary
contributions to a public good among elected officials (Butler and
Kousser 2013) and shrimp fishermen (Carpenter and Seki 2011); threshold
public goods without refunding among nurses (Bram Cadsby and Maynes
1998); trust games among CEO principals (Fehr and List 2004). (2)
More in general, the most appropriate pool of participants should
depend both on the task and the goal of the study. For instance,
contractors may be a better sample than college students for the
external validity of auction experiments (Dyer, Kagel, and Levin 1989)
and villagers may be a better sample than city dwellers for experiments
about the management of a renewable natural resource (Cardenas 2005).
Contractors and villagers are more appropriate than students in this
case because they are more familiar with the experimental task and their
behavior is more relevant because they are those who actually make the
decisions in the field. Both aspects boost the external validity of the
experimental results. (3)
The goal of the study is also relevant. When studying issues of
bounded rationality, for instance, one may prefer participants with very
high or very low cognitive skills, depending on the initial conjecture
to be tested. High-score participants may be preferred when collecting
evidence about the presence of a bound to rationality. Showing that game
theorists choose numbers in a guessing game away from the Nash
equilibrium prediction provides more compelling evidence about the
descriptive inaccuracy of the theory than using a representative sample
of the general population (Camerer 2003). Conversely, participants with
low cognitive skills who succeed at a task suggest that such task is not
too demanding. In short, to enhance the external validity of
experimental results, one should recruit participants with a bias
against the initial conjecture.
As we are interested in norms of cooperation and punishment of the
society broadly defined, the most appropriate pool of participants would
be a representative sample of the population at large. One reason is
that civic norms of cooperation are likely to be an emergent property of
a society. Let's consider, for instance, a society made up of young
and old citizens. Cooperation and punishment behaviors can develop in
different ways because of two driving forces: first, young and old
citizens may follow different group-specific norms; second, the same
individuals may behave differently when facing only people from the same
age group or when interacting in a mixed group. For instance, youngsters
may follow one norm when interacting with peers, but they may behave
differently when they interact with elderly people. When deciding
whether to punish or not, a young person may have no hesitations if the
target is another young person (i.e., in-group), but she may refrain
from punishing an old person (i.e., out-group). The propagation of
group-specific norms in a society can depend both on the relative size
of each group and on the interaction of in- and out-group norms. Hence,
the civic norms of a society cannot be reduced to the sum of the
behavior of specific subsamples.
Economic experiments conducted with a representative sample of the
population are rare. Recruiting such samples is indeed a hard task
because of logistic and technical issues. In addition, payments must be
higher to compensate participants' opportunity costs. Table 1
summarizes experiments comparing students and a sample of the general
population. (4)
We contribute to the current literature by ensuring high
methodological standards and comparability across participant pools.
Following Harrison, Lau, and Williams (2002), Bellemare and Kroger
(2007), Falk, Meier, and Zehnder (2012), and Cappelen et al. (2010), we
specified ex-ante stratification variables and quotas. To the best of
our knowledge, this study, along with Cappelen et al. (2010), is the
only experiment conducted in a laboratory to compare a student sample
with a stratified sample. We conducted the experiment by following the
same procedures for both students and the general population. Finally,
in order to increase comparability across subsamples, our study,
together with Falk, Meier, and Zehnder (2012), restricts participation
of the student and representative samples to those subjects resident in
a given region.
Two more technical issues emerge when running experiments outside
the group of college students. First, there can be logistical challenges
when running multiple rounds. Unlike most of the previous studies that
focused on one-shot experiments, we collected repeated measures of
cooperation--with and without punishment--to investigate whether
differences in contribution norms evolve over time or remain stable.
One-shot experiments may capture the initial other-regarding disposition
but not the reaction to others' choices. The second issue is the
subjects' level of understanding of the rules of interaction.
Uneducated participants may struggle to grasp a situation described in a
formal and abstract manner. Instructions that suit well a college
audience may be obscure to ordinary people. Thus, the misunderstanding
of instructions may be then responsible for the behavioral differences
across subject pools.
We report three main findings. First, without punishment, in line
with previous evidence, we found that the general population cooperated
more than college students. Second, these results do not survive the
introduction of peer punishment, the introduction of the opportunity to
punish increased cooperation among college students but not in the
general population. Third, this result did not stem from lack of
punishment as the general population sample punished more than the
student sample. Punishment did not promote cooperation among the general
population because it was frequently directed toward cooperators rather
than free-riders. Previous studies have shown that there exist wide
variations in punishment norms across societies: peer punishment
opportunities enable some societies to overcome collective action
problems, whereas lead other societies into feuds and revenge that harm
cooperation (Henrich et al. 2010; Herrmann, Thoni, and Gachter 2008;
Ostrom, Walker, and Gardner 1992). Here we show experimentally that,
even within the same culture, punishment has a beneficial or a
detrimental effect on cooperation depending on the subsample of the
population involved. The remainder of the article is organized as
follows. Section II describes the characteristics of the subject pools,
the experimental tasks and procedures; Section III presents the main
results on cooperation and punishment; finally Section IV discusses
results and concludes.
II. PARTICIPANTS AND DESIGN
The experiment comprises two treatments--Representative and
Student--that vary only according to the composition of the participant
pool. All participants, regardless of the treatment, were born within
the Emilia-Romagna region (Italy). This information was common knowledge
and could help subjects to form more accurate expectations about norms
and the others' behavior. This present restriction was explicitly
stated during the recruitment process and publicly announced by the
experimenter at the beginning of each session.
The Representative sample was recruited among the general adult
population by two professional companies, both unaware of the goal of
the research. The companies contacted people by phone--both through
telephone directories and private databases and a local recruiter.
Recruiters were provided with a script to approach potential
participants. (5) To be eligible, subjects had to: (a) be at least 18
years old; (b) be born within the province of Ravenna (6); and (c) be
resident within the province of Ravenna. The sample was stratified
according to age (18-39, 40-59, 60 or older), sex, and employment status
(employed, homemakers or retired, others--including students and
unemployed). The target quotas for each category were defined according
to the composition of the Italian population. (7) To favor wider
participation, the subjects received a 30 Euros fuel voucher as show-up
fee in addition to the earnings gained through the sessions.
The Student sample was recruited among the students of the
University of Bologna. The University of Bologna has around 90,000
students with campuses in four of the eight provinces of the region
Emilia-Romagna. Only students that were born in Emilia-Romagna were
invited and could take part in the study. (8) Invitations were sent to
subjects present in the ORSEE (Greiner 2004) database of the Bologna
Laboratory for Social Sciences (BLESS) at the time of the experiment.
(9) This sample comprises a standard participant pool of college
students, which is roughly balanced between Humanities, Science, and
Economic and Business majors. (10)
Table 2 reports the sociodemographic characteristics of the two
samples. While gender composition is similar, age and employment
compositions differ widely. In the Student sample the vast majority of
participants is aged between 18 and 39, whereas in the Representative
sample most participants belong to the 40-59 category (44.7%) and the
remaining subjects are equally distributed across 18-39 and 60 or above.
About half of the participants in the Representative sample are employed
and about 13% are students. The overwhelming majority of participants
self-reported in the questionnaire to be at least second-generation
natives of the region. Participants share deep-rooted geographical
origins, which may suggest shared social norms: as a matter of fact,
about 87 (84)% of the participants in the Representative sample (Student
sample) have one or both parents born in the region. (11)
Each session included a series of repeated Public Goods Games (PGG)
with and without punishment (within-subjects design). (12) Tasks were
presented in a fixed order in all sessions: each subject first played 8
periods of a PGG-Standard and then 8 periods of a PGG-Punishment. We
followed this order to help the general population to better understand
the punishment mechanism, which could have been more difficult to grasp
had it been presented first. (13) Before each period, participants were
divided into groups of N = 4 under a strangers-matching protocol.
Interaction was anonymous and there was no possibility to build an
individual reputation: a subject could not verify whether the same
participant was in his/her group in the following periods.
In the PGG-Standard, each subject received an endowment of
[w.sub.i] = 20 tokens and had to decide simultaneously how to allocate
those tokens between a group account (x) and a private account
([w.sub.i] - x). Each group comprised N = 4 members and contributions to
the group account could only take four levels, [x.sub.i] = {0, 6, 14,
20}. Individual earnings were determined as follows:
[[pi].sup.1.sub.i] = w - [x.sub.i] + a [N.summation over (j=1)]
[x.sub.j]
where the marginal per capita return (MPCR) of the public good was
a = 0.5. At the end of each period, a subject could observe individual
contributions and earnings for each group member. Earnings cumulated
from one period to the next.
The PGG-Punishment was identical to the PGG-Standard but for the
addition of a second stage in which subjects had the opportunity to
reduce, at a cost, the earnings of the other group members. After
receiving feedbacks on individual contributions, every subject could
assign [p.sub.i] = {0, 1, 2} deduction points to each group member; a
deduction point had a cost of 1 token for the punisher and reduced the
earnings of the targeted subject by b~4 tokens. Punishment decisions
were simultaneous and earnings were computed as follows:
[[pi].sub.i] = [[pi].sup.1.sub.i] - b [N.summation over (j[not
equal to]i)] [p.sup.i.sub.j] - [N.summation over (j[not equal to]i)]
[p.sup.j.sub.i].
At the end of each period, a subject could observe the deduction
points he/she received and his/her final earnings. The punisher's
identity was not revealed.
In a one-shot interaction, it is a dominant strategy for rational
self-interested subjects to contribute zero in both PGG-Standard and
PGG-Punishment, because the marginal per capita return of the public
good is below 1 and above 1/N, and to assign zero deduction points in
PGG-Punishment. Group surplus is instead maximized when everyone
contributes their whole endowment and never punishes.
The study comprised eight experimental sessions, equally divided
across treatments for a total of 212 subjects. Participants in a session
ranged between 20 and 32 and the laboratory hardware and set-up were
identical across subject pools and locations. The same experimenter read
the instructions in all sessions. Representative sessions were held in
Faenza in a large hotel conference room in the city center, where we
deployed the mobile BLESS. Student sessions took place in Bologna at the
permanent BLESS laboratory. (14)
In an effort to make the task more intuitive, we largely relied on
graphical elements. (15) To facilitate elderly people unfamiliar with
computers, all choices could be made by simply touching the screen (see
sample screens in Appendix SI, Supporting Information) and there was
indeed no need to type or use a mouse. At the end of the session,
subjects filled in a questionnaire. The average Student (Representative)
session lasted about 90 (120) minutes. Subjects were paid in private at
the end of the session. The experiment paid 1 Euro for every 40 tokens
earned. There was no show-up fee in the Student sessions and a 30 Euros
fuel voucher in the Representative sessions, under the assumption of a
lower opportunity cost for students than for the general adult
population. Average per-capita earnings were 19.50 Euros in the Student
sessions and 17 Euros (plus the show-up fee) in the Representative
sessions.
III. RESULTS
We report five main results; we first consider aggregate behavior
(Results 1, 2, and 3) and then present the evolution of contributions
and punishment norms over time (Results 4 and 5).
In the PGG-Standard, how do observed contribution levels in the
student population compare to the ones observed in the representative
population?
RESULT 1. The representative sample cooperates more in the standard
Public Goods Game than the student sample.
The average cooperation level over the eight periods was 9.1 in the
Representative and 6.8 in the Student treatment. Support for Result 1 is
provided by Figure 1 and an ordered logit regression, where the
dependent variable is the contribution level of a subject in a period
(Table 4, Model l). (16) The main explanatory (dummy) variable
Representative sample has a positive and highly significant coefficient,
hence suggesting that the general public cooperates more than college
students. To account for subjects' understanding, we also included
the dummy Low understanding that takes into account the number of
mistakes in the control questions and the time used to answer correctly
to all questions. The dummy takes value 1 for subjects in the last
decile of the distribution according to either the number of mistakes or
the total answering time. Our results are robust to alternative ways to
model understanding: in Model 2 we included a dummy that takes value 1
for subjects who made 4 or more mistakes in the control questions and 0
otherwise. While subjects who made more mistakes contribute
significantly more in the PGG-Standard, the difference between student
and representative sample remains large and significant. (17)
When following a very conservative approach and considering each
session as an independent observation, the difference in contributions
across subject pools in PGG-Standard is not statistically significant
(Mann-Whitney rank-sum, p = .149, [N.sub.R] = [N.sub.S] = 4, two-sided).
[FIGURE 1 OMITTED]
[FIGURE 2 OMITTED]
In PGG-Punishment, how do contribution levels observed in a student
population compare to contributions in the general population?
RESULT 2. With punishment, the representative sample cooperates
less than the student sample. The opportunity of peer punishment
enhances cooperation levels in the student sample but not in the
representative sample.
The introduction of peer punishment reverses the treatment order:
the general population contributes less as compared to students. Average
cooperation in the PGG-Punishment was 8.9 in the Representative and 12.8
in the Student treatment. Support for Result 2 comes from Table 4 and
Figure 2.
The difference across subject pools in the PGG-Punishment is highly
significant according to an ordered logit regression on individual
contributions (Table 4, Model 4). The negative coefficient of the
explanatory variable Representative sample lends support to the evidence
that students are more cooperative than the general public in the
PGG-Punishment. The difference is also statistically significant
according to a Mann-Whitney rank-sum test (p=.021, [N.sub.R] = [N.sub.S]
= 4, two-sided). Moreover, the opportunity of peer punishment enhances
cooperation levels in the student sample but not in the representative
sample (Mann-Whitney signed-rank test, p = .068, [N.sub.S-PGG-std] =
[N.sub.S-PGG-punish] = 4, p = .465, [N.sub.R-PGG-std] =
[N.sub.R-PGG-punish] = 4, two-sided). (18) To illustrate this outcome,
we plotted individual average contributions in the two variants of the
PGG by subject (Figure 2). About 82% of students contribute on average
more with than without punishment opportunities (vs. 32% in the
representative sample). The upward shift in students' contributions
is present for free-riders and contributors alike.
We are going to consider individual decisions over time in order to
grasp a better understanding of the underpinning dynamics of
cooperation. As a matter of fact, our experiment offers repeated
measures of cooperation; this allows us to analyze the initial
contribution levels as well as the dynamics of contribution and
punishment over time (Figure 1).
RESULT 3. Cooperation in the initial period is indistinguishable
between representative and student samples both with and without
opportunities to punish.
Table 3 and Figure 1 provide support for Result 3. In the
PGG-Standard, individual contributions in the first period are not
significantly different across subject pools (Mann-Whitney rank-sum, p =
.614, [N.sub.R] = 108, [N.sub.S] = 104, two-sided). The same conclusion
holds for the PGG-Punishment (p = .169, [N.sub.R] = 108, [N.sub.S] =
104, two-sided). (19) We also regressed contributions in the first
period over the dummy Representative sample (see Table A1 in Appendix)
and it turns out that differences across treatments are not
statistically significant for both PGG-Standard (Model 1) and
PGG-Punishment (Model 2).
As shown in Figure 1, differences across treatments emerged over
time. While in the first period, the two pools are indistinguishable, in
the last period of the PGG-Standard, the representative sample shows a
cooperation level more than twice as large as the student sample (7.9
and 3.3, respectively). In particular, cooperation among students
unravels rather quickly, whereas the general population manages to
sustain a more stable contribution level. Support for this finding is
provided in Table 4 (Model 3). The negative coefficient for Period
reasserts the presence of a declining trend in the PGG-Standard, whereas
the positive coefficient in the interaction term indicates that the
decline in the Representative treatment is less pronounced than in the
Student treatment. The dynamics in the PGG-Punishment were exactly the
opposite (see Model 6); contributions tend to increase over time and the
upward trend is more marked in the Student than in the Representative
treatment.
What drives these different trends in cooperation across games and
subject pools? To answer this question, in the last part of this section
we will focus on individual decisions to contribute and punish. We first
consider whether the reaction to others' contributions--that is,
conditional cooperation--is the same across treatments. Are the
adjustment dynamics the same in our two participant pools?
RESULT 4. In the representative sample, current contributions
depend less on observed past contributions than in the student sample.
We consider an indirect measure of conditional cooperation
(Fischbacher, Gachter, and Fehr 2001; Kocher et al. 2008) and test how
current contributions adjust to previous contributions made by others.
(20) Here we mostly focus on the PGG-Standard that in our view provides
a cleaner test of conditional cooperation. Indeed in the PGG-Punishment
previous contributions are likely to be connected with punishment and
not just with cooperative behavior. (21)
Table 5 (Models 1 to 3) lends support to Result 4 for the
PGG-Standard. In all specifications, the dependent variable is the
contribution level at time t for each subject. In the first two models
we consider each sample separately and the regressor of interest is the
sum of other group members' contributions in period t - 1
(Others' contributions in t - 1). (22) In PGG-Standard,
Others' contributions in t - 1 has a positive and highly
significant impact on the student sample but is not significant in the
representative sample (Models 1 and 2, respectively in Table 5). This
result is confirmed also in the pooled sample (Model 3). (23)
Models 4 to 6 in Table 5 replicate the same analysis for the
PGG-Punishment. Both pools tend to adjust to observed contributions.
However, the difference in conditional cooperation between the two
samples is less pronounced in the PGG-Punishment than in the
PGG-Standard: the coefficient of interaction Others' contributions
in t - 1 x Representative sample is indeed negative, although not
significant (see Model 6).
We now take into account the analysis of punishment behavior. The
differential impact of punishment on the two subject pools may be the
result of different amounts of punishment or different types of
punishment. We say punishment is prosocial when the target of the
punishment is a free-rider; conversely, we say punishment is antisocial
when the target is a high contributor. Does the representative pool
punish less than the student pool? Or does the representative pool
punish differently from the student pool?
[FIGURE 3 OMITTED]
RESULT 5. The representative punishes no less than the student
sample but engages more in anti-social punishment.
Support for Result 5 is presented in Figure 3 and Tables 6 and 7.
The extent of punishment is similar across treatments and, if anything,
it is higher in the representative than in the student sample (7.2 vs.
5.9). (24) Hence, the absence of a positive effect of punishment on
cooperation levels in the representative sample must stem from reasons
other than lack of punishment. The data suggest an explanation based on
differences in the target of the punishment as well as in the response
to the received punishment.
Punishment on free-riders is heavier in the Student than in the
Representative treatment (15.8 vs. 12.1 average points of punishment),
whereas the opposite is true for punishment on full cooperators (1.9 vs.
3.0). These differences in punishment are statistically significant
according to a logit regression (Table 6). Moreover, there is no element
that points to lack of understanding as a driver of punishment (Table
6); if anything, subjects with a lower level of understanding tend to
engage in more prosocial and less antisocial punishment as compared to
subjects that did best in the control questions. Figure 3 illustrates
this pattern. The steeper line indicates more favorable incentives for
cooperation.
Another way to measure punishment preferences is the level of
prosocial versus antisocial punishment. In line with other studies,
prosocial punishment is more frequent than antisocial punishment but the
ratio is very different in the representative and in the student sample
(2.5:1 vs. 4.8:1, respectively). Notice that this treatment difference
is present from the first period of interaction, which suggests that
revenge is not enough to account for antisocial punishment.
Those who deviate from the average group contribution are punished
significantly more, and punishment is more severe for less-than-average
contributions (i.e., a negative deviation) as compared to
more-than-average contributions (see Models 1 and 2 in Table 7). Sign
and magnitude of these coefficients are consistent with similar studies
using a strangers-matching protocol (see Fehr and Gachter 2000). (25)
When pooling all samples, the evidence suggests again that the
representative sample sanctions relatively fewer free-riders and
relatively more contributors than the student sample (Models 3 and 4 in
Table 7). This pattern could have discouraged cooperation and might
explain the weak impact of punishment within the representative sample.
In the representative sample there is significantly less punishment of
defectors than in the student sample (Negative deviation (abs) x
Representative).
Besides shifting the target of punishment, the representative
sample also responds weakly to punishment received. The evidence comes
from logit regressions on the variations over time in contributions
levels of free-riders and full cooperators (Table 8). More specifically,
the dependent variable takes value 1 if the contribution level in period
t is different from t - 1, 0 otherwise. Free-riders who receive
punishment do not subsequently increase their cooperation level; and
full cooperators who receive punishment do not decrease their
cooperation level (Models 1 and 4). These results stand in sharp
contrast to the behavior of the student sample, which strongly reacts to
punishment (Models 2 and 5). The treatment differences are significant
(see Deduction received in t - 1 x Representative in Models 3 and 6).
A comparison between the behavior of the student sample versus the
young subjects in the representative sample could be of interest. If the
behavior in the two groups is similar then the added value of a
representative sample would mostly originate from the variety in
sociodemographic characteristics. If the behavior differs then it
becomes empirically relevant also how the same subject adapts his
behavior depending on who the others are. A first exploratory analysis
points toward the former interpretation. Given the limited number of
young people within the representative sample, further studies are in
order before making firm claims.
IV. DISCUSSION AND CONCLUSIONS
This study compares the cooperative behavior of two samples sharing
similar geographical and cultural origins but differing along important
sociodemographic dimensions: college students and a representative
subsample of the general adult population. We find that results from
experiments on norms of cooperation and punishment among students cannot
be readily generalized to society at large.
In a social dilemma, we replicate the common finding that students
in a simple collective action task are on average less cooperative than
the general population (Result 1, see for instance, Bellemare and Kroger
2007; Bellemare, Kroger, and Van Soest 2008; Cappelen et al. 2010;
Belot, Duch, and Miller 2010). Previous studies show that, when facing
social dilemmas, some societies benefit from the availability of
opportunities for peer punishment while others do not, and punishment
opportunities magnify the existing differences across societies in their
ability to cooperate (Herrmann, Thoni, and Gachter 2008). Here we show
that, even within the same society, the impact of peer punishment in
promoting cooperation can vary widely depending on the subsample of the
population considered. Our results document that punishment can reverse
the ordering of subgroups in a society in terms of cooperativeness even
when both participant pools are from the same geographical area. In a
public goods game, punishment opportunities had a positive effect on
cooperation in the student subsample, whereas little or no effect was
detected in the general population. As a consequence, without peer
punishment, students contributed less than the general population; with
peer punishment students were more cooperative than the general
population (Result 2).
We found two main factors driving this differential effect of peer
punishment. One factor lies in distinct preferences for punishment.
There were differences in the way punishment was used by the two
participant pools: for instance, punishment levels were higher in the
Representative than Student treatment. More importantly, in the general
population a remarkable amount of punishment was directed toward
cooperators (i.e., antisocial punishment) and this happened with a
higher frequency than in the student pool, starting from period one.
Hence, punishment did not promote cooperation among the general
population because it was frequently directed toward cooperators rather
than free-riders. Another factor is the unresponsiveness to punishment
by the general population subsample. While students, both high and low
contributors, showed significant reactions to the punishment received in
the previous period, those reactions were not significant in the general
population. As a consequence, contributions in the student subsample
increase with repetition while they remain flat in the general
population.
More generally, a main behavioral difference between the subsamples
is the low reactivity of the general population to the feedback within
the experiment. We report no difference between the students and the
general population subsamples in their first period average contribution
to the public good game, either with punishment or without punishment.
The differences emerge with repeated interactions. In particular, in the
baseline public good game we document less conditionally cooperative
behavior among the general population than among students (Result 4). In
the public good game with punishment, as already mentioned, we observe a
smaller reaction to past punishment within the general population than
within students. One implication of this evidence is to exert caution
when generalizing results of experiments consisting of one-shot social
dilemmas because some differences emerge only over time.
There could be a variety of reasons for the low reactivity of the
general population to experimental feedback. One reason could be the
poor understanding of the rules of the experimental set-up. When
venturing beyond college students, participants may lack a clear
comprehension of the situation at hand. In this study, we put extra
effort in the experimental design, software and instructions to
facilitate understanding. Moreover, our econometric analysis supports
our main results also after checking for understanding. Another possible
explanation is that some participants may update their beliefs more
slowly. Two motivations come to mind. A rational motivation could be
past exposure to many similar experiences. A behavioral motivation is
related to receiving feedback from someone inside or outside one's
own reference group. For instance, an elderly person may give low weight
to the feedback of a young person, because it is deemed irrelevant.
Both motivations would suggest a slower updating in an experiment
among the general population than among a homogeneous student
population. There can be other reasons, such as higher cognitive costs
of adjustment. In conclusion, these results should not be taken as a
sweeping indictment against laboratory experiments with student
populations. On the contrary, they are part of an ongoing effort to
identify those research questions that can be usefully addressed using
students and those that instead are best dealt with other types of
participants. While students are well suited for studying a number of
issues (i.e., theory testing, learning, rationality, etc.), the use of a
representative sample of the general population is, in our view, the
most appropriate choice when investigating the emergence and the
maintenance of civic norms of cooperation and punishment, which is often
the result of the interaction between different social strata. For
instance, if we were to classify the Italian society according to the
impact of peer punishment in promoting cooperation, one would draw
opposite conclusions depending on whether the experiment was run with
college students or with the general population.
ABBREVIATIONS
GMM: Generalized Method of Moments
MPCR: Marginal Per Capita Return
OLS: Ordinary Least Squares
PGG: Public Goods Game
doi: 10.1111/ecin.12187
APPENDIX
A. FIRST PERIOD AND DYNAMIC PANEL ESTIMATION
Result 3 suggests that contributions in the first period are
indistinguishable across subject pools. In Table A1, we regress
individual contributions over the dummy Representative sample and
provide support to Result 3.
Result 4 suggests that differences emerge over time, as students
condition their behavior on previous experience more than the general
population does. Apart from conditional cooperation there can be two
additional factors that influence cooperation: (a) individual
(unconditional) preferences for cooperation; and (2) other unobserved
individual characteristics. All these motivations are captured by the
following equation:
(A1) [x.sub.i,t] = [v.sub.i] + [alpha][x.sub.i,t-1] + [beta]
[3.summation over (j=1)] [x.sub.j,t-1] + [u.sub.i,t]
where [x.sub.i,t] indicates the contribution to the public good of
subject i at time [[summation].sub.j][x.sub.j,t-1] indicates the sum of
the contributions of the other group members in the previous period and
is meant to capture conditional cooperation. Please recall that groups
were formed according to a strangers-matching protocol at the beginning
of each period. The variable [x.sub.i,t-1] is the contribution of
subject i in period t - 1 and measures the persistence of subjects'
choice; we interpret this variable as a proxy for individual preferences
toward cooperation. Finally, [v.sub.i] is an individual time-invariant
component capturing intrinsic characteristics of each subject that
cannot be observed.
To account for the endogeneity problem arising from the
introduction in the model of the variable [x.sub.i,t-1], we implement a
two-step GMM system estimator.
Following the general model of Equation (Al) we estimate a dynamic
panel of the form:
(A2) [x.sub.i,t] = [[alpha].sub.1][s.sub.i,t-1] + [[beta].sub.1]
[summation over j] [x.sub.j,t-1] + [u.sub.i,t]
(A3) [u.sub.i,t] = [v.sub.i] + [e.sub.i,t]
where [v.sub.i] are unobserved individual effects, [e.sub.i,t] are
the observation specific errors, which have zero mean (E[[e.sub.i,t]] =
E[[v.sub.i][e.sub.i,t]] = 0), constant variance and are uncorrelated
across time and individuals E[[e.sub.i,t]] x E[[e.sub.i,s]] = 0 for each
i, j, t, s and i [not equal to] j. We consider the variable
[[summation].sub.j][x.sub.j,t-1] as exogenous thanks to the strangers
matching protocol implemented in our setting and to the fact that the
choice of individual i is excluded from the calculation of the aggregate
group contribution in the previous period.
Endogeneity is an issue in this context because of the small number
of time periods available for the estimation, or what is defined in the
literature as small sample bias (Nickell 1981). This could also be the
reason for the scarce use of this methodology in the experimental
literature. (26) We implement two-step GMM system estimators, (27) which
are robust under heteroskedasticity with the Windmeijer (2005)
finite-sample correction to avoid downward bias
As we are interested to introduce in our model a time invariant
regressor, the Representative Sample dummy, difference GMM estimators
are not appropriate (as the time invariant regressor would be canceled
out in the procedure). Hence we use a system GMM estimator that
maintains both the original and the differenced equation and uses both
levels and differenced variables as instruments. For simplicity, the
discrete dependent variable is approximated to continuous as in Hislop
(1994). (28)
For each estimated Model we control the p values of the Sargan and
Hansen test of overidentifying restrictions and the Arellano and Bond
(1991) second order autocorrelation test in first differences. The
former test indicates a correctly specified set of instruments, whereas
the Arellano Bond test evaluates the presence of residual order
autocorrelation of the differenced error, which in this context is a
signal of endogeneity between the lagged endogenous variable and the
differenced fixed effect, condemning the related variable to be an
invalid instrument. In Tables A2 and A3, we list the instruments
implemented for GMM system estimation for the Standard and Punishment
treatments, respectively. In Models 5 and 6 of the Punishment treatment,
a correct specification was not achieved using the same instruments:
this suggests that the two subject pools are substantially different and
that it is necessary to explore other variables in order to find an
explanation to individual behavior in the punishment treatment.
Table A4 reports estimates for Standard (Models 1 to 3) and
Punishment (Models 4 to 6) variants of the Public Goods Game. Model 1,
considers data only from the Representative treatment in the
PGG-Standard; the large and significant coefficient of the variable Own
contribution in t - 1 suggests that individual contributions in the
general population are very persistent. On the contrary, the coefficient
of the variable Other's contribution in t - 1 is very small and not
significant at any conventional level. Taken together, these two
variables confirm that in the representative sample subjects tend to
stick to their choices and are less influenced by others' behavior.
When considering the Student treatment only (Model 2), we find that
coefficient on Other's contribution in t - 1 is larger and highly
significant, and this supports the idea that conditional cooperation
plays a key role among students even after controlling for their own
contributions in t - 1.
When moving from the Standard to the Punishment variant, we find
that none of the explanatory variables can account for observed
cooperation in the Representative treatment (Model 4). In this case,
behavior is thus explained by a variable not included in this present
model; received punishment appears to be a likely candidate (see
discussion in the Section III). For the student sample, the only
marginally significant variable is Other's contribution in t - 1,
hence yielding further evidence in favor of the idea that students are
more conditional cooperators than the general public.
B. RECRUITMENT
Recruitment Procedure for the Representative Sample
Participants to the Representative treatment were recruited from
the general population of the province of Ravenna, which is part of
Emilia-Romagna region, located in the North of Italy. Eligible
candidates for the study had to: (a) be at least 18 years; (b) be born
in the county; (c) be resident in the county; (d) have a good knowledge
of spoken and written Italian. The experimenters, before the
experimental sessions were carried out, double checked
participants' ID cards so to guarantee that all subjects met the
requirements (age and place of birth). At the beginning of each session,
the experimenter made public that all subjects in the room were born and
resident in the same province (or at least in the region) with the
explicit aim to make this information common knowledge.
We wanted a representative sample of the Italian population with
respect to age, sex, and employment status, as these characteristics
might be relevant for the investigation of cooperation norms in a
society. The sample was stratified according to three categories of age
(18-39; 40-59; 60 and older), two of sex (male and female), and three
for employment status (employed; housewives and retired; others,
including students and unemployed). For the composition of the target
sample we referred to the 2009 statistics for the Italian population.
(29)
We hired two professional companies--Metis-Ricerche and
Demoskopea--to recruit subjects that comply with the aforementioned
requirements. We provided these companies with a message and a script to
approach potential participants. Details about the study and the goal of
the experiment were not disclosed to subjects during recruitment;
recruiters had no prior knowledge of the purpose or the content of this
study. We asked them to recruit people resident both in the town and
outside the town where the experiment has been carried out; in both
cases only subjects resident in the province of Ravenna could be
involved. In addition to the aforementioned requirements, special
categories of people were ex-ante barred from participation, such as:
employees of the research sector; people that participated to market
researches in the preceding 3 months; recruiters' family members;
employees of marketing companies and of the press sector in general.
Moreover, no more than two people per session needed to be acquainted
with each other.
One company (Metis-Ricerche) recruited subjects for the first three
sessions. Potential subjects were identified with the use of telephone
book entries and approached by telephone calls. All phases of the
recruitment process were performed from the company's headquarters,
and, in case of acceptance, the company provided the participant with a
confirmation letter. This letter contained the same information that was
provided on the phone by the company operators and that we had
previously agreed upon with the company. The former company
Metis-Ricerche decided not to renew their contract for recruitment of
people in other locations, as the recruitment procedures turned out to
be more expensive than what they expected. The latter company
(Demoskopea) was in charge of the recruitment of subjects for the last
session. Local representatives of Demoskopea contacted directly subjects
in each province. The local recruiters proceeded with the choice/random
extraction of names from telephone books and with random contacts
obtained through personal interactions as instructed by the
headquarters. In the following part of our study, we report the message
used by both companies to recruit subjects for our study.
Message for recruiters with instructions
We would like to invite you to participate to a meeting organized
by the Universities of Bologna and Oxford. We are looking for people
born in the city and within the province of Faenza. The aim of this
study is strictly scientific. There are no commercial purposes and the
identity of all participants will be always kept anonymous. Our interest
is to understand how Italians take decisions in situations dealing with
money.
During the meeting you will be given several different situations
and you will be kindly asked to take decisions. Taking decisions is an
easy task. No particular skills are required.
We offer you a payment of 30 euros in petrol tokens, plus a sum in
cash that, according to your choices and those of other participants,
will amount up to 25 euros. You will be paid at the end of the meeting,
which we expect to last no more than 2 hours and a half.
If you wish to verify the accuracy of these information, please
contact Name of the secretary in charge, from University of Bologna, or
visit the website http://www.unibo.it/ Portale/Ricerca. If you accept
our offer, you may show up location at time, which is description of how
to get to the location.
By participating to this meeting, you make a contribution to one of
the few scientific research projects supported by the European
Commission in Italy.
F.A.Q. (In case somebody asks, recruiters are allowed to provide
the following extra information)
* How do we make our choices?
--Choices are made very easily, touching the screen with a finger.
It is just like an ATM or a cell phone with touch screen.
* In a nutshell, how does this activity work?
--You will be given several different situations and you will be
kindly asked to choose among alternatives. There is no right answer, we
just want to know your opinion.
Recruitment Procedure for the Student Sample
Subjects that belonged to the student sample were recruited
according to the standard procedure implemented in a regular laboratory
experiment. Announcements were sent to potential participants in the
ORSEE database: this database is the one commonly used in the Bologna
Laboratory for Experiments in Social Sciences (BLESS). We slightly
changed the standard announcement to include the requirement that
subjects to this study should be born in Emilia-Romagna. Subjects were
asked to reach the laboratory on the agreed day for the carrying out of
the session with a valid ID card, which was checked by experimenters to
verify the birth and residence requirements. In the following part of
our study, we will provide the announcements sent via the ORSEE platform
to recruit the participants for the students sessions.
Message for the student sessions
--Please, ignore this message if you were not born in Emilia
Romagna
Hello (first name last name)
You are kindly invited to participate to a research in our
Laboratory of Experimental Economics.
Only people that were born in Emilia Romagna can take part to this
study.
Please, do not sign in if you were born in another region. People
born outside of Emilia Romagna, even if they sign in, will not be
granted the chance to participate to the study.
Sessions will take place in the following dates and time slots:
(session list)
To choose the session, please click on the following link:
(link)
(If you cannot click on the link, please select it, copy it by
clicking on the right button of the mouse and paste it in the address
line by clicking on right button once again.)
TABLE A1
Table A1: Treatment Effect on Contributions in
Period 1: PGG-Standard and Punishment
Dependent Variable:
Contribution in Period 1
PGG- PGG-
Standard Punishment
Model 1 Model 2
Representative 0.118 -0.343
sample (0.298) (0.345)
Low understanding -0.338 -0.291
(0.389) (0.360)
No. of 212 212
observations
Log likelihood -284.328 -288.298
Note: Ordered logit regression on individual contribution
levels in period 1, standard errors robust for clustering at the
session level (in parentheses).
TABLE A2
Instruments for System GMM Estimation of PGG-Standard
Instruments for first differences equation
Type or instrument Standard GMM-type
Variable used Others Own
contribution contribution
Specification First difference, Level, lags from
lag 1 3 to max
Instruments for levels equation
Type of instrument Standard GMM-type
Variable used Others' Own contribution
contribution
Specification Level, lag 1 First difference,
lag 2
TABLE A3
Instruments for System GMM Estimation of PGG with
Punishment
Instruments for first differences equation
Type of instrument Standard GMM-type
Variable used Others' Own
contribution contribution
Specification First difference, Level, lags
lag 2 from 2 to max
Instruments for levels equation
Type of Standard GMM-type
instrument
Variable used Others' Own
contribution contribution
Specification Level, lag 2 First difference,
lag 1
TABLE A4
Dynamic Panel Estimation
PGG-Standard
Dependent Variable: Representative Students All Sample
Contributions Model 1 Model 2 Model 3
Own contribution 0.711 ** 0.533 ** 0.460 ***
in t - 1 (0.28) (0.14) (0.16)
Others' 0.023 0.086 *** 0.051 **
contribution (0.03) (0.02) (0.02)
in t - 1
Representative 3.134
sample (3.11)
No. of 756 728 1484
observations
PGG-Punishment
Dependent Variable: Representative Students All Sample
Contributions Model 4 Model 5 Model 6
Own contribution 0.059 0.143 0.109 **
in t - 1 (0.07) (0.10) (0.05)
Others' 0.006 0.113 * 0.051
contribution (0.08) (0.06) (0.06)
in t - 1
Representative -6.008 ***
sample (1.92)
No. of 648 624 1272
observations
Notes: Blundell and Bond (1998) panel estimation, p Value
for Sargan and Hansen test (null hypothesis that the
overidentifying restrictions are valid) and Arellano Bond
test for second- and third-order autocorrelation in first
differences: (0.686,0.518,0.077) for Model 1;(0.000, 0.194,
0.022) for Model 2; (0.000, 0.153, 0.016) for Model 3;
(0.046, 0.178, 0.083) for Model 4; (0.002, 0.047, 0.403) for
Model 5; (0.260, 0.845,0.373) for Model 6.
***, **, and * indicate significance at the 1%, 5%, and 10%
level, respectively.
REFERENCES
Alevy, J. E., M. S. Haigh, and J. A. List. "Information
Cascades: Evidence from a Field Experiment with Financial Market
Professionals." Journal of Finance, 62(1), 2007, 151-80.
Alpert, B. "Non-Businessmen as Surrogates for Businessmen in
Behavioral Experiments." Journal of Business, 40(2), 1967, 203-7.
Arellano, M., and S. R. Bond. "Some Tests of Specification for
Panel Data: Monte Carlo Evidence and an Application to Employment
Equations." Review of Economic Studies, 58(2), 1991, 277-97.
Arellano, M., and O. Bover. "Another Look at the Instrumental
Variable Estimation of Error-Components Models." Journal of
Econometrics, 68(1), 1995, 29-51.
Bellemare, C., and S. Kroger. "On Representative Social
Capital." European Economic Review, 51(1), 2007, 183-202.
Bellemare, C., S. Kroger, and A. Van Soest. "Measuring
Inequity Aversion in a Heterogeneous Population Using Experimental
Decisions and Subjective Probabilities." Econometrica, 76(4), 2008,
815-39.
--. "Preferences, Intentions, and Expectation Violations: A
Large-Scale Experiment with a Representative Subject Pool." Journal
of Economic Behavior of Organization M3), 2011.349-65.
Belot, M., R. Duch, and L. Miller. "Who Should Be Called to
the Lab? A Comprehensive Comparison of Students and Non-Students in
Classic Experimental Games." Discussion Papers 2010001, University
of Oxford, Nuffield College, 2010.
Bigoni, M., M. Casari, and G. Camera. "Strategies of
Cooperation and Punishment among Students and Clerical Workers."
Journal of Economic Behavior and Organization, 94, 2012, 172-82.
Bigoni, M., S. Bortolotti, M. Casari, D. Gambetta, and F. Pancotto.
"Cooperation Hidden Fronteirs: The Behavioral Foundations of the
Italian North-South Divide." Technical Report, Department of
Economics, University of Bologna WP 882, 2013.
Block, M. K., and V. E. Gerety. "Some Experimental Evidence on
Differences between Student and Prisoner Reactions to Monetary Penalties
and Risk." Journal of Legal Studies, 24(1), 1995, 123-38.
Blundell, R., and S. Bond. "Initial Conditions and Moment
Restrictions in Dynamic Panel Data Models." Journal of
Econometrics, 87(1), 1998, 115-43.
Bosch-Domenech, A., J. Montalvo, R. Nagel, and A. Satorra.
"One, Two, (Three), Infinity, ...: Newspaper and Lab Beauty-Contest
Experiments." American Economic Review, 92(5), 2002, 1687-701.
Bram Cadsby, C., and E. Maynes. "Choosing between a Socially
Efficient and Free-Riding Equilibrium: Nurses versus Economics and
Business Students." Journal of Economic Behavior &
Organization, 37(2), 1998, 183-92.
Branas-Garza, P., M. Bucheli, and T. Garcia-Munoz. "Dynamic
Panel Data: A Useful Technique in Experiments." Technical Report
10/22, Department of Economic Theory and Economic History of the
University of Granada, 2011.
Burks, S., J. Carpenter, and L. Goette. "Performance Pay and
Worker Cooperation: Evidence from an Artefactual Field Experiment."
Journal of Economic Behavior & Organization, 70(3), 2009, 458-69.
Butler, D. M., and T. Kousser. "How Do Public Goods Providers
Play Public Goods Games?" 2013.
Camerer, C. F. Behavioral Game Theory. New York: Russell Sage
Foundation, 2003.
Cappelen, A. W., K. Nygaard, E. O. Sprensen, and B. Tungodden.
"Efficiency, Equality and Reciprocity in Social Preferences: A
Comparison of Students and a Representative Population." Discussion
Paper Series in Economics 28/2010, Department of Economics, Norwegian
School of Economics, 2010.
Cardenas, J. C. "Groups, Commons and Regulations: Experiments
with Villagers and Students in Colombia," in Psychology,
Rationality and Economic Behaviour, edited by B. Agarwal and A.
Vercelli. London: Palgrave, 2005, 242.
Carpenter, J., and E. Seki. "Do Social Preferences Increase
Productivity? Field Experimental Evidence from Fishermen in Toyama
Bay." Economic Inquiry, 49(2), 2011, 612-30.
Carpenter, J. P, S. Burks, and E. Verhoogen. Comparing Students to
Workers: The Effects of Social Framing on Behavior in Distribution
Games. Bradford, UK: Emerald Group Publishing Limited, 2005, 261-89.
Carpenter, J., C. Connolly, and C. Myers. "Altruistic Behavior
in a Representative Dictator Experiment." Experimental Economics,
11, 2008, 282-98.
Cooper, D. J. "Are Experienced Managers Experts at Overcoming
Coordination Failure?" The B.E. Journal of Economic Analysis &
Policy, 6(2), 2006, 1450.
Cooper, D. J., J. K. W. Lo, and Q. L. Gu. "Gaming against
Managers in Incentive Systems: Experimental Results with Chinese
Students and Chinese Managers." American Economic Review, 89(4),
1999, 781-804.
Croson. R., and K. Donohue. "Behavioral Causes of the Bullwhip
Effect and the Observed Value of Inventory Information." Management
Science, 52(3), 2006, 323-36.
Dejong, D. V., R. Forsythe, and W. C. Uecker. "A Note on the
Use of Businessmen as Subjects in Sealed Offer Markets." Journal of
Economic Behavior & Organization, 9(1), 1988, 87-100.
Dohmen, T., A. Falk, D. Huffman, and U. Sunde. "Representative
Trust and Reciprocity: Prevalence and Determinants." Economic
Inquiry, 46(1), 2008, 84-90.
Dragone, D., F. Galeotti, and R. Orsini. "Temporary Workers
Are Not Free-Riders: An Experimental Investigation." Technical
Report, University of Bologna, DSE-WP 915, 2013.
Dyer, D., J. H. Kagel, and D. Levin. "A Comparison of Naive
and Experienced Bidders in Common Value Offer Auctions: A Laboratory
Analysis." The Economic Journal, 99(394), 1989, 108-15.
Egas, M., and A. Riedl. "The Economics of Altruistic
Punishment and the Maintenance of Cooperation." Proceedings of the
Royal Society B, 275, 2008, 871-78.
Ermisch, J., D. Gambetta, H. Laurie, T. Siedler, and S. C. Noah
Uhrig. "Measuring People's Trust." Journal of the Royal
Statistical Society: Series A (Statistics in Society), 172(4), 2009,
749-69.'
Exadaktylos, F., A. Espin, and P. Branas-Garza. "Experimental
Subjects Are Not Different." Nature Scientific Reports, 3(1213),
2013, 1-6.
Falk, A., S. Meier, and C. Zehnder. "Do Lab Experiments
Misrepresent Social Preferences?" Journal of the European Economic
Association, 11, 2012, 839-52.
Fehr, E., and S. Gachter. "Cooperation and Punishment in
Public Goods Experiments." American Economic Review, 90(4), 2000,
980-94.
Fehr, E., and J. A. List. "The Hidden Costs and Returns of
Incentives--Trust and Trustworthiness among CEOs." Journal of the
European Economic Association, 2(5), 2004, 743-71.
Fischbacher, U. "z-Tree: Zurich Toolbox for Ready-Made
Economic Experiments." Experimental Economics, 10(2), 2007, 171-78.
Fischbacher, U., S. Gachter, and E. Fehr. "Are People
Conditionally Cooperative? Evidence from a Public Goods
Experiment." Economics Letters, 71(3), 2001, 397-404.
Frechette, G. R. "Laboratory Experiments: Professionals versus
Students," in The Methods of Modern Experimental Economics, edited
by G. Frechette and A. Schotter. Oxford: Oxford University Press, 2009.
Gachter, S., and B. Herrmann. "The Limits of Self-Governance
When Cooperators Get Punished: Experimental Evidence from Urban and
Rural Russia." European Economic Review, 55(2), 2011, 193-210.
Gachter, S., B. Hermann, and C. Thoni. "Norms of Cooperation
among Urban and Rural Dwellers. Experimental Evidence from Russia."
Mimeo, University of St. Gallen, Switzerland, 2003.
--. "Trust, Voluntary Cooperation, and Socio-Economic
Background: Survey and Experimental Evidence." Journal of Economic
Behavior and Organization 55, 2004, 505-31.
Glaser, M., T. Langer, and M. Weber. Overconfidence of
Professionals and Lay Men: Individual Differences within and between
Tasks? Mannheim, Germany: University of Mannheim, 2005.
Greiner, B. "The Online Recruitment System ORSEE 2.0--A Guide
for the Organization of Experiments in Economics." Working Paper
Series in Economics 10, University of Cologne, Department of Economics,
2004.
Harbaugh, W. T., K. Krause, and T. R. Berry. "Garp for Kids:
On the Development of Rational Choice Behavior." American Economic
Review, 91(5), 2001, 1539-45.
Harbaugh, W. T., K. Krause, and L. Vesterlund. "Risk Attitudes
of Children and Adults: Choices over Small and Large Probability Gains
and Losses." Experimental Economics, 5(1), 2002, 53-84.
Harrison, G. W., and J. A. List. "Field Experiments."
Journal of Economic Literature, 42(4), 2004, 1009-55.
Harrison, G. W., M. I. Lau, and M. B. Williams. "Estimating
Individual Discount Rates in Denmark: A Field Experiment." American
Economic Review, 92(5), 2002, 1606-17.
Henrich, J., J. Ensminger, R. McElreath, A. Barr, C. Barrett, A.
Bolyanatz, J. C. Cardenas, M. Gurven, E. Gwako, N. Henrich, C.
Lesorogol, F. Marlowe, D. Tracer, and J. Ziker. "Markets, Religion,
Community Size, and the Evolution of Fairness and Punishment."
Science, 327(5972), 2010, 1480-4.
Herrmann, B., C. Thoni, and S. Gachter. "Antisocial Punishment
across Societies." Science, 319(5868), 2008, 1362-7.
Hislop, D. R. "State Dependence, Serial Correlation and
Heterogeneity in Intertemporal Labor Force Participation of Married
Woman." Econometrica, 6(67), 1994, 1255-94.
Kocher, M. G., T. Cherry, S. Kroll, R. J. Netzer, and M. Sutter.
"Conditional Cooperation on Three Continents." Economics
Letters, 101(3), 2008, 175-8.
Murnighan, J. K., and M. S. Saxon. "Ultimatum Bargaining by
Children and Adults." Journal of Economic Psychology, 19(4), 1998,
415-45.
Nickell, S. "Biases in Dynamic Models with Fixed
Effects." Econometrica, 49, 1981, 1417-26.
Ostrom, E., J. Walker, and R. Gardner. "Covenants with and
without a Sword: Self-Governance Is Possible." American Political
Science Review, 86, 1992, 404-17.
Potters, J., and F. Van Winden. "Professionals and Students in
a Lobbying Experiment: Professional Rules of Conduct and Subject
Surrogacy." Journal of Economic Behavior & Organization, 43(4),
2000, 499-522.
Sade, O., C. Schnitzlein, and J. F. Zender. "Competition and
Cooperation in Divisible Good Auctions: An Experimental
Examination." Review of Financial Studies, 19(1), 2006, 195-235.
Stewart, M. "Maximum Simulated Likelihood Estimation of
Random-Effects Dynamic Probit Models with Autocor-related Errors."
Stata Journal, 6(2), 2006, 256-72.
Windmeijer, F. "A Finite Sample Correction for the Variance of
Linear Efficient Two-Step GMM Estimators." Journal of Econometrics,
126(1), 2005, 25-51.
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online
version of this article:
Appendix S1. Instructions.
(1.) The categories studied include business people and managers
(Alpert 1967; Cooper 2006; Cooper, Lo, and Gu 1999; Croson and Donohue
2006; Dejong, Forsythe, and Uecker 1988; Fehr and List 2004); prisoners
(Block and Gerety 1995); lay people (Glaser, Langer, and Weber 2005);
children (Harbaugh, Krause, and Berry 2001; Harbaugh, Krause, Vesterlund
2002; Murnighan and Saxon 1998); finance industry professionals (Alevy,
Haigh, and List 2007; Sade, Schnitzlein, and Zender 2006); and public
affair officials (Potters and Van Winden 2000).
(2.) For a general review of experiments beyond social dilemmas
that compare students with subject pools of professionals see Frechette
(2009).
(3.) The choice is task specific, as contractors would not be the
most appropriate sample for studying the management of a renewable
natural resource.
(4.) Papers comparing students and nonstudents with the aim of
controlling for self-selection in the participation in experiments are
beyond the scope of this paper (for a review, see Exadaktylos, Espin,
and Branas-Garza 2013).
(5.) For a detailed description of the recruitment process, see the
Appendix.
(6.) Ravenna is one of the eight provinces of Emilia-Romagna.
(7.) These data were collected as part of a wider research project
to investigate social norms across various locations in Italy (Bigoni et
al. 2013), where Ravenna was selected as one of the provinces of
interest. For sample stratification, we referred to the figures of the
National Institute of Statistics concerning inhabitants in January 1,
2009 (source Istat: http://demo.istat.it/pop2009/index1.html).
(8.) Because of the limited number of students that were born
within the province of Ravenna and present in the ORSEE database, we
decided to include among the potential participants subjects that were
born in all the provinces of Emilia-Romagna: all of them shared similar
socio-economic characteristics. As pointed out by Harrison and List
(2004), there are at least two factors that may restrict the
generalizability of laboratory results obtained with students: (a) there
is an endogenous sample selection among students participating in
experiments; (2) students are not informative about the general
population. As we are mainly interested in (2), we do not take any
additional precaution to limit endogenous sample selection among
students. In the same spirit, we did not exclude from the database the
small proportion of nonstudents that used to take part in experiments,
we however retain the term "Student" for brevity.
(9.) As we are mainly interested in assessing to what extent
results obtained with a standard participant pool can be extended to the
general public, we opted for a group of participants as similar as
possible to the one commonly involved in standard lab experiments. To
this end, we sampled participants from the ORSEE database rather than
from the general college population of the University of Bologna.
(10.) The database includes a small fraction of nonstudents, most
of them were former students living in the area. We had 18 nonstudent
participants (17%). Thanks to a questionnaire we know that 14 are 32
years old or younger and that 9 hold a college degree. About 1/3 of them
hold a college degree and are looking for their first job.
(11.) The figures for the representative sample refer to the
province of Ravenna.
(12.) Each session included a total of five parts presented in a
fixed order: (1) choice over lotteries; (2) PGG-Standard; (3)
PGG-Punishment; (4) PGG-Standard; (5) PGG-Threshold. Subjects received a
feedback on part 1 only at the end of the session. For the comparison of
norms of cooperation across subject pools, we focus only on parts 2 and
3. Instructions for all five parts are in the Appendix.
(13.) We did not control for order effect. Previous studies with a
similar set-up found no significant evidence of order effect (see
Herrmann, Thoni, and Gachter 2008, 5, SOM).
(14.) Upon arrival, subjects were seated at a visually separated
desk; no form of communication was allowed during the experiment. A
paper copy of the relevant instructions was handed out before each part
and read loud by the experimenter. Before PGG-Standard and
PGG-Punishment, subjects had to answer a computerized quiz to ensure
their understanding. Everyone had to answer all questions correctly
before proceeding. The experiment was programmed and conducted with the
software z-Tree (Fischbacher 2007).
(15.) In programming our interfaces, we took inspiration from the
first wave of experiments conducted at the Internet Laboratory for
Experimental Economics, iLEE (for further details see:
http://www.econ.ku.dk/cee/ilee/ description/ilee1/).
(16.) We opted for ordered probit regressions to take into account
that the dependent variable was not continuous but could take on only
four values. Models were estimated using the Gllamm package
(http://www.gllamm.org/). We also run OLS specifications and Tobit
models to account for censoring at 0 and 20. Our results are robust to
the use of these different estimation procedures. Results of these
additional estimations are available upon request from the authors.
(17.) In addition, we control for three alternative ways of
modeling low understanding: (1) subjects in the last quartile of the
distribution according to either the number of mistakes or the total
answering time; (2) subjects without a college degree or higher; (3)
subjects who contributed 6 or 14 in the PGG-Threshold. Results are
qualitatively similar under all specifications and are available upon
request from the authors.
(18.) Per-period profit decreases from PGG-Standard to
PGG-Punishment for both subject pools. In the Representative treatment
the earnings drop was more pronounced; subjects earned, on average,
about 9.1 tokens less in each period. The loss was of only 1.5 tokens
among students.
(19.) For first period data, we consider each subject an
independent unit of observation.
(20.) Conditional cooperation is commonly defined as the
willingness to contribute to the common pool based on the expectation
that others will contribute as well. We consider an indirect measure and
assume that a subject's belief about future group members'
contributions depends on their past contributions. Our
strangers-matching protocol weakens this relation compared to a
partner-matching protocol. Alternatively, one could have used the
strategy method to directly elicit conditional cooperation.
(21.) If high cooperators are more likely to punish than free
riders, there should be a correlation between the punishment received by
a subject and others' contributions in the previous period.
(22.) We also control for time trend and low understanding as in
Table 4.
(23.) As a robustness check, we run the same regressions using a
generalized method of moments (GMM) system methodology to check for
potential endogeneity of the variable Others' contributions in t -
1. Results are consistent with the present estimates and are reported in
the Appendix.
(24.) A Wilcoxon rank-sum test does not reveal any statistically
significant difference when taking each session as an independent
observation (p = .149, [N.sub.R] = [N.sub.S] = 4).
(25.) Models 1 and 2 report results for Representative and Student
treatments, respectively. The variable Negative deviations (abs) has a
positive coefficient and is highly significant in both treatments (see
Models 1 and 2) hence giving support to the idea that the more the
contribution falls short of others' contributions the more severe
the punishment. Quite surprisingly, also the coefficient of the variable
Positive deviations is positive and significant. That implies that
punishment increases as the gap between others' contributions and
socially minded subjects' contributions widens. The negative and
highly significant coefficient for Others' contributions implies
that a deviation from others' contributions is punished more
severely if the sum of the contributions is small.
(26.) Notable exception is Branas-Garza, Bucheli, and Garcia-Munoz
(2011) that compares static and dynamic panel estimation in an
experimental setting.
(27.) The problem of endogeneity in small samples has been
originally tackled by Arellano and Bond (1991) seminal paper but other
contributions have extended the applicability of the methodology in
various directions in the following years. See Arellano and Bover
(1995); Blundell and Bond (1998).
(28.) We leave for future research the possibility to implement a
dynamic discrete choice panel with endogenous regressors as in Stewart
(2006). for example.
(29.) We referred to the number of inhabitants registered on
January 1, 2009. Age range: 18-39 years, 34.8%; 40-59 years, 34.6%; 60
and more, 31.6%. Sex: male, 48%; female, 52%. Employment status:
employed, 42%; housewives and retired, 37%; others, 21%. Source:
http://demo.istat.it/ pop2009/index1.html
STEFANIA BORTOLOTTI, MARCO CASARI and FRANCESCA PANCOTTO *
* This paper is part of a larger research project that includes
also Maria Bigoni and Diego Gambetta, who contributed to the design of
the experiment with the general population. We thank Maria Bigoni also
for her active help in running the sessions and programming the
software. The authors thank Michele Belot, Peter Martinsson,
participants in the seminar held at the University of Modena and Reggio
Emilia, the IMEBE 2013 meeting in Madrid, the 2013 Firenze Experimental
Economics workshop, and the 2014 AEW meeting in Rome for their helpful
comments and suggestions on previous versions of this paper. We
gratefully acknowledge the financial support of the ERC Starting Grant
Strangers 241196. The usual disclaimer applies.
Bortolotti: Research Fellow, Department of Economics, University of
Bologna, Bologna 40126, Italy. Phone +39 051 209 8135, Fax +39 051 209
8143, E-mail stefania.bortolotti@unibo.it
Casari: Professor, Department of Economics, University of Bologna,
40126 Bologna, Italy. Phone +39 051 209 8662, Fax +39051 2098493, E-mail
marco.casari@unibo.it
Pancotto: Associate Professor, Department of Communication and
Economics, University of Modena and Reggio Emilia, Reggio Emilia 40121,
Italy. Phone +39 0522 523 264, Fax +39 0522 523 205, E-mail
francesca.pancotto@unimore.it
TABLE 1
Studies Comparing College Students with
the General Population
Same
Procedures
Two distinct Stratified for Both
samples Sample Samples Lotteries Dictator
This study Y Y(lab) 1
Cappelen et al. Y Y(lab) 1
(2010)
Bellemare and Y N
Kroger (2007)
Falk. Meier, Y Y(mail)
and Zehnder
(2012)
Carpenter, N * N 1
Connolly, and
Myers (2008)
Gachter, N * N
Hermann, and
Thoni (2003)
Gachter, N * N
Hermann, and
Thoni(2004)
Belot, Duch, N Y(lab) 1 1
and Miller
(2010)
Bosch-Domenech N N
et al. (2002)
Only one sample
Harrison, Lau, Y Interview 1
and Williams
(2002)
Ermisch et al. Y Interview
(2009)
Exadaktylos, N * Interview 1
Espin, and
Branas-Garza
(2013)
Bellemare, N * Internet 1
Kroger, and Van
Soest (2008)
Bellemare, N * Internet
Kroger, and Van
Soest (2011)
Dohmen et al. N * Interview
(2008)
Egas and Riedl N * Internet
(2008)
Two distinct Beauty
samples Ultimatum 4 PGG PGG w/pun Trust Game Contest
This study R R
Cappelen et al. R
(2010)
Bellemare and 1
Kroger (2007)
Falk. Meier, 1
and Zehnder
(2012)
Carpenter,
Connolly, and
Myers (2008)
Gachter, 1
Hermann, and
Thoni (2003)
Gachter, 1
Hermann, and
Thoni(2004)
Belot, Duch, R 1 1
and Miller
(2010)
Bosch-Domenech 1
et al. (2002)
Only one sample
Harrison, Lau,
and Williams
(2002)
Ermisch et al. 1
(2009)
Exadaktylos, 1 1
Espin, and
Branas-Garza
(2013)
Bellemare, 1
Kroger, and Van
Soest (2008)
Bellemare, 1
Kroger, and Van
Soest (2011)
Dohmen et al. 1
(2008)
Egas and Riedl R R
(2008)
Two distinct
samples Country
This study Italy
Cappelen et al. Norway
(2010)
Bellemare and Netherlands
Kroger (2007)
Falk. Meier, Switzerland
and Zehnder
(2012)
Carpenter, United States
Connolly, and
Myers (2008)
Gachter, Russia
Hermann, and
Thoni (2003)
Gachter, Russia
Hermann, and
Thoni(2004)
Belot, Duch, UK
and Miller
(2010)
Bosch-Domenech Germany/US/ Spain
et al. (2002)
Only one sample
Harrison, Lau, Denmark
and Williams
(2002)
Ermisch et al. Britain
(2009)
Exadaktylos, Spain
Espin, and
Branas-Garza
(2013)
Bellemare, Netherlands
Kroger, and Van
Soest (2008)
Bellemare, Netherlands
Kroger, and Van
Soest (2011)
Dohmen et al. Germany
(2008)
Egas and Riedl Netherlands
(2008)
Notes: We consider a sample to be stratified if it has been
selected according to prespecified categories and target
quotas. N * indicates a representative sample that has not
been selected ex ante according to target quotas. In the
cells relative to each task, 1 indicates a one-shot game and
R a repeated game.
TABLE 2
Sociodemographic Characteristics of the Two
Samples
Representative Student
Sample Sample
Male 51.5% 55.8%
Age
18-39 24.3% 95.2%
40-59 44.7% 4.8%
60 or above 31.1% 0.0%
Employment status
Employed 47.6% 8.6%
Unemployed 10.7% 7.7%
Students 13.6% 82.7%
Housewife or retired 28.2% 1.0%
Education level
8th grade or lower 18.5% 1.0%
High school 47.5% 55.8%
College, Master, or PhD 34.0% 43.3%
Rootedness
Elementary school in the 86.4% 97.1%
region (county)
Mother bom in the region 69.9% 72.1%
(county)
Father bom in the region 63.1% 70.2%
(county)
Sessions
Dates (dd/mm/yyyy) 02/03/2011 23/02/2011
04/03/2011 24/03/2011
05/03/2011 24/03/2011
01/10/2011 16/06/2012
No. of participants 108 104
Notes: Self-reported answers from a post-experimental
computerized questionnaire. Owing to a software failure,
questionnaire answers for one Representative session
(02/03/2012) were collected via phone a few weeks after the
session. Five participants did not answer the phone; as a result,
for the representative sample, questionnaire data are available
for 103 of 108 subjects.
TABLE 3
Average Contributions to the Public Good
All Periods First Period
PGG Representative Student Representative Student
Standard 9.11 6.79 11.04 10.54
Punishment 8.88 12.77 9.07 10.52
Note: Average individual contributions to the public good,
divided by subject pool and stage game.
TABLE 4
Treatment Effect on Contributions
Dependent Variable: Contribution
PGG-Standard
Model 1 Model 2 Model 3
Representative 0.842 *** 0.676 *** -0.117
sample (0.252) (0.261) (0.327)
Low understanding -0.018 -0.019
(0.337) (0.362)
4 or more mistakes 0.576 **
(0.284)
Period -0.365 ***
(0.033)
Period x 0.237 ***
Representative (0.044)
No. of 1696 1696 1696
observations
Dependent Variable: Contribution
PGG-Punishment
Model 4 Model 5 Model 6
Representative -1.487 *** -1.631 *** -0.573
sample (0.309) (0.326) (0.367)
Low understanding -0.506 -0.527
(0.412) (0.421)
4 or more mistakes 0.500
(0.352)
Period 0.192 ***
(0.032)
Period x -0.211 ***
Representative (0.043)
No. of 1696 1696 1696
observations
Note: Ordered logit regression on individual contribution
levels, individual-level random effects.
*** and ** indicate significance at the 1%
and 5% level, respectively.
TABLE 5
Conditional Cooperation and Observed Contributions
PGG-Standard
Dependent Variable: Representative Students Pooled Sample
Contribution Model 1 Model 2 Model 3
Others' contributions 0.004 0.030 *** 0.035 ***
in r - 1 (0.006) (0.007) (0.006)
Period -0.099 *** -0.284 *** -0.173 ***
(0.037) (0.045) (0.028)
Low understanding 0.560 -0.520 0.072
(0.475) (0.590) (0.370)
Representative sample 1.760 ***
(0.356)
Others' -0.034' **
contribution in t - 1
X Representative (0.009)
No. of observations 756 728 1484
PGG-Punishment
Dependent Variable: Representative Students Pooled Sample
Contribution Model 4 Model 5 Model 6
Others' contributions 0.021 *** 0.042 *** 0.043 ***
in r - 1 (0.007) (0.007) (0.007)
Period -0.012 0.071 * 0.022
(0.036) (0.042) (0.027)
Low understanding -0.284 -0.796 -0.515
(0.564) (0.710) (0.446)
Representative sample -0.693
(0.458)
Others' -0.021 **
contribution in t - 1
X Representative (0.010)
No. of observations 756 728 1484
Note: Ordered logit regression on cooperation levels with
individual random effects and robust standard errors
(in parentheses).
***, **, and * indicate significance at the 1%,
5%, and 10% level, respectively.
TABLE 6
Received Punishment by Contribution Level
Dependent Variable: Deductions
Assigned (1 = Yes; 0 = No)
[x.sub.i] = 0 [x.sub.i] = 6
Model 1 Model 2
Representative -1.234 *** -0.364
sample (0.454) (0.354)
Low understanding -0.837 -0.322
(0.586) (0.473)
No. of observations 1074 1239
Dependent Variable: Deductions
Assigned (1 = Yes; 0 = No)
[x.sub.i] = 14 [x.sub.i] = 20
Model 3 Model 4
Representative -0.073 1.324 **
sample (0.393) (0.517)
Low understanding 0.263 1.825 ***
(0.522) (0.648)
No. of observations 1344 1431
Dependent Variable: Deductions
Assigned (1 = Yes; 0 = No)
[x.sub.i] = 0 [x.sub.i] = 6
Model 1 Model 2
Representative -1.089 ** -0.404
sample (0.473) (0.371)
Four or more -0.541 0.160
mistakes (0.494) (0.397)
No. of observations 1074 1239
Dependent Variable: Deductions
Assigned (1 = Yes; 0 = No)
[x.sub.i] = 14 [x.sub.i] = 20
Model 3 Model 4
Representative -0.234 0.991 *
sample (0.416) (0.545)
Four or more 0.541 1.179"
mistakes (0.452) (0.583)
No. of observations 1344 1431
Note'. Logit regression on assigned punishment, with
individual-level random effects.
***, **, and * indicate significance at the 1%, 5%, and
10% level, respectively.
TABLE 7
Treatment Effect on Punishment
Dependent
Variable: Pooled
Deduction Points Representative Students sample
Received Model 1 Model 2 Model 3
Other's -0.019 *** -0.033 *** -0.032 ***
contributions (0.006) (0.007) (0.007)
Positive deviation 0.104 *** 0.059 *** 0.062 ***
(0.016) (0.023) (0.022)
Negative 0.209 *** 0.356 *** 0.341 ***
deviation(abs) (0.019) (0.022) (0.018)
Period -0.033 -0.019 -0.027
(0.027) (0.031) (0.020)
Representative 0.469
Sample (0.365)
Others' contrib. x 0.012
Representative (0.009)
Pos. deviation x 0.041
Representative (0.027)
Neg. deviation x -0.125" *
Representative (0.024)
No. of observations 864 832 1696
Notes: Ordered logit regression on deduction points
received, individual-level random effects. Negative deviation
is the absolute value of the deviation of a subject's contribution
level with respect to the average contribution of the others
in her group, in the case that the contribution falls short of the
average, and 0 otherwise. Positive deviation takes values other
than 0 when a subject's contribution is larger than the average
contribution of the others.
*** indicates significance at the 1% level.
TABLE 8
Variation in Contribution Levels and Punishment:
High versus Low Contributors.
Dependent Variable:
Delta Contributions Contributes 0 in t - 1
1 if |Give.sub.t] -
[Give.sub.(t-1)] Representative Students All Samples
1 > 0 Model 1 Model 2 Model 3
Deduction received in -0.101 0.453 ** 0.428 **
t - 1 (0.141) (0.230) (0.213)
Period 0.048 0.038 0.039
(0.103) (0.165 (0.087)
Low understanding 0.491 0.085 0.324
(0.747) (1.072) (0.612)
Representative sample 2.302 **
(1.112)
Deduction received in -0.534 **
t - 1 x (0.258)
Representative
No. of observations 203 111 314
Dependent Variable:
Delta Contributions Contributes 20 in t - 1
1 if |Give.sub.t] -
[Give.sub.(t-1)] Representative Students All Samples
1 > 0 Model 4 Model 5 Model 6
Deduction received in 0.293 0.806 *** 0.953 ***
t- 1 (0.311) (0.204) (0.218)
Period 0.2 -0.009 0.064
(0.143) (0.092) (0.078)
Low understanding -0.261 1.060 * 0.658
(1.325) (0.623) (0.622)
Representative sample 2.65 ***
(0.559)
Deduction received in -0.747 **
t - 1 x (0.333)
Representative
No. of observations 136 276 412
Notes: Logit regression on variation in cooperation levels
with individual random effects and clusters at the session
level. The dependent variable takes value 1 if contributions
in t and t - 1 are not identical and 0 otherwise. Models 1 to
3 consider subjects who contributed 0 in t - 1. Models 4 to 6
consider subjects who contributed 20 in t - 1.
***, **, and * indicate significance at the 1%, 5%, and 10%
level, respectively.