A consistent method for calibrating contingent value survey data.
Mansfield, Carol
1. Introduction
Contingent value (CV) surveys are used to estimate the economic value
of nonmarket goods, especially environmental goods. A major concern with
CV surveys is the potential for what have loosely been called
hypothetical and strategic biases in the answers to CV questions. For a
variety of reasons, often individual specific, a respondent's
answer to a CV question may differ from his or her true value for the
good. To address the potential problem of inaccurate bids, the initial
version of the proposed rules for the Oil Pollution Act of 1990 called
for all CV values to be divided in half.(1) The provision was intended
as a challenge to practitioners to develop a method for calibrating the
data from CV surveys.
This paper outlines a statistical method for calibrating the data
from CV surveys derived from the assumption that individuals make
constrained utility maximizing decisions. The method allows us to
determine the influence of individual characteristics on bias, as
distinct from their influence on the preference parameters. To
illustrate the logic of this approach, a specific functional form for
individual preferences was used to derive closed-form analytical
expressions for an individual's willingness-to-pay (WTP) and
willingness-to-accept (WTA). These functions allow systematic deviations
in individual responses to be explicitly modeled by providing a
structural interpretation of the error term. The framework is
appropriate for both open-ended and dichotomous choice data.
The random utility model framework focuses attention on the error
term, specifically on the possibility that there is an
individual-specific, systematic component to the error term that is
related to bias in CV responses. Evidence from experimental economics
and psychology suggests that different respondents may react differently
to the same survey question or laboratory experiment. Some elements of
this reaction may be correlated with observable characteristics, such as
education or age, while other elements will appear random to the
researcher. Thus, it is important for calibration techniques to allow
for the influence of individual characteristics on the existence and the
size of any bias in CV responses.
We illustrate this approach with three CV data sets using data from
both open-ended and dichotomous choice responses. The particular
applications were selected because comparable sets of laboratory or
simulated market data exist for each of the three CV data sets. This
allows a comparison between the results from the proposed calibration
model and the laboratory or simulated market data. Ideally, the
calibrated CV results should be compared to values from actual market
transactions rather than the results from laboratory or simulated market
experiments, which may also be biased. The calibration model proposed in
this paper can be applied to simulated market data as easily as CV data,
and for two of my data sets, I am actually able to estimate whether the
simulated market data suffers from biased responses as well.
The approach derived here does not require additional data beyond the
CV survey itself to implement; thus, it can be used to calibrate data
measuring use or nonuse values. Other calibration techniques for CV data
require actual market data from weakly complementary goods or the
identification of a surrogate market good for the nonmarket good valued
in the CV survey.(2) But in many cases, especially where nonuse values
are important, it may be impossible, if not contradictory, to define the
appropriate set of weakly complementary market goods. Instead, we
interpret the task of developing a calibration model for CV responses as
a logical problem that considers whether there are sufficient model
restrictions and sample information to identify the preference
parameters and distinguish them from sources of bias in CV responses.
While the choice of functional form for utility is an important
maintained assumption conditioning the results derived from this
approach to calibration, similar issues have been routinely addressed in
modeling consumer demand. A great deal of work in demand analysis has
focused on developing statistical models that can be used to test demand
theory. In much of this work, researchers have been forced to make
assumptions about the functional form for either the direct or the
indirect utility functions. Even when the functional forms are
restrictive, the resulting estimates can be informative and provide a
foundation for future research (see Deaton [1986] for a review).
Furthermore, failure to account for response bias when estimating bid
functions for CV data will yield parameter estimates that are a
composite of preference and response bias effects. Because response
biases may be positive or negative, the parameter estimates will be
difficult to interpret.
This paper is organized as follows. Section 2 develops a model of CV
bids and derives structural equations for WTP and WTA. The calibration
model is applied to the three data sets in section 3, and the results
are discussed in section 4. Section 5 contains ideas for further
research.
2. A Model for Calibration
A typical CV survey describes an environmental good and then proposes
a change for better or worse in some feature of that good. The
respondents are asked to decide how much they will pay for the
improvement or how much compensation they require if the change is for
the worse. Within this framework, assume that each respondent receives
utility from two goods: the environmental good (E) that has two levels,
[E.sub.high] and [E.sub.low], and a Hicksian composite good, represented
by income (Y). In this framework, an individual's true maximum
willingness-to-pay [wtp.sub.i], and minimum willingness-to-accept
[wta.sub.i] satisfy the equalities
U([Y.sub.i] - [wtp.sub.i], [E.sub.high]; [X.sub.i]) = U([Y.sub.i],
[E.sub.low]; [X.sub.i]) (1)
U([Y.sub.i] + [wta.sub.i], [E.sub.low]; [X.sub.i]) = U ([Y.sub.i],
[E.sub.high]; [X.sub.i]), (2)
where [X.sub.i] is a vector of individual characteristics and
attitudes.
For a given utility function with parameter vector [Beta], these
equations can be solved explicitly for [wtp.sub.i] or [wta.sub.i] as
[wtp.sub.i] = f([Y.sub.i], [E.sub.high], [E.sub.low],
[Beta]([X.sub.i]); [X.sub.i]) (3)
[wta.sub.i] = g([Y.sub.i], [E.sub.high], [E.sub.low],
[Beta]([X.sub.i]); [X.sub.i]) (4)
Most CV studies rely on bid functions assumed to be linear in
observed characteristics.(3) By selecting a specific functional form for
utility, explicit closed-form solutions for WTP and WTA can be derived.
These structural equations will allow me to decompose the
individual's bid into a preference-based component that is my
estimate of WTP and WTA and a bias term that identifies systematic
deviations from the assumptions of the model.
I chose to estimate the equations based on a random utility model
(RUM) where WTP (or WTA) is treated as a random variable. The random
utility approach to analyzing CV data was popularized by Hanemann
(1984). From Hanemann and Kanninen (in press), "one wants to
formulate a statistical model for the CV responses that is consistent
with an economic model of utility maximization" (p. 4). The
calibration equations developed below exploit the link between the
economic and statistical models that is the foundation of the RUM
framework to identify deviations in CV bids from true WTP or WTA. My
model uses the direct utility function rather than the indirect utility
function (Hanemann 1984) or a variation function (McConnell 1990)(4)
because the calibration method was developed in conjunction with efforts
to estimate the parameters of utility functions associated with WTP and
WTA responses (Mansfield, in press). However, one could develop similar,
closed-form solutions for WTP and WTA from indirect utility functions.
Two sources of error, systematic and random, may cause an
individual's bid to differ from the amount he or she would actually
pay for the good if a market existed. Systematic over- or understatement
of WTP and WTA might occur due to factors such as the amount of time the
individual has to answer the question, the wording of the survey, or the
structure of the experiment. Hoehn and Randall (1987) and Crocker and
Shogren (1991) develop theoretical models of CV bids that predict
deliberate over- or understatement of WTP.(5) In addition, there is a
large literature base on the incentive properties of various survey and
experimental formats and the likelihood for strategic behavior.(6) For
example, Bohm (1984) hypothesizes that individuals who favor the action
proposed in the CV survey might purposely inflate their WTP if they did
not believe the survey would actually be used to determine the amount
they had to pay. Horowitz (1993) discusses the potential for
misunderstandings between the analyst and the respondent to contribute
to systematic bias in responses.
Furthermore, whether and by how much individual bids differ from
their true value will depend on the respondents' characteristics,
attitudes, and interpretation of the survey. Evidence that individuals
will react differently to identical incentive schemes can be found in
experiments such as Andreoni (1995) on the provision of public goods.
Herriges and Shogren (1996) found that local residents and
recreationists exhibited different anchoring behavior in a survey
valuing water quality improvements in an Iowa lake. Studies from the
psychology literature, reviewed in Krosnick (1991), indicate that the
response strategy an individual uses to answer a survey question may be
a function of his or her personal characteristics.
An ad hoc linear specification does not allow the analyst to
distinguish the influence of a characteristic, such as education, on
preferences from the influence of that characteristic on the propensity
of respondents to systematically inflate or deflate their CV bids.
Because individual characteristics and attitudes may affect both
preferences and the response strategy an individual adopts (e.g.,
systematic overstatement), I attempt to decompose the influences of
respondent attributes on these two elements of a CV bid.
Beyond these systematic influences, the CV bids will be subject to
random error due to unobserved heterogeneity in the respondents and the
inability of the variables available to the analyst to perfectly measure
the pertinent attitudes and characteristics of the respondents.
To capture both the systematic and random variations in individual
bids a systematic bias parameter, [C.sup.P] or [C.sup.A], is added to
Equations 3 and 4 in addition to a random error term. Equations 5-8 are
the bid functions derived from three utility functions: CES, constant
relative risk aversion (CRRA), and Cobb-Douglas.
[Mathematical Expression Omitted] (5)
[Mathematical Expression Omitted] (6)
Cobb-Douglas: [Alpha] ln Y (1 - [Alpha])ln E
[WTP.sub.i] = [C.sup.P] ([X.sub.i]) + [Y.sub.i] -
[e.sup.ln([Y.sub.i])+[(1-[Alpha]([X.sub.i])/[Alpha]([X.sub.i])(ln([E.sub.low]) - ln([E.sub.high]))] + [Mu] (7)
[WTA.sub.i] = [C.sup.A]([X.sub.i]) +
[e.sup.ln([Y.sub.i])+[(1-[Alpha]([X.sub.i]))/[Alpha]([X.sub.i])](ln([E.sub.high]) - ln([E.sub.low]))] - [Y.sub.i] + [Epsilon] (8)
In each equation, [C.sup.P] or [C.sup.A] measures the systematic
component of the error term while [Mu] and [Epsilon], which have means
of zero, measure the random component. The parameters of the utility
function determine the characteristics of the respondent's
preferences, and all of the parameters can be specified as a function of
individual characteristics and attitudes. For example, in the CES
utility function, income elasticity equals 1/[Rho], while for the CRRA
function, the income elasticity is a function of [Lambda] and [Theta].
3. Three Applications of the Calibration Model Using CV Data
This section describes the results from three applications of the
calibration model. The first data set is from Brookshire and Coursey
(1987) valuing the density of trees in a neighborhood park. The authors
conducted an open-ended CV survey and a laboratory experiment for the
commodity. The other two CV data sets are WTP and WTA dichotomous choice
data from a study by Bishop, Heberlein, and Welsh (BHW) valuing deer hunting permits (see Welsh 1986; Bishop and Heberlein 1990). In this
experiment, data from dichotomous choice CV bids were compared to bids
from identical SM experiments for a special one-day permit to hunt deer
in the Sandhill Wildlife Demonstration Area in Wisconsin prior to the
opening of the official Wisconsin hunting season.
Calibrating Open-Ended WTP Data
A list of the variables and summary statistics for the Brookshire and
Coursey CV and lab experiment data can be found in Table 1. The WTP
question asked the respondents their WTP for two increases in the number
of trees planted in a new neighborhood park - from 200 to 225 and then
250. The first component of the experiment consisted of a CV survey and
a modified Smith auction (without actual payment of bids) conducted
through door-to-door in-person interviews. I pooled these responses into
one data set with 170 observations.(7) The second component of their
analysis consisted of a laboratory experiment involving 27 respondents
conducted at a local school using the modified Smith auction with up to
five repeated trials.(8) The laboratory data provide a benchmark for the
calibration.(9)
As discussed in section 2, the systematic bias coefficient may be
determined by features of the experiment itself, in addition to
characteristics of the individual respondents. Unfortunately, because
the Brookshire and Coursey data set includes a limited number of
variables, this application is presented simply to illustrate one use of
this calibration model.
Table 1. Brookshire and Coursey WTP Data Set Means and Standard
Deviations (in Parentheses)
Variable Description CV Survey Lab Experiment
INCOME Monthly, after tax 2241.35 2196.22
(776.77) (717.10)
PARKVIEW Dummy = 1 if full or 0.31 0.52
partial view of park (0.46) (0.51)
HHSIZE Household size 3.21 3.07
(1.22) (1.07)
SCHOOL Dummy = 1 for child 0.47 0.44
attending elementary (0.50) (0.51)
school next to park
FINCOLLEGE Dummy = 1 if graduate 0.32 0.33
from college (0.47) (0.48)
GRADSCH Dummy = 1 if attended 0.19 0.41
some graduate school (0.39) (0.50)
or received advanced
professional or
technical degree
BID 17.15 8.48
(19.96) (10.02)
MEDIAN BID 10 5
N 170 27
Table 2 reports the results for the CV WTP data from three
specifications derived from CES utility (Eqn. 5) and one specification
from CRRA utility (Eqn. 6). Here [Alpha], [Rho], [Lambda], [Theta], and
[C.sup.P] are modeled as linear combinations of the observed
characteristics.(10) Several specifications are reported to illustrate
the impact of changing the specification on the systematic bias
coefficients and the resulting calibrated bids.(11)
Since the aim of this exercise is to predict true WTP, it might be
more appropriate to judge specifications using a test based on
mean-squared error or a noncentral F-test rather than on the standard
F-test. Several tests of this type are described in Wallace (1972) and
Wallace and Toro-Vicarrondo (1969), including a test that focuses on
forecasting the conditional mean of the dependent variable rather than
the vector [Beta]. For example, in Table 2, one could compare the
restricted model CES (3) to either CES (1) or CES (2). Using CES (1) as
the unrestricted model, the F-statistic is 0.32 with (4, 153) degrees of
freedom. Both the standard F-test and an F-test with noncentrality
parameter of one-half fail to reject CES (3).
The bias term [C.sup.P] is positive for all the participants in the
CV survey regardless of which model is used, suggesting that the
participants systematically overstated their WTP. When the bias
parameter is not specified as a function of respondent characteristics,
the two different [TABULAR DATA FOR TABLE 2 OMITTED] assumptions about
utility, CES (3) and CRRA (4), yield similar estimates of the systematic
bias coefficient ([[C.sup.P].sub.intercept]) - about $10. As expected,
due to the limited quality of the information available, none of the
respondent characteristics used to predict systematic bias in CES (1)
and (2) in Table 2 are significant.
To calibrate the data, I attempted to decompose the individual bids
separating the portion [TABULAR DATA FOR TABLE 3 OMITTED] of the bid
that is related to preferences from the portion that might be attributed
to bias. In Table 3, the data are calibrated by calculating the amount
of bias for each individual answer [Mathematical Expression Omitted] and
subtracting this from the amount the individual actually bid [BID.sub.i]
to arrive at the calibrated bid [CALBID.sub.i]. So [Mathematical
Expression Omitted] for each individual i in the experiment. The first
two rows of Table 3 present the mean and standard deviation of the bids
from the actual CV survey and lab experiment data. The other four rows
contain the mean and standard deviation of [[C.sup.P].sub.i] and
[CALBID.sub.i] from the four different models estimated in Table 2.(12)
If [CALBID.sub.i] was less than zero, then it was set equal to zero.
(Thus, the mean of the CALBID's in Table 3 will not equal the mean
of BID minus the mean of [C.sup.P].) In models (1) and (2) of Table 3,
the mean value of the systematic bias term is greater than the actual
bids made by many of the participants. The high mean values for
[C.sup.P] in models (1) and (2) may result from specifying [C.sup.P] as
a function of respondent characteristics, none of which are significant
in Table 2 and all of which are positive.
The means of the [CALBID.sup.i]'s range from a low of $2.45 in
model (2) to a high of over $10 in models (3) and (4). (In models (3)
and (4), [C.sup.P] was not specified as a function of any respondent
characteristics.) The actual mean bid from the lab experiment, $8.48,
falls within the range of estimated CALBID's.
Calibrating Dichotomous Choice CV Survey Data
The Bishop, Heberlein, and Welsh experiment consisted of both WTP and
WTA CV surveys and simulated market (SM) surveys administered by mail.
The surveys valued special one-day [TABULAR DATA FOR TABLE 4 OMITTED]
deer hunting permits that are distributed free each year to 150 hunters
through a lottery held by the Wisconsin Department of Natural Resources.
Hunters who had lost the lottery were sent the WTP questions. Half of
these hunters were offered a chance to actually purchase a deer hunting
permit for a set price. The other half received a similar hypothetical
offer. The WTA questions were sent to the lucky hunters who had won
permits in the state lottery. These hunters were offered a chance to
sell their permits back for a fixed price; again, half received a real
offer while half received a hypothetical offer. In all the experiments,
the price of the permit was varied over the sample.
In dichotomous choice questions, respondents are presented with a
proposed change in a good and then offered the option of either paying
or receiving some fixed amount of money to secure the change. The amount
of the offer is varied over the sample, and people must simply answer
yes or no to the CV question. Using dichotomous choice data, the RUM
framework is typically estimated using either a method suggested by
Hanemann (1984) or Cameron (see Hanemann and Kanninen, in press p. 6). I
chose to use the method outlined in Cameron and James (1987) and Cameron
(1988), where the outcome of the choice process is treated as a random
variable.(13) However, in this case, the model predicts that an
individual will answer yes to a WTP question if the offered amount is
less than or equal to the individual's true WTP plus the systematic
bias term and an error term. Assuming the response error is
independently and normally distributed with mean zero, the resulting
likelihood function can be estimated using a maximum hkelihood
technique.(14)
[TABULAR DATA FOR TABLE 5 OMITTED]
Bid functions consistent with Cobb-Douglas preferences (Eqn. 7, 8)
were used to construct the likelihood function.(15) Considering first
the CV WTP results in column 6 of Table 4, the model predicts that the
CV data are consistent with true WTP - the bias term C is insignificant.
Comparing the CV WTP results with the SM results in column 5 provides
support for this conclusion. In column 7, Equation 7 was estimated with
a data set combining the CV and SM WTP data. Comparing column 7 with
columns 5 and 6, I cannot reject the hypothesis that the data can be
jointly estimated at the 5% level using a likelihood ratio test. Not
surprisingly, the estimates for the expected value of WTP are quite
close: $31 in the SM and $35 in the CV survey (as calculated by the
survey authors; see Bishop and Heberlein 1990).
The CV WTA estimates in column 3 of Table 4 suggest that the CV bids
overstate true WTA - the systematic bias coefficient is positive and
significant. Comparing the CV results with the SM results in column 2,
the estimated values of [Alpha] remain comparable, but the bias term and
standard error (C and [Sigma]) are much larger in the CV estimation. In
this case, I can reject the hypothesis that the data can be jointly
estimated at any confidence level using a likelihood ratio test
comparing the results from the joint CV/SM data set in column 4 with
columns 2 and 3. In line with these results, the expected values of WTA
calculated by Bishop and Heberlein for the CV and SM experiments are not
nearly as close in value - $153 in the SM and $420 in the CV survey. One
disturbing result is the positive and significant systematic bias
coefficient in the SM WTA model. The SM experiment offered real cash
payments for the participants' deer hunting permits. Thus, one
might expect the bids from this experiment to reflect true WTA. In fact,
in the fuller specifications that include individual characteristics
presented in Table 6, the SM systematic bias coefficient is
insignificant, while the CV systematic bias coefficient remains
significant.
These preliminary estimates suggest that the CV WTP data do not need
to be calibrated for systematic bias, while the CV WTA do. Table 5 lists
the variables used to estimate the calibration equations along with
their means and standard deviations. The utility function parameter
[Alpha] was specified as a function of the respondents' feelings
about deer hunting, a measure of the number of substitutes the hunters
felt they had for hunting, and the quality of deer hunting at Sandhill.
The systematic bias coefficient was modeled as a function of the
respondents' feelings about their right to hunt, their reactions to
the survey, and education.
[TABULAR DATA FOR TABLE 6 OMITTED]
Table 6 reports the results from re-estimating the WTA models in
Table 4, specifying the parameters as functions of the individuals'
observed characteristics. The CV and SM data sets were analyzed separately using three different specifications of the CV data and one
specification of the SM data. The last two columns of Table 6, CV WTA
(3) and SM WTA, have the same specification. These results confirm that
the CV bids overstated true WTA ([[C.sup.A].sub.intercept], is positive
and significant in CV WTA (3)). However, the model now suggests that the
SM data accurately represent true WTA ([[C.sup.A].sub.intercept] is not
significantly different from zero in the SM WTA results). When [C.sup.A]
is specified as a function of individual characteristics and attitudes
in CV WTA (1) and (2), none of the systematic bias coefficient estimates
are significant. However, one cannot reject the hypothesis that the
systematic bias parameter estimates in CV WTA (2) are jointly
significant at the 10% level.(16) As far as the specification of
[Alpha], EXQUALSH is significant in all three models, while FEEL 3 and
SUBST are significant in models (2) and (3).
Table 7. BHW Data Uncalibrated and Calibrated WTA Responses - Means
and Standard Deviations
Mean
Variable(a) (SD)
Uncalibrated WTA (N = 68) PRED BID 799.80
(849.26)
Model (1) [C.sup.A] 191.41
(55.51)
Calibrated WTA, CV Survey (N = 68) PRED BID 160.32
(174.02)
Model (2) [C.sup.A] 183.19
(73.70)
Calibrated WTA, CV Survey (N = 68) PRED BID 190.76
(211.86)
Model (3) [C.sub.A] 184.14
Calibrated WTA, CV Survey (N = 68) PRED BID 195.23
(206.79)
Predicted WTA, SM Survey (N = 70) PRED BID 110.94
(110.42)
a PRED BID was calculated for each individual from the equation
WTA = [e.sup.log(Y)+[(1-[Alpha])/[Alpha]](log([E.sub.high])-log
([E.sub.low]))] - Y,
where [Alpha] was specified as a linear function of an intercept,
FEEL 3, FEEL 8, REACT 11, SUBST, EXQUALSH, and BSTCHNC as in Table
6. PRED BID (predicted WTA) was calculated for each respondent and
the mean is presented in this table. The uncalibrated WTA is based
on coefficients from this model estimated without a systematic bias
parameter (the results are not presented). The calibrated results
are based on the coefficients in Table 6. [C.sup.A] measures the
amount of bias in each individual's response. The value of
[C.sup.A] was calculated for each individual using the parameter
estimates from Table 6, and the mean value of [C.sup.A] for all
individuals is presented here.
Calibrating the responses to the closed-ended questions is slightly
more difficult than for open-ended data because I do not have a direct
observation of the individual's minimum WTA. Rather, I infer
minimum WTA by estimating a WTA function. In this case, the calibrated
value for WTA (PRED BID) is calculated from the expression for WTA
derived from Cobb-Douglas preferences using the coefficient estimates
from Table 6. (See Table 7 for more details.) The PRED BID calculated
with the coefficients from Table 6 should represent true WTA because any
systematic bias in the CV bids, and thus in my estimates of the utility
function parameters, should be captured by the bias parameter.(17)
The means of all the individuals' values for [C.sup.A] and PRED
BID based on the results in Table 6 are contained in Table 7.(18) The
first row is the predicted value of WTA based on estimates from a model
that was identical to the model in columns (1)-(3) in Table 6 except
that it contained no systematic bias coefficient. This provides an
uncalibrated estimate of WTA. The next three rows contain the bias term
[C.sup.A] and the calibrated values of WTA. Finally, the last row
contains predicted WTA for the SM data. According to this model, most of
the participants in the CV survey overstated their WTA - the mean of the
predicted WTA from the SM data of $110.94 is lower than the uncalibrated
mean WTA of $799.80. The means of the PRED BID's from the three CV
models (which range from $160 to $195) are lower than the uncalibrated
WTA but are still higher than the mean of the PRED BID from the SM
estimate. Again, the three differem CV models provide the reader with a
sense of how PRED BID and [C.sup.A] vary under different specifications.
4. Discussion
CV and SM responses from three data sets, Brookshire and
Coursey's WTP for trees and WTP and WTA values from BHW's deer
hunting permit data, have been considered in evaluating the calibration
model. The calibration model suggests that the Brookshire and Coursey
WTP CV bids and the BHW WTA CV bids were biased upwards, while the BHW
WTP CV bids were unbiased.
In the context of the literature on revealed preference methods, bids
from the BHW experiment for hunting permits should capture recreational
use value. In a meta-analysis of 287 benefit estimates, Walsh, Johnson,
and McKean (1990) compared the results from travel cost and WTP CV
estimates of recreational use value.(19) According to their analysis, CV
surveys produce lower values than travel cost models, but dichotomous
choice CV values are closer to the travel cost estimates than CV
estimates using an open-ended question format. Using data from studies
valuing a variety of quasi-public goods, Carson et al. (1996) examine
the ratio of WTP CV to revealed preference estimates. They found that
across 46 comparisons between CV and simulated market or experimental
data, including the WTP data from BHW, there was a close correspondence
between the values from the two methodologies.
For the BHW data, my calibration model predicted that neither the CV
nor the SM WTP results were biased. Thus, my results confirm the
findings of Carson et al. On the other hand, I estimated a substantial
upwards bias in the CV WTA results. The difficulty in measuring WTA
through CV surveys is well known, and neither meta-analysis included
data from WTA studies. The positive and significant bias term for the
WTA CV bids confirms the results found in other comparisons of WTA CV
and SM data, such as Fisher, McClelland, and Schulze (1988) (see
Mansfield, Van Houtven, and Huber [1997] for a discussion of the
difficulties in measuring WTA).
Respondents to the Brookshire and Coursey study were drawn from the
neighborhood surrounding the park, so their bids should reflect both
recreational use and aesthetic value for the extra trees. The results
from the calibration model predict that the WTP bids were biased
upwards. The study was not included in the meta-analysis on quasi-public
goods performed by Carson et al. (1996); however, the authors note that
"some CV estimates clearly exceed their revealed preference
counterparts, therefore one should not conclude that CV estimates are
always smaller than revealed preference estimates" (p. 93). Since
the bids from the simulated market experiment were lower than the bids
from the CV survey, it suggests that the calibration model may have
correctly identified the upwards bias in the CV bids.
For this study, I deliberately chose data sets that included both CV
and SM components in order to provide a benchmark against which to judge
the calibrated CV results. Two issues related to this decision should be
emphasized. The first issue is that the SM bids themselves may not
accurately measure true WTP or WTA. The Brookshire and Coursey SM
experiments were conducted at the local high school, and it is unclear
how the participants interpreted the exercise. For example, it is
possible that the SM bids understated WTP if the respondents did not
believe that the money was actually going to be used to purchase
additional trees. Even the BHW SM experiment was probably considered
unusual, especially by the WTA respondents, since the yearly lottery for
permits had never before included opportunities to buy or sell the
permits. Thus, a comparison between the SM bids and the calibrated CV
bids is a test of convergent validity.(20)
The second issue relates to the application of this calibration
method to CV surveys that lack comparable simulated market or other data
against which to compare the calibration results, especially surveys
that measure primarily nonuse or existence values. The strength of this
calibration technique is that it does not require revealed preference or
experimental data to estimate bias and calibrate CV bids. However,
before the calibration technique can be applied in situations where
benchmark data do not exist, it must first be tested using data sets for
which benchmark data do exist, as in this paper. Tests such as these are
important for establishing the reliability of the technique and for
addressing issues such as appropriate functional form assumptions and
other specification issues.
Applying the lessons learned from these tests to other CV data sets
will require further assumptions about the way in which people respond
to various types of CV surveys. As with benefit transfer, the more
similarities that exist between the test data sets and the CV data that
need to be calibrated, the more confidence one might have in the
results.
Most, but not all, of the CV data sets for which complementary
benchmark data exist measure use value rather than nonuse value. In
order to apply the results from use value data to CV surveys measuring
nonuse value, assumptions must be made about the relationship between
use and nonuse values and the way people respond to the two types of CV
questions. Unfortunately, there is currently little agreement about
nonuse values, how they are formed, and their relationship to use
values. Thus, while this calibration method can be applied to any CV
data set, proper specification may be more difficult when a significant
portion of the value is nonuse.
5. Conclusions and Future Research
Despite the controversy surrounding CV surveys, they are often
employed to estimate the benefits of nonmarket environmental goods. The
results from CV surveys will vary in quality depending on the
circumstances of the survey implementation, including the expertise of
the analysts and the budget for the survey. The development of a
calibration technique for CV data would provide a measure of the
reliability of the data and the ability to adjust biased results.
The model proposed in this paper provides the basis for a simple and
inexpensive way of isolating bias and calibrating the responses from a
CV survey. The method does not require additional data beyond the CV
survey itself, allowing the calibration of both use and nonuse data.
Furthermore, whether an individual under- or overstates his bid in a CV
survey is related to the individual's characteristics and his or
her reaction to the format of the survey. This calibration method allows
me to separate out the effect of individual characteristics on
systematic bias from the effect of these characteristics on the
parameters of the utility function.
The challenge in developing techniques for calibrating CV data is
finding a benchmark against which to judge the results. To test this
calibration method, I used data sets for which laboratory or SM
benchmarks existed. This analysis suggests that only the BHW WTP CV data
produced unbiased values. In contrast, the calibration model predicts
that the responses from the other two CV surveys tested overstated true
WTP and WTA. For these data sets, the results of the calibration model
are encouraging - the results from the calibration model corroborated the general pattern observed from comparing the CV data with laboratory
or SM data.
Further tests of this calibration model need not rely on data sets
that include SM or experimental components. The calibration model could
be tested using data from CV surveys that measured the value of the same
good with several different question formats. For example, suppose one
had data from two CV surveys measuring the value of the same good, an
open-ended and a dichotomous choice survey. Loosely speaking, if the
mean open-ended WTP was lower than the mean predicted WTP from the
dichotomous choice data, then the systematic bias parameter from the
calibration model should also be smaller for the open-ended data.
Studies such as this could help establish the reliability of the
calibration model.
As discussed above, the power of the calibration model could be
improved by a better understanding of how individuals answer CV
questions, including the traits or attitudes that inspire individuals to
give more or less accurate answers and variables that measure these
traits or attitudes. This is especially important for cases in which
benchmark data do not exist. Future research might use verbal protocols
or other debriefing techniques to develop more accurate models of
response behavior.
Finally, the choice of functional form is an important element of the
calibration model. To facilitate the estimation of more flexible
functional forms, future studies might also want to include more
variation in the bid space and in the attributes of the environmental
commodity.
I would like to thank William Evans for all his help. I am also
grateful to Maureen Cropper, John Horowitz, Glenn Harrison, Randall
Kramer, Kerry Smith, and two anonymous referees for their many useful
suggestions, and David Brookshire, Donald Coursey, Richard Bishop,
Thomas Heberlein, and Michael Welsh for supplying me with their data.
Any remaining errors are my own.
1 See Federal Register, vol. 59, no. 5 (January 7, 1994), p. 1146.
2 For example, see Blackburn, Harrison, and Rutstrom (1994), Cameron
(1992), and Eom and Smith (1994). One exception is work by Schulze,
MeClelland, and Lazo (1994), who propose transforming the bids from
open-ended CV surveys with a Box-Cox specification until they fit a
normal distribution.
3 Exceptions include Hanemann (1984), Hoehn (1991), and Hoehn and
Loomis (1993). See also Hanemann and Kanninen (in press) for a
description of more general specifications.
4 McConnell (1990) develops the variation function as a change in the
expenditure functions.
5 Hoehn and Randall (1987) offer a model that predicts people will
understate WTP and overstate WTA when they lack time to think or a clear
definition of the commodity. Crocker and Shogren (1991) outline a model
in which the commodity is well defined but unfamiliar. Even with
adequate time to think about the question, the respondents need to
invest in learning about the unfamiliar good and thus will
systematically overstate WTP.
6 For a discussion of issues such as strategic behavior and free
riding in CV surveys, see, for example, the Winter 1994 issue of the
Natural Resources Journal.
7 No significant difference between the bids from the two experiments
was found. In the modified Smith auction, respondents were told the
total number of households who would be asked to contribute and the
total cost of the new trees. Three possible outcomes were explained to
each respondent. First, if the sum of the payments was less than the
cost, then the households paid nothing and no additional trees would be
planted. Second, if the sum of the payments equaled the cost, then each
household paid the amount they bid and the trees would be planted.
Finally, if the sum of the payments exceeded the cost, then each
household would pay a fraction of what they bid so that payments equaled
the cost of the new trees.
8 During the lab experiment, the participants were divided into
groups and each individual was asked to write down his or her WTP If the
sum of the group's WTP was greater than the cost of the additional
trees, the participants were required to pay the amount they bid. In the
actual experiment, payment was collected from only one of the groups the
other groups did not collectively bid enough to cover the cost of the
extra trees and the experiment ended after five trials, as per the
instructions.
9 Unfortunately the lab experiment data set was too small to estimate
the calibration model. To test the accuracy of the lab bids, I estimated
Equation 5 with [C.sup.P] specified as the function of an intercept term
and a dummy variable for participation in the lab experiment using a
data set combining the survey and lab experiment data. The results
suggest that the lab experiment bids were not subject to systematic
bias.
10 For example, in Equation 5 [Rho] = [[Rho].sub.intercept] +
[[Rho].sub.hhsize] * HHSIZE + [[Rho].sub.gradsch] * GRADSCH +
[[Rho].sub.school] * SCHOOL + [[Rho].sub.parkview] * PARKVIEW +
[[Rho].sub.fincollege] * FINCOLLEGE. Note that the parameters [Lambda]
and [Theta] have different interpretations than [Alpha] and [Rho], so
the coefficients on these parameters are not comparable.
11 Specifications of models (1) and (2) including
[[Alpha].sub.fincollege] were rejected due to multicollinearity
problems.
12 The values of [C.sup.P] and CALBID were calculated using
individual characteristics and the coefficient estimates in Table 2. The
means of the individual estimates are presented in Table 3 with their
standard deviations.
13 See McConnell (1990) for a discussion of assumptions about the
scale factor and the conditions under which Cameron and James'
model is identical to Hanemann's.
14 Specifically, the likelihood function was estimated using a
Newton-Raphson maximization technique and the covariance matrix was
estimated using the procedure of Bernadt, Hall, Hall and Hausman. See
Greene (1993) or Judge et al. (1980).
15 The equations used in the likelihood function are highly nonlinear in the parameters to be estimated, and the CES and CRRA equations did
not converge. To facilitate the estimation of more complex models,
future studies could be designed with wider variation in the bid space
and the attribute space for the environmental commodity. The measure of
income used in the estimation also deserves more attention both in this
calibration model and in other demand models. Because willingness to pay is small in relation to total income, the statistical estimation process
could be improved if the analysis was based on some fraction of income.
For example, one could raplace income with the household budget for
discretionary spending. Of course, determining such a budget is not a
trivial issue.
16 The test was made using a likelihood ratio test comparing CV WTA
(2) and CV WTA (3). The chi-square test statistic was 11.7 with 5
degrees of freedom.
17 Alternatively, the bids could be calibrated by first estimating
Equation 8 without a systematic bias term and using these estimates to
calculate an uncalibrated predicted WTA. Then the predicted WTA bid
could be calibrated by estimating [C.sup.A] in a separate equation and
subtracting it from the uncalibrated WTA. The same choice actually
exists for calibrating open-ended data. One could either subtract estimated bias from the actual CV bid, as I did, or use the values of
[Alpha] and p from Table 2 to calculate a predicated WTP.
18 The values of [C.sup.A] and PRED BID are the means of all the
individual values. The individual values were calculated using the
coefficient estimates in Table 6 and the individual's
characteristics. [C.sup.A] and [Alpha] are linear combinations of the
variables in Table 6.
19 The analysis included variables to account for issues such as
whether the travel cost study included travel time, very general
measures of site quality, the type of CV question, and other
characteristics of the study.
20 Mitchell and Carson (1989) define convergent validity as "the
correspondence between a measure and other measures of the same
theoretical construct . . . . In convergent validity neither of the
measures is assumed to be a truer measure of the construct than the
other" (p. 204).
References
Andreoni, J. 1995. Cooperation in public goods experiments: Kindness and confusion. The American Economic Review 85:891-904.
Bishop, R. C., and T. A. Heberlein. 1990. The contingent valuation method. In Economic valuation of natural resources: Issues, theory, and
applications, edited by R. L. Johnson and G. V. Johnson. Boulder, CO:
Westview Press, pp. 81-104.
Blackburn, M., G. W. Harrison, and E. E. Rutstrom. 1994. Statistical
bias functions and informative hypothetical surveys. American Journal of
Agricultural Economics 76:1084-88.
Bohm, P. 1984. Revealing demand for an actual public good. Journal of
Public Economics 24:135-51.
Brookshire, D. S., and D. L. Coursey. 1987. Measuring the value of a
public good: An empirical comparison of elicitation procedures. American
Economic Review 77:554-65.
Cameron, T. A. 1988. A new paradigm for valuing non-market goods
using referendum data. Journal of Environmental Economics and Management
15:355-79.
Cameron, T. A. 1992. Combining contingent valuation and travel cost
data for the valuation of nonmarket goods. Land Economics 68:302-17.
Cameron, T. A., and M. D. James. 1987. Efficient estimation methods
for 'closed-ended' contingent valuation surveys. The Review of
Economics and Statistics 69:269-76.
Carson, R. T., N. E. Flores, K. M. Martin, and J. L. Wright. 1996.
Contingent valuation and revealed preference methodologies: Comparing
the estimates for quasi-public goods. Land Economics 72:80-99.
Crocker, T. D., and J. F. Shogren. 1991. Preference learning and
contingent valuation methods. In Environmental policy and the economy,
edited by F. J. Dietz, E van der Ploeg, and J. van der Straaten.
Amsterdam: North-Holland, pp. 77-94.
Deaton, A. 1986. Demand analysis. In Handbook of econometrics 3,
edited by Z. Griliches and M.D. Intriligator. Amsterdam: North-Holland,
pp. 1767-839.
Eom, S. Y., and V. K. Smith. 1994. Calibrated nonmarket valuation.
Unpublished paper, North Carolina State University.
Fisher, A., G. H. McClelland, and W. D. Schulze. 1988. Measures of
willingness to pay versus willingness to accept: Evidence, explanations
and potential reconciliation. In Amenity resource valuation: Integrating
economics with other disciplines, edited by G. L. Peterson, B. L.
Driver, and R. Gregory. State College, PA: Venture, pp. 127-34.
Greene, W. H. 1993. Econometric analysis. 2nd edition. New York:
Macmillan.
Hanemann, W. M. 1984. Welfare evaluations in contingent valuation
experiments with discrete responses. American Journal of Agricultural
Economics 66:332-41.
Hanemann, W. M., and B. Kanninen. 1998. The statistical analysis of
discrete response CV data. In Valuing environmental preferences: Theory
and practice of the contingent valuation method in the us, ec and
developing countries, edited by I. J. Bateman and K. G. Willis. Oxford:
Oxford University Press. In press.
Herriges, J. A., and J. F. Shogren. 1996. Starting point bias in
dichotomous choice valuation with follow-up questioning. Journal of
Environmental Economics and Management 30:112-31.
Hoehn, J. 1991. Valuing the multidimensional impacts of environmental
policy: Theory and methods. American Journal of Agricultural Economics
73:289-99.
Hoehn, J., and J. Loomis. 1993. Substitution effects in the valuation
of multiple environmental programs. Journal of Environmental Economics
and Management 25:56-75.
Hoehn, J., and A. Randall. 1987. A satisfactory benefit cost
indicator from contingent valuation. Journal of Environmental Economics
and Management 14:226-47.
Horowitz, J. 1993. A new model of contingent valuation. American
Journal of Agricultural Economics 75:1268-72.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T. C.
Lee. 1985. The theory and practice of econometrics. New York: John
Wiley.
Krosnick, J. A. 1991. Response strategies for coping with the
cognitive demands of attitude measures in surveys. Applied Cognitive
Psychology 5:213-36.
Mansfield, C. A. 1998. Despairing over disparities: Explaining the
difference between willingness-to-pay and willingness-to-accept.
Environment and Resource Economics. In press.
Mansfield, C. A., G. Van Houtven, and J. Huber. 1997. Guilt by
association: Compensation and the bribery effect. Unpublished paper,
Duke University.
McConnell, K. E. 1990. Models for referendum data: The structure of
discrete choice models for contingent valuation. Journal of
Environmental Economics and Management 18:19-35.
Mitchell, R., and R. Carson. 1989. Using surveys to value public
goods: The contingent valuation method. Washington, DC: Resources for
the Future.
Schulze, W., G. McClelland, and J. Lazo. 1994. Methodological issues
in using contingent valuation to measure non-use values. Paper prepared
for DOE/EPA Workshop, May 19-20, 1994, Herdon, VA.
Wallace, T. D. 1972. Weaker criteria and tests for linear
restrictions in regression. Econometrica 40:689-98.
Wallace, T. D., and C. E. Toro-Vizcarrondo. 1969. Tables for the mean
squared error test for exact linear restrictions in regression. American
Statistical Association Journal 64:1649-63.
Walsh, R. G., D. M. Johnson, and J. R. McKean. 1990. Nonmarket values
from two decades of research on recreation demand. In Advances in
applied micro-economics 5, edited by V. K. Smith. Greenwich, CT: JAI Press, pp. 167-93.
Welsh, M. 1986. Exploring the accuracy of the contingent valuation
method: Comparisons with simulated Markets. Ph.D. thesis, Department of
Agricultural Economics, The University of Wisconsin-Madison, Madison,
Wisconsin.