文章基本信息

标题：A consistent method for calibrating contingent value survey data.
作者：Mansfield, Carol
期刊名称：Southern Economic Journal
印刷版ISSN：0038-4038
出版年度：1998
期号：January
语种：English
出版社：Southern Economic Association
摘要：Contingent value (CV) surveys are used to estimate the economic value of nonmarket goods, especially environmental goods. A major concern with CV surveys is the potential for what have loosely been called hypothetical and strategic biases in the answers to CV questions. For a variety of reasons, often individual specific, a respondent's answer to a CV question may differ from his or her true value for the good. To address the potential problem of inaccurate bids, the initial version of the proposed rules for the Oil Pollution Act of 1990 called for all CV values to be divided in half.(1) The provision was intended as a challenge to practitioners to develop a method for calibrating the data from CV surveys.
关键词：Utility functions;Utility theory;Value (Economics)

A consistent method for calibrating contingent value survey data.

Mansfield, Carol

1. Introduction

Contingent value (CV) surveys are used to estimate the economic value of nonmarket goods, especially environmental goods. A major concern with CV surveys is the potential for what have loosely been called hypothetical and strategic biases in the answers to CV questions. For a variety of reasons, often individual specific, a respondent's answer to a CV question may differ from his or her true value for the good. To address the potential problem of inaccurate bids, the initial version of the proposed rules for the Oil Pollution Act of 1990 called for all CV values to be divided in half.(1) The provision was intended as a challenge to practitioners to develop a method for calibrating the data from CV surveys.

This paper outlines a statistical method for calibrating the data from CV surveys derived from the assumption that individuals make constrained utility maximizing decisions. The method allows us to determine the influence of individual characteristics on bias, as distinct from their influence on the preference parameters. To illustrate the logic of this approach, a specific functional form for individual preferences was used to derive closed-form analytical expressions for an individual's willingness-to-pay (WTP) and willingness-to-accept (WTA). These functions allow systematic deviations in individual responses to be explicitly modeled by providing a structural interpretation of the error term. The framework is appropriate for both open-ended and dichotomous choice data.

The random utility model framework focuses attention on the error term, specifically on the possibility that there is an individual-specific, systematic component to the error term that is related to bias in CV responses. Evidence from experimental economics and psychology suggests that different respondents may react differently to the same survey question or laboratory experiment. Some elements of this reaction may be correlated with observable characteristics, such as education or age, while other elements will appear random to the researcher. Thus, it is important for calibration techniques to allow for the influence of individual characteristics on the existence and the size of any bias in CV responses.

We illustrate this approach with three CV data sets using data from both open-ended and dichotomous choice responses. The particular applications were selected because comparable sets of laboratory or simulated market data exist for each of the three CV data sets. This allows a comparison between the results from the proposed calibration model and the laboratory or simulated market data. Ideally, the calibrated CV results should be compared to values from actual market transactions rather than the results from laboratory or simulated market experiments, which may also be biased. The calibration model proposed in this paper can be applied to simulated market data as easily as CV data, and for two of my data sets, I am actually able to estimate whether the simulated market data suffers from biased responses as well.

The approach derived here does not require additional data beyond the CV survey itself to implement; thus, it can be used to calibrate data measuring use or nonuse values. Other calibration techniques for CV data require actual market data from weakly complementary goods or the identification of a surrogate market good for the nonmarket good valued in the CV survey.(2) But in many cases, especially where nonuse values are important, it may be impossible, if not contradictory, to define the appropriate set of weakly complementary market goods. Instead, we interpret the task of developing a calibration model for CV responses as a logical problem that considers whether there are sufficient model restrictions and sample information to identify the preference parameters and distinguish them from sources of bias in CV responses.

While the choice of functional form for utility is an important maintained assumption conditioning the results derived from this approach to calibration, similar issues have been routinely addressed in modeling consumer demand. A great deal of work in demand analysis has focused on developing statistical models that can be used to test demand theory. In much of this work, researchers have been forced to make assumptions about the functional form for either the direct or the indirect utility functions. Even when the functional forms are restrictive, the resulting estimates can be informative and provide a foundation for future research (see Deaton [1986] for a review). Furthermore, failure to account for response bias when estimating bid functions for CV data will yield parameter estimates that are a composite of preference and response bias effects. Because response biases may be positive or negative, the parameter estimates will be difficult to interpret.

This paper is organized as follows. Section 2 develops a model of CV bids and derives structural equations for WTP and WTA. The calibration model is applied to the three data sets in section 3, and the results are discussed in section 4. Section 5 contains ideas for further research.

2. A Model for Calibration

A typical CV survey describes an environmental good and then proposes a change for better or worse in some feature of that good. The respondents are asked to decide how much they will pay for the improvement or how much compensation they require if the change is for the worse. Within this framework, assume that each respondent receives utility from two goods: the environmental good (E) that has two levels, [E.sub.high] and [E.sub.low], and a Hicksian composite good, represented by income (Y). In this framework, an individual's true maximum willingness-to-pay [wtp.sub.i], and minimum willingness-to-accept [wta.sub.i] satisfy the equalities

U([Y.sub.i] - [wtp.sub.i], [E.sub.high]; [X.sub.i]) = U([Y.sub.i], [E.sub.low]; [X.sub.i]) (1)

U([Y.sub.i] + [wta.sub.i], [E.sub.low]; [X.sub.i]) = U ([Y.sub.i], [E.sub.high]; [X.sub.i]), (2)

where [X.sub.i] is a vector of individual characteristics and attitudes.

For a given utility function with parameter vector [Beta], these equations can be solved explicitly for [wtp.sub.i] or [wta.sub.i] as

[wtp.sub.i] = f([Y.sub.i], [E.sub.high], [E.sub.low], [Beta]([X.sub.i]); [X.sub.i]) (3)

[wta.sub.i] = g([Y.sub.i], [E.sub.high], [E.sub.low], [Beta]([X.sub.i]); [X.sub.i]) (4)

Most CV studies rely on bid functions assumed to be linear in observed characteristics.(3) By selecting a specific functional form for utility, explicit closed-form solutions for WTP and WTA can be derived. These structural equations will allow me to decompose the individual's bid into a preference-based component that is my estimate of WTP and WTA and a bias term that identifies systematic deviations from the assumptions of the model.

I chose to estimate the equations based on a random utility model (RUM) where WTP (or WTA) is treated as a random variable. The random utility approach to analyzing CV data was popularized by Hanemann (1984). From Hanemann and Kanninen (in press), "one wants to formulate a statistical model for the CV responses that is consistent with an economic model of utility maximization" (p. 4). The calibration equations developed below exploit the link between the economic and statistical models that is the foundation of the RUM framework to identify deviations in CV bids from true WTP or WTA. My model uses the direct utility function rather than the indirect utility function (Hanemann 1984) or a variation function (McConnell 1990)(4) because the calibration method was developed in conjunction with efforts to estimate the parameters of utility functions associated with WTP and WTA responses (Mansfield, in press). However, one could develop similar, closed-form solutions for WTP and WTA from indirect utility functions.

Two sources of error, systematic and random, may cause an individual's bid to differ from the amount he or she would actually pay for the good if a market existed. Systematic over- or understatement of WTP and WTA might occur due to factors such as the amount of time the individual has to answer the question, the wording of the survey, or the structure of the experiment. Hoehn and Randall (1987) and Crocker and Shogren (1991) develop theoretical models of CV bids that predict deliberate over- or understatement of WTP.(5) In addition, there is a large literature base on the incentive properties of various survey and experimental formats and the likelihood for strategic behavior.(6) For example, Bohm (1984) hypothesizes that individuals who favor the action proposed in the CV survey might purposely inflate their WTP if they did not believe the survey would actually be used to determine the amount they had to pay. Horowitz (1993) discusses the potential for misunderstandings between the analyst and the respondent to contribute to systematic bias in responses.

Furthermore, whether and by how much individual bids differ from their true value will depend on the respondents' characteristics, attitudes, and interpretation of the survey. Evidence that individuals will react differently to identical incentive schemes can be found in experiments such as Andreoni (1995) on the provision of public goods. Herriges and Shogren (1996) found that local residents and recreationists exhibited different anchoring behavior in a survey valuing water quality improvements in an Iowa lake. Studies from the psychology literature, reviewed in Krosnick (1991), indicate that the response strategy an individual uses to answer a survey question may be a function of his or her personal characteristics.

An ad hoc linear specification does not allow the analyst to distinguish the influence of a characteristic, such as education, on preferences from the influence of that characteristic on the propensity of respondents to systematically inflate or deflate their CV bids. Because individual characteristics and attitudes may affect both preferences and the response strategy an individual adopts (e.g., systematic overstatement), I attempt to decompose the influences of respondent attributes on these two elements of a CV bid.

Beyond these systematic influences, the CV bids will be subject to random error due to unobserved heterogeneity in the respondents and the inability of the variables available to the analyst to perfectly measure the pertinent attitudes and characteristics of the respondents.

To capture both the systematic and random variations in individual bids a systematic bias parameter, [C.sup.P] or [C.sup.A], is added to Equations 3 and 4 in addition to a random error term. Equations 5-8 are the bid functions derived from three utility functions: CES, constant relative risk aversion (CRRA), and Cobb-Douglas.

[Mathematical Expression Omitted] (5)

[Mathematical Expression Omitted] (6)

Cobb-Douglas: [Alpha] ln Y (1 - [Alpha])ln E

[WTP.sub.i] = [C.sup.P] ([X.sub.i]) + [Y.sub.i] - [e.sup.ln([Y.sub.i])+[(1-[Alpha]([X.sub.i])/[Alpha]([X.sub.i])(ln([E.sub.low]) - ln([E.sub.high]))] + [Mu] (7)

[WTA.sub.i] = [C.sup.A]([X.sub.i]) + [e.sup.ln([Y.sub.i])+[(1-[Alpha]([X.sub.i]))/[Alpha]([X.sub.i])](ln([E.sub.high]) - ln([E.sub.low]))] - [Y.sub.i] + [Epsilon] (8)

In each equation, [C.sup.P] or [C.sup.A] measures the systematic component of the error term while [Mu] and [Epsilon], which have means of zero, measure the random component. The parameters of the utility function determine the characteristics of the respondent's preferences, and all of the parameters can be specified as a function of individual characteristics and attitudes. For example, in the CES utility function, income elasticity equals 1/[Rho], while for the CRRA function, the income elasticity is a function of [Lambda] and [Theta].

3. Three Applications of the Calibration Model Using CV Data

This section describes the results from three applications of the calibration model. The first data set is from Brookshire and Coursey (1987) valuing the density of trees in a neighborhood park. The authors conducted an open-ended CV survey and a laboratory experiment for the commodity. The other two CV data sets are WTP and WTA dichotomous choice data from a study by Bishop, Heberlein, and Welsh (BHW) valuing deer hunting permits (see Welsh 1986; Bishop and Heberlein 1990). In this experiment, data from dichotomous choice CV bids were compared to bids from identical SM experiments for a special one-day permit to hunt deer in the Sandhill Wildlife Demonstration Area in Wisconsin prior to the opening of the official Wisconsin hunting season.

Calibrating Open-Ended WTP Data

A list of the variables and summary statistics for the Brookshire and Coursey CV and lab experiment data can be found in Table 1. The WTP question asked the respondents their WTP for two increases in the number of trees planted in a new neighborhood park - from 200 to 225 and then 250. The first component of the experiment consisted of a CV survey and a modified Smith auction (without actual payment of bids) conducted through door-to-door in-person interviews. I pooled these responses into one data set with 170 observations.(7) The second component of their analysis consisted of a laboratory experiment involving 27 respondents conducted at a local school using the modified Smith auction with up to five repeated trials.(8) The laboratory data provide a benchmark for the calibration.(9)

As discussed in section 2, the systematic bias coefficient may be determined by features of the experiment itself, in addition to characteristics of the individual respondents. Unfortunately, because the Brookshire and Coursey data set includes a limited number of variables, this application is presented simply to illustrate one use of this calibration model.

Table 1. Brookshire and Coursey WTP Data Set Means and Standard
Deviations (in Parentheses)

Variable Description CV Survey Lab Experiment

INCOME Monthly, after tax 2241.35 2196.22
 (776.77) (717.10)

PARKVIEW Dummy = 1 if full or 0.31 0.52
 partial view of park (0.46) (0.51)

HHSIZE Household size 3.21 3.07
 (1.22) (1.07)

SCHOOL Dummy = 1 for child 0.47 0.44
 attending elementary (0.50) (0.51)
 school next to park

FINCOLLEGE Dummy = 1 if graduate 0.32 0.33
 from college (0.47) (0.48)

GRADSCH Dummy = 1 if attended 0.19 0.41
 some graduate school (0.39) (0.50)
 or received advanced
 professional or
 technical degree

BID 17.15 8.48
 (19.96) (10.02)

MEDIAN BID 10 5

N 170 27

Table 2 reports the results for the CV WTP data from three specifications derived from CES utility (Eqn. 5) and one specification from CRRA utility (Eqn. 6). Here [Alpha], [Rho], [Lambda], [Theta], and [C.sup.P] are modeled as linear combinations of the observed characteristics.(10) Several specifications are reported to illustrate the impact of changing the specification on the systematic bias coefficients and the resulting calibrated bids.(11)

Since the aim of this exercise is to predict true WTP, it might be more appropriate to judge specifications using a test based on mean-squared error or a noncentral F-test rather than on the standard F-test. Several tests of this type are described in Wallace (1972) and Wallace and Toro-Vicarrondo (1969), including a test that focuses on forecasting the conditional mean of the dependent variable rather than the vector [Beta]. For example, in Table 2, one could compare the restricted model CES (3) to either CES (1) or CES (2). Using CES (1) as the unrestricted model, the F-statistic is 0.32 with (4, 153) degrees of freedom. Both the standard F-test and an F-test with noncentrality parameter of one-half fail to reject CES (3).

The bias term [C.sup.P] is positive for all the participants in the CV survey regardless of which model is used, suggesting that the participants systematically overstated their WTP. When the bias parameter is not specified as a function of respondent characteristics, the two different [TABULAR DATA FOR TABLE 2 OMITTED] assumptions about utility, CES (3) and CRRA (4), yield similar estimates of the systematic bias coefficient ([[C.sup.P].sub.intercept]) - about $10. As expected, due to the limited quality of the information available, none of the respondent characteristics used to predict systematic bias in CES (1) and (2) in Table 2 are significant.

To calibrate the data, I attempted to decompose the individual bids separating the portion [TABULAR DATA FOR TABLE 3 OMITTED] of the bid that is related to preferences from the portion that might be attributed to bias. In Table 3, the data are calibrated by calculating the amount of bias for each individual answer [Mathematical Expression Omitted] and subtracting this from the amount the individual actually bid [BID.sub.i] to arrive at the calibrated bid [CALBID.sub.i]. So [Mathematical Expression Omitted] for each individual i in the experiment. The first two rows of Table 3 present the mean and standard deviation of the bids from the actual CV survey and lab experiment data. The other four rows contain the mean and standard deviation of [[C.sup.P].sub.i] and [CALBID.sub.i] from the four different models estimated in Table 2.(12) If [CALBID.sub.i] was less than zero, then it was set equal to zero. (Thus, the mean of the CALBID's in Table 3 will not equal the mean of BID minus the mean of [C.sup.P].) In models (1) and (2) of Table 3, the mean value of the systematic bias term is greater than the actual bids made by many of the participants. The high mean values for [C.sup.P] in models (1) and (2) may result from specifying [C.sup.P] as a function of respondent characteristics, none of which are significant in Table 2 and all of which are positive.

The means of the [CALBID.sup.i]'s range from a low of $2.45 in model (2) to a high of over $10 in models (3) and (4). (In models (3) and (4), [C.sup.P] was not specified as a function of any respondent characteristics.) The actual mean bid from the lab experiment, $8.48, falls within the range of estimated CALBID's.

Calibrating Dichotomous Choice CV Survey Data

The Bishop, Heberlein, and Welsh experiment consisted of both WTP and WTA CV surveys and simulated market (SM) surveys administered by mail. The surveys valued special one-day [TABULAR DATA FOR TABLE 4 OMITTED] deer hunting permits that are distributed free each year to 150 hunters through a lottery held by the Wisconsin Department of Natural Resources. Hunters who had lost the lottery were sent the WTP questions. Half of these hunters were offered a chance to actually purchase a deer hunting permit for a set price. The other half received a similar hypothetical offer. The WTA questions were sent to the lucky hunters who had won permits in the state lottery. These hunters were offered a chance to sell their permits back for a fixed price; again, half received a real offer while half received a hypothetical offer. In all the experiments, the price of the permit was varied over the sample.

In dichotomous choice questions, respondents are presented with a proposed change in a good and then offered the option of either paying or receiving some fixed amount of money to secure the change. The amount of the offer is varied over the sample, and people must simply answer yes or no to the CV question. Using dichotomous choice data, the RUM framework is typically estimated using either a method suggested by Hanemann (1984) or Cameron (see Hanemann and Kanninen, in press p. 6). I chose to use the method outlined in Cameron and James (1987) and Cameron (1988), where the outcome of the choice process is treated as a random variable.(13) However, in this case, the model predicts that an individual will answer yes to a WTP question if the offered amount is less than or equal to the individual's true WTP plus the systematic bias term and an error term. Assuming the response error is independently and normally distributed with mean zero, the resulting likelihood function can be estimated using a maximum hkelihood technique.(14)

[TABULAR DATA FOR TABLE 5 OMITTED]

Bid functions consistent with Cobb-Douglas preferences (Eqn. 7, 8) were used to construct the likelihood function.(15) Considering first the CV WTP results in column 6 of Table 4, the model predicts that the CV data are consistent with true WTP - the bias term C is insignificant. Comparing the CV WTP results with the SM results in column 5 provides support for this conclusion. In column 7, Equation 7 was estimated with a data set combining the CV and SM WTP data. Comparing column 7 with columns 5 and 6, I cannot reject the hypothesis that the data can be jointly estimated at the 5% level using a likelihood ratio test. Not surprisingly, the estimates for the expected value of WTP are quite close: $31 in the SM and $35 in the CV survey (as calculated by the survey authors; see Bishop and Heberlein 1990).

The CV WTA estimates in column 3 of Table 4 suggest that the CV bids overstate true WTA - the systematic bias coefficient is positive and significant. Comparing the CV results with the SM results in column 2, the estimated values of [Alpha] remain comparable, but the bias term and standard error (C and [Sigma]) are much larger in the CV estimation. In this case, I can reject the hypothesis that the data can be jointly estimated at any confidence level using a likelihood ratio test comparing the results from the joint CV/SM data set in column 4 with columns 2 and 3. In line with these results, the expected values of WTA calculated by Bishop and Heberlein for the CV and SM experiments are not nearly as close in value - $153 in the SM and $420 in the CV survey. One disturbing result is the positive and significant systematic bias coefficient in the SM WTA model. The SM experiment offered real cash payments for the participants' deer hunting permits. Thus, one might expect the bids from this experiment to reflect true WTA. In fact, in the fuller specifications that include individual characteristics presented in Table 6, the SM systematic bias coefficient is insignificant, while the CV systematic bias coefficient remains significant.

These preliminary estimates suggest that the CV WTP data do not need to be calibrated for systematic bias, while the CV WTA do. Table 5 lists the variables used to estimate the calibration equations along with their means and standard deviations. The utility function parameter [Alpha] was specified as a function of the respondents' feelings about deer hunting, a measure of the number of substitutes the hunters felt they had for hunting, and the quality of deer hunting at Sandhill. The systematic bias coefficient was modeled as a function of the respondents' feelings about their right to hunt, their reactions to the survey, and education.

[TABULAR DATA FOR TABLE 6 OMITTED]

Table 6 reports the results from re-estimating the WTA models in Table 4, specifying the parameters as functions of the individuals' observed characteristics. The CV and SM data sets were analyzed separately using three different specifications of the CV data and one specification of the SM data. The last two columns of Table 6, CV WTA (3) and SM WTA, have the same specification. These results confirm that the CV bids overstated true WTA ([[C.sup.A].sub.intercept], is positive and significant in CV WTA (3)). However, the model now suggests that the SM data accurately represent true WTA ([[C.sup.A].sub.intercept] is not significantly different from zero in the SM WTA results). When [C.sup.A] is specified as a function of individual characteristics and attitudes in CV WTA (1) and (2), none of the systematic bias coefficient estimates are significant. However, one cannot reject the hypothesis that the systematic bias parameter estimates in CV WTA (2) are jointly significant at the 10% level.(16) As far as the specification of [Alpha], EXQUALSH is significant in all three models, while FEEL 3 and SUBST are significant in models (2) and (3).

Table 7. BHW Data Uncalibrated and Calibrated WTA Responses - Means
and Standard Deviations

Mean

 Variable(a) (SD)

Uncalibrated WTA (N = 68) PRED BID 799.80
 (849.26)
Model (1) [C.sup.A] 191.41
 (55.51)
Calibrated WTA, CV Survey (N = 68) PRED BID 160.32
 (174.02)
Model (2) [C.sup.A] 183.19
 (73.70)
Calibrated WTA, CV Survey (N = 68) PRED BID 190.76
 (211.86)
Model (3) [C.sub.A] 184.14
Calibrated WTA, CV Survey (N = 68) PRED BID 195.23
 (206.79)
Predicted WTA, SM Survey (N = 70) PRED BID 110.94
 (110.42)

a PRED BID was calculated for each individual from the equation

WTA = [e.sup.log(Y)+[(1-[Alpha])/[Alpha]](log([E.sub.high])-log
([E.sub.low]))] - Y,

where [Alpha] was specified as a linear function of an intercept,
FEEL 3, FEEL 8, REACT 11, SUBST, EXQUALSH, and BSTCHNC as in Table
6. PRED BID (predicted WTA) was calculated for each respondent and
the mean is presented in this table. The uncalibrated WTA is based
on coefficients from this model estimated without a systematic bias
parameter (the results are not presented). The calibrated results
are based on the coefficients in Table 6. [C.sup.A] measures the
amount of bias in each individual's response. The value of
[C.sup.A] was calculated for each individual using the parameter
estimates from Table 6, and the mean value of [C.sup.A] for all
individuals is presented here.

Calibrating the responses to the closed-ended questions is slightly more difficult than for open-ended data because I do not have a direct observation of the individual's minimum WTA. Rather, I infer minimum WTA by estimating a WTA function. In this case, the calibrated value for WTA (PRED BID) is calculated from the expression for WTA derived from Cobb-Douglas preferences using the coefficient estimates from Table 6. (See Table 7 for more details.) The PRED BID calculated with the coefficients from Table 6 should represent true WTA because any systematic bias in the CV bids, and thus in my estimates of the utility function parameters, should be captured by the bias parameter.(17)

The means of all the individuals' values for [C.sup.A] and PRED BID based on the results in Table 6 are contained in Table 7.(18) The first row is the predicted value of WTA based on estimates from a model that was identical to the model in columns (1)-(3) in Table 6 except that it contained no systematic bias coefficient. This provides an uncalibrated estimate of WTA. The next three rows contain the bias term [C.sup.A] and the calibrated values of WTA. Finally, the last row contains predicted WTA for the SM data. According to this model, most of the participants in the CV survey overstated their WTA - the mean of the predicted WTA from the SM data of $110.94 is lower than the uncalibrated mean WTA of $799.80. The means of the PRED BID's from the three CV models (which range from $160 to $195) are lower than the uncalibrated WTA but are still higher than the mean of the PRED BID from the SM estimate. Again, the three differem CV models provide the reader with a sense of how PRED BID and [C.sup.A] vary under different specifications.

4. Discussion

CV and SM responses from three data sets, Brookshire and Coursey's WTP for trees and WTP and WTA values from BHW's deer hunting permit data, have been considered in evaluating the calibration model. The calibration model suggests that the Brookshire and Coursey WTP CV bids and the BHW WTA CV bids were biased upwards, while the BHW WTP CV bids were unbiased.

In the context of the literature on revealed preference methods, bids from the BHW experiment for hunting permits should capture recreational use value. In a meta-analysis of 287 benefit estimates, Walsh, Johnson, and McKean (1990) compared the results from travel cost and WTP CV estimates of recreational use value.(19) According to their analysis, CV surveys produce lower values than travel cost models, but dichotomous choice CV values are closer to the travel cost estimates than CV estimates using an open-ended question format. Using data from studies valuing a variety of quasi-public goods, Carson et al. (1996) examine the ratio of WTP CV to revealed preference estimates. They found that across 46 comparisons between CV and simulated market or experimental data, including the WTP data from BHW, there was a close correspondence between the values from the two methodologies.

For the BHW data, my calibration model predicted that neither the CV nor the SM WTP results were biased. Thus, my results confirm the findings of Carson et al. On the other hand, I estimated a substantial upwards bias in the CV WTA results. The difficulty in measuring WTA through CV surveys is well known, and neither meta-analysis included data from WTA studies. The positive and significant bias term for the WTA CV bids confirms the results found in other comparisons of WTA CV and SM data, such as Fisher, McClelland, and Schulze (1988) (see Mansfield, Van Houtven, and Huber [1997] for a discussion of the difficulties in measuring WTA).

Respondents to the Brookshire and Coursey study were drawn from the neighborhood surrounding the park, so their bids should reflect both recreational use and aesthetic value for the extra trees. The results from the calibration model predict that the WTP bids were biased upwards. The study was not included in the meta-analysis on quasi-public goods performed by Carson et al. (1996); however, the authors note that "some CV estimates clearly exceed their revealed preference counterparts, therefore one should not conclude that CV estimates are always smaller than revealed preference estimates" (p. 93). Since the bids from the simulated market experiment were lower than the bids from the CV survey, it suggests that the calibration model may have correctly identified the upwards bias in the CV bids.

For this study, I deliberately chose data sets that included both CV and SM components in order to provide a benchmark against which to judge the calibrated CV results. Two issues related to this decision should be emphasized. The first issue is that the SM bids themselves may not accurately measure true WTP or WTA. The Brookshire and Coursey SM experiments were conducted at the local high school, and it is unclear how the participants interpreted the exercise. For example, it is possible that the SM bids understated WTP if the respondents did not believe that the money was actually going to be used to purchase additional trees. Even the BHW SM experiment was probably considered unusual, especially by the WTA respondents, since the yearly lottery for permits had never before included opportunities to buy or sell the permits. Thus, a comparison between the SM bids and the calibrated CV bids is a test of convergent validity.(20)

The second issue relates to the application of this calibration method to CV surveys that lack comparable simulated market or other data against which to compare the calibration results, especially surveys that measure primarily nonuse or existence values. The strength of this calibration technique is that it does not require revealed preference or experimental data to estimate bias and calibrate CV bids. However, before the calibration technique can be applied in situations where benchmark data do not exist, it must first be tested using data sets for which benchmark data do exist, as in this paper. Tests such as these are important for establishing the reliability of the technique and for addressing issues such as appropriate functional form assumptions and other specification issues.

Applying the lessons learned from these tests to other CV data sets will require further assumptions about the way in which people respond to various types of CV surveys. As with benefit transfer, the more similarities that exist between the test data sets and the CV data that need to be calibrated, the more confidence one might have in the results.

Most, but not all, of the CV data sets for which complementary benchmark data exist measure use value rather than nonuse value. In order to apply the results from use value data to CV surveys measuring nonuse value, assumptions must be made about the relationship between use and nonuse values and the way people respond to the two types of CV questions. Unfortunately, there is currently little agreement about nonuse values, how they are formed, and their relationship to use values. Thus, while this calibration method can be applied to any CV data set, proper specification may be more difficult when a significant portion of the value is nonuse.

5. Conclusions and Future Research

Despite the controversy surrounding CV surveys, they are often employed to estimate the benefits of nonmarket environmental goods. The results from CV surveys will vary in quality depending on the circumstances of the survey implementation, including the expertise of the analysts and the budget for the survey. The development of a calibration technique for CV data would provide a measure of the reliability of the data and the ability to adjust biased results.

The model proposed in this paper provides the basis for a simple and inexpensive way of isolating bias and calibrating the responses from a CV survey. The method does not require additional data beyond the CV survey itself, allowing the calibration of both use and nonuse data. Furthermore, whether an individual under- or overstates his bid in a CV survey is related to the individual's characteristics and his or her reaction to the format of the survey. This calibration method allows me to separate out the effect of individual characteristics on systematic bias from the effect of these characteristics on the parameters of the utility function.

The challenge in developing techniques for calibrating CV data is finding a benchmark against which to judge the results. To test this calibration method, I used data sets for which laboratory or SM benchmarks existed. This analysis suggests that only the BHW WTP CV data produced unbiased values. In contrast, the calibration model predicts that the responses from the other two CV surveys tested overstated true WTP and WTA. For these data sets, the results of the calibration model are encouraging - the results from the calibration model corroborated the general pattern observed from comparing the CV data with laboratory or SM data.

Further tests of this calibration model need not rely on data sets that include SM or experimental components. The calibration model could be tested using data from CV surveys that measured the value of the same good with several different question formats. For example, suppose one had data from two CV surveys measuring the value of the same good, an open-ended and a dichotomous choice survey. Loosely speaking, if the mean open-ended WTP was lower than the mean predicted WTP from the dichotomous choice data, then the systematic bias parameter from the calibration model should also be smaller for the open-ended data. Studies such as this could help establish the reliability of the calibration model.

As discussed above, the power of the calibration model could be improved by a better understanding of how individuals answer CV questions, including the traits or attitudes that inspire individuals to give more or less accurate answers and variables that measure these traits or attitudes. This is especially important for cases in which benchmark data do not exist. Future research might use verbal protocols or other debriefing techniques to develop more accurate models of response behavior.

Finally, the choice of functional form is an important element of the calibration model. To facilitate the estimation of more flexible functional forms, future studies might also want to include more variation in the bid space and in the attributes of the environmental commodity.

I would like to thank William Evans for all his help. I am also grateful to Maureen Cropper, John Horowitz, Glenn Harrison, Randall Kramer, Kerry Smith, and two anonymous referees for their many useful suggestions, and David Brookshire, Donald Coursey, Richard Bishop, Thomas Heberlein, and Michael Welsh for supplying me with their data. Any remaining errors are my own.

1 See Federal Register, vol. 59, no. 5 (January 7, 1994), p. 1146.

2 For example, see Blackburn, Harrison, and Rutstrom (1994), Cameron (1992), and Eom and Smith (1994). One exception is work by Schulze, MeClelland, and Lazo (1994), who propose transforming the bids from open-ended CV surveys with a Box-Cox specification until they fit a normal distribution.

3 Exceptions include Hanemann (1984), Hoehn (1991), and Hoehn and Loomis (1993). See also Hanemann and Kanninen (in press) for a description of more general specifications.

4 McConnell (1990) develops the variation function as a change in the expenditure functions.

5 Hoehn and Randall (1987) offer a model that predicts people will understate WTP and overstate WTA when they lack time to think or a clear definition of the commodity. Crocker and Shogren (1991) outline a model in which the commodity is well defined but unfamiliar. Even with adequate time to think about the question, the respondents need to invest in learning about the unfamiliar good and thus will systematically overstate WTP.

6 For a discussion of issues such as strategic behavior and free riding in CV surveys, see, for example, the Winter 1994 issue of the Natural Resources Journal.

7 No significant difference between the bids from the two experiments was found. In the modified Smith auction, respondents were told the total number of households who would be asked to contribute and the total cost of the new trees. Three possible outcomes were explained to each respondent. First, if the sum of the payments was less than the cost, then the households paid nothing and no additional trees would be planted. Second, if the sum of the payments equaled the cost, then each household paid the amount they bid and the trees would be planted. Finally, if the sum of the payments exceeded the cost, then each household would pay a fraction of what they bid so that payments equaled the cost of the new trees.

8 During the lab experiment, the participants were divided into groups and each individual was asked to write down his or her WTP If the sum of the group's WTP was greater than the cost of the additional trees, the participants were required to pay the amount they bid. In the actual experiment, payment was collected from only one of the groups the other groups did not collectively bid enough to cover the cost of the extra trees and the experiment ended after five trials, as per the instructions.

9 Unfortunately the lab experiment data set was too small to estimate the calibration model. To test the accuracy of the lab bids, I estimated Equation 5 with [C.sup.P] specified as the function of an intercept term and a dummy variable for participation in the lab experiment using a data set combining the survey and lab experiment data. The results suggest that the lab experiment bids were not subject to systematic bias.

10 For example, in Equation 5 [Rho] = [[Rho].sub.intercept] + [[Rho].sub.hhsize] * HHSIZE + [[Rho].sub.gradsch] * GRADSCH + [[Rho].sub.school] * SCHOOL + [[Rho].sub.parkview] * PARKVIEW + [[Rho].sub.fincollege] * FINCOLLEGE. Note that the parameters [Lambda] and [Theta] have different interpretations than [Alpha] and [Rho], so the coefficients on these parameters are not comparable.

11 Specifications of models (1) and (2) including [[Alpha].sub.fincollege] were rejected due to multicollinearity problems.

12 The values of [C.sup.P] and CALBID were calculated using individual characteristics and the coefficient estimates in Table 2. The means of the individual estimates are presented in Table 3 with their standard deviations.

13 See McConnell (1990) for a discussion of assumptions about the scale factor and the conditions under which Cameron and James' model is identical to Hanemann's.

14 Specifically, the likelihood function was estimated using a Newton-Raphson maximization technique and the covariance matrix was estimated using the procedure of Bernadt, Hall, Hall and Hausman. See Greene (1993) or Judge et al. (1980).

15 The equations used in the likelihood function are highly nonlinear in the parameters to be estimated, and the CES and CRRA equations did not converge. To facilitate the estimation of more complex models, future studies could be designed with wider variation in the bid space and the attribute space for the environmental commodity. The measure of income used in the estimation also deserves more attention both in this calibration model and in other demand models. Because willingness to pay is small in relation to total income, the statistical estimation process could be improved if the analysis was based on some fraction of income. For example, one could raplace income with the household budget for discretionary spending. Of course, determining such a budget is not a trivial issue.

16 The test was made using a likelihood ratio test comparing CV WTA (2) and CV WTA (3). The chi-square test statistic was 11.7 with 5 degrees of freedom.

17 Alternatively, the bids could be calibrated by first estimating Equation 8 without a systematic bias term and using these estimates to calculate an uncalibrated predicted WTA. Then the predicted WTA bid could be calibrated by estimating [C.sup.A] in a separate equation and subtracting it from the uncalibrated WTA. The same choice actually exists for calibrating open-ended data. One could either subtract estimated bias from the actual CV bid, as I did, or use the values of [Alpha] and p from Table 2 to calculate a predicated WTP.

18 The values of [C.sup.A] and PRED BID are the means of all the individual values. The individual values were calculated using the coefficient estimates in Table 6 and the individual's characteristics. [C.sup.A] and [Alpha] are linear combinations of the variables in Table 6.

19 The analysis included variables to account for issues such as whether the travel cost study included travel time, very general measures of site quality, the type of CV question, and other characteristics of the study.

20 Mitchell and Carson (1989) define convergent validity as "the correspondence between a measure and other measures of the same theoretical construct . . . . In convergent validity neither of the measures is assumed to be a truer measure of the construct than the other" (p. 204).

References

Andreoni, J. 1995. Cooperation in public goods experiments: Kindness and confusion. The American Economic Review 85:891-904.

Bishop, R. C., and T. A. Heberlein. 1990. The contingent valuation method. In Economic valuation of natural resources: Issues, theory, and applications, edited by R. L. Johnson and G. V. Johnson. Boulder, CO: Westview Press, pp. 81-104.

Blackburn, M., G. W. Harrison, and E. E. Rutstrom. 1994. Statistical bias functions and informative hypothetical surveys. American Journal of Agricultural Economics 76:1084-88.

Bohm, P. 1984. Revealing demand for an actual public good. Journal of Public Economics 24:135-51.

Brookshire, D. S., and D. L. Coursey. 1987. Measuring the value of a public good: An empirical comparison of elicitation procedures. American Economic Review 77:554-65.

Cameron, T. A. 1988. A new paradigm for valuing non-market goods using referendum data. Journal of Environmental Economics and Management 15:355-79.

Cameron, T. A. 1992. Combining contingent valuation and travel cost data for the valuation of nonmarket goods. Land Economics 68:302-17.

Cameron, T. A., and M. D. James. 1987. Efficient estimation methods for 'closed-ended' contingent valuation surveys. The Review of Economics and Statistics 69:269-76.

Carson, R. T., N. E. Flores, K. M. Martin, and J. L. Wright. 1996. Contingent valuation and revealed preference methodologies: Comparing the estimates for quasi-public goods. Land Economics 72:80-99.

Crocker, T. D., and J. F. Shogren. 1991. Preference learning and contingent valuation methods. In Environmental policy and the economy, edited by F. J. Dietz, E van der Ploeg, and J. van der Straaten. Amsterdam: North-Holland, pp. 77-94.

Deaton, A. 1986. Demand analysis. In Handbook of econometrics 3, edited by Z. Griliches and M.D. Intriligator. Amsterdam: North-Holland, pp. 1767-839.

Eom, S. Y., and V. K. Smith. 1994. Calibrated nonmarket valuation. Unpublished paper, North Carolina State University.

Fisher, A., G. H. McClelland, and W. D. Schulze. 1988. Measures of willingness to pay versus willingness to accept: Evidence, explanations and potential reconciliation. In Amenity resource valuation: Integrating economics with other disciplines, edited by G. L. Peterson, B. L. Driver, and R. Gregory. State College, PA: Venture, pp. 127-34.

Greene, W. H. 1993. Econometric analysis. 2nd edition. New York: Macmillan.

Hanemann, W. M. 1984. Welfare evaluations in contingent valuation experiments with discrete responses. American Journal of Agricultural Economics 66:332-41.

Hanemann, W. M., and B. Kanninen. 1998. The statistical analysis of discrete response CV data. In Valuing environmental preferences: Theory and practice of the contingent valuation method in the us, ec and developing countries, edited by I. J. Bateman and K. G. Willis. Oxford: Oxford University Press. In press.

Herriges, J. A., and J. F. Shogren. 1996. Starting point bias in dichotomous choice valuation with follow-up questioning. Journal of Environmental Economics and Management 30:112-31.

Hoehn, J. 1991. Valuing the multidimensional impacts of environmental policy: Theory and methods. American Journal of Agricultural Economics 73:289-99.

Hoehn, J., and J. Loomis. 1993. Substitution effects in the valuation of multiple environmental programs. Journal of Environmental Economics and Management 25:56-75.

Hoehn, J., and A. Randall. 1987. A satisfactory benefit cost indicator from contingent valuation. Journal of Environmental Economics and Management 14:226-47.

Horowitz, J. 1993. A new model of contingent valuation. American Journal of Agricultural Economics 75:1268-72.

Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T. C. Lee. 1985. The theory and practice of econometrics. New York: John Wiley.

Krosnick, J. A. 1991. Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology 5:213-36.

Mansfield, C. A. 1998. Despairing over disparities: Explaining the difference between willingness-to-pay and willingness-to-accept. Environment and Resource Economics. In press.

Mansfield, C. A., G. Van Houtven, and J. Huber. 1997. Guilt by association: Compensation and the bribery effect. Unpublished paper, Duke University.

McConnell, K. E. 1990. Models for referendum data: The structure of discrete choice models for contingent valuation. Journal of Environmental Economics and Management 18:19-35.

Mitchell, R., and R. Carson. 1989. Using surveys to value public goods: The contingent valuation method. Washington, DC: Resources for the Future.

Schulze, W., G. McClelland, and J. Lazo. 1994. Methodological issues in using contingent valuation to measure non-use values. Paper prepared for DOE/EPA Workshop, May 19-20, 1994, Herdon, VA.

Wallace, T. D. 1972. Weaker criteria and tests for linear restrictions in regression. Econometrica 40:689-98.

Wallace, T. D., and C. E. Toro-Vizcarrondo. 1969. Tables for the mean squared error test for exact linear restrictions in regression. American Statistical Association Journal 64:1649-63.

Walsh, R. G., D. M. Johnson, and J. R. McKean. 1990. Nonmarket values from two decades of research on recreation demand. In Advances in applied micro-economics 5, edited by V. K. Smith. Greenwich, CT: JAI Press, pp. 167-93.

Welsh, M. 1986. Exploring the accuracy of the contingent valuation method: Comparisons with simulated Markets. Ph.D. thesis, Department of Agricultural Economics, The University of Wisconsin-Madison, Madison, Wisconsin.