A hazard function approach to modeling consumer search.
Choong, Peggy
ABSTRACT
This study recognizes the stochastic element in the consumer search
process and develops a stochastic model of search termination that
incorporates the effects of time elapsed since commencing search,
individual and product characteristics and unobserved heterogeneity.
Results indicate substantial duration dependence. Perceived
benefits of search, size of the evoked set and the quality of past
experience are found to be important determinants of the hazard
function. This study highlights the importance of accounting for
unobserved heterogeneity and the sensitivity of the parameter estimates
to the specification of its distribution.
INTRODUCTION
What causes a customer to terminate his search process and
purchase? This is a pervasive question faced by numerous marketing
managers. Many studies have documented the effects of consumer
characteristics on the extent of information search for durable products
as well as patterns of search across information sources (Punj &
Staelin, 1983; Kiel & Layton, 1981; Furse, Punj & Stewart, 1984;
Beatty & Smith, 1987; Srinivasan & Ratchford, 1991; Putsis &
Srinivasan, 1994). However, to date, there has been little significant
work documenting the termination of consumer search and final purchase.
In addition, contemporary marketing literature on the extent of search
does not explicitly model the fact that the duration of search is a
stochastic process. Thus, a consumer may terminate search early if
he/she got lucky and chanced on a good deal early in the search process
or he/she may be unlucky and not obtain an acceptable offer until late
in the search.
A complete model of search behavior would account for this
stochastic element. Another shortcoming of the literature on search is
that it fails to model the possible effects of unobserved heterogeneity.
The most common method of accounting for observed heterogeneity is to
include consumer, retailer and product characteristics in the model, and
to estimate how the measured extent of search varies with these
variables. However, given the difficulty of determining and measuring
these characteristics, there are likely to be many variables that affect
search that are unmeasured. A complete model of search behavior would
explicitly model the effect of this unobserved heterogeneity, and
failure to do so may contaminate the estimates of the included variables
(Heckman & Singer, 1984).
This study focuses on the rate of search termination and its
determinants, using a stochastic model of search within the framework of
a conditional hazard function. The probability of search termination is
modeled as a function of the duration of search, measured consumer
characteristics and unobserved factors.
HAZARD FUNCTION
Hazard function models have been used extensively in economics and
statistics literature especially in the areas of research on job search,
employment and unemployment (Jones, 1988; Lancaster, 1985; Flinn &
Heckman, 1982). It has been used in a study on inter-purchase timing in
marketing (Jain & Vilcassim, 1991) but never in the area of search.
Since the hazard function can be thought of as the rate at which an
event occurs, its application in this area of search termination is very
appropriate.
For the purpose of this study, let the random variable T be the
time a consumer spends searching for external information before
purchasing an automobile. Duration T spans between the interval [0,8).
The hazard function [lambda](t) can therefore, be defined as:
(1.) [lambda](t) = [lim.sub.[DELTA]t[right arrow]0] Pr(t < T
< t + [DELTA]t|T > t)/[DELTA]t
Equation 1 indicates that the hazard function simply specifies the
instantaneous rate of search terminating at time t, given that the
consumer is still searching at t. In other words, conditional on the
consumer not having purchased, the hazard function measures the
likelihood of search ending at time t.
The hazard function is a convenient method of organizing, testing
and interpreting data in cases where conditional probabilities are
theoretically or intuitively appealing. The basic requirements of the
hazard function are non-negativity and finiteness. This makes it less
stringent than the requirements of probability distributions, which are
required not only to be non-negative but also to sum or integrate to
unity.
Since there are likely to be individual differences in the rate of
terminating search, how individual characteristics enter into the hazard
model needs to be specified. Also, since identifying all relevant
characteristics is difficult, if not impossible, unobserved or
unmeasurable heterogeneity needs to be taken into account. Accordingly,
the conditional hazard, conditioned on a vector of consumer
characteristics X, and on unmeasured heterogeneity [theta] is specified
as (Flinn & Heckman, 1982; Heckman & Singer, 1984):
(2.) [lambda] (t|X, [theta]) = [[lambda].sub.0](t)
[phi](X,[beta])[psi]([theta]),
where [[lambda].sub.o] is the baseline hazard corresponding to
[phi] = [psi] = 1; [beta] is a vector of parameters corresponding to the
consumer characteristics X; [theta] is unobserved heterogeneity. The
observed and unobserved heterogeneity act multiplicatively on the hazard
function and in effect serve to shift the hazard from its baseline.
For the specification of the measure of covariates, the commonly
used form is adopted:
(3.) [phi](X,[beta]) = exp[X[beta]]
Since the expression exp(.) is always positive, the hazard function
is automatically non-negative and finite for all X and [beta]. Following
Heckman and Singer (1984) the unobserved heterogeneity shall be
specified as follows:
(4.) [psi]([theta]) = exp(c[theta])
where [theta] is the individual heterogeneity that remains constant
within each spell and c is the associated coefficient.
Finally, the baseline hazard is parameterized in as general a form
as possible. To this end, the Box-Cox formulation is adopted because the
most commonly used probability distributions are nested within this
general form (Cox, 1972).
(5.) [[lambda].sub.o](t) = exp[[[gamma].sub.o] + [J.summation over
(j=1)] [[gamma].sub.j] ([T.sup.[epsilon]j] - 1)/[[epsilon].sub.j]]
The baseline hazard captures the time elapsed since embarking on
search and T is the duration of search. Here again, the expression
exp(.) ensures the non-negativity of the baseline hazard and, hence, the
hazard. Two commonly used distributions in studies on duration, namely
the Weibull and Erlang-2, are used in this study. These are nested in
the Box-Cox formulation and statistical tests can, therefore, be
performed to test their suitability. Table 1 illustrates restrictions on
the parameters in Equation (5.) and the resulting probability
distribution.
ESTIMATION
Defining Y= ([[gamma].sub.0], [[gamma].sub.1], [beta], c), the
method of maximum likelihood is used to estimate Y. The likelihood
function of Y for individual i on [theta] is, therefore, given by:
(6.) [L.sub.i](Y|[theta]) = [f([t.sub.i]|[theta])]
Substituting for f(t|[theta]) into the above equation and assuming
the covariates remain constant during the search, we obtain the
following conditional likelihood function:
(7.) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
By integrating over the distribution of [theta], the nuisance term
u is eliminated. Therefore, the unconditional likelihood function
[L.sub.i](Y) is given by:
(8.) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
Parameter estimates are obtained by maximizing the likelihood
function across all N individuals in the sample:
(9.) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
The specification of G([theta]) is approached in two ways. First,
following past research a standard normal distribution is adopted
(Massy, Montgomery & Morrison, 1970). Second, a non-parametric
approach is used in which the structural parameters and the distribution
of the unobserved covariates of the model are jointly estimated. Results
for the two approaches will be compared.
Estimates of the parameters are obtained through an iterative maximum likelihood procedure using the CTM (Continuous Time Model)
program developed by George Yates at the National Opinion Research
Institute (NORI), Chicago, Illinois. Non-parametric estimation of the
unobserved heterogeneity requires the joint estimation of Y and the
cluster points [[theta].sub.1]..... [[theta].sub.s]. . Estimates
obtained through the iterative likelihood procedure are consistent
(Amemiya, 1985). To ensure that global optimality is obtained, the
iterative program is applied to different sets of starting values. When
the final estimates are nearly identical, global optimality is concluded
to have been achieved (Yi, Honore & Walker, 1987).
DESCRIPTION OF COVARIATES
Search is defined as the effort directed toward the acquisition of
marketer and non-marketer dominated information from the external
environment. It begins when need triggers the serious consideration of a
purchase and ends with the actual purchase transaction (Beatty &
Smith, 1987; Srinivasan & Ratchford, 1991). The hazard rate is
modeled as the dependent variable. In essence, the hazard can be thought
of as the rate of terminating search and is inversely related to
duration.
Based on a general knowledge of the automobile market and from
current marketing literature on search behavior, motivational
determinants of search that would influence the distribution of the
hazard function are identified. The variables included in the study are
consumer, product and demographic factors. Variable names are included
in parenthesis.
1) Amount of Experience (AMOUNT, +) is defined as the number of new
automobiles purchased in the last ten years. Consumers who have
experience in buying cars are likely to develop simplifying procedures
that reduce the amount of time required to reach a decision (Alba &
Hutchinson, 1987; Johnson & Russo, 1984; Furse, Punj & Stewart,
1984). By being more efficient, the amount of time spent by the consumer
in searching for external information is reduced. Duration of search is,
therefore, reduced, and it is posited that the amount of experience is
positively related to the hazard function.
(2.) Perceived Risk (RISK, -) is a measure of the consumer's
belief of the chance of incurring a physical, financial, performance and
convenience loss (Peter & Ryan, 1976; Peter & Tarpey, 1975;
Srinivasan & Ratchford, 1991). The higher a consumer's
perceived risk of making a wrong choice, the greater is the duration of
search. Hence, perceived risk is negatively related to the hazard
function.
(3.) Evoked Set (EVOKE, -) is the number of models included in the
individual's consideration set. A larger set would require more
extensive information search as opposed to a smaller set, thereby
leading to an extended duration of search. Evoked set is, therefore,
hypothesized to be negatively related to the hazard function.
(4.) Perceived Benefits of search (BENFT, -) is a measure of a
consumer's perception of potential gains from search. For example,
the consumer may benefit in the form of obtaining a better price or a
more satisfactory model. A greater perception of the benefits of search
would drive the consumer toward more extensive search.
(5.) Interest (INTRST, -) in the product class would result in more
time spent collecting information (Maheswaran & Sternthal, 1990).
The rate of terminating search is, thereby reduced. Interest is,
therefore, hypothesized to be negatively related to the hazard function.
(6.) Knowledge (KNO, +) is the knowledge and understanding that an
individual has of a product within a particular product class. It
enables the person to process information more efficiently by excluding
irrelevant information (Bettman & Park, 1980; Johnson & Russo,
1984; Beatty & Smith, 1987; Urbany, Dickson & Wilkie, 1989;
Brucks & Schurr, 1990). Duration of search is thereby shortened, and
product class knowledge is, therefore, posited to be positively related
to the hazard function.
(7.) Positive experience (EXPER, +) with the product reflects the
quality of past experience with the previous car and the dealer or
manufacturer. A positive experience builds feelings of trust and
confidence toward the manufacturer and/or dealer and impacts positively
on decision making in that product category. Positive experience
manifests itself in simplified decision processes often based on
simplistic rules (such as purchasing the same brand of car or buying
from the regular dealer). This is similar to what Bettman and Zins
(1977) refer to as "preprocessed choice." Therefore, we expect
greater amounts of positive experience to be accompanied by shorter
durations of search. In other words, positive experience is positively
related to the hazard function.
(8.) Price (PRICE, -). This is defined as the net price after
taxes. Consumers tend to spend a long time searching for items of higher
value (Kiel & Layton, 1981). The higher the price of the automobile,
the more extended the duration of search. Price is posited to be
negatively related to the hazard function.
(9.) Discount (DISCOUNT, +). This is the combined total
manufacturer and dealer discounts. Discounts act as incentives to
purchase. Larger discounts would encourage consumers to terminate search
and complete the purchase transaction. Therefore, large discounts are
associated with higher rates of terminating search, and the covariate is
hypothesized to be positively related to the hazard function.
(10.) Age (AGE, +) reflects the lifestage of an individual. Hempel
(1969) and Srinivasan and Ratchford (1991) have shown that older
individuals tend to engage in less search. In other words, their
duration of search is smaller. Hence, age is hypothesized to be
positively related to the hazard function.
(11.) Education (EDU, -) is used as a proxy measure of an
individual's ability to collect, process and use external
information (Newman & Staelin, 1972; Ratchford & Srinivasan
1993). More educated consumers tend to engage in extended search,
thereby, leading to higher durations of search. Education is, therefore,
negatively related to the hazard function.
While all attempts have been made to adequately measure and include
variables that might account for heterogeneity, it is expected that
there remain some factors which are unaccounted for or unmeasurable. The
heterogeneity factor, c, captures these unexplained effects and leaves
the estimated parameters unbiased.
DATA
The data set used in this study is a subset of a data set obtained
through a mail survey of people who registered new cars in a
northeastern SMSA. The questionnaires elicited response from the person
mainly responsible for buying the new car. After eliminating all cases
with any missing data, 1024 usable cases remained representing a
response rate of 46%. These were employed in the analysis.
The measure of time spent searching in this data set is the sum of
self-reported time spent in the search process on the following
categories: talking to friends/relatives, reading books/magazine
articles, reading/listening to ads, reading about car ratings in
magazines, reading automobile brochures/pamphlets, driving to/from
dealers, looking around showrooms, talking to salespersons, test driving
cars.
DISCUSSION
Equation (9.) is estimated using the iterative maximum likelihood
procedure. While several commonly used distributions for the hazard
function were estimated, the Weibull hazard gave the best results.
Results for this model are displayed in Table 2. This table reports
results for three different specifications of the unobserved
heterogeneity factor, namely a specification that does not account for
unobserved heterogeneity, another that assumes standard normality, and
finally a non-parametric specification that represents heterogeneity in
terms of a discrete distribution of mass points.
In estimating the non-parametric specification, the end points of
the interval over which the support points are estimated are fixed at 0
and 1, and other support points between these are determined in
estimation. Also, the probability mass associated with each point is
estimated. In estimation, support points are added one at a time until
two points become clustered at approximately the same location. In the
analysis, five support points are required to adequately estimate the
underlying probability distribution. The estimated support points are 0,
0.33, 0.55, 0.75 and 1.00 with associated probabilities of 0.0264,
0.1392, 0.3194, 0.3924 and 0.1226 respectively.
Effects of Time on Search Termination
Looking across the columns in Table 2, the duration term
(coefficient of lnt) is seen to be significant at the 0.001 level. It
takes on the values of 0.25, 0.54 and 3.36 under the no heterogeneity,
standard normal and non-parametric specifications respectively.
The hazard is positively related to lnt in Table 2, implying that
the longer the time elapsed while searching, the greater is the
likelihood of terminating search. For the non-parametric heterogeneity
case, the estimated coefficients of lnt exceeds one, implying that the
second derivative of the hazard with respect to time is positive, which
means that the hazard increases at an increasing rate.
Effects of Covariates on Search Termination
The covariates with the strongest effects on the duration of search
are perceived benefits of search and size of the evoked set, both of
which tend to lower the hazard and lengthen the search. As expected,
both amount and type of experience are associated with an increased
hazard, and hence a shorter duration of search. The effect of the
covariate interest is significant at the 0.001 level. Interest in a
certain product class encourages more external search for information.
Past research indicate that knowledgeable consumers experience pleasure
in collecting and processing information (Maheswaran &
Sternthal,1990). The results are similar to Srinivasan and Ratchford
(1991) who have shown that the interest a consumer has for a certain
product class is a major motivator of search. It follows, then, that
greater interest leads to a lower probability of terminating search or
lower hazard values.
While these results are in general agreement with past studies of
search effort for automobiles, our study has the advantage of
controlling for changes in the hazard through time, and for unmeasured
heterogeneity. The estimated effect of several of the covariates changes
considerably when heterogeneity is taken into account, indicating that
it is important to control for unmeasured heterogeneity when studying
the determinants of search.
The heterogeneity factor, c, shown in Table 2 is significant at the
0.001 level in both the standard normal and non-parametric
specifications, thereby rejecting the null hypothesis of no
heterogeneity. This implies that unobserved heterogeneity has a positive
impact on the hazard function and if unaccounted for will contaminate
the parameter estimates (Heckman & Singer, 1984). While the more
flexible non-parametric model yields a higher log likelihood than the
model with normal heterogeneity, the two models are not nested, and no
formal significance test for their difference was run.
CONCLUSION
This study attempted to model the stochastic nature of search. One
of its contributions is to provide a framework within which three
distinct effects on the hazard function can be examined. They are the
effects of time, the influence of observed product and consumer
motivational factors and the significance of unobserved or unmeasured
heterogeneity.
The results show significant amounts of duration dependence and
point to duration as a major determinant of the rate of terminating
search. The effects of time elapsed since commencing search is biased
when unobserved heterogeneity is not taken into account.
Another important finding relates to the magnitude and nature of
the unobserved heterogeneity. This component is found to be highly
significant and exerts substantial impact on the parameter estimates.
The results highlight the importance of accounting for unobserved
heterogeneity and the sensitivity of parameter estimates to the
specification of its distribution. Problems associated with the
assumption of standard normality for the unobserved heterogeneity are
also presented.
Covariates that exert the largest impact on the hazard function are
found to be the perceived benefits of search and the size of the evoked
set. Price and amount of discount are also found to be significant. Of
the demographic characteristics, age is found to be positively related
to the hazard while education is not significant.
Due to the nature of the data, this study restricts itself to a
single spell. Assuming that consumers build up an inventory of knowledge
and experiences, which impact on future actions and choices, it would be
interesting to build and estimate a model incorporating several spells.
REFERENCES
Alba, J. B. & Hutchinson J. W. (1987). Dimensions of consumer
expertise. Journal of Consumer Research, 13, 411-454.
Amemiya, T. (1985). Advanced econometrics. Cambridge, MA: Harvard
University Press.
Bayus, B.L. (1991). The consumer durable replacement buyer. Journal
of Marketing, 55, 42-51.
Beatty, S. E. & Smith S.M. (1987). External search effort: An
investigation across several product categories. Journal of Consumer
Research, 14, 83-95.
Bettman, J. & Park C. W. (1980). Effects of prior knowledge and
experience and purchase of the choice process on consumer decision
processes: A protocol analysis. Journal of Consumer Research, 7,
234-248.
Bettman, J. & Zins M. (1977). Constructive processes in
consumer choice. Journal of Consumer Research, 4, 75-85.
Brucks, M. & Schurr P. (1990). The effects of bargainable
attributes and attribute range knowledge on consumer choice processes.
Journal of Consumer Research, 4, 409-419.
Bucklin L.P. (1969). Consumer search role: Enactment and market
efficiency. Journal of Business, 42, 416-438.
Cox D.R. (1972). Regression models and life-tables. Journal of
Royal Statistical Society, 34, 187-200.
Flinn, C. & Heckman J. (1982). Models for the analysis of labor
force dynamics. Advances in Econometrics, 1, 35-95.
Furse, D. H., Punj G. N. & Stewart D. W. (1984). A typology of
individual search strategies among purchasers of new automobiles.
Journal of Consumer Research, 10, 417-427.
Heckman, J. & Singer B. (1984).A method for minimizing the
impact of distributional assumptions in econometric models for duration
data. Econometrica, 52, 271-320.
Jain, D. C. & Vilcassim N. J. (1991). Investigating household
purchase timing decisions: A conditional hazard function approach.
Marketing Science, 10, 1-13.
Johnson, E. C. & Russo J. E. (1984). Product familiarity and
learning new information. Journal of Consumer Research, 11(June),
542-550.
Jones, S. (1988). The relationship between unemployment spells and
reservation wages as a test of search theory. The Quarterly Journal of
Economics, 743-765.
Kiel, G. C. & Layton R. A. (1981). Dimensions of consumer
information seeking. Journal of Consumer Research, 8, 233-239.
Lancaster, T. (1985). Simultaneous equations models in applied
search theory. Journal of Econometrics, 28, 113-126.
Lancaster, T. (1990). The Econometric Analysis of Transition Data.
New York: Cambridge University Press.
Maheswaran, D. & Sternthal, B. (1990). The effects of
knowledge, motivation and type of message on ad processing. Journal of
Consumer Research, 17(1), 66-73.
Marmorstein, H., Grewal D. & Fishe R. (1992). The value of time
spent in price-comparison shopping: Survey and experimental evidence.
Journal of Consumer Research, 19, 52-61.
Massey W. F., Montgomery D. G. & Morrison D.G. (1970).
Stochastic models of buying behavior. Cambridge, MA.: MIT Press.
Newman, J. W. & Staelin R. (1972). Prepurchase information
seeking for new cars and major household appliances. Journal of
Marketing Research, 9, 249-257.
Peter, P. J. & Ryan M. J. (1976). An investigation of perceived
risk at the brand level. Journal of Marketing Research, 13, 186-188.
Peter, P.J & Tarpey, Sr., L. X. (1975). Comparative analysis of
three consumer decision strategies. Journal of Consumer Research, 2,
29-37.
Punj, G.N. & Staelin R. (1983) A model of consumer search
behavior for new automobiles. Journal of Consumer Research, 9, 366-380.
Putsis, W. & Srinivasan, N. (1994). Buying or just browsing?
The duration of purchase deliberation. Journal of Marketing Research,
31, 393-402.
Ratchford, B. T. & Srinivasan, N. (1993). An empirical
investigation of returns to search. Marketing Science, 12, 73-87.
Srinivasan, N. & Ratchford, B. T. (1991). An empirical test of
a model of external search for automobiles. Journal of Consumer
Research, 18, 233-241.
Urbany, J., Dickson P. & Wilkie W.(1989).Buyer uncertainty and
information search. Journal of Consumer Research, 16, 208-215.
Vilcassim, N. J. & Jain D. C. (1991). Modeling purchase-timing
and brand-switching behavior incorporating explanatory variables and
unobserved heterogeneity. Journal of Marketing Research, 28, 29-41.
Yi K.M., Honore B. & Walker J.(1987). Program for the
estimation and testing of continuous time multi-state multi-spell
models, user's manual, program version 50. Chicago, Ill: National
Opinion Research Center.
Peggy Choong, Niagara University
Table 1: Restrictions on Parameters in Equation (5) and the Resulting
Probability Distribution
Corresponding
Probability
Restrictions Baseline Hazard Distribution
1. [[gamma].sub.k] = 0 exp([[gamma].sub.0]) = Exponential
[k.sup.3]1 constant
2. [[member of].sub.1] = 0; exp([[gamma].sub.0] + Weibull
[[member of].sub.k] = 0 [[gamma].sub.1] lnt)
[k.sup.3]
3. [[member of].sub.1] = 1; exp[([[gamma].sub.0] - Approximately
[R] 0; [[gamma].sub.1] - Erlang-2
[[member of].sub.3] = 2 [[gamma].sub.1.sup.2]/2)
[[gamma].sub.1] < 0; + lnt +
[[gamma].sub.2] = 1 ([[gamma].sub.1.sup.2] /
[[gamma].sub.3] = 2) [t.sup.2]]
[[gamma].sub.1.sup.2];
[[gamma].sub.k] = 0
[k.sup.3] 4
Table 2: Parameter Estimates
(a) No (b) Standard (c) Non-
Variables Heterogeneity Normal Parametric
Intercept 4.570 (++++) 22.798 (++++) 9.052 (++++)
(0.307) (0.744) (0.528)
lnt 0.249 (++) 4.426 (++++) 3.356 (++++)
(0.029) (0.167) (0.179)
KNO (+) 0.885 (++) 4.921 (++++) 3.688 (++++)
(0.383) (0.579) (0.621)
EXPER (+) 1.225 (++++) 1.206 (++) 6.261 (++++)
(0.336) (0.509) (0.601)
AMOUNT (+) 1.085 (++++) 0.461 2.695 (++++)
(0.309) (0.490) (0.554)
RISK (-) -0.453 -2.219 (++++) -1.279 (++)
(0.272) (0.478) (0.566)
EVOKE (-) -2.459 (++++) -13.913 (++++) -11.752 (++++)
(0.329) (0.687) (0.675)
BENFT (-) -3.588 (++++) -16.371 (++++) -14.438 (++++)
(0.305) (0.684) (0.782)
INTRST (-) -0.830 (++++) -2.558 (++++) -1.705 (++++)
(0.181) (0.235) (0.376)
PRICE (-) -1.259 (+++) -1.054 (+) -1.437 (+++)
(0.392) (0.548) (0.596)
DISCOUNT (+) 0.964 (++) 1.412 (++) 3.009 (++++)
(0.427) (0.589) (0.651)
AGE (+) 1.150 (++++) 7.199 (++++) 3.440 (++++)
(0.325) (0.527) (0.596)
EDU (-) 0.618 (+) 2.892 (++++) 0.865
(0.339) (0.493) (0.524)
HETEROGENEITY -- 3.887 (++++) 14.305 (++++)
FACTOR (c) (0.128) (0.642)
Negative Log 1624.39 1617.64 1606.0
Likelihood
Standard errors are in parentheses.
((++++)) Significant at the p = 0.001 level;
(+++) Significant at the p = 0.02 level;
(++) Significant at the p = 0.05 level;
(+) Significant at the p = 0.1 level.