文章基本信息

标题：Using spreadsheets to conduct Monte Carlo experiments for teaching introductory econometrics. (Targeting Teaching).
作者：Craft, R. Kim
期刊名称：Southern Economic Journal
印刷版ISSN：0038-4038
出版年度：2003
期号：January
语种：English
出版社：Southern Economic Association
摘要：In recent years a number of authors have begun to use Monte Carlo simulation to help explain elementary concepts in statistics and econometrics (e.g., Gujarati 1999, pp. 79-81 and 159-161; Albright, Winston, and Zappe 2000, pp. 191-194; Studenmund 2001, pp. 101-102). Furthermore, Kennedy (1998a, b) has argued that the Monte Carlo experiment is an indispensable pedagogical tool for the undergraduate econometrics course. In particular, he believes that Monte Carlo methods can be used to illuminate the idea of a sampling distribution, a fundamental concept that is often difficult for students to grasp. Kennedy maintains that because the notion of a sampling distribution is the "statistical lens" that makes other statistical concepts clear, and because Monte Carlo methods provide a superior vehicle for acquiring this lens, introductory students should be required to describe a Monte Carlo experiment related to "every major topic in the course" (Kennedy 1998b, p. 148). Kennedy further contends, however, that intro ductory students should not be asked to conduct their own Monte Carlo experiments because of the high opportunity cost of learning to program standard econometric software.
关键词：Econometrics;Simulation;Simulation methods;Spreadsheet software;Spreadsheets;Statistics;Statistics (Data)

Using spreadsheets to conduct Monte Carlo experiments for teaching introductory econometrics. (Targeting Teaching).

Craft, R. Kim

1. Introduction

In recent years a number of authors have begun to use Monte Carlo simulation to help explain elementary concepts in statistics and econometrics (e.g., Gujarati 1999, pp. 79-81 and 159-161; Albright, Winston, and Zappe 2000, pp. 191-194; Studenmund 2001, pp. 101-102). Furthermore, Kennedy (1998a, b) has argued that the Monte Carlo experiment is an indispensable pedagogical tool for the undergraduate econometrics course. In particular, he believes that Monte Carlo methods can be used to illuminate the idea of a sampling distribution, a fundamental concept that is often difficult for students to grasp. Kennedy maintains that because the notion of a sampling distribution is the "statistical lens" that makes other statistical concepts clear, and because Monte Carlo methods provide a superior vehicle for acquiring this lens, introductory students should be required to describe a Monte Carlo experiment related to "every major topic in the course" (Kennedy 1998b, p. 148). Kennedy further contends, however, that intro ductory students should not be asked to conduct their own Monte Carlo experiments because of the high opportunity cost of learning to program standard econometric software.

I concur with Kennedy's assertion that teaching statistical concepts in the context of a Monte Carlo study is an enormously valuable pedagogical technique. I also agree that the marginal benefit derived from requiring students to program Monte Carlo experiments with standard econometric software, rather than to only demonstrate their ability to conceptualize the experiment, probably exceeds the marginal cost for most undergraduates. Nevertheless, it is clear that students can derive substantial learning benefits from conducting their own Monte Carlo experiments. It is well known that many students learn best through experiencing and experimenting, and even students with strong abstract reasoning abilities usually find experiential learning exercises helpful. The process of developing and experimenting with a simulation model can provide such experiential learning opportunities with otherwise very abstract statistical concepts; but the opportunity cost is a problem.

I have found that the benefit can substantially exceed the cost when introductory students use spreadsheets to perform their own Monte Carlo experiments. As I hope to demonstrate below, the opportunity cost of learning associated with conducting a Monte Carlo experiment with a spreadsheet can be relatively low. Nearly all of my econometrics students (juniors and seniors in finance and economics, most of whom have had a prior course in spreadsheet modeling) are familiar with spreadsheets and find the environment very natural. Thus, they learn the necessary spreadsheet commands quickly and are then able to focus on the purpose of the exercise rather than on programming. In addition, instructors can control the cost to some degree by supplying spreadsheet templates and directions tailored to the background of the students and the goals of the exercise (see Cahill and Kosicki 2001 for a useful discussion of this point). The use of spreadsheets to conduct Monte Carlo experiments in introductory courses can therefo re overcome the high-opportunity-cost problem to a large extent.

The use of spreadsheets to teach Monte Carlo simulation offers additional learning benefits because the environment is especially intuitive and user friendly. For instance, the spreadsheet setting invites exploration: Once a model is created, the effects of changing a parameter value can be investigated by simply entering a new number in a cell and pressing a key. Spreadsheet models can also be superior teaching tools because of the way data are displayed. Realizations of random events can be organized, annotated, and presented in a tabular form that undergraduates can easily relate to. With a well-designed spreadsheet model, students can almost see samples being drawn and estimates being created. As opposed to the "black box" nature of standard econometric software, these models can be very explicit, with each step connected in a straightforward way.

Finally, there are important positive externalities associated with the use of spreadsheets to teach Monte Carlo methods. Most of the spreadsheet techniques needed for a Monte Carlo study are useful in a variety of other modeling situations, ranging from economic theory (e.g., Cahill and Kosicki 2000) to personal finance (e.g., Holden and Womack 2000). Students therefore learn practical spreadsheet modeling skills along with econometrics. Moreover, since spreadsheet software is widely available, students are more likely to later use their knowledge of spreadsheets and Monte Carlo methods in a work environment. Because they recognize these benefits (with help from the instructor), most students are quite willing to invest in learning the necessary spreadsheet skills.

Along with the numerous authors who have promoted the use of spreadsheets in teaching economic theory, Judge (1999) introduced a simple method for conducting Monte Carlo experiments with spreadsheets. The approach proposed in this paper differs from Judge's approach in at least three important ways. First, the method outlined here allows for random regressors. Although it is not useful for most research applications, sampling from a population of regressors adds an element of realism (for cross-sectional analysis) that facilitates student comprehension--my students are able to get a better grasp of the meaning and significance of the "fixed in repeated samples" assumption when I contrast fixed regressors with random regressors in a Monte Carlo context. Another distinguishing feature of the approach presented here involves the method of sampling. I believe it is both more intuitive and more efficient than that used with Judge's approach; in particular, it is easier to draw larger samples and to repeat experime nts under alternative scenarios. Finally, because it is easier to repeat experiments under alternative scenarios, the approach proposed here facilitates student experimentation and enables instructors to accomplish more with any Monte Carlo exercise.

My objectives in the remainder of this paper are to (1) describe an effective method of conducting Monte Carlo experiments in Excel and (2) discuss some of the ways in which I have used spreadsheet-based Monte Carlo experiments successfully in teaching introductory econometrics courses.

2. Monte Carlo Simulation with Excel

In this section, I briefly outline how to conduct a Monte Carlo experiment in Microsoft Excel with a specific example involving a cross-sectional application. More detailed instructions designed for students, including a number of spreadsheet modeling tips and guidelines, can be found on the course Website at http://www.suu.edu/faculty/craft/EC4260/MonteCarlo.htm. The Website also contains corresponding Excel templates and other supporting material.

The presentation is based on a lecture and computer demonstration that I give students in preparation for their first Monte Carlo assignment. The initial demonstration and discussion usually take nearly one week of class meetings, but I believe it is time well spent for introductory students. I typically assign the first Monte Carlo exercise in the fourth or fifth week of the semester, after students are familiar with the basics of the classical multiple linear regression model and just before hypothesis testing. It is assumed that most have a basic understanding of spreadsheets. I teach the Monte Carlo experiment as a six-step process, which differs slightly from that typically outlined by other authors. Each step is described below with reference to the spreadsheet model displayed in Figures 1 and 2.

The Data-Generating Process

To begin the illustration, assume that the true data-generating process is a classical linear regression model with two explanatory variables:

Y = [[beta].sub.0] + [[beta].sub.1][x.sub.1] + [[beta].sub.2][x.sub.2] + u.

where the error term u is independently distributed Normal(0, [[sigma].sup.2]). Further, assume that the true parameter values are, say, [[beta].sub.0] = 25, [[beta].sub.1] = 6, [[beta].sub.2] = -2, and [[sigma].sup.2] = 2500. Any parameter values can, of course, be chosen. To make the example more concrete, I sometimes let the dependent variable represent something real, like the selling price of a house, and then choose the parameter values accordingly. Depending on the purpose of the exercise, it can also be useful to let the error term follow some nonnormal distribution.

Explanatory Variables

While a fixed set of regressors can be specified, and it is computationally simpler to do so, it is both more realistic and more intuitive for students to randomly draw the explanatory variables from a population. Moreover, undergraduates generally find it easier to grasp the notion of sampling from a finite population rather than from an infinite one. Students are therefore provided with a "population" of explanatory variables such as that displayed in columns C and D of Figure 1, in which the population size is 750. A much larger population is feasible, but I like to keep the in-class simulation small to speed up the demonstration. The independent variables in Figure 1 were created using Excel's random number generator, although the assumed population values can be obtained in different ways, such as pretending that some real-world sample of data is actually a census.' The explanatory variables can also be made to have certain properties if that is important for the exercise. For example, the variables can be constructed so that they exhibit a certain degree of collinearity by generating one as a linear function of the other: [x.sub.i2] = f([x.sub.i1]) + -[epsilon].sub.i], where f(.) is a linear function and [[epsilon].sub.i] is a stochastic term with desired properties.

Obtaining a Random Sample

The stage of the exercise in which a random sample is obtained consists of two steps: (i) generating the dependent variable according to the assumed data-generating process and (ii) obtaining a sample from the population. While these steps can be reversed, this ordering seems to be most natural for students. Excel will produce a random draw from a normal distribution with mean [mu] and standard deviation with the formula =NORMINV(RAND(),[mu][sigma]). This formula is entered into the error term cells of the spreadsheet (cells E10 : E759), with the arguments being referenced to fixed parameter value cells. The dependent variable is then generated for each element of the population with standard spreadsheet functions (cells B10 B759), again by referencing cells containing the appropriate population parameters. Note that once the formulas are entered for the first element, the remainder can be entered by simply copying and pasting if relative and absolute cell references are set properly.

The model will draw a new error term for each individual and consequently generate a new set of dependent variables every time the spreadsheet is recalculated. Excel can be set to recalculate formulas automatically whenever a change is made in the spreadsheet or to recalculate manually. When developing simulation models, it is generally best to have the spreadsheet set on manual calculation to avoid unnecessary recalculations.

Next, a random sample of n observations, { ([x.sub.i1], [x.sub.i2], [y.sub.i]) i 1,2,..., n], is drawn from the artificially created population using simple random sampling with replacement. Here I think it is worthwhile to briefly mention some of the sample selection issues that can arise in practice and to remind students that the Gauss-Markov theorem requires random sampling. A random selection can be made from the sampling frame with the Excel formula =INT(N*RAND())+ 1, which yields a uniformly distributed random integer from 1 to N. This formula is entered into cells G4 through G43 of the spreadsheet model (Figure 1) to obtain a sample of 40.

After n random numbers have been drawn, each is matched with the corresponding population element and the data are extracted. One way to accomplish this in Excel is with the VLOOKUP formula. The syntax for VLOOKUP is =VLOOKUP(lookup value, table, column number), where lookup value is the number that you want Excel to find in the first column of a table of data, table is a reference to the table of information from which you want to take data, and column number is the column in table from which to take the data. This formula is entered into those cells in which the sample observations and variables are to be displayed. For instance, cell 14 of the model contains VLOOKUP($G4,$A1O: $D$759,2), which says to find the number in the first column of the table defined by cell range A10 : D759 that matches the value in cell G4 and return the corresponding number from the second column of table AlO : D759. This process yields the first observation on the dependent variable. The remaining observations and variables are o btained similarly. For example, cell J4 returns [x.sub.11] with the formula = VLOOKUP($G4,$Al0 $D$759,3).

Estimating the Parameters

A regression model is now estimated from the sample data. The Excel array formula = LTNEST(range of y, range of x, const, stats), where x represents all of the explanatory variables arranged adjacent to each other on the spreadsheet, returns ordinary least squares (OLS) regression estimates and certain statistics. (2) The argument const is a logical value: If const is "true" (equal to 1) or omitted, an intercept term is included in the regression. The argument stats is also a logical value: If stats is "true" (equal to 1), LINEST returns certain regression statistics. Figure 2 shows the output generated by the LINEST formula for our example, where stats is "true." It is important to note that LINEST is an array formula. Array formulas can return more than one value, and the way in which they must be entered is different from that for regular formulas. (3) For instance, to generate the regression output illustrated in Figure 2, select the cell range P3:R7, enter the formula LINEST(I4 : I43,J4 : K43,1,1). (onet ime), and then, with the cell range still selected, press CTRL + SHIFT + ENTER. The formula will be automatically entered into each cell in the selected range with braces placed around it, indicating that it is an array formula, and regression output will be returned for the specific data set. To change an array formula, select the entire range of cells, make the change, and then press CTRL + SHIFT + ENTER again.

Running the Simulation

At this point, a new population is created, a new sample is selected, and new OLS estimates are obtained each time the spreadsheet is recalculated. In a classroom demonstration, it can be very helpful to run the simulation manually several times and let students observe the process while discussing the meaning and importance of the sampling distribution of [[beta].sub.j].

Although it was not designed for this specific purpose, the Excel data table command provides a simple way to run the simulation a large number of times and record the results (see the course Web page for a supplement called "On Using the Excel Data Table" for more information). (4) To illustrate, the following describes how to program Excel to draw 500 different samples, obtain OLS results for each sample, and store the corresponding values of [[beta].sub.1], se([[beta].sub.1]), and [R.sup.2] on another worksheet (while the simulation results can be placed anywhere, I like to put them on a separate worksheet to help drive home the sometimes difficult point that a sample of estimates is distinct from a sample of data). With most personal computers, it would be practical to perform a much larger simulation if desired, but 500 is adequate for the example. First, name the original worksheet Model and open a new worksheet and name it Results. (5) Next, on the Results sheet, create a table with rows labeled 1 thro ugh 500 and three columns labeled [[beta].sub.1], se([[beta].sub.1]), and [R.sup.2]. In the first row of the table, just under each column heading, place a reference to the cell in the Model worksheet that contains the value of [[beta].sub.1], se([[beta].sub.1]), and [R.sup.2]. This can be accomplished by typing "=" and then clicking on the appropriate cell in the Model worksheet. Figure 3 illustrates the new worksheet.

To implement the data table command, select the entire table, including the column with the row labels (cell range A3 : D502 in Figure 3); click on the Data menu and select Table; in the Table window, leave "Row input cell" blank and in "Column input cell" enter any empty cell on the Results worksheet outside of the table; click OK. Excel will fill in the body of the table with the formulas (=TABLE(,column input cell)) and will initially show the values from the first row. When F9 is pressed, Excel will put each row number from the first column of the table sequentially into the column input cell, recalculate the spreadsheet, and store the new values for [[beta].sub.1], se([[beta].sub.1]), and [R.sup.2]. This produces results for 500 OLS regression runs, as illustrated in Figure 3.

Analyzing the Results

The bottom rows of Figure 3 indicate some of the summary measures that might be used to investigate the results. It is also fairly simple--and typically very helpful for students--to create histograms of the estimates or statistics under consideration.

3. Class Discussion

I normally spend one or two entire class periods simply discussing results from the initial demonstration. (I also spend some class time discussing the results and implications of each Monte Carlo exercise assigned.) Depending on students' familiarity with spreadsheets, it may be a good idea to postpone a detailed discussion until after they have worked through the first Monte Carlo experiment so that they will be in a better position to focus on the meaning of the experiment rather than on the details of creating the spreadsheet. In either case, the following are some class discussion points and experiments that I have found to be useful following the introduction of Monte Carlo simulation:

* Compare and contrast the sample with the sampling distribution. In my experience, students frequently confuse these concepts (because they do not understand the sampling distribution concept). Given a set of Monte Carlo results, one can more clearly demonstrate how and why they are distinctly different things.

* Note that the experimental results support the Gauss-Markov theorem, discuss why they should, and take this opportunity to review the meaning and importance of the theorem.

* Discuss a histogram of [[beta].sub.j]s representing draws from a sampling distribution. Discuss why it appears to be approximately normally distributed. Change the random error term to follow some nonnormal distribution (a uniform distribution is easy to specify) and compare the resulting [[beta].sub.j] sampling distribution to the original. Discuss the implications for hypothesis testing.

* Compare and contrast the standard deviation of the sample of [[beta].sub.j].s with the OLS standard error, and discuss the interpretation and properties of se([[beta].sub.j]).

* Consider the variation in the [R.sup.2] statistic and discuss how even when the model is well specified, the OLS regression line might fit well or not so well depending on the variance of the error term and the characteristics of a particular sample (i.e., a low [R.sup.2] does not necessarily imply a "bad" model).

* Run an experiment over again with a larger sample size and compare histograms associated with each sampling distribution.

* Compare and contrast alternative estimators using summary statistics and histograms generated from a Monte Carlo simulation.

4. Monte Carlo Exercises

My course Web page includes spreadsheet-based Monte Carlo assignments that I have successfully used with undergraduate econometrics students. In addition, Kennedy (1998c) suggests numerous Monte Carlo exercises, most of which could easily be implemented using the spreadsheet framework described in this paper. With the first exercise, I typically provide a set of detailed instructions on how to conduct Monte Carlo simulation in Excel and a spreadsheet template similar to Figures 1 and 2 (see the course Web page).

In the introductory course, I generally focus on exercises that illustrate the effects of violations of the basic assumptions, the sampling distribution concept, and other fundamental principles. The objective of the first assignment, for instance, is to demonstrate the effects of an omitted variable when it is strongly correlated with the included variable(s) and when it is not. I provide students with a population regression model with two independent variables, complete with parameter values and an error distribution, and two different populations of explanatory variables. The explanatory variables are highly correlated in one population and much less so in the other (but I do not state this). Students are instructed to conduct a Monte Carlo experiment with each population estimating [[beta].sub.1] first with [x.sub.2] included and then with [x.sub.2] omitted (i.e., to purposely estimate an underspecified model). The students are then asked a series of questions regarding the properties of the OLS estimato r in each case. They are also asked to explain why the underspecified model yields an OLS estimator that is more biased in one experiment than in the other.

As they work through the first Monte Carlo exercise, most of my students quickly become comfortable with the necessary spreadsheet techniques, although I am often asked to debug their models (recall that nearly all of my students have had a prior course in spreadsheet modeling). And while the students have no problem with the idea that excluding a relevant variable causes OLS to be "wrong," except in special circumstances, their answers to the first assignment often indicate either a superficial or an incorrect understanding of basic principles. For instance, after completing the Monte Carlo exercise just described, students still

make comments such as, "the estimate is biased because the number is too small," "the estimate is more accurate when the standard error of the explanatory variable is smaller," or even, "it is important that the error term not be correlated with the y variable."

Thus, nearly all students need more time and experience to fully grasp the sampling distribution concept and other fundamental principles that I hope to illustrate through Monte Carlo simulation. It is therefore both beneficial and inexpensive at this point to assign Monte Carlo exercises to illustrate any econometric topic that deserves emphasis. This helps to drive home specific concepts and also deepens--or awakens--understanding of fundamental principles at the same time.

The second Monte Carlo assignment might be designed to demonstrate the effects of multicollinearity and how one can get an imprecise estimate of a regression parameter when [R.sup.2] is high and vice versa. This could be accomplished by asking students to compare and contrast Monte Carlo results from two scenarios: In the first case, specify a relatively large value for [[sigma].sup.2] along with a population with little collinearity among the independent variables; in the second case, specify the opposite conditions. Along with illustrating the implications of multicollinearity, this experiment helps students to understand the practical meaning of [R.sup.2], since they tend to become overly concerned with its magnitude, and also provides another opportunity to illustrate the sampling distribution concept.

While students with less spreadsheet experience may struggle with the techniques, it is important to note that the learning economies involved in developing Monte Carlo experiments in Excel are substantial. Once students have done one experiment, additional exercises require much less effort with respect to spreadsheet modeling; students are therefore better able to focus on the main purpose of the exercise. Thus, it seems that the marginal benefit is increasing while the marginal cost is decreasing over the first three or four exercises.

5. Conclusions

In his introductory text, Wooldridge (2000, p. iii) states that "Not only are undergraduates capable of learning about econometrics in a way that reflects modern practice, but this approach enhances their interest in and enjoyment of the subject." Making Monte Carlo experiments an integral part of the introductory course can help support this objective in several ways: first, developing an understanding of Monte Carlo methods leads to a deeper understanding of fundamental principles of econometrics and statistics; second, Monte Carlo experiments can help illustrate key econometric concepts and results; third, conducting Monte Carlo experiments gives students hands-on experience that is both instructive and enjoyable; and finally, experience with simulation gives beginning students an understanding of an important method of econometric research.

While the educational benefits of having students conduct their own Monte Carlo experiments are substantial, in the past the practicality of doing so has been limited because of the high opportunity cost involved in learning to program. More recently, however, it has become possible to conduct Monte Carlo experiments using standard spreadsheet software. Because most students have experience with spreadsheets, pick up new spreadsheet modeling techniques rather quickly, and generally find the spreadsheet environment to be intuitive and user friendly, the opportunity cost involved in having students conduct Monte Carlo experiments with spreadsheets is comparatively small. This paper has described a method for conducting meaningful Monte Carlo experiments with Excel and has discussed how it can be used as an effective teaching tool for introductory econometrics students.

(1.) The Excel formula RAND() generates a uniformly distributed pseudorandom number between 0.0 and 1.0 (technically. 0 [less than or equal to] RAND() < 0.999).

(2.) The Regression Analysis tool in Excel's Analysis ToolPak cannot be used here because it will not automatically recalculate the regression parameters during the resampling process.

(3.) Look op "array formula" or "=linest" in the Excel online help catalog for more information.

(4.) The process of resampling and analyzing the results can be greatly facilitated by commercial spreadsheet add-ins designed for this purpose, such as Crystal Ball or @Risk. My objective here is to show how to conduct a Monte Carlo simulation experiment using only the inherent capabilities of Excel.

(5.) To create and name the Results worksheet, (i) click the Insert menu and select Worksheet, (ii) click the Format menu and select Sheet-Rename, (iii) type the name of the new worksheet, and (iv) click anywhere on the new worksheet or press [Enter].

References

Albright, S. Christian, Wayne L. Winston, and Christopher J. Zappe. 2000. Managerial statistics. Pacific Grove, CA: Duxbury.

Cahill, Miles, and George Kosicki. 2000. Exploring economic models using Excel. Southern Economic Journal 66:770-92.

Cahill, Miles, and George Kosicki. 2001. A framework for developing spreadsheet applications in economics. Social Science Computer Review 19:186-200.

Gujarati, Damodar. 1999. Essentials of econometrics. Boston, MA: Irwin McGraw-Hill.

Holden, Craig W., and Kent L. Womack. 2000. Spreadsheet modeling in finance and investments courses. FEN Educator (online) 5(5). Available at: http://papers.ssm.com/sol3/papers.cfm?abstract id=241708.

Judge, Guy. 1999. Simple Monte Carlo studies on a spreadsheet. Computers in Higher Education Economics Review (online) 13(2). Available at: http://econltsn.ilrt.bris.ac.uk/cheer/ch13_2/ch13_2pl2.htm.

Kennedy, Peter E. 1998a. Teaching undergraduate econometrics: A suggestion for fundamental change. American Economic Review 88:487-91.

Kennedy, Peter E. 1998b. Using Monte Carlo studies for teaching econometrics. In Teaching undergraduate economics: Alternatives to chalk and talk, edited by W. E. Becker and M. Watts. Aldershot, UK: Edward Elgar pp. 141-59.

Kennedy, Peter E. 1998c. A guide to econometrics. 4th edition. Cambridge, MA: MIT Press.

Studenmund, A. H. 2001. Using econometrics: A practical guide. 4th edition. Boston, MA: Addison Wesley Lougman.

Wooldridge, Jeffrey M. 2000. Introductory econometrics. Cincinnati, OH: South-Western College Publishing.

R. Kim Craft *

* School of Business, Southern Utah University, Cedar City, UT 84720, USA; E-mail craftt@suu.edu.

The author appreciates helpful comments and suggestions from Joe Baker, David Tufte, and two anonymous referees. Remaining errors are the author's alone.