Building informal inference with TinkerPlots in a measurement context.
Watson, Jane ; Wright, Suzie
Several issues surround the continuing implementation of the Chance
and Data component of the mathematics curriculum in Australia. First is
its survival. Second is the question of how far toward formal inference should the curriculum take students (assuming it survives). Third is
what contexts are amenable for understanding the concepts in the
curriculum. Fourth is what tools are available to save time and assist
in the learning process. One of the ways of ensuring survival is to
convince decision makers that Chance and Data can be taught and learned
successfully.
As is probably true of other parts of the mathematics curriculum,
there is sometimes a tendency in Chance and Data to focus on small
components, without spending time to fit them into the overall picture
of handling data to answer questions and draw conclusions. Playing games
with dice might be fun but how does the activity lead to answering
meaningful questions? Finding the mean of a set of numbers might be good
practice in addition and division but what does it convey about the set
of numbers and how can it be useful in answering a question? Drawing a
graph might create an attractive, colourful picture but what story is
told about a data set, its variation and its clusters of values?
Statistics is about telling stories and answering questions based on
various types of data. For statisticians the questions involve
collecting samples from populations and drawing inferences about the
latter from the former, usually based on random selection. For school
students statistics is likely to be more what Tukey (1977) called
exploratory data analysis, perhaps answering questions limited to their
own experience on a known population from which a convenience, rather
than random, sample is drawn. One of the aims across the middle years of
school should be to provide students a pathway for asking questions
about populations within which they see themselves as members. This
pathway is signposted with the techniques, such as finding middles,
drawing representations, and describing variation, which assist in
telling stories and answering questions. Associated with these
techniques there are now software packages that will ease the
computational burden and provide visual representations to make decision
making more intuitive than in the past. The packages can change the
focus from performing computations to interpreting and explaining.
Experiencing this process is part of informal inference, which will lay
the foundation for formal inference in later years.
This article uses a familiar setting to explore the issues
associated with developing ideas of informal inference and introduces
the software package, TinkerPlots (Konold & Miller, 2005), as a tool
to facilitate this development. Those wishing to follow up on
information about TinkerPlots can download a trial version at
www.keypress.com. An evaluation of TinkerPlots as an educational data
analysis package is provided by Fitzallen (2007). The value of one
representation provided in TinkerPlots, the hat plot, is explored in
detail by Watson, Fitzallen, Wilson, and Creed (in press).
The activities suggested in this article are intended for use with
middle and secondary students (grades 6 to 10). It is acknowledged,
however, that teachers in a school might need to work together to gain
an appreciation of the expected development of understanding and plan
for the background and level of the students they teach. The data and
suggestions presented here have arisen mainly from workshops with
inservice middle school teachers and preservice primary teachers, and
hence may provide models for similar sessions, as well as for activities
in the classroom. Examples of student work from grade 7 are also
included.
The context chosen for the investigations is body measurement.
Activities based on measuring hand span, foot length, arm span, and
height have been described by others (e.g., Clarke, 1996; Lovitt &
Clarke, 1992) and the famous drawing by Leonardo da Vinci of the
Vitruvian Man is often used as a motivation for asking a question about
arm span equalling height. The recent Australian Bureau of Statistics CensusAtSchool survey asked students for measurements of the height of
their belly button from the floor, the length of their right foot, and
their total height; these measurements hence provide an excellent data
base from which random samples can be collected ("2006
CensusAtSchool Questionnaire", 2006).
When planning a unit of work that aims to develop ideas associated
with informal inference, the starting point and questions need to be
considered carefully. Some mathematics educators, for example, would
suggest beginning with the da Vinci drawing and asking a question about
the population at large: Do you think it is true for all the people in
the world that their arm span lengths are equal to their heights?
Discussion would evolve into how this question could be answered, with
suggestions about appropriate kinds of data collection. Most students
will be interested in checking themselves and collecting data from their
classmates. Most high school teachers would assume that collecting these
data will lead to the production of a scatterplot with arm span measured
on one axis and height on the other. Jumping straight into this type of
investigation may be appropriate for students with some previous
experience in data handling and graphing (perhaps in grade 9 and above)
but for younger students it seems more appropriate to begin with a less
complex scenario in terms of the data handling expectations. Thus, even
though the question that begins with a population is quite easy to
understand, the techniques required to provide an answer may be
relatively sophisticated.
A less demanding approach for students may be to start with a
measurement activity, asking how accurately the arm span of a particular
member of the class (or the teacher) can be measured (Konold &
Pollatsek, 2002; Shaughnessy, 2006). In this way, questions of accuracy
and variation can be introduced: What does it mean to make an accurate
measurement? What variation can we expect in a measurement? Why is
accuracy important? These questions can in fact be considered in a
narrow classroom context, such as suggested here or expanded to consider
wider social or scientific contexts. From this initial investigation,
students can be encouraged to think about what a typical arm span
measurement for a particular age or grade level might be, before
comparing the arm span measurements of two groups, such as boys and
girls. Finally, students can investigate the association of two
variables, arm span and height, and draw conclusions from this. The
questions discussed in the previous paragraph about a larger population
can now be explored with more confidence. The following few
investigations provide examples of how investigations might proceed.
Investigation 1: Measuring accurately
Four questions similar to the following can be used to begin an
investigation of accuracy in measurement.
1. What does it mean to make an accurate measurement?
2. What variation can be expected if a measurement is repeated?
3. Why is accuracy important?
4. How confident can we be that we have the "true"
measurement?
Although these questions appear to be about measuring, not
statistics, statistics can be used to help answer them. The following
steps in the investigation provide starting points for teachers to adapt
for their classes.
Setting the question
How accurately can the arm span of a person be measured? What
method should be used? What would be a reasonable estimate?
Discussion of various methods of measuring is likely to be a good
place to begin to answer the question. Why might more than one
measurement be needed? All students in the class can contribute by
suggesting how they would make the measurement. Would students expect
all measurements to be the same? Issues might include whether a person
would stand or lie on the floor, what instruments would be used to make
the measurements, and what accuracy of measurement should be recorded.
Data collection (interval data)
Each person measures the arm span of a single selected person (say,
with arms spread out, to nearest 0.5 cm). Discussion can focus on how
many measurements would be needed for a good estimate of the actual
value.
[FIGURE 1 OMITTED]
Representing data
The collected data can be listed and ordered in a table similar to
the one in Table 1 and initial discussion based on values observed in
the table. What is the largest value measured? What is the smallest
value? Are any values repeated? Students can enter the data on
TinkerPlots data cards and create a graphical representation for the
measurements. Figure 1 displays an example of a stacked dot plot.
Summarising data
Using the tools available in TinkerPlots, students can mark the
mean, median, mode, and range on the line plot. Any interesting features
can then be discussed. Are any of the averages the same? Are there any
outliers? Can they be explained? In Figure 1 for example, the values of
186 cm and 187 cm were measured by one person with a shorter ruler than
the other people used and by another person who measured "over
Nathan's body" rather than flat on the floor under him.
Constructing a hat plot in TinkerPlots is often helpful in
summarising a data set. A default hat plot covers the middle 50% of the
data values under its crown and the bottom and top 25% under its brims.
Does the hat plot help describe the spread of the data? Figure 2 shows a
hat plot for the plot in Figure 1 with the data remaining visible. How
does the graph help answer the question about how accurately the arm
span can be measured? What is the best estimate of the selected
person's arm span length from the data collected? Looking at the
crown of the hat should help narrow the value of the estimated arm span
without forcing the choice of a single value, such as the mean or
median.
[FIGURE 2 OMITTED]
Chance and sampling questions
How could a better estimate of the selected person's arm span
be obtained? Discussion could focus for example on selecting another
sample of measures, collecting a larger sample of measurements, or using
a more consistent measuring instrument or technique.
Ways of randomly choosing a sample of measurements for this problem
could be discussed. Would the same data set, mean, median or mode be
obtained each time? How chance selection of a sample from a much larger
set of measurements might affect the mean, or other values, could be an
interesting topic of discussion.
Drawing a conclusion
Students should finally write a summary report, including all of
the assumptions made, to explain how accurately the group measured the
arm span of a single person and what the best estimate is. Decisions
about the potential outliers and their inclusion in or exclusion from
the analysis need to be included in the report. Suggestions for further
investigation are also valuable to include. This report can be written
in a text box in TinkerPlots to include with the plots created or the
plots can be copied and pasted into Word documents. An
"informal" inference reached should include a "best"
estimate for the person's arm span, perhaps expressed as a range
and with some statement about the degree of confidence with which the
estimate is made.
Investigation 2: Measuring arm spans of a group
A natural progression from Investigation 1 is to consider the
typical arm span of a class of students, or of students of a certain
age. Students should have a feel for the accuracy of their individual
measurements (and may want to take several measurements and average them
in some way). The following steps suggest a possible pathway.
Setting the question
What is the typical arm span measurement of grade X students?
Data collection (interval data)
Students should discuss how their particular class can contribute
to answering the rather general question. After discussing and deciding
on a method of measurement, the next issue is how many measurements
would be needed for a good estimate of the typical arm span length for
the class. All students then have their arm spans measured (say, with
arms spread out, to nearest 0.5 cm).
Representing data
The next step is to create a table similar to Table 1 for students
to record their data (or add to the previous data set, perhaps as
"My_armspan"). This information can be entered into
TinkerPlots by each student and representations created for the class
data. Students should be given freedom to create their own preferred
graphical form. Again there may be occasion to discuss outliers if there
are some very unusual measurements recorded.
Summarising data
Students should be asked to fill in a text box in TinkerPlots
summarising what their representation tells them about the typical arm
span of their class. This may involve discussion of the mean, median,
mode, or range as found on their graphs. They may use the hat plot to
discuss the shape of the data and the variation in the data. Figure 3
shows an example. If Investigation 1 has preceded Investigation 2, a
discussion point would be the difference in the variation shown in the
two stacked dot plots in Figures 2 and 3. Why would the second be
expected to show more variation?
[FIGURE 3 OMITTED]
Chance
Students should discuss ways of randomly choosing a sample for
answering this question. Perhaps there are other grade X classes in the
school that could be measured. How would chance and a different sample
affect the mean, the median, the variation, and the shape of the hat
plot?
Drawing a conclusion
Students should then write a report, complete with graphs,
including all of the assumptions made, to explain how the class arrived
at its estimate of a typical arm span length for grade X students and to
indicate its degree of confidence in the estimate.
Investigation 3: Comparing measurements on two groups
A natural extension of Investigation 2 is to ask a question that
compares two groups, perhaps boys and girls, or students in different
grades. Questions to consider might be: Do boys have greater arm spans
than girls? Do students across the middle years have increasing arm
spans with higher grades? To make formal inferences about these
questions for a state or country would require random samples and
advanced techniques but much can be learned about the processes in the
informal inference arena.
The data collection and representation tasks would be similar to
Investigations 1 and 2. As an example, Figure 4 shows a portion of a
TinkerPlots table with data from 58 students in grades 5 to 8 with
gender also included. Two interesting comparisons are possible from this
data set. Figure 5 shows the stacked dot plots for the boys and girls in
the middle years, whereas Figure 6 shows the stacked dot plots for the
grades.
[FIGURE 4 OMITTED]
[FIGURE 5 OMITTED]
For this data set some very interesting observations about
differences in variation as well as typical arm span can be made. The
students in this school, for example, concluded that the variation in
arm spans of boys in the middle school was greater than the variation in
the arm spans of girls in the same grades. They also concluded that arm
span increased from grade 5 to grade 6 and from grade 6 to grade 7, but
then levelled off, probably related to growth spurts up to grade 7.
Including hat plots in the graphical representations further enhances
the discussion of "middles" and spread. For this school as the
population, the students could make definitive statements about the data
sets, to answer the questions and speculate about causes; but for a
larger population, they would have to reach informal inferences and
acknowledge uncertainty.
[FIGURE 6 OMITTED]
Investigation 4: Comparing measurements on two variables
An extension to Investigation 3 is to ask a question about the
association between two variables within the same group, namely arm span
length and height. In this investigation, students can be introduced (or
reintroduced) to da Vinci's Vitruvian Man and asked to consider the
questions: Is there an association between people's arm spans and
their heights? Are they the same or nearly the same? Is there a
"cause" of the association?
Data collection can involve students measuring their heights (say,
with shoes off, to nearest 0.5 cm) and their arm spans (if not measured
before). These data need to be recorded, perhaps on a whiteboard or
worksheet, in a manner that makes it easy for students to enter them
into new data cards in TinkerPlots (or into the data cards from
Investigation 1).
[FIGURE 7 OMITTED]
If they have the appropriate background, students can then produce
association graphs such as the one represented in Figure 7, with height
on one axis and arm span length on the other. Older students can also
use a calculator and see if there is a significant correlation between
the two attributes.
[FIGURE 8 OMITTED]
Summarising the data may involve finding the mean of the data on
each axis and discussing any outliers. Using the drawing tool in
TinkerPlots, it is possible to draw a "line of best fit"
showing the association between height and arm span (an example of this
is shown in Figure 8).
Younger students who may not be familiar with scattergraphs may
suggest subtracting arm span length from height to see if the results
are zero or close to zero. This can be easily done using a special
formula in TinkerPlots that provides the difference between the two
attributes. Figure 9 contains the formula box showing how this can be
achieved. The difference can then be represented graphically, as in
Figure 10. Of interest in Figure 10 is how many differences
(My_Height-My_Armspan) are equal to zero, positive, or negative. What is
a reasonable difference from zero that students could observe and still
be able to say that the two measurements are "roughly the
same"? The hat plot might be useful here. There is no definitive
answer and again it might be necessary to check for (and explain)
outliers.
[FIGURE 9 OMITTED]
Other students, perhaps at an age between those who would subtract and those who would draw a scatter-graph, might suggest dividing one
measurement by the other and seeing how close the ratios are to one.
This idea is of course related to how close the points on a scattergraph
lie to the straight line drawn at 45[degrees] from the origin. Figure 11
shows the formula box and what the associated graph would look like for
the preservice teachers' data.
Using a TinkerPlots text box, students can write a report, setting
the context for the question, answering the question, and explaining how
the analysis was carried out for their inferences. They can also
speculate on the "cause" of this association, being careful to
use probabilistic rather than declarative language. It would be
interesting in a class if different groups of students presented these
three representations (or others) and their associated arguments to
answer the question. Students could discuss which was the most
convincing.
[FIGURE 10 OMITTED]
[FIGURE 11 OMITTED]
Conclusion
The purpose of this article has been to motivate teachers to
present their students with meaningful investigations that lead to an
appreciation of the types of questions that informal inference can help
to answer. The chance and data curriculum is about much more than
finding averages and drawing graphs. All three averages found in
curriculum documents--mean, median and mode--can be illustrated in these
investigations. In Figure 1, for example, the median and the mode are
both 182, whereas the mean ranges from 182.6 to 182 depending on whether
the two potential outliers are included or not. These observations
should not, however, be the only focus of Investigation 1: variation
observed, reasons for it, and consequent qualified statements about
accuracy are essential to a meaningful report. Although these
investigations could be carried out without the use of TinkerPlots, the
package can save time and add creativity and student ownership to the
production of evidence and the creation of a final report answering the
initial questions.
Acknowledgements
This article was written in association with Australian Research
Council Linkage Grant No. LP0560543. Key Curriculum Press, publisher of
TinkerPlots, provided the software to all schools in the project.
References
2006 CensusAtSchool Questionnaire. (2006). Canberra, ACT:
Australian Bureau of Statistics. Retrieved 9 October 2007, from
http://www.abs.gov.au/websitedbs/CaSHome.nsf/Home/Students+Area
Clarke, D. M. (1996). The case of the mystery bone: A unit of work
on measurement for grades 5 to 8. North Ryde, NSW: The Mathematical
Association of New South Wales, Inc.
Fitzallen, N. (2007). Evaluating data analysis software: The case
of TinkerPlots. Australian Primary Mathematics Classroom, 12(1), 23-28.
Konold, C. & Miller, C. D. (2005). TinkerPlots: Dynamic data
exploration. Emeryville, CA: Key Curriculum Press.
Konold, C. & Pollatsek, A. (2002). Data analysis as the search
for signals in noisy processes. Journal for Research in Mathematics
Education, 33(4), 259-289.
Lovitt, C. & Clarke, D. (1992). MCTP Activity bank volume 2.
Carlton, Vic.: Curriculum Corporation.
Shaughnessy, J. M. (2006). Student work and student thinking: An
invaluable source for teaching and research. In A. Rossman & B.
Chance (Eds), Proceedings of the Seventh International Conference on
Teaching Statistics: Working cooperatively in statistics education
[CD-ROM]. Voorburg, The Netherlands: International Association for
Statistical Education and the International Statistical Institute.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA:
Addison-Wesley.
Watson, J., Fitzallen, N., Wilson, K. & Creed, J. (in press).
The representational value of hats. Mathematics Teaching in the Middle
School.
Jane Watson & Suzie Wright
University of Tasmania
<jane.watson@utas.edu.au>
[TABLE 1 OMITTED]