Raising schooling attainments by grouping pupils within each class.
Prais, S.J.
The object of this Note is to caution against accepting, at least in
the context of English schooling policy, conclusions drawn by a group of
Canadian educational researchers from their survey (a
'meta-analysis' - as they call it) of a mass of earlier
classroom studies which, they say, on average significantly favour - not
'whole-class teaching' - but dividing pupils within each class
into small groups according to their ability ('homogeneous
within-class ability-grouping'). Issues of this kind have for long
been of great concern to educational policy makers; in simplistic terms:
those more anxious to advance social egalitarianism have tended to
favour mixed-ability teaching of the whole class, while others - more
worried about academic (or 'cognitive') attainments - have
preferred some form of division of pupils according to 'general
ability' (in whatever way that may be ascertained) or according to
attainments in particular subjects. As we shall see, a correct approach
requires broader strategies in the organisation of teaching than implied
by that simple dichotomy.
The argument of the present Note is threefold: (a) the Canadian
researchers have seriously mis-summarised their findings; (b) the real
issue for the class teacher is not whether simply 'to group or not
to group within a class', but rather in what proportions to divide
the time of each lesson amongst a range of teaching styles - for
example, the teacher addressing and questioning the whole class,
pupils' individual deskwork, and pupils' working in groups;
(c) the real issue for those responsible for English primary school
organisational policy is whether there are other organisational features
of primary schooling - normally to be found in high-attaining
Continental European countries - that are more important, and which
should now gradually be introduced in order that proper egalitarian
objectives of schooling can be combined with higher cognitive
attainments (especially in relation to slower-developing children -
including summer-born children - who seem to be particularly
disadvantaged under current English schooling arrangements).(1)
Canadian conclusions
The new Canadian meta-analysis - a survey of about a hundred
comparisons carried out by previous researchers - concludes that
'the practice of within-class grouping is supported by the results
of this review', and that 'overall, the average achievement
effect-size was +0.17'.(2) Let us first explain the quantitative
significance of that improvement: by average 'effect-size' is
meant the average improvement in attainments in grouped classes as
compared with control (ungrouped) classes measured in units of the
standard deviation of attainments in the control group; in simpler terms
this implies, for example, that if there are 25 pupils in a hypothetical
class then, ranking them by their achievements, the achievement of the
13th pupil - the median pupil - would be expected to rise to that of the
15th pupil as a result of such within-class grouping (all this is to be
taken as 'roughly speaking', assuming a normal probability
distribution is an adequate approximation here).(3) As a single summary
measure of the consequence of such a change, the value +0.17 of course
necessarily conceals many possible variations; for example, the main
improvements may have taken place among top pupils, while low-attaining
pupils may have stood still or perhaps even fallen. The first caution
that needs to be voiced therefore is that more than a single measure of
the consequences of such changes is needed.
Secondly, we must ask whether an improvement of only two places in
the ranking of the average pupil is enough to be worthwhile, taking into
account possible adverse effects on some pupils (especially, as just
indicated, a possible lowering of self-esteem and demotivation in the
low-attaining group) and the increased burdens on teachers in attempting
to attend simultaneously to the needs of many distinct attainment-groups
within their class. Much-depends on whether improvements of that size (a
rise of two places in the ranking of the median pupil) can be expected
to cumulate over time - in which case over a five-year period of primary
schooling (for example) the median pupil's rise might cumulate to a
more obviously worthwhile ten ranks - or whether they are
once-and-for-all improvements measured at the end of a long period of
schooling. A related question: were the underlying comparisons based on
comparing two long-established samples of classes, one sample with
long-established within-class grouping practices throughout the school,
and the other sample with long-established teaching practices without
grouping; or were the comparisons based on two samples of schools, both
with established practices of not grouping, and where there were
short-term experimental introductions of within-class groupings in one
of the samples? Or, perhaps, vice versa: did both samples have
long-established grouping practices, and one sample give up grouping for
an experimental period? And how much training of teachers took place to
prepare them for experimental changes? In the case of experimental
short-term changes, the direction of change could affect the direction
of bias resulting from teachers in the experimental class being
instructed more carefully in how to present their teaching material.
Unfortunately, these aspects were not given too much attention in the
Canadian meta-analysis. We are told only that distinctions were drawn in
that analysis between 'experimental treatments' lasting a
'medium' period of 4-16 weeks and treatments lasting shorter
or longer than this range; and, surprisingly in view of an expectation
of cumulation, 'duration of treatment . . . was not [found to be]
significantly related to the size of the effect' (p.444). The
typical underlying comparison probably related to quite a short
experimental period in the school-life of a pupil; and the direction of
the experimental change was probably towards within-class grouping,
rather than giving up a.n existing system of within-class grouping and
trying whole-class teaching. Some long-term cumulation of effects thus
still remains possible, but no clear evidence was gathered in the course
of this meta-analysis.(4)
Almost anything can happen
Much more worrying than the small size of the average improvement was
the extraordinary variability in outcomes of the individual underlying
studies. The average of +0.17 was taken from 103 comparisons, based on
51 studies, which had been narrowed down from an original over-3000
published articles on within-class grouping that had been identified by
computerised searches. The effect-sizes estimated in those 103
comparisons varied from +1.52 to -1.96 (p.439); that is to say, in terms
of the above hypothetical example of a class of 25 pupils, within-class
grouping in some studies raised the attainments of the class's
median pupil to that of the second pupil from the top while, in other
studies, grouping lowered the median pupil to the attainment of the
weakest pupil but one!(5)
It has to be emphasised that each underlying study included
sufficient pupils to warrant a published article that had gained the
approval of the referees of an educational journal. And yet there was
this extraordinary variability - more correctly, inconsistency - in
findings. A calculated average in such circumstances provides little
basis for drawing implications for policy; it would have been more
important to state as an overriding conclusion that the underlying
comparisons only tell us that almost anything can happen -
extraordinarily good or disastrously bad - if all that is done is to
group pupils within a class, rather than not group them. The precision,
or margin of uncertainty, attached to each of the comparisons (the
sampling error of the effect-size) should also have been taken into
account in attempting to calculate an average.(6)
In any event, the Canadian authors went on to carry out a number of
further summary analyses, classifying the underlying comparisons by
about a dozen possible contributory factors - though taken only one at a
time, rather than introduced simultaneously in a grand multivariate
analysis. We may here consider the implications of four of those
factors: age of pupil; whether pupils were grouped by ability
('homogeneous grouping') or whether they were put into
mixed-ability groups; whether pupils were affected differently according
to their position in the attainment-ranking within the class; and
whether the number of pupils in each group, and the number of such
groups within a class, made a difference to the outcome.
Age of pupil
The most beneficial effect on average emerged from comparisons
relating to classes for pupils aged approximately 10-12 years (in
'late elementary' grades 4-6); an average size-effect of +0.29
was reported, corresponding in our hypothetical class of 25 pupils to
raising the attainments of the median pupil by two and a half ranks. But
the difficulty remains as before - there is immense variability in
observed outcomes even at these propitious ages: from their published
summary it can be deduced that the central 95 per cent of underlying
comparisons covered classes where the median pupil rose by 5 or 6 ranks,
to others where the median pupil fell by that number of ranks?) Even in
that most favourable age-group a teacher adopting class-grouping is
engaged in a very worrying form of Russian roulette with her
pupils' prospective attainments!
At secondary school ages, the estimated effect-sizes were similar to
the general average; at early primary school ages, the effect-size was
negligible (the median pupil would be expected to rise by only a third
of a rank) and, as elsewhere, highly variable.
Grouping by ability
Some of the underlying studies compared different types of
within-class grouping. In some classes pupils were grouped according to
their attainments/ability (for example, into high, middle and
'foundation' groups) with each group perhaps being allocated
work of different difficulty; while in other classes, pupils were put
into mixed-ability groups with each group undertaking work of
essentially similar difficulty. One of the benefits often expected from
the latter arrangement is that the brightest child in each group acts as
a kind of assistant or surrogate teacher, able to explain to other
children in a way that is perhaps even better than the teacher's;
it also provides an opportunity for children to develop skills in
group-working (how to receive and give help, how to divide a large
problem amongst the group; it may be that a 'bright child'
could have learnt more by spending his time in other ways - but we need
not go into that here). These two kinds of arrangement - homogeneous versus mixed-ability grouping - were the subject of twenty comparisons
selected for the Canadian meta-analysis, and the results were averaged
according to pupils' ability-levels; regrettably, the results were
not at the same time compared with classes in which there was no
grouping at all (this presumably arose because of the limitations of
their computerised programmes).
The results of the various comparisons again showed immense
variability, so that no clear implication follows for the policy to be
adopted in an individual class. All that can be said is that the
comparisons suggest that on average the following results are to be
expected. Low-ability children perform worse if they are in a
homogeneous group than if they are put into a mixed-ability group
(effect-size of -0.60, in their table 10); on the other hand, pupils of
medium and high ability perform better in homogeneous ability groups
(effect-sizes of +0.51 and +0.09 respectively). There is therefore a
conflict between the kind of grouping that would best serve the
interests of low-ability children and the kind of grouping that would
best help other children.
The evidence to be gleaned from this meta-analysis on whether
low-ability children would perform better if they were not grouped at
all (rather than put into mixed-ability groups) is puzzling. An
effect-size of +0.37 is quoted (in their table 8) for low-ability
children put into unspecified types of groups rather than not grouped;
that would be consistent with the -0.60 (quoted in table 10) for
homogeneous versus mixed-ability group only if low-ability children in
mixed-ability groups did very much better than if left ungrouped. That
is of course possible; but other surveys have provided other findings.
Some suggest that low-ability homogeneous groups lose 'a great deal
of ground' since they do not have the stimulus or help of
higher-ability children in their group; others find the reverse, perhaps
because of better focused teaching for their needs. There seems to be
wider agreement that the class's spread of attainments grows, and
so would the disparity of attainments of the nation's
school-leavers, if schools in general were to follow this practice.(8)
It is a pity that the authors of this meta-analysis did not explore such
issues in greater depth.
One might have expected that low-ability children would develop best
if taught for a good part of the time on their own in suitable
circumstances - in small classes, with an appropriately trained and
experienced teacher, using finely graduated teaching material; the
results presented in this meta-analysis cannot be regarded as
inconsistent with that view.
Size of group and number of groups
Groups of 3-4 pupils appear from this meta-analysis to be the most
effective in raising attainments on average; but groups that are only
slightly larger, of 5-7 pupils, show negligible benefit (effect-sizes of
+0.22 and +0.02 respectively, reported in their table 6).(9) In terms of
the number of groups in a class, this research thus tells us that a
class of 25 children would require seven or eight separate groups of
children to be effective; while if there were only four groups, each
with six or so children in such a class, than we might expect
pupils' attainments to be unchanged. One can think of many reasons
why such curious statistical findings may have arisen (for example,
small groups of pupils were observed mainly in very small classes); but,
on the surface at least, it seems that this meta-analysis provides
support only for dividing an average English primary class into too many
groups to be manageable!
Black, white or grey?
Experiments contrasting the advantages of white as against black,
suffer from a central problem - the real choice should often be between
light grey and dark grey. In our context: many teachers normally spend
part of their weekly lesson-time in some form of differentiated
group-work, and try to mix their teaching-styles to meet the varied
needs of their class; when teachers are asked to partake in such an
experiment, some will be asked to give up group-work completely, and
others to adopt it as completely as possible. Both types of classes move
to what their teachers regard as a less-preferred situation. A better
simple contrast might therefore be between teachers devoting, say, less
than a quarter of teaching time to this approach, and those devoting
more than three-quarters (or, say, one-tenth and nine-tenths). But that
more delicate issue of 'finding the right mix' lay outside the
province of this meta-analysis.
Where do we go from here?
Let us now take the issues nearer home. In England a government-sponsored experimental initiative for teaching mathematics in
primary schools was established in 1997 - the National Numeracy Project
- based on specifying more detailed syllabuses than were provided under
the National Curriculum of 1988 (and subsequent revisions) combined with
firm advice that, for the greater part of each lesson, the class is to
be divided into three groups according to pupils' attainments. On
the basis of the meta-analysis under review here (and there seems to
have been some reliance by the Government's advisory bodies on the
summary conclusions drawn by its authors) we would expect (a)
considerable variety in outcomes for classes; (b) that pupils with low
attainments would tend to suffer differentially; (c) that the size of
the groups in each class would be found too large for the teacher to
provide adequate help to individual pupils who need it; and (d) that the
disparity in attainments of the class as a whole would grow
cumulatively. In the longer run (over three or four years, rather than
just a term or two's work that has so far been assessed), we might
expect the rate of progress of such a class as a whole to be moderated
with the increasing disparity in pupils' attainments.
It seems to be universal schooling-experience that at some age in the
course of their schooling the increasing disparity in pupils'
attainments requires some division into 'parallel'
differentiated classes or sub-groups; at least that seems to be true in
subjects such as foreign languages and mathematics where learning is
built up in a clearly cumulative way. But at what age should that
division take place? Too early an age carries the disadvantage of
demotivating slow-developers who might otherwise catch up - with
suitable help - and reach attainments within the normal range; too late,
and the burden on the teacher arising from disparity within the class
becomes too great - with the result that the progress of the whole class
suffers and, often most of all, the progress of low-attaining pupils to
whom the teacher cannot give the time they need in order to keep pace.
The familiar compromise is to make the division on transfer from primary
to secondary schooling, at ages varying mostly from 10 to 13; but during
primary ages Continental schools adopt other organisational strategies
to help cope better with disparity, and in particular with helping
low-attainers. Those strategies undoubtedly are not wholly strange to
English teachers, but need now to be considered afresh in England in a
generation when, as a result of technological progress and automation,
the penalty attached to leaving school with failings in basic subjects
such as mathematics has grown significantly.
Other organisational strategies
It would take us beyond the province of this Note to do more than
outline very briefly the main organisational features of schooling
judged as important by English teachers and school inspectors following
a systematic programme of observations of Continental classes organised
in recent years by the National Institute, and thought worth contrasting
with practice in English schools.(10) The overriding Continental
emphasis on whole-class teaching in primary schools is there aided by
the following organisational features: -
(a) Flexibility in age of entry to school, depending on a
child's rate of development and readiness for school (there is
normal flexibility on the Continent of 4-6 months at each end of the
year of birth, depending on the child's maturity and subject to
parental consent, in contrast to the English rigid twelve months'
period based on date of birth which governs each year of schooling).
(b) Additional teaching time for pupils with difficulties, whether
during that part of the lesson when the main body of the class is
engaged on individual deskwork exercises, or at other times as part of a
teacher's time-tabled normal hours of duty. This strategy might be
considered as coming close to 'within-class grouping': but the
Continental objective of taking the whole class forward together leads
to radically different teaching approaches and consequences. The English
objective is to accept, encourage and widen differences among pupils
within a class: the Continental objective is to narrow differences
within the class. Skipping a class, or repeating a class, is the rare -
but sometimes the more worthwhile - option for those pupils who have
clearly moved outside the limits which can be accommodated if the class
as a whole is to move forward efficiently.
(c) Greater clarity on the essential elements of each year's
syllabus, specified in relation to what the great majority (say,
four-fifths) of pupils should master within the year; those essentials
are to be well consolidated, so that next year's teacher knows on
what to rely. This may seem obvious; but it has to be understood as
fundamentally contrasting with the National Curriculum in England which
is intended to be highly elastic in respect of each class's
teaching, encompassing what is normally to be expected of an average
child up to 3-4 years younger or older. For example, in a class at age
11 (the final year of primary school), an English teacher has to extend
her teaching to cover pupils' attainments expected for average
children of age 8 to age 14 (National Curriculum levels 2-3 to 5-6 are
specified as normally covering 80 per cent of children at age 11; for
the remaining 20 per cent of children, the teacher is expected to teach
even outside that range).(11) Pupils in a Continental class of course
also vary in their capabilities (though not as much as in England,
partly because of the measures described here) and extension and
additional consolidation work may have to be provided for them, to some
extent mirroring the problem faced in an English class; but the
important difference of principle remains that the Continental teacher
is expected to do his best not to encroach on the next class's
work, whereas the English teacher is specifically expected to stretch
his better pupils to the higher levels specified for average older
pupils.
In brief: there must be grave worries that the current encouragement
by official educational circles in England of within-class grouping of
pupils according to their ability will serve to widen the disparity of
pupils' attainments and, more generally, to exacerbate English
schools' teaching problems in relation to slow-developing children,
with a consequential slowing of the rate of progress of the class as a
whole. It is unfortunate that a too hurried and uncritical reading of an
academic Canadian summary of research may have encouraged that move.
Better ways forward are likely to be found by encouraging organisational
reforms, on the lines (just listed) that are virtually universal in
high-attaining Continental countries, and which serve to encourage more
whole-class teaching and so bring the whole class forward together.
NOTES
(1) See the section on the long tail of under-achievement by English
pupils, in comparison with leading Western European countries, in Prais,
S.J. (1997), 'How did English schools and pupils really perform in
the 1995 international comparison in mathematics, National Institute
Economic Review, April, 1997; an expanded version is to appear in Oxford
Studies in Comparative Education, 1998.
(2) The quotations are from the conclusions (p. 451) and abstract (p.
423) of 'Within-class grouping: a meta-analysis', by Yiping
Lou of Concordia University, Montreal, plus five co-authors (listed
unalphabetically), all Canadian educationists (four from Concordia, one
from Alberta), Review of Educational Research (University of
Wisconsin-Madison), 1996 (vol. 66, no. 4), pp. 423-58. The first-named
author is described as a PhD candidate in the department of the second
co-author, Professor P.C. Abrami, Director of the Centre for the Study
of Classroom Processes at Concordia. This article considerably extends
precious surveys on this topic by Slavin (1987) and by Kulik (1987,
1991; full references are in Lou et al.); it has received recent
favourable attention in British official educational circles and, since
it may affect English educational policy adversely, it deserves rather
fuller examination here than might otherwise seem necessary (on
punctuation: 'effect-size' has become a technical term among
educationalists; for clarity I have written it here with a hyphen throughout).
(3) Instead of the standard deviation of the control group, sometimes
an estimate of the pooled standard deviation of the control and
experimental groups is taken. If they do not differ significantly, such
a pooling may be unobjectionable. But if we suspect, for example, that
the experimental group has a higher standard deviation, then such a
pooling may amount to 'throwing away the baby with the
bathwater'.
(4) Putting children into groups for only a short period such as a
term or semester, and then returning them to their previous
organisational arrangements, cannot be expected to leave much effect;
this was noted in the course of a recent valuable research survey of
wider alternative methods of Setting and Streaming by W. Harlen and H.
Malcolm (Scottish Council for Research in Education, Edinburgh, 1997, p.
19) - though based on a study of 12 year-olds in the US as long ago as
1960. That research survey did not, however, consider the Continental
organisational strategies mentioned at the end of the present note.
(5) These extreme values are probably not untypical 'outliers', as appears from the more detailed analyses quoted
below relying on their published 95 per cent confidence interval.
Unfortunately, the 95 per cent confidence interval they published for
their overall size-effect of +0.17 (of +0.16 to +0.23; p. 439) was
subject to a misprint - as evident from its asymmetry - and Professor
Abrami has now kindly told me (in response to my query) that it should
read +0.14 to +0.21.
(6) Their average was calculated by weighting the estimated
effect-size of each comparison by the number of pupils involved; it
would have been more efficient to weight by the precision of each
estimate (the inverse of the error-variance), which would depend partly
on the number of pupils and partly on the closeness of the relationship.
(7) based on their table 8, p. 443. Their summary gives only the
number of comparisons (n), the average effect-size, and the '95 per
cent CI', which is the confidence interval for that average; but
what the reader really needs is the original range within which 95 per
cent of the estimated effect-sizes lie. Elementary statistical theory
fortunately allows the reader to deduce this (at least approximately) by
multiplying the range of their published confidence interval by [-square
root of n] . For example, the 95 per cent CI calculated by the authors
for the effect-size of +0.29 quoted in the text above is +0.24 to +0.35,
and is based on 36 comparisons; the 95 per cent range of the original
studies is thus [-square root of 36] (0.35 - 0.24) = 0.66, leading to a
95 per cent range in the original comparisons from -0.37 to +0.95. The
table of the Normal integral function shows that the latter correspond
to the 36th and 83rd percentiles, ie to the 9th and 21st pupils in a
class of 25.
(8) See, for example, another recent valuable research review by S.
Hallam and I. Toutounji, What Do We Know About the Grouping of Pupils by
Ability? (University of London Institute of Education, 1996), esp. pp.
8-9.
(9) There are some unfortunate typographical errors in that table, in
that the mean effect-size for 5-7 pupils was published as -0.02 instead
of +0.02 (the confidence interval published as '-0.02 to
-0.09' should read '-0.09 to +0.04'; I am grateful to
Professor Abrami for responding to my query on this).
(10) See, for example, Luxton, R. and Last, G. (1997),
'Under-achievement and pedagogy', National Institute
Discussion Paper no. 112, February (forthcoming in Teaching Mathematics
and its Applications, 1998); Bierhoff, H. (1996), 'Laying the
foundations of numeracy', Teaching Mathematics and its
Applications; and S.J. Prais (1997), 'School readiness, whole-class
teaching and pupil's mathematical attainments', Oxford Review
of Education.
(11) The current validity of this requirement (which is to be traced
to the National Curriculum specification of 1986, and previously to the
Eng!ish legendary 'seven-year spread of attainments' at each
age) was recently confirmed by an OFSTED report which noted: it is
normal for attainment at [age 11] the end of Key Stage 2 to range over
three or four National Curriculum leads in the core subjects and to
cover work as high as Level 5 and sometimes Level 6' (Using Subject
Specialists to Promote High Standards at Key Stage 2, OFSTED, 1997, p.
3). It will be remembered that a National Curriculum 'Level'
is defined to correspond to two years of teaching; and that Level 6 is
expected for average 15 year-olds.