文章基本信息

标题：The measurement of student achievement in music using a Rasch measurement model.
作者：Waugh, Russell F.
期刊名称：Australian Journal of Education
印刷版ISSN：0004-9441
出版年度：2001
期号：April
语种：English
出版社：Sage Publications, Inc.
关键词：Academic achievement;Music

The measurement of student achievement in music using a Rasch measurement model.

Waugh, Russell F.

This paper explains the making of an objective measure of music achievement for students in Western Australian government schools. Forty-five music achievement tasks were developed to reflect exemplary classroom practice for three levels of music achievement. The tasks included analysis and process type questions relating to listening and appreciating, identifying music aspects and performance, with some linked tasks across levels, to enable the tasks (items) for the three levels to be calibrated on the same continuum. The sample consisted of students from Years 3, 7, and 10. The tasks were placed onto a continuum of student achievement which was matched to a standards framework based on Student outcome statements: The Arts. A Rasch measurement model was used to create a music achievement scale and transform student raw scores into achievement estimates and item difficulties on the same scale, with a computer program called RUMM.

Introduction

The need to gather information about the effectiveness of education in `the arts' has been emphasised by the current push for accountability in education and recognition of the arts as one of the eight compulsory learning areas in the Western Australian K-10 curriculum. The generic title, `the arts', subsumes the disciplines of dance, drama, media, music and the visual arts. In Western Australia, it is intended that, during the primary school years, students will have the opportunity to experience several art forms and develop broadly based achievements in each discipline, with a view to specialisation in particular art forms at secondary school (Education Department of Western Australia, 1994, p.2). The present study, within a climate of educational accountability and a wider offering of the arts in Western Australian schools, focuses on the measurement of achievement in one aspect of the arts namely, music education.

The recognition of the arts as one of the important learning areas in education systems, as evidenced in initiatives such as the National Curriculum (Department of Education and Science, 1989), the American National Standards (Consortium of National Arts Education Associations, 1994), The arts--a curriculum profile for Australian schools (Curriculum Corporation, 1994a) and the Western Australian Student outcome statements (Education Department of Western Australia, 1996), reflects a trend towards wider recognition, within schools, of its importance in the development of the `whole person'. Arts educators involved in the writing of the Australian and Western Australian documents received strong support for the central role of the arts in school curricula in response to the draft versions of the documents (Emery, 1994, p.6) and this support and recognition of the importance of the arts, together with an emphasis on accountability in schools, has led to an increased awareness of the necessity to evaluate student achievement in music, objectively.

Although teachers regularly use methods of observation, checklists and anecdotal records within the classroom, the most common form of formal assessment used to establish levels, or compare students with the rest of the population, is test data that comprise formal gathering of information involving a structured situation, in which performance is assessed under standard conditions. This form of assessment is usually a requirement of entry into special educational courses or tertiary institutions and successful achievement in formal assessment is often a requirement of employers (Griffin, 1991, p.13). In learning areas that have been regarded as the core subjects such as mathematics and English, schools regularly use this type of formal testing to establish student grades or levels and, indeed in the area of music, formal testing of performance is commonplace. This testing in Western Australian schools, however, has been confined to the playing of set pieces and identifying students' knowledge of the musical elements, such as rhythm, melody, harmony, texture, and notation, and there has been no obvious attempt to gather information on students' creativity skills or their knowledge in the areas of aesthetics, criticism, or past and present contexts. The absence of an attempt to assess these skills is probably due to the difficulties involved in designing assessment instruments in these areas, and to the difficulties in reaching consensus as to how levels can be identified. Student outcome statements: The arts (Education Department of Western Australia, 1996) have now provided a framework for levels of development in the arts and, consequently, the opportunity to develop some methods of assessment in music learning is taken up in this study.

For the purposes of this study, the term `assessment' refers to the overall process of making analytical judgements, the term `evaluation' refers to the process of determining the extent to which individuals or groups possess certain skills, knowledge or abilities, the term `measurement' refers to the collecting of quantitative information to form a unidimensional scale and the term `test' refers to the use of a series of questions or activities to measure the skills, knowledge or abilities of individuals or groups (Lehman, 1996, p. 1).

Problems with current arts measurement

Five aspects of current assessment practices in music are called into question. One, current assessment practices do not indicate expertise in music as, for example, with typical teacher assessments like `participates enthusiastically', `enjoys music' and `attends practice regularly'. These types of assessment help to maintain music being regarded as a `frill' or extracurricular area and not as a `real' subject (Carlton, 1987, p.45; Gordon, 1992, p.24; Jorgensen, 1994, p.26; Kemp & Freeman, 1988, p.21; Lehman, 1996, p.6). Primary school reports to parents traditionally have placed undue emphasis on non-curricular factors, rather than on the skills and abilities of students. Second, there are no standards or objective measures that teachers can use across or between schools. Third, current practices do not provide a basis for teachers and schools to monitor student changes in music achievement. Four, current practices do not help make teachers accountable for helping students improve their music achievement. Five, assessment has not been carried out with interval level scales. Modern measurement programs are now available to create interval level scales where music achievement and item difficulties are calibrated on the same scale.

The present study examines the problem of a lack of reliable and systematic methodology for evaluating progress in music achievement in schools. It attempts to do this by developing an innovative range of authentic assessment tasks appropriate for use at system, school or classroom level, so that meaningful reporting of student outcomes in music can occur. For the purposes of this study, the term `authentic' describes assessment tasks that reflect exemplary classroom practice. The assessment tasks reflect good teaching and good assessment practice in classroom music. The skills and understandings identified in the authentic tasks are placed onto a continuum of students' skills which are matched to a standards framework based on Student outcome statements: The arts (Education Department of Western Australia, 1996).

Achievement tasks were developed for students in Year 3 (aged 8), Year 7 (aged 12) and Year 10 (aged 15). The reason for selecting these three levels is that they represent three significant stages of students' compulsory schooling--the conclusion of junior primary school, the conclusion of primary school and the conclusion of the compulsory years of education. Themes and stimulus material were linked across year levels and it will be possible in the future to develop assessment materials for students between these levels, using adaptations of the materials in the present study.

The knowledge, skills and abilities of students in the discipline of music were measured using the Extended Logistic Model of Rasch (Andrich 1988a, 1988b) and will be related to the Education Department of Western Australia (1994c) Student outcome statements: The arts as a framework for assessment. Links were made across age levels in an attempt to map progress of skills development along a developmental continuum.

Aims of the study

The aims of this study are to:

1 develop a music assessment instrument, incorporating both analysis and process aspects, appropriate for each of Year 3 (8 year olds), Year 7 (12 year olds) and Year 10 (15 year olds) in Western Australia;

2 show patterns of development from Year 3 through Year 7 to Year 10 by including common or `link' items in the instrument;

3 trial the music assessment instrument and generate marking keys based on data gathered at Western Australian schools;

4 mark the items and analyse the data using the Extended Logistic Model of Rasch so that the student measures of music achievement and the item difficulties are calibrated on the same interval level scale;

5 match the Music Achievement Scale to student outcome levels and determine cut-off points between levels;

6 analyse the data to provide state means for Year 3, Year 7 and Year 10 to provide teachers with comparisons of student performance;

7 analyse the data to provide comparative information on the performance of sub-groups; and

8 develop student profiles to provide teachers with descriptions of performance.

The Analysis task consisted of a set of stimulus materials to which students responded, primarily in relation to the Appreciating strands of `Responding, reflecting and evaluating', and `Understanding the role of the arts in society'. Students produced responses in relation to aesthetics, critical analysis, interpretation of meaning and music concepts, such as beat, rhythm, melody, dynamics, shape, mood and tension. Developmental processes involved comparisons and contrasts and the exploration of critical and contextual understanding focusing on particular periods of music history. Where possible, tasks were open-ended in order to provide students with the opportunity to demonstrate their maximum levels of ability. The Analysis task was designed to cover a time duration of approximately one lesson period at the appropriate level; that is, approximately 45 minutes at Year 3, 50 minutes at Year 7 and 60 minutes at Year 10.

A multi-media CD Rom version of the Year 3 Analysis task was designed and developed in consultation with a teacher colleague, who is not only an experienced Year 3 teacher but who is also a producer of educational computer software. The CD Rom was developed in an attempt to determine whether the limited literacy skills of Year 3 students, as well as the limitations involved in whole classroom access to stimulus materials, have an effect on students' results. The CD Rom includes visual material in high quality colour, sound digitised for music and moving images.

The CD Rom interface was designed so that students could complete tests at the screen, on an individual basis, thus allowing them the opportunity to listen to and view stimulus materials, as often as necessary, as well as having the questions read aloud, as often as necessary. Student responses were entered in a computer and, at the end of testing the whole class, the data were saved on a disk by the teacher, thus eliminating the need for large quantities of paperwork. There is already a high penetration of CD Rom in schools through school libraries and a proposal such as this may assist in increasing efficiency in the collection of data for future assessment. A small scale study involving approximately 120 Year 3 students was conducted during the present study, using both the CD Rom version and the hard copy version of the task.

The process aspect offered a broad view of student abilities through the documentation of steps in music learning which lead to the performance of their final products. The process targets Student outcome statements: The arts (Education Department of Western Australia, 1996) Expressing strands of Creating, exploring and developing ideas, Expressing and using skills, techniques, technologies and processes, and provides evidence of students' planning processes towards a simple composition and performance in music. The activities in which students engage provide opportunity for inquiry and the use of music language which are fundamental elements in the creative process leading to the development of worthwhile music. These activities provided direct evidence of the students' skills and learning, as well as concrete evidence for evaluation, using marking keys that were developed during trials. An important feature of the process instrument is the opportunity for students' reflection and self-appraisal of their work. The process assessment is designed to cover a time duration of approximately two lesson periods at the appropriate level and is based on a clearly structured framework, beginning with an appropriate stimulus and culminating in the performance of the composition.

The framework of Student outcome statements: The arts (Education Department of Western Australia, 1996) provides a series of descriptions of standards against which performance can be gauged. Test items are a set of developmental indicators of achievement that are mapped against the skills and abilities described at each level of the outcome statements. For purposes of reporting, descriptions of typical understandings which can be expected at each level are calibrated onto a measurement scale--the higher the calibration, the more difficult the item. Student levels of achievement are simultaneously calibrated on to the same scale and mapped as an arbitrary numerical scale that is organised at equal levels along the continuum, thus facilitating reporting of student performance data.

Literacy competence is not a criterion and hence spelling, grammar and sentence construction are not assessed. The criteria for evaluation were emphasized during item writing and the design of marking keys, and students were made aware of these criteria during testing. Literacy levels were kept at an understanding appropriate to the year level. Ongoing consultation with classroom practitioners was undertaken to refine items and language for the relevant year levels.

Significance

The study adds to knowledge in three ways. First, it tests a theoretical model of standards based on Student outcome statements: The arts (Education Department of Western Australia, 1996), as it is applied to music learning. The model has been trialled in Western Australia over the past two years and is due to be operational in Western Australian government schools from 2001 onwards. The model has not been analysed using a Rasch measurement model before and this study provides the first test of the model.

Second, the study adds to knowledge of measurement of standards in music achievement. It will be of importance to teachers in Western Australia, as the assessment methods and instruments developed mean that specialist and generalist teachers in Western Australia will have access to reliable, authentic assessment material, reflecting exemplary classroom practice. It will not only provide teachers with a useful set of instruments with which to measure student progress in music, but it will also provide them with authentic models on which to base future assessable classroom activities. It will significantly contribute to teacher knowledge in music education and to the use by teachers of Student outcome statements: The arts (Education Department of Western Australia, 1996) to measure progress, because there are no current standardised benchmarks of student achievement in music at government schools in Western Australia.

Teachers engaging in classroom music programs will be able to use the material in four ways. First, they will be able to map activities to the outcome statement levels to provide clear examples of requirements at that level and, while all music teachers will find this useful, examples are particularly needed by generalist teachers. Second, teachers, both specialist and generalist, will be able to identify activities which can be matched to specific strands of Student outcome statements: The arts (Education Department of Western Australia, 1996). Third, they could access examples of activities demonstrating the aesthetically oriented sub-strands of `Responding, reflecting and evaluating' and `Understanding the role of the arts in society' which are currently unavailable. Four, they could link items across different levels of the Student outcome statements: The arts (Education Department of Western Australia, 1996). For instance, the marking keys will provide opportunities to measure open-ended responses at different levels on the continuum, providing links from one level to the next. At present, there are no syllabus documents in Western Australia which provide any of this information to either specialist or generalist teachers in music education.

The third way in which this study will add to knowledge is by helping educational administrators in gathering whole-school information in music. School administrators are obliged to develop a management information system in their schools that provides whole-school data in each of the eight learning areas, for reporting to the district superintendent and for planning priorities and future teaching programs. This study will provide data enabling them to gather reliable material in music achievement that can be interpreted and linked to achievement in other aspects of the arts.

Limitations

There are three limitations to this study. These are associated with the sample and its generalisability, restrictions of data to the subject of music and consistency of marker judgements of open-ended achievement tasks.

The first limitation refers to the population of Year 3, Year 7 and Year 10 students to whom the tasks were administered. The students were drawn from government schools only. No students from private schools or independent schools were tested. Hence, strictly, the results of this study are only representative of Year 3, Year 7 and Year 10 students in government schools in Western Australia. Because of the nature of group activities, whole classes were tested and only one Year 3 or Year 7 class was tested in each school. Schools and classes were drawn randomly from all Western Australian metropolitan and country primary schools with a minimum Year 3 or Year 7 population of six. In secondary schools, the whole class samples consisted of students who were currently studying music and were drawn randomly from all secondary schools in Western Australia which offered music at Year 10 and which had a minimum population of six in the music class.

The second limitation refers to the tests designed to test students' abilities and performances in music only. Therefore, although Student outcome statements: The arts (Education Department of Western Australia, 1996) outline generic levels across the five disciplines of dance, drama, media, music and visual arts, it is not possible to generalize about levels in disciplines other than music. That is, if students are reported as having achieved a level three in music, it is not possible to assume that they have achieved a level three in drama or dance, for example.

The third limitation refers to the consistency of marker judgements in contexts different from the standardised procedures used in this study. The student responses to the open-ended tasks had to be marked consistently. Markers were given one day's training and moderation so that they were able to establish and maintain consistent standards. Markers then took the student responded tasks away for marking. Spot checks were made on the marks and, where discrepancies were found, these were re-marked. The standards, scales and profiles created in this study are only valid where teachers use the same marking standards.

Measurement model

The Extended Logistic Model of Rasch is used with the computer program Rasch Unidimensional Measurement Models (RUMM) (Andrich, Lyne, & Sheridan, 1997) to analyse the data. This model unifies the Thurstonian goal of item scaling with extended response categories for items measuring, for example, student achievement in music, which are applicable to this study. Item difficulties and student measures are placed on the same scale. The Rasch method produces scale-free student measures and sample-free item difficulties (Andrich, 1988b; Wright & Masters, 1982). That is, the differences between pairs of student measures and pairs of item difficulties are expected to be sample independent. The RUMM program parameterises an ordered threshold structure, corresponding with the ordered response categories of the items. The thresholds are boundaries located between the response categories and are related to the change in probability of responses occurring in the two categories separated by the threshold. Thresholds should be ordered when the data fit the model.

The zero point on the scale does not represent zero achievement of music. It is an artificial point representing the mean of the item difficulties, calibrated to be zero. It is possible to calibrate a true zero point, if it can be shown that an item represents zero music achievement. There is no true zero point in the present study.

The RUMM program substitutes the parameter estimates back into the model and examines the difference between the expected values predicted from the model and the observed values using two tests-of-fit: one is the item-trait interaction and the second is the item-student interaction. The item-trait test-of-fit (a chi-square) examines the consistency of the item parameters across the student estimates for each item and data are combined across all items to give an overall test-of-fit. The latter shows the collective agreement for all items across students of differing measures. The item-student test-of-fit examines both the response pattern of students across items and items across students. It examines the residual between the expected estimate and the actual values for each student-item summed over all items for each student and summed over all students for each item. The fit statistics approximate a t distribution with a mean of zero and a standard deviation of one, when the data fit the model. Negative values indicate a response pattern that fits the model too closely (probably because dependencies are present, see Andrich, 1985b) and positive values indicate a poor fit to the model (probably because `noise' or other measures are present).

Sample

The final sample for music testing at Year 3 consisted of 40 classes, providing a total of 946 students. This compares with a total number of Year 3 students in government schools in 1996 of 20 661. Of the 946 students tested, 426 were identified as gifts and 486 were identified as boys. There were 34 students who did not state their gender. Other sub-groups identified in the sample were Aboriginal and Torres Strait Islander students, of which there was a total of 59 at Year 3, and non-English-speaking background students, of which there was a total of 122 at Year 3.

The final sample for music at Year 7 consisted of 40 classes, providing a total of 921 students. This compares with a total number of Year 7 students in government schools in 1996 of 20 524. Of the 921 students tested, 397 were identified as girls and 487 were identified as boys, with 37 students not stating their gender. The total number of Aboriginal and Torres Strait Islander students identified in the sample at Year 7 was 44, and the total number of non-English-speaking background students was 114. The overall total of primary school music tests submitted for marking at Years 3 and 7 was 1867.

The final sample for music testing at Year 10 consisted of 20 classes, providing a total of 324 students. Of these, 172 were identified as girls and 139 were identified as boys, with 13 students not stating their gender. There were 17 Aboriginal and Torres Strait Islander students and 41 non-English-speaking background students identified in the sample.

Structure of the Music Analysis Sub-test

A combination of multiple choice and extended answer question types was included in the tests and, where possible, tasks were open-ended in order to provide students with the opportunity to demonstrate their maximum levels of ability. As this was an assessment of music, student responses were not assessed for spelling or writing skills. Through the use of common items and common stimulus material, tasks allowed for linking of items through Years 3, 7 and 10, thus providing valuable information on student progression through the outcome levels. Where subjective questions asking for students' opinions or reflections were asked, they were used as prompts for further justification and were not scored.

At Year 3, teachers were provided with an audio tape of the piece `Ballet for children' (Bliss, 1995), which was recorded in parts, as well as containing verbal instructions for teachers on where to pause the tape. Teachers were then requested to: read the questions for part 1, play the passage of music for part 1, and read the questions one at a time, giving the students reasonable time to answer before going on to the next question. When part 1 was completed, they then repeated the procedure for parts 2 to 7.

The test contained 13 questions that were designed to assess the outcome levels primarily in the two Appreciation strands from Level 1 to 5.

Question 1 demonstrates a Level 1, multiple-choice item. Students were asked: `Where would you be most likely to hear this piece of music?'. They chose their answer from the selection provided which was: birthday party, orchestral concert, street parade, rock concert. This item covers the Level 1 statement: `identifies arts experiences in their own lives' in the strand Understanding the role of the arts in society (Education Department of Western Australia, 1996, p.3).

Question 2 demonstrates an extended answer item type and asks students: `Explain what you heard in the music that made you pick this answer' (referring to their answer to question 1). This question provided the opportunity for students to provide a range of responses from Level 2; that is, `outlines features of their own and others' arts works and activities using simple arts terminology relating their responses to these features, to Level 5'; that is, `uses arts terminology and critical frameworks to analyse and express informed opinions about arts works and activities in the strand Responding, Reflecting and Evaluating' (Education Department of Western Australia, 1996).

Question 5 represents an example of a subjective question asking for students' personal responses. Students were asked for their interpretation of the mood of the piece by selecting from the answers: `sleepy', `happy', `sad', or `angry'. Where students were asked for a personal response such as this, answers were not assessed. However, this type of question was always followed up by asking for a justification of their response as demonstrated by question 6; that is `Explain what you heard in the music that made you pick this answer'. This question required an extended answer that demonstrated students' knowledge of the elements of the music and allowed them to respond up to Level 5 in the strand `Understanding the role of the arts in society', that is: `identifies and discusses distinguishing features of arts works which locate them in a particular time, place or culture' (Education Department of Western Australia, 1996). All questions in the test, apart from multiple-choice items, had the capacity to earn partial credit for students who answered below the targeted level.

At Year 7, teachers were provided with an audio-tape of the same stimulus piece as that for Year 3, with an additional piece entitled `Dharpa' (Kellaway & Yunupingu, 1992). The format was similar to that of the Year 3 test with the test being presented in parts, from part 1 to part 9, containing a total of 15 questions. Teachers were instructed to: ask the students to read the questions for part 1 (or read aloud if you think that it is necessary), play the passage of music for part 1, give the students reasonable time to answer all the questions in part 1. When part 1 was completed, they were then asked to repeat the procedure for parts 2 to 9.

Question types were similar to those in the Year 3 tests with the addition of a `compare and contrast' item, as demonstrated by question 14, which allowed the students to compare and contrast the two stimulus pieces in the areas of instrumentation, expression and rhythm. This question covered the Level 5 statements: `Identifies and discusses distinguishing features of arts works which locate them in a particular time, place or culture'; and `Identifies and discusses the distinguishing features of arts works and activities in contemporary Australian society' (Education Department of Western Australia, 1996, p.3) from the strand `Understanding the role of the arts in society'.

Questions 3, 4, 5, 7 and 10 were linked to the Year 3 test. This provided the opportunity for comparisons to be made, and progress to be mapped, between Years 3 and 7 students. Items were coded so that the same item was given the same code name across the three levels. For instance, Year 3 item 7, Year 7 item 3 and Year 10 item 3 was coded MU07. As for the Year 3 test, answers to questions which were not multiple-choice item types earned partial credit for lower level responses.

The structure for the Year 10 Analysis tests was similar to that of the Year 3 and 7 tests. Both of the stimulus pieces used at Year 7 were provided, together with an additional, more complex, contemporary piece entitled Earthcry Kakadu (Sculthorpe, 1989). The test consisted of 17 questions and the audio-tape was played in 10 parts.

Item types similar to those of the Year 3 and 7 tests were used, with the addition of more complex items, providing the potential for students to respond as high as Level 8: `Researches arts works from a variety of contexts, understanding how histories are constructed in the arts and how their own expression and appreciation of the arts is shaped by them'; and `Critically examines the ways the arts challenge and shape values and are influenced by prevailing values' (Education Department of Western Australia, 1996, p.3). An example of this is Question 13, which asks; `What effect has this style of music had on Australian culture?'.

It should be emphasised that, while items in all tests at Years 3, 7 and 10 were targeted towards particular outcome levels, all, apart from multiple-choice items, allowed for partial credit to be awarded and the analysis of the data, using a Rasch model, provided item difficulty estimates which enabled outcome levels of achievement to be established. Partial credit item categories for the Year 10 tests were outlined in the Year 10 Music analysis marking key.

It was possible to make comparisons among the three year levels, and to map progress from Year 3, through Year 7 to Year 10 through the use of link items. Questions 3, 4, 5, 6, 7 and 10 in the Year 10 tests are linked to both the Year 3 and Year 7 tests. An example of a successful link item is Question 10 in the Year 10 test which refers to the stimulus piece `Ballet for children' (Bliss, 1995) which asks students `Explain how the music ends'. This question provided the opportunity for students to provide responses varying from a simple Level 1 answer such as `It ended very loud' to high level responses where they aurally identified and described distinguishing features and used musical language to describe and discuss elements such as harmonic and rhythmic tension.

Structure of the Music Process Sub-test

The structure for the Process tests was the same for Years 3, 7 and 10. First, students participated in a directed music warm-up that was intended to focus students' thinking on the creative use of sound and different musical elements. Following the warm-up, they were presented with a stimulus that they examined before participating in a class brainstorming activity to discuss the stimulus. They were then instructed to: write down their own ideas about different sounds that could be used to represent the stimulus, join a small, pre-determined group to plan a composition to reflect the stimulus and notate the composition in either traditional form, or their own style. Groups then rehearsed their pieces before performing them for the class. Teachers videotaped the group performances for central marking. Specific instructions were given for the videotaping process to avoid differences in the quality of productions. After all groups had presented their items, students were asked, individually, to complete a critique of their groups' performances. Links were achieved through Years 3, 7 and 10 by using the same procedure, the same items and the same marking key across the three year groups. Tasks were developmental so that, potentially, it was possible for students at all levels to achieve as high as Level 8.

There were differences between the groups in time allocations, as primary school students cannot stay on task as long as Year 10 students. The stimulus material used at Year 3 was different from that used at Years 7 and 10 as the interpretation of a painting, which was required from the two higher year groups, was considered too difficult for Year 3 children.

The stimulus used at Year 3 was a videotaped excerpt from a newsreel depicting the calm before a storm, the build-up and climax of the storm and the stillness of the devastation after the storm. This structure was intended to guide the students into using basic form; that is, beginning, middle and end, in their compositions. In order to acquaint students with the points for assessment, they were supplied with information entitled `Ideas to help you make your composition'. The time specified for the Year 3 test was approximately 85 minutes, comprising approximately 40 minutes for the warm-up, viewing the stimulus, brainstorming, group planning and group rehearsal. Following a short recess or lunch break, the remaining 45 minutes was used for the final rehearsal, the group performance, the student critique of their performance, and collection of materials.

The structure for the Year 7 Process test was similar to that used at Year 3, except that the time allocation for the Year 7 Process test was 110 minutes. The first 55 minutes was allocated to the warm-up, brainstorming and discussion, group planning and rehearsal. After a short break, the second 55 minutes was used for the final rehearsal, group performance, student critique and collection of materials. The stimulus for Year 7 was a painting entitled Heaven and Earth (Pericles, 1978) which was selected to provide some contrast, intended to assist students in their use of form. Year 7 students were supplied with a more detailed guide than that provided at Year 3, to acquaint them with points for assessment. This guide, entitled `Ideas to help you make your composition', used musical terminology to describe the elements students were expected to include in their compositions. This terminology, however, was accompanied by explanations of meaning; for instance, "harmony--two or more sounds heard together'.

The structure for the Year 10 test was similar to that used at Years 3 and 7, except that, at Year 10, the time allocation was 115 minutes. There was no break in the time allocation as, unlike primary school children, Year 10 students are expected to work for this period of time without a break. The `Ideas to help you make your composition' page described the same musical elements as those for Year 7 except that there was no explanation of the musical terminology. The stimulus for Year 10 was the same painting, Heaven and Earth (Pericles, 1978), as that used for Year 7.

Development of the marking keys

Music Analysis marking key

In order to ascertain categories for the partial credit model to be used to mark the analysis items, it was necessary to trial the items with children in Western Australian classrooms. This was done by asking teachers to volunteer to administer the tests to their classes. After collection of the materials, the extended-answer test items were examined one by one to determine what types of responses students were likely to give. These were then collapsed into three or four general categories for each question, examined against Student outcome statements: The arts (Education Department of Western Australia, 1996) and categorized in order of difficulty. Answers which were wrong, made no sense, or were tautological were given 0 marks, answers which provided little information were given 1 mark, those which provided more were given 2 marks and so on. Items usually had between two and four categories. In four of the items, after the analysis of the data, some categories were not discriminating sufficiently from each other. In these cases, categories had to be collapsed and the items re-scored.

Music Process marking key

Experimentation was carried out to establish the most effective structure for marking keys. As mentioned previously, each group's performance was videotaped so markers could watch it as often as necessary to allocate the appropriate mark. Again, the trial material was used to finalise the most effective method of marking.

First, to reflect the development of skills, a line continuum was developed in a style similar to a Likert scale. For instance, the marker was prompted with the question, `How effectively has the student's artwork communicated his or her ideas?'. Along a continuous line across the page were three vertical marks. Under the first mark was the indicator, `not very effectively', with four descriptors (no mood evident, no evidence of form, no use of musical elements, and lacks confidence). Under the middle mark was `somewhat effectively' with three descriptors (suggests a mood, some evidence of musical elements, and some confidence shown). Under the third mark was `very effectively' with six descriptors (clearly shows mood, makes use of musical elements as harmony, rhythm, makes good use of instruments, musical has a form, and confident music). Under the fourth mark was `very effectively'. A problem with this method was the tendency for markers to be inclined to allocate a level in between the indicators. An attempt was then made to divide the line into smaller degrees with 20 marks along the continuum so that levels between the descriptors could be measured. This resulted in markers tending to count the marks and give a score out of 20. This was detrimental to the notion of assessing and describing what students can actually do, and reverted back to the old method of allocating a numerical score. It appeared that using this style of marking did not fit with the concept of the vertical progression of student achievement described in the outcomes framework and so experimentation was carried out to design a marking key in a vertical rather than a horizontal format.

Finally, a method known colloquially as a `marking tree' was developed. A prompt question to the marker, such as `How effectively has the group used expression?' was followed by a sequential, vertical list of competency levels matched to a mark allocation. For instance, 0 mark for `no evidence'--no expression--even sound, all loud or all soft, 1 mark `for beginning to develop'--slight changes in dynamics--loud/soft, 2 marks for `sound development'--obvious variation in dynamics, tempo and/or melody in an attempt to reflect mood, 3 marks for `well developed'--effective use of dynamics, tempo, rhythm, melody, harmony, tone, etc. to reflect mood--some evidence of organisation in planning as well as performance, 4 marks for `highly developed'--exceptional use of elements to create a pleasing sense of expression which clearly conveys mood--inclusion of appropriate variety of dynamics, tempo, rhythm, melody, harmony, tone, texture, legato and staccato--evidence of organisation/leadership in planning and performance. Using this structure, markers could not mark between the descriptors and had to allocate the one which most closely reflected the student's performance.

Data collection

The tests were administered in school classrooms that reflected students' usual learning environments. In primary schools where a specialist music teacher normally taught music classes, the music teacher administered the tests. In primary schools where there was no music specialist, the teacher who normally taught music to the class administered the tests. This was usually the classroom teacher. In secondary schools, the specialist music teacher administered the tests.

In order to reduce variability in administration of the tests, explicit administration instructions were distributed to teachers. These included the overall time allocation for the tests, as well as times to be apportioned for specific sections of the tests. Instructions were also given as to what the teacher was required to prepare before administering tests. For the Process test, this included the viewing of a teacher training video demonstrating the warm-up and group work.

Teachers were instructed to help students who were having difficulty following instructions or reading questions, but were asked, emphatically, not to help them with the actual task. Standardised wording for the teacher's verbal instructions to the students were provided and teachers were instructed not to deviate from this, except to clarify understanding. At the Year 3 level, teachers were asked to read questions aloud while students followed, whereas at Years 7 and 10 they gave students time to read the questions themselves, assisting only when requested. The Analysis stimulus audio tapes were divided into parts to correspond with the parts in the test paper, with the voice on the tape instructing when to pause the tape.

For the Process test, teachers were instructed to organise the students into groups of four prior to testing. If numbers were uneven, groups of three or five were allowed. Some control over group selection was exercised by providing teachers with a numbered list on which an asterisk had been placed beside every fourth number. Teachers were then asked to copy students' names directly from their classroom attendance roll onto the list. Each asterisked student became the nucleus of a group and teachers then organised students to create the most suitable working groups.

Guidelines for the administration of the Process test were very explicit and teachers were asked to adhere rigidly to the verbal instructions provided during the time prior to the group planning and rehearsal session. During the group planning and rehearsal, teachers were asked to move around the room, supervising as they would in a normal classroom situation, dealing with questions or problems, or clarifying, when necessary, but without actually helping students with the task.

It was important to have good quality videotapes for the central marking of performances. To ensure that teachers supervised classes adequately during videotaping of performances, they were requested to work in collaboration with a support teacher or student to operate the video camera. Clear instructions as to the positioning of the camera, the background, the size of the performing area and identification of groups were provided. These instructions minimised the potential for markers being influenced by either professionally produced videotapes or poor quality ones.

Transforming the logit values

For the purposes of reporting, and to eliminate the use of negative values for student ability, the logit scale was converted to a scale from 0 to 800, to reflect the eight levels of outcomes contained in Student outcome statements: The arts (Education Department of Western Australia, 1996). After being adjusted to 0.7 probability, the minimum logit value of the sampled students (-3.75) was transformed to the arbitrary scale score of 0. The maximum logit value (+4.56), after being adjusted to 0.7 probability, was transformed to the arbitrary scale score 800. The equation used to perform this conversion is 800/[logitmax-logitmin].

Psychometric characteristics of the Music Achievement Scale

The 45 items relating to Music Achievement have a good fit to the measurement model, indicating a strong agreement between all 2191 students to the different locations of the items on the scale (see Table 3). That is, there is strong agreement among the students to the item difficulties along the scale. The item threshold values are ordered from low to high indicating that the students have answered consistently and logically with the ordered response format used (category responses were collapsed for several items to ensure correct ordering). The Index of Student Separability for the 45 item scale is 0.900 and the Index of Item Separability is 0.928. This means that the proportion of observed variance considered true is 90 per cent. The difficulties of the items have a similar spread along the scale to that of the student measures. This means that the items are targeted appropriately for the students.

The item-trait tests-of-fit indicate that the values of the item difficulties are strongly consistent across the range of student measures. The item-student tests-of-fit (see Table 3) indicate that there is good consistency of student and item response patterns. These data indicate that the errors are small and that the power of the tests-of-fit are excellent. All these data are evidence for the validity of the Music Achievement Scale.

Student performance levels

The mean level for each year group shows a clear pattern of development from Year 3 through Year 7 to Year 10, although there is considerable overlap in performance between the year groups, as would be expected. For example, the highest achieving 10 per cent of Year 7 students performed above the level demonstrated by approximately 25 per cent of Year 10 students.

Over 80 per cent of Year 3 students demonstrated skills associated with Level 2 outcomes in music. This means they can work in a group to plan and create a simple sound piece to interpret a given stimulus, including the creation of a simple score, notating their own rhythms, melodies and accompaniment patterns using simple known methods. They reflect upon music works, noting particular features including melody, instruments used, form and expression. They identify the purpose of a work and how it affects the way it should be performed. They apply simple critical reflections on their preferences and describe sounds using basic musical terms.

Over 55 per cent of Year 7 students demonstrated skills associated with Level 3 outcomes in music. This means they can compose short, simple, structured musical works using tuned or untuned percussion instruments, recorder, sounds from the environment, voice and body percussion. They are able to recognise aurally and describe musical features such as simple rhythmic and melodic patterns, tempo, instrumentation, timbre, dynamics and structure and use and interpret signs and symbols representing pitch, duration of sound and dynamics. They can describe obvious features such as repetition, form, changes in dynamics and texture, as well as identifying music from another culture and associating characteristics of the music with the style.

Over 80 per cent of Year 10 students demonstrated skills associated with Level 4 outcomes in music. This means they can create musical works that capture characteristics of a given stimulus and interpret elements of pitch, rhythm, dynamics and phrasing in composition. They explore major and minor tonalities, textures, forms, media, and invent a soundscape score related to the theme. They explore combinations of sounds from the environment, chords, ostinati, and incorporate known structures such as ternary or binary form. They are able to give reasons why a musical element used in a piece is important and how it was used to create the perceived mood, tension and purpose. They can compare music from different times, places or cultures, identifying notable differences in musical characteristics. Table 4 gives a summary of the overall performance of Year 3, 7 and 10 students in music and the scale of student performance and outcomes achieved.

Conclusions

A Music Achievement Scale to measure student outcomes in classroom music across both the Appreciating and the Expressing strands of Student outcome statements: The arts (Education Department of Western Australia, 1996) was successfully developed. Validity of the measure of student achievement in classroom music was established by testing the materials with a sample of 2191 students in Western Australian primary and secondary schools and conducting a Rasch model of analysis using the RUMM program. Overall fit as well as individual fit of items to the model was established and thresholds were adjusted where necessary, so that they are properly ordered. The proportion of observed variance considered true was 90 per cent and the achievement tasks were developed according to a theoretical model. The power of the tests-of-fit were excellent.

The tests have been administered by both generalist and specialist music teachers in schools and are suitable for use by either group. Teachers will be able to use students' raw scores to compare their results with the data gathered across the state for this testing program. Outcomes which relate to aesthetics, critical analysis, interpretation of meaning and music concepts have not been measured with any level of reliability in Western Australian classrooms before, nor has there been any opportunity for teachers to make comparisons using a common framework. These tests will provide these opportunities, as well as providing a model of good classroom practice based on Student outcome statements: The arts (Education Department of Western Australia, 1996) framework.

Marking keys and item descriptions have been worded to provide descriptions that can be understood by generalist as well as specialist teachers at both primary and secondary levels. Although the tests were designed for testing at Years 3, 7 and 10, they have been developed to reflect a developmental continuum and so are not targeted at specific year levels. This means that, although comparisons with state means at particular year levels are not possible, the tests can be used as a valuable tool for gathering classroom or whole school data in relation to Student outcome statements: The arts (Education Department of Western Australia, 1996).

Keywords

achievement

measurement techniques

music education

primary school students

school based assessment

secondary school students

Table 1 Music Analysis sub-test item links and levels

Code Max score Year 3 Year 7 Year 10 SOS level

Mu01 1 1 App 2.1
Mu02 2 2 App 2.3, 2.4
Mu03 1 3 App 2.1
Mu04 2 4 App 1.3, 2.3, 2.4
Mu05 No score 5 App 1.1, 2.1
Mu06 2 6 App, 1.2 - 1.4
Mu07 1 7 3 3 App 1.2
Mu08 2 8 4 4 App 1.4, 1.5
Mu09 1 9 5 5 App 1.2, Exp 2.2
Mu10 1 10 6 6 App 1.2
Mu11 1 11 7 7 Exp 2.3
Mu12 4 12 10 10 App 1.2 - 1.5
Mu13 3 13 App 1.2 - 1.5
Mu14 1 1 1 App 1.2
Mu15 3 2 2 App 1.3 - 1.5
Mu16 No score 8 8 App 1.1 - 2.1
Mu17 3 9 9 App 1.3 - 1.5
Mu18 1 11 11 App 2.2
Mu19 3 12 12 App 2.2 - 2.5
Mu20 3 13 13 App 2.3 - 2.5
Mu21 4 14a 14a App 1.3 - 1.6
Mu22 4 14b 14b "
Mu23 3 14c 14c "
Mu24 4 15 App 1.2 - 1.5
Mu25 3 15 App 1.2 - 1.8
Mu26 4 16 App 1.2 - 1.7, Exp 1.6
Mu27 4 17 App 1.4 - 1.5

Key: Mu14: Music coded item 14

App: Appreciating strand

SOS: Student outcome statement level

Exp: Expressing strand 1.3: sub-strand 1, level 3
2.5 sub-strand 2, level 5
Table 2 Music Process sub-test item links and levels

Code Max score Year 3 Year 7 Year 10

MuP01 4 1 1 1
MuP02 4 2 2 2
MuP03 4 3 3 3
MuP04 4 4 4 4
MuP05 4 5 5 5
MuP06 4 6 6 6
MuP07 4 7 7 7
MuP08 4 8 8 8
MuP09 4 9 9 9
MuP10 3 P
MuP11 4 C1
MuP12 4 C2
MuP13 3 P P
MuP14 4 C1
MuP15 4 C2
MuP16 4 C1
MuP17 4 C2
MuP18 4 1 1
MuP19 4 2 2
MuP20 4 3 3
MuP21 4 4 4
MuP22 4 5 5
MuP23 4 6 6
MuP24 4 7 7
MuP25 4 8 8
MuP26 4 9 9

Code SOS level

MuP01 Exp 1.1 - 1.8, App 1.1 - 1.8
MuP02 Exp 1.1 - 1.8
MuP03 "
MuP04 Exp 1.1 - 1.8
MuP05 Exp 2.1 - 1.8
MuP06 "
MuP07 "
MuP08 "
MuP09 Exp 1.1 - 1.8, Exp 2.1 - 2.8
MuP10 Exp 1.1 - 1.8
MuP11 App 1.1 - 1.8
MuP12 App 1.2 - 1.8
MuP13 Exp 1.2 - 1.8
MuP14 App 1.2 - 1.8
MuP15 App 1.2 - 1.8
MuP16 App 1.2 - 1.8
MuP17 App 1.2 - 1.8
MuP18 Exp 1.1 - 1.8, App 1.1 - 1.8
MuP19 Exp 1.1 - 1.8
MuP20 "
MuP21 Exp 1.1 - 1.8
MuP22 Exp 2.1 - 1.8
MuP23 "
MuP24 "
MuP25 "
MuP26 Exp 1.1 - 1.8, Exp 2.1 - 2.8

Key: MuP11: Music Process coded item 11

App: Appreciating strand

SOS: Student outcome statement level

Exp: Expressing strand

1.4: sub-strand 1, level 4

2.1: sub-strand 2, level 1
Table 3 Summary data of the reliabilities and fit statistics
for the Music Achievement Scale (n=2191)

Index of Student 0.900
 Separability (reliability)
Index of Item 0.928
 Separability (reliability)
Item fit statistic mean -1.377
 standard deviation +6.132
Student fit statistic mean -0.248
 standard deviation +1.198
Item-trait interaction chi-square 3456.156
 df 450
 p <0.001
Power of test-of-fit excellent
Table 4 Summary of student achievement in music by year
and framework level

 Number of Mean achievement Standard Framework
 students music score deviation level

Year 3 946 294 85 2
Year 7 921 359 82 3
Year 10 324 525 105 4/5

References

Andrich, D. (1982a). An extension of the Rasch model for ratings providing for both location and dispersion parameters. Psychometrika, 47, 105-113.

Andrich, D. (1982b). An index of person separation in latent trait theory, the traditional KR.20 index, and the Guttman scale response pattern. Education Research and Perspectives, 9 (1), 95-104.

Andrich, D. (1985a). An elaboration of Guttman scaling with Rasch models for measurement. In N. Brandon-Tuma (Ed.), Sociological methodology (chap. 2, pp. 33-80). San Francisco: Jossey-Bass.

Andrich, D. (1985b). A latent trait model for items with response dependencies: Implications for test construction and analysis. In S. Embretson (Ed.), Test design: Contributions from psychology, education and psychometics (pp. 245-275). New York: Academic Press.

Andrich, D. (1988a). A general form of Rasch's Extended Logistic Model for partial credit scoring. Applied Measurement in Education, 1 (4), 363-378.

Andrich, D. (1988b). Rasch models for measurement (Sage university paper on quantitative applications in the social sciences, series number 07/068). Newbury Park, CA: Sage Publications.

Andrich, D. (1989). Distinctions between assumptions and requirements in measurement in the social sciences. In J.A. Keats, R. Taft, R.A. Heath, & S.H. Lovibond (Eds.), Mathematical and theoretical systems. Amsterdam: Elsevier Science Publishers.

Andrich, D. (1991). Rasch models for measurement (Quantitative applications in the social sciences 68). Beverly Hills: Sage Publications.

Andrich, D., Lyne, G., & Sheridan, B. (1997). RUMM: A Windows-based item analysis program employing Rasch Unidimensional Measurement Models. Perth: Murdoch University & Edith Cowan University.

Armstrong, C.L. (1994). Designing assessment in art. Reston: National Art Education Association.

Bliss, A. (1995). Bliss conducts bliss: A colour symphony things to come, suite introduction & allegro [CD Rom]. London Symphony Ortchestra. London: Dutton Laboratories.

Bonser, S. & Grundy, S. (1995). Trialing student outcome statements: An arts faculty focus. Perth: Quality Development and Educational Services.

Carlton, M. (1987). Music in education. London: Woburn Press.

Consortium of National Arts Education Associations. (1994). National standards for arts education. Reston: Music Educators National Conference.

Curriculum Corporation. (1994a). The arts--a curriculum profile for Australian schools. Burwood, Vic.: A.E. Keating (Printing) Pty Ltd.

Curriculum Corporation. (1994b). A statement on the arts for Australian schools. Burwood, Vic.: A.E. Keating (Printing) Pty Ltd.

Department of Education and Science. (1989). National curriculum: From policy to practice. Stanmore, NSW: Publications Dispatch Centre.

Education Department of Western Australia. (1993). SEA assessment policy: Art Years 11 and 12. East Perth: Education Department of Western Australia.

Education Department of Western Australia. (1987a). Practical and creative arts: Class music. Perth: Curriculum Branch.

Education Department of Western Australia. (1987b). The unit curriculum: Assessment and grading procedures. Perth: Curriculum Branch.

Education Department of Western Australia. (1993). Tertiary entrance examinations. Perth: Author.

Education Department of Western Australia. (1994a). Profiles of student achievement. Perth: Author.

Education Department of Western Australia. (1994b). Student achievement in health and physical education in Western Australian government schools. Perth: Author.

Education Department of Western Australia. (1994c). Student outcome statements: The arts. (Working edition.) Perth: Author.

Education Department of Western Australia. (1994d). Student outcome statements: Technology and enterprise (Working edition). Perth: Author.

Education Department of Western Australia. (1995). Student achievement in English in Western Australian government schools. Perth: Monitoring Standards in Education.

Education Department of Western Australia. (1996). Student outcome statements: The arts (Draft version). Perth: Author.

Education Research and Development Committee. (1980). National assessment of educational progress. Canberra: AGPS.

Emery, L. (1994). Room to move in the arts. EQ Australia, 1, 5-7.

Fairhall, J.H. Dogmatism and aesthetic judgment: A study of response to paintings. Unpublished PhD thesis, University of Western Australia, Perth.

Fortney, P.M. (1992). The construction and validation of an instrument to measure attitudes of students in high school instrumental music programs. Contributions to Music Education, 19, 32-45.

Gordon, E.E. (1992). Is it only in academics that Americans are lagging? American Music Teacher, 42 (3), 24-83.

Hanley B. (1992a). Assessment and evaluation in music education: Reflections on British Columbia initiatives. Canadian Music Educator, 33 (5), 7-13.

Hanley, B. (1992b). Student assessment in music education. Canadian Music Educator, 33 (5), 19-24.

Hewton, J. (1985). Primary music evaluation. Brisbane: Queensland Dept. of Education

Jorgensen, E.R. (1994). Justifying music instruction in American public schools. Bulletin: Council for Research in Music Education, No. 120, 15-31.

Kellaway. S. & Yunupingu, M. (1992). Tribal voice [CD Rom]. Sydney: Mushroom Records.

Kemp, A.E. & Freeman, S.W. (1988). New tasks for music in primary schools and teacher training. International Journal of Music Education, 11, 21-23.

Knight, S. (1992). Evaluation--alpha and omega. Canadian Music Educator, 33 (5), 25-32.

Lehman, P. (1994a). Issues of assessment. In Perspectives on implementation. Reston: Music Educators National Conference.

Lehman, P. (1994b). Writing standards for music. In The vision for arts education in the 21st century. Reston: Music Educators National Conference.

Lehman, P. (1996). Performance standards for music. Reston: Music Educators National Conference.

Madsen, C.K. (1990). Teacher intensity in relationship to music education. Bulletin, No. 104, 38-45.

Ministry of Education, British Columbia. (1994a). Assessment handbooks series: Performance assessment. Victoria: Queens Printer for British Columbia.

Ministry of Education, British Columbia. (1994b). Assessment handbooks series: Portfolio assessment. Victoria: Queens Printer for British Columbia.

Ministry of Education, British Columbia. (1994c). Assessment handbooks series: Student self-assessment. Victoria: Queens Printer for British Columbia.

Ministry of Education, British Columbia. (1994d). Assessment handbooks series: Student-centred conferences. Victoria: Queens Printer for British Columbia.

Ministry of Education. (1991). School accountability--policy and guidelines. Perth: Author.

Ministry of Education. (1989a). Frameworks: Unit curriculum art guide 8-10. Perth: Ministry of Education, Programmes Branch.

Ministry of Education. (1989b). Music it schools. Perth: Ministry of Education, Curriculum Programmes.

Music Educators National Conference. (1994). National standards for arts education. Reston: Author.

Music Educators National Conference Committee on Performance Standards. (1996). Performance standards for music. Reston: Music Educators National Conference

Myford, C.M. (1989). The nature of expertise in aesthetic judgment: Beyond inter-judge agreement. Unpublished doctoral dissertation, University of Chicago.

Ogilvie, L. (1992). Stage-struck! Assessment and class music making. British Journal of Music Education, 9, 201-209.

Pericles, Leon. (1978). Heaven and Earth [Painting: Acrylic on board, 900x480mm]. Perth.

Rasch, G. (1980). Probablistic models for intelligence and attainment tests (Expanded ed.). Chicago: University of Chicago Press. (Original work published in 1960)

Roberts, B. (1994). Assessment in music education: A cross-Canada study. Canadian Music Educator, 35 (5), 3-5

Sculthorpe, P. (1989). Earthcry Kakadu, mangrove [CD Rom]. Sydney Symphony Orchestra. Sydney: Australian Broadcasting Corporation.

Willingham, L. (1992). Musical growth: Need we evaluate it? Canadian Music Educator, 33 (5), 41-43

Wright, B.D. (1985). Additivity in psychological measurement. In E. E. Roskam (Ed.), Measurement and personality assessment (pp. 101-112). Amsterdam: Elsevier Science Publishers.

Wright, B. & Masters, G. (1981). The measurement of knowledge and attitude (Research memorandum no. 30). Chicago: University of Chicago, Department of Education, Statistical Laboratory.

Wright, B. & Masters, G. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA Press.

Dr Beverley Pascoe is Manager, English and the Arts, Curriculum Council, 27 Walters Drive, Osborne Park, Western Australia 6017. Dr Russell Waugh is a Senior Lecturer, School of Education, Edith Cowan University, Pearson Street, Churchlands, Western Australia 6018.