首页    期刊浏览 2025年07月16日 星期三
登录注册

文章基本信息

  • 标题:The relationship between teacher assessments and NCLB mathematics testing.
  • 作者:Herte, Christopher Mark
  • 期刊名称:Forum on Public Policy: A Journal of the Oxford Round Table
  • 印刷版ISSN:1556-763X
  • 出版年度:2007
  • 期号:December
  • 语种:English
  • 出版社:Forum on Public Policy
  • 摘要:This study examined the relationship between teacher assessments and grade eight New Jersey NCLB mathematics testing. It also examined how curricular changes affected student achievement for one New Jersey junior high school on the NCLB mathematics test.
  • 关键词:Education and state;Education policy;Teacher evaluation;Teachers, Rating of

The relationship between teacher assessments and NCLB mathematics testing.


Herte, Christopher Mark


Abstract

This study examined the relationship between teacher assessments and grade eight New Jersey NCLB mathematics testing. It also examined how curricular changes affected student achievement for one New Jersey junior high school on the NCLB mathematics test.

The research considered teacher assessments and the New Jersey grade eight NCLB mathematics test for the 2003 and 2005 administrations. End of marking period grades and the midterm exam grades, expressed as percents, for one of the two lowest tracked math courses were collected and analyzed with the 2003 and 2005 NCLB test scores (n > 200 each year). There is a need to determine how curricular changes affect in the relationship between teacher assessments and NCLB test scores.

There was little relationship between teacher assessments and the components of the 2003 NCLB math test. Two years later, after the curricular changes, the relationship between teacher assessment and NCLB testing increased and the percent of students who demonstrated proficiency on the NCLB test increased. The increases in NCLB testing were statistically significant.

This paper reviewed the methodology used, the findings of the study, and how the results may impact similar school districts. Suggestions for further research and action are presented.

Introduction

Educational leaders have always been faced with the challenging task of increasing student achievement. However, now educational leaders must demonstrate increasing student achievement as measured on state assessments under the No Child Left Behind act of 2002 (NCLB).

The No Child Left Behind (NCLB) legislation was signed into law on January 8, 2002 by President George W. Bush initiating sweeping changes in the accountability of school districts to ensure that all children meet high academic standards. The accountability system requires that all public school districts administer tests in English language arts and mathematics in grades three through eight and one year in high school. State agencies were required to set three-year benchmark scores culminating with 100 percent of all children in the aggregate as well as the subgroups to pass state testing in grades three through eight and high school by the year 2014. Meeting these benchmarks in the aggregate as well as in each subgroup is the challenge of every public school.

In order to hold schools and districts accountable for the educational progress of their children, NCLB established serious sanctions for districts that continually fail to meet state established benchmarks for all children. Schools are required to meet each benchmark referred to as adequate yearly progress (AYP) each year for each subgroup and the entire school. Subgroups that are considered are ethnicity, special education, and English language learners. Sanctions that result from schools not meeting AYP range in severity from offering inter-district school choice to district restructuring for schools that continually fail to meet AYP.

The New Jersey Department of Education annually develops and administers the Grade Eight Proficiency Assessment (GEPA). This assessment consists of multiple choice, short response, and extended response items. The purpose of this assessment is, in part, for early identification of students that need remediation as well as how well the students, and school are meeting the state standards in mathematics for grade eight (NJDOE, 2006a). The GEPA is a secure test that may not be reproduced, distributed, or discussed by educators and students.

The difficulty for educational leaders is that these scores may be insufficient to determine specific areas for program improvement necessary to increase student achievement. It is therefore important to determine the relationship between Teacher Assessment and the GEPA. Following the 2003 administration of the GEPA, a study was conducted to determine this relationship for one school district and each of the five ability level courses. This study found that the relationship between teacher assessment and the GEPA was very weak (Herte, 2005). As a finding of the study of the 2003 administration of the GEPA a recommendation for further research was to conduct the study again after curriculum alignment with the state standards as well as new textbook adoptions.

Discussion of Previous Study

Herte (2005) reported on relationship of teacher assessments to the GEPA for the GEPA administered in 2003. Specifically, Herte reported that the relationship of teacher assessments in Algebra Part 1 could only explain 6.6 percent of the variance in number sense, 8.4 percent of the variance in spatial sense, 6.3 percent of the variance in data analysis, 5 percent of the variance in patterns and functions, and 10.9 percent of the variance in the GEPA scale score. This very weak relationship indicated that the content of Teacher Assessments in Algebra Part 1 needed to be aligned to the GEPA.

The Algebra Part 1 students earned grades that were above passing on teacher assessments when they could not achieve proficiency on the GEPA. Parents viewing only these higher report card grades could believe their children are achieving much higher than they actually are against the state standards. Aligning the teacher assessments both in content as well as expectation is necessary to obtain a clear picture of student achievement.

Herte (2005) recommended that similar districts should take several steps including curriculum alignment with the state standards, use of formative assessments to guide instruction, and to infuse non-algebraic topics in algebraic courses.

Theoretical Perspectives

Teacher Assessment

The NCTM (1997) suggested changes in the concept of program evaluation based upon assessment data. They included moving, "Toward detailed analyses of group data (e.g., examining variations in responses, and the disaggregation of data) and away from reporting only group means" (p. 67).

In a study of elementary and secondary mathematics and English teachers, McMillan and Nash (2000) found that there was variability among grading and assessment practices. Through interviews conducted and responses coded, McMillan and Nash (2000) found six themes regarding how teachers decide to use specific assessment and grading practices. These themes were 1) teacher beliefs and values, 2) classroom realities, 3) external factors, 4) teacher decision making rationale, 5) assessment practices, and 6) grading practices. The model they identified described teachers' need for flexible assessment and grading practices so that they could individually accommodate each student. McMillan and Nash reported that external pressure of recent mandated statewide testing caused teachers to increase the role of objective assessments and grading practices into their repertoire. Teachers reported that they made their assessments more aligned to the statewide assessment format. This enabled students to be more comfortable with the format on these statewide assessments.

The ability level class could also have an impact on student grades. McMillan and Nash (2000) noted, "The reality of poor student attitudes and inappropriate behavior, especially in remedial and standard classes, seemed to have both a direct and indirect impact on students' grades" (p. 13). One teacher's comments about the possibility of the students passing the statewide assessment during their interview were reported by McMillan and Nash (2000), "Given these remedial classes, there's no way they will pass it (SOL test), unless they make a total 900% change in their attitude and in their behavior, they're not" (p. 14).

McMillan and Nash (2000) observed that teachers reported their belief that their assessment of student achievement and assigning grades gave a better understanding of the depth of student knowledge. This was predicated on the basis that they used multiple assessments and of different variety so that students could demonstrate their achievement in various ways. The teachers, to guide instruction, used formative assessments extensively. The nature of formative assessments gave teachers necessary feedback as to when students had mastered concepts or when more or different instruction was necessary.

Boston (2002) examined the role of formative assessments and their diagnostic use to teachers and students in providing feedback. Boston (2002) stated, "Teachers can build in many opportunities to assess how students are learning and then use this information to make beneficial changes in instruction" (p. 1). Noting the importance of formative assessment Boston (2002) stated, "While state tests provide a snapshot of a student's performance on a given day under test conditions, formative assessment allows teachers to monitor and guide students' performance over time in multiple problem-solving situations" (p. 1). This sentiment was mirrored by Guskey (2003, February), who stated, "Teachers who develop useful assessments, provide corrective instruction, and give students second chances to demonstrate success can improve their instruction and help students learn" (p. 7).

Guskey (2003, February) pointed out that the best assessments are the tests, quizzes, and assignments that teachers give on a regular basis. Teachers trust these assessments since they were developed by the teacher in order to address the curriculum. Additionally, the results are available to the teacher in order to alter the classroom instruction. Guskey (2003, February) pointed out that many teachers have not received instruction in creating assessments and as such may, "construct their own in a haphazard fashion" (p. 7). Teachers should test what they teach, rather than teaching to the test. The need for corrective action following student assessment is critical. Guskey (2003, February) stated, "assessments must be followed by high-quality, corrective instruction designed to remedy whatever learning errors the assessment identified," (p. 9).

Diagnostic assessment of student learning will help ensure success for all students. Gandal and McGiffert (2003, February) stated, "Just as medical tests help diagnose and treat patients, rigorous and meaningful education assessments can help ensure the academic health of all students" (p. 39). The limitation of using only teacher assessment of students' achievement doesn't allow a healthy check of how students are meeting state standards. The teacher assigned grade in one school can mean something drastically different in another. Yet with this limitation, instruction focusing on students' weaknesses can improve student achievement. Gandal and McGiffert acknowledged the importance of statewide testing but also recommend the use of teacher assessments to improve student achievement and improve instruction. Districts must begin to use ongoing assessments that give immediate feedback to the teachers and students. This will enable teachers to address students' weaknesses.

NCLB Security Issues

Policies that require exams to be secure or closed and not have items disclosed to the public can have negative results if insufficient data are reported. While these policies may be necessary for testing reliability where test items are used in subsequent test administrations, the ability to improve student achievement is reduced to the reporting results that are available to school districts and parents. There is a need to identify specific individual student weaknesses in order to improve individual student achievement and to target topics or skills that need additional or alternate instruction. When educational professionals only have several numbers to describe how a student or students achieve it is very difficult to target instruction. Additionally, there have been difficulties with state testing that have yielded dramatic results.

Bowman (2003, November 19) reported that Florida's 1st District Court of Appeals ruled in 2003 that a father did not have the right to view the graduation test his son had repeatedly failed. The father was only allowed to view his son's score on the Florida Comprehensive Assessment Test (FCAT). The father, Steven O. Cooper, sued the State of Florida in 2001 after being denied access to his son's test booklet and answer sheets. Florida education officials argued that creating a new test each year would be prohibitively expensive. Bowman (2003, November 19) reported that Governor Jeb Bush praised the Court of Appeals decision saying, "The Florida Comprehensive Assessment Test has been a catalyst for student achievement, and today's decision allows us to maintain meaningful standards, while giving parents and educators the ability to monitor student gains" (p. 5).

Blair (2004, January 14) reported that in Chicago, the U.S. Court of Appeals confirmed the ruling of the lower court that a teacher and a publication editor did not have the right to publish several social studies and English tests that the Chicago school system had been piloting for three years. The publication editor argued that the public had the right to determine whether the exams were appropriate for the students. Judge Richard Posner wrote in the 3-0 decision that the teacher and editor had a right to be critical of the tests but not to publish the tests in their entirety (p. 5).

New Jersey has a secure test policy where teachers and parents may not view the three state administered tests: High School Proficiency Assessment (HSPA), the GEPA, and the NJ Assessment of Student Knowledge (NJ ASK). The NJDOE (2005b) warned, "Examiners, proctors, and other school personnel are NOT to look at, discuss, or disclose any test items before, during, or after the test administration" (p. 2). The NJDOE (2006b) further warned, "Security breaches may have financial consequences for the district, professional consequences for staff, and disciplinary consequences for students" (p. 8). The NJDOE (2006b) explained the reason for this as some of the items on the assessment will reappear in subsequent administrations and it is necessary to maintain the stability of the test.

GEPA Reporting

The NJDOE supplied the following reports: Individual Student, Summary of School Performance, School Performance by Demographic Groups, School Student Rosters, and Summary of District Performance. The NJDOE (2005b) reported the number of points each student received on each of the four core content clusters, knowledge, problem solving, and the scale score used to determine the proficiency level. The NJDOE (2005b) reported, "Cluster Data: Cluster data are provided to help identify students' strengths and weaknesses" (p. 17). The NJDOE (n.d.a) lists the skills and concepts that comprise each of the four content clusters.

The cluster, Number and Numerical Operations, consists of at least 15 separate skills or concepts. The cluster, Geometry and Measurement, consists of at least 19 separate skills or concepts, several of which have numerous components. The cluster, Patterns and Algebra, consists of at least 10 skills and concepts. The cluster, Data Analysis, Probability, and Discrete Mathematics, consists of at least 14 skills or concepts.

Reporting individual and group scores are essential for the improvement of curriculum and instruction to meet the challenging standards set by NCLB. The GEPA reporting consists of a scale score and six sub-scores. Four sub-scores measure the four mathematics content clusters. The remaining two sub-scores are knowledge, which is the sum of the total points earned on the four previously mentioned sub-scores, and problem solving skills.

Assessment Errors

Errors in scoring can have great implications for states, districts, schools and students. The United States General Accounting Office (2002) noted that errors had been detected in contractor scoring by local district officials, parents, and individuals at state agencies. "Based on erroneous scores calculated by a contractor, one state sent thousands of children to summer school in the mistaken belief that their performance was poor enough to meet the criterion for summer intervention" (p. 16). The U.S. General Accounting Office (2002) also noted, "based on a contractor's erroneous scoring, a state incorrectly identified several schools as 'in need of improvement,' a designation that carries with it both bad publicity and extra expense" (p. 16).

In June 2003, the New York State Department of Education had difficulties with the Math A test, which is a graduation requirement. Based upon a survey by the State Department of Education, only 37 percent of the students taking the exam passed it. Richard Mills, Commissioner of Education for New York State, was quoted in Dillon (2003, June 25), "I think we made some mistakes with this exam, and it's up to us to identify and correct them" (p. B4). Due to the immediacy of graduations, Seniors were exempted from passing the test. Mills established an independent panel to review the exam and analyze its results with the charge to make specific recommendations. Some of the recommendations included: to revise the mathematics standards making them clearer and easier for teachers to apply; to produce a suggested scope and sequence K-12 curriculum; and to establish a new Math A exam. By the end of August a new scoring chart for the June 2003 exam was created ensuring an increase in most students' scores.

The U.S. General Accounting Office (2002) recommended to the Secretary of Education,

Rod Paige, that,
 Assessment results are a key part of the mechanism for holding both
 schools and states accountable for improving educational
 performance. Thus, ensuring the completeness and accuracy of
 assessment data is central to measuring students' progress and
 ensuring accountability. Without adequate oversight of assessment
 scoring, efforts to identify and improve low-performing schools
 could be hindered by lack of confidence in assessment results or
 uncertainty regarding whether particular schools have been
 appropriately identified for improvement. (p. 19)


Assessment Irregularities

Popham (2006, April 19) pointed out that with the increased stakes of state testing under NCLB some educators are more likely to resort to testing infractions to demonstrate improved test scores. Hurst (2004, October 6) reported that the number of testing irregularities in Nevada's public schools has increased by more than 50 percent from 2002-03 school year to the 2003-04. The majority of the 121 incidents occurred at the secondary level where students have access to technology such as cell phones that take pictures and can text message. Hurst (2004, October 6) further reported that in answering this finding Keith Rheault, Nevada state superintendent, maintained that with the increased demand for schools to meet adequate yearly progress puts more pressure on teachers and students. In Austin, Texas the district pled NoContest and paid a fine of $5,000, where district administrators were alleged to have manipulated state testing data as reported by Keller (2002, January 16). Hoff (2003, November 5) cited that 21 teachers were caught cheating from 1998 through mid-2002. Hoff (2003, November 5) also reported that Robert Schaeffer, director of the Center for Fair & Open Testing, maintained that one could predict that some teachers and students will resort to cheating when the pressure to perform is increased. Manzo (2005, January 19) quoted Walter M. Haney, professor of education at Boston College, "'Even if there's not outright fraud, where people become so obsessed with raising test scores on one relatively narrow test,' cheating and other improprieties are likely to occur" (p. 14).

Assessment of students occurs for various purposes. The main purpose of assessment is to improve student achievement. Additional purposes are to improve instruction, to alter instruction (as in formative assessment) and to determine the mastery of content and skills by students. It is important that the assessments give an accurate measure of student achievement and be reliable from one administration to another. It is necessary to have security measures in place and uniform testing conditions in order to have results that will be meaningful.

Curricular Changes

The district in this study took several actions following the results of the 2003 GEPA administration. The district provided summer staff workshops for teachers to rewrite assessments and to examine the role of non-algebraic topics in algebraic courses during the summers of 2003 and 2004. During the fall of 2003, the district established a committee of teachers and administrators to examine new textbooks for all grade eight math courses. Following an analysis of the NJ state standards and textbooks, two textbooks were selected and piloted in two classes for approximately two months. The committee reconvened and selected one of these texts, Algebra 1, authored by Larson, Boswell, Kanold, and Stiff (2004) to be used in the three ability level math courses: Algebra Part 1, Algebra 1, and Algebra 1 Honors. These new textbooks replaced the ten-year old textbooks formerly used for these grade eight mathematics courses.

Teachers using these new textbooks received one day of initial training from the publishing company. Students in Algebra Part 1 began using these new textbooks in September, 2004. During the summer of 2004, teachers and administrators met and wrote the curriculum guide to be used by all teachers teaching Algebra Part 1. During two staff development days in November, 2004 teachers met to discuss the progress of the new materials and curriculum and developed a common midterm exam and other assessments. Teachers met monthly for departmental meetings as well as informally in planning sessions.

Method

Setting

The setting for this study was a suburban public school district with an enrollment of over 9000 students from grades Kindergarten through 12th grade located in central New Jersey. The district has eight elementary schools comprised of grades Kindergarten through fifth grade, a middle school comprised of grades six and seven, a junior high school consisting of eighth and ninth grade students and a high school with students in grades ten through twelve. The ethnicity of the district is 68 percent white, 24 percent Asian, 3 percent Hispanic, 3 percent African American, and 2 percent Other. The socioeconomic status of the community is primarily middle and upper middle class with many residents commuting to New York City for employment. The total cost per pupil, including transportation, was $11,073 during the 2002-2003 school year and $12,021 during the 2004-2005 school year. The New Jersey state average was $11,646 and $12,567 during the same school years.

Research Question

Following curricular changes, what is the relationship of teacher assessments to the New Jersey NCLB Grade Eight Proficiency Assessment (GEPA)?

Independent Variables

Teacher Assessments in mathematics consisted of four components: 1) First marking period grade; 2) Second marking period grade; 3) Third marking period grade; and 4) Midterm exam. The marking period grade is defined as the weighted average assigned to a student by the student's teacher during a consecutive ten-week period. These three variables are first marking period grade, second marking period grade, and third marking period grade. Students in the same course were administered common assessments as part of the departmental practice. Each marking period grade was based upon the math department grading policy of the school district consisting of a weighted average of 50 percent major assessments (tests), 25 percent minor assessments (quizzes), and 25 percent performance assessments (homework completion and class participation). Each marking period grade was calculated automatically using Intergrade software that resulted in a numerical percentage from 0-100.

The Midterm exam was the percentage correct that a student answered on a common criterion referenced test created by all teachers instructing students in Algebra Part 1 within the school district. The same midterm exam was administered to all students in Algebra Part 1, as is the departmental practice of the school district.

Dependent Variables

GEPA consists of five subscales: 1) Number sense; 2) Spatial sense; 3) Data analysis; 4) Patterns and functions and 5) GEPA knowledge.

Number sense, based upon the New Jersey Core Curriculum Content Standard 4.1 (Number and Numerical Operations), was the GEPA subscale that measured the numerical skills of grade eight students. The range of scores for this scale was 0-12 and was converted to a percentage based on a total of 12 points (NJDOE, 2005b).

Spatial sense, based upon New Jersey Core Curriculum Content Standard 4.2 (Geometry and Measurement), was the GEPA subscale that measured the spatial and measurement skills of grade eight students. The range of scores for this scale was 0-12 and was converted to a percentage based on a total of 12 points (NJDOE, 2005b).

Patterns and functions, based upon New Jersey Core Curriculum Content Standard 4.3 (Patterns, Functions, and Algebra), was the GEPA subscale that measured the algebraic skills of grade eight students. The range of scores for this scale was 0-12 and was converted to a percentage based on a total of 12 points (NJDOE, 2005b).

Data analysis, based upon New Jersey Core Curriculum Content Standard 4.4 (Data Analysis, Probability, Statistics, and Discrete Mathematics), was the GEPA subscale that measured the data analysis, probability, statistics, and discrete mathematics skills of grade eight students. The range of scores for this scale was 0-12 and was converted to a percentage based on a total of 12 points (NJDOE, 2005b).

GEPA knowledge was the sum of the four sub-scores 1) Number sense 2)Spatial sense 3) Patterns & Functions and 4) Data analysis. The range of scores for this scale was 0-48 points and is directly converted to the GEPA Scale Score (NJDOE, 2005b).

GEPA scale score measured how prepared the student was toward the New Jersey Core Curriculum Content Standards (NJDOE, 2003b). The New Jersey Department of Education has identified levels of proficiency. The range of scores for this scale was 150 to 300. Students with scores within the range of 150-199 are considered "partially proficient." Students with scores within the range of 200-249 are considered "proficient", while students with scores within the range of 250-300 are considered "advanced proficient" (NJDOE, 2003c).

Selection of Subjects

This study is limited to one school district and the students who are assigned to the mathematics course Algebra Part 1 during the 2002-2003 school year (2003 cohort) and the 2004-2005 school year (2005 cohort). The ability level mathematics course, Algebra Part 1, was selected due the number of students enrolled as well as having the highest number of students failing to demonstrate proficiency on the GEPA. The total number of students at this school taking the GEPA was 778 in 2003, and 723 in 2005. The number of students that were enrolled in Algebra Part 1 for the first three marking periods and who took the GEPA was 254 in 2003, and 218 in 2005.

Procedure

Approval for the access and use of student data was obtained in writing from the superintendent of the school district prior to collecting any data. The request for use of the student data outlined the purpose of the study and how the analysis would be reported. The results and analysis were made available to the school district for its curricular purposes.

Teacher assessment data were collected electronically from the district's database. These data included the first marking period grade, second marking period grade, third marking period grade, and midterm exam. These data were pared to the student database, which included demographic data using Microsoft Excel. The GEPA scale score and sub-scores were manually entered into the Excel spreadsheet with the student name and teacher assessment for each student. These data were exported to the Statistical Package for the Social Sciences (SPSS) for analysis.

The present study was conducted to determine what the relationship was between the teacher assessment and the GEPA for two cohorts of eighth grade students during the 2002-2003 and 2004-2005 school years. The relationships between teacher assessments to each component of the GEPA were examined separately by calculating the variance for each contributing variable using stepwise multiple regressions.

Results

The relationship between teacher assessments and the GEPA was stronger for the 2005 administration than the 2003. For the GEPA component, number sense, the relationship went from [R.sup.2] = .066 in 2003 to [R.sup.2] = .164 in 2005, as illustrated in Table 2. Using step-wise regression, the midterm exam was the only component for teacher assessment used in both the 2003 and 2005 regression models.

For the GEPA component, spatial sense, the relationship went from [R.sup.2] = .084 in 2003 to [R.sup.2] = .221 in 2005. Using step-wise regression, the midterm exam was the only component for teacher assessment used in the 2003 regression model. The 2005 regression model used the third marking period grade and then the midterm exam. This indicated that the third marking period grade in 2005 had a stronger relationship to spatial sense than in 2003.

For the GEPA component, patterns and functions, the relationship went from [R.sup.2] = .050 in 2003 to [R.sup.2] = .200 in 2005. Using step-wise regression, the midterm exam was the only component for teacher assessment used in the 2003 regression model. The 2005 regression model used the third marking period grade and then the midterm exam. This indicated that the third marking period grade in 2005 had a stronger relationship to patterns and functions than in 2003.

For the GEPA component, data analysis, the relationship went from [R.sup.2] = .063 in 2003 to [R.sup.2] = .278 in 2005. Using step-wise regression, the midterm exam was the only component for teacher assessment used in the 2003 regression model. The 2005 regression model used the midterm exam and then the third marking period grade. This indicated that the third marking period grade in 2005 had a stronger relationship to data analysis than in 2003.

GEPA knowledge also had a stronger relationship where the relationship went from [R.sup.2] = .109 in 2003 to [R.sup.2] = .336 in 2005. Using step-wise regression, the midterm exam was the only component for teacher assessment used in the 2003 regression model. The 2005 regression model used the third marking period grade and then the midterm exam. As with the previous three variables, this indicated that the third marking period grade in 2005 had a stronger relationship to GEPA knowledge than in 2003.

Since there were stronger relationships between teacher assessments and GEPA it was important to determine whether there were significant changes in the components of each. Table 1 illustrates the means for teacher assessments and GEPA for 2003 and the 2005 cohort. There was a decrease for the first marking period grade, M = 81.278 in 2003 to M = 77.807 with SD = 10.011 and SD = 10.099 for 2003 and 2005, respectively. Each component of teacher assessment had a lower mean in 2005 than in 2003. Conversely there were increases in the means for each component of GEPA from 2003 to 2005. Number sense had the greatest increase of almost 17 percentage points with M = 44.59, SD = 21.45 in 2003 to M = 61.58, SD = 19.16 in 2005. Based on the changes from 2003 to 2005 it was necessary to determine whether these changes were significant.

A random sample consisting of 35 students from the 2003 cohort and 35 students from the 2005 cohort were selected. The null hypothesis was that there were no statistically significant differences between the scores from the 2003 and 2005 cohorts. Independent samples t tests were conducted with the null hypothesis tested at the p < .05 level. The differences between the 2003 and 2005 components of teacher assessments were not statistically significant with p values of p = 3.18 for the first marking period grade, p = .655 for the second marking period grade, p = .691 for the midterm exam, and p = .120 for the third marking period grade.

Most of the differences between the 2003 and 2005 components of the GEPA were statistically significant at the p < .05 level. The differences in number sense, spatial sense, data analysis, and GEPA knowledge from 2003 to 2005 were all significant. With p values less than .05 the null hypothesis was rejected. These p values were p < .0005 for number sense, p = .009 for spatial sense, p < .0005 for data analysis, and p = .001 for GEPA knowledge. The p value for patterns and functions was p = 3.26 and therefore, the null hypothesis could not be rejected. Based on these findings, there were significant increases from 2003 to 2005 for most components of GEPA.

Were significant increases in GEPA components reflected in the proficiency of the Algebra Part 1 students on the GEPA? In 2003, of the 254 students in Algebra Part 1, 52.8 percent of the students were identified as proficient or advanced proficient indicating that these students should not need remedial instruction. In 2005 the percentage increased. Of the 218 students in Algebra Part 1, in 2005, 68.3 percent were identified as being proficient or advanced proficient as illustrated in Figure 2.

Further analysis was conducted to determine whether this increase in the percent of students scoring proficient or advanced proficient was significant. Random samples of 35 students were selected from both the 2003 cohort and the 2005 cohort. The value of 0 was entered in the variable, proficiency level, for students who scored partially proficient and the value of 1 was entered for students who scored either proficient or advanced proficient. The Mann-Whitney nonparametric test was conducted on these data with the p value set to p < .05 necessary to determine significance. The mean rank was 30 and the sum of the ranks was 1050 for the 2003 GEPA. The mean rank was 41 and the sum of the ranks was 1435 for the 2003 GEPA. Based on this, the percent of students proficient or advanced proficient in 2005 was statistically greater than in 2003 with a p value of p = .009.

Discussion and Conclusions

In a climate of accountability and sanctions for schools that do not demonstrate Adequate Yearly Progress (AYP) under No Child Left Behind (NCLB) it is incumbent on educational leaders to passionately pursue methods and programs that improve student achievement. In this study, the relationship between teacher assessments and NCLB testing was determined for two testing administrations of the New Jersey GEPA. Following the 2003 GEPA administration, educational professionals aligned the curriculum to NJ state standards, researched and adopted new textbooks, and participated in staff development on assessment. The results for the 2005 GEPA were reported as having a stronger relationship between teacher assessments and the GEPA. Additionally, there was a statistically significant increase in most of the GEPA components. There was also a statistically significant increase in the percent of students that scored proficient or advanced proficient on the 2005 GEPA.

There are several possible causes for these increases, including that one or more of the actions taken by the educational professionals in the district were effective. Another possibility could be that the mathematical achievement of the students in the 2005 cohort was higher than the 2003 cohort prior to the 2003 administration of the GEPA. However, the district administration used the same criteria, including standardized test scores to place students into the Algebra Part 1 course. There is a possibility that the 2005 GEPA test items were not as difficult as those administered in 2003. However, the New Jersey Department of Education takes steps to maintain the statistical stability of each testing administration. These findings were limited to the two cohorts of students, taking Algebra Part 1 and the GEPA.

It is necessary to align curriculum to the state standards for two reasons. The first is that these are the topics and skills that the state department of education has outlined as essential for all students. Secondly, in order to increase student achievement and making AYP, precious instructional time should be devoted to those topics and skills that are identified in the state standards. In states where there are insufficient data reported on student achievement and where NCLB testing is secure and not released, this method of determining the relationship may yield important data for program improvement. However, the need to determine and address individual student weaknesses is not met with this method alone. Similar districts may find that common assessments, such as a midterm exam or district developed instrument, can yield valuable data for individual students through the use of item analyses.

The need for security of assessments balanced with useable data supplied by departments of education is imperative in meeting AYP goals.

Further Research

In this era of data driven instruction it is important to find how the accessibility of data for educational professionals informs instruction, program improvements, and increases student achievement. A qualitative study could be conducted to determine how educational professionals utilize achievement data. A study could be conducted to determine what the relationship is between teacher assessments and NCLB testing for schools that have meet AYP goals and have high student achievement. A study could be conducted to determine the relationship between the Scholastic Aptitude Test (SAT) and NCLB high school testing. A study of a school or district that continually does not meet AYP goals to determine the relationship between teacher assessment and NCLB testing could be conducted.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

References

Blair, J. (2004, January 14). Court rules against editor for publishing Chicago tests. Education Week, 23, p. 5.

Boston, C. (2002). The Concept of Formative Assessment. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation. (ERIC Document Reproduction Service No. ED470206)

Bowman, D. H. (2003, November 19). Florida court rejects father's bid to view state test. Education Week, 23, p. 5.

Dillon, S. (June 25, 2003). Citing flaw, state voids math scores. The New York Times.

Gandal, M. & McGiggert, L. (February, 2003). The power of testing. Educational Leadership, 60, 39-42.

Guskey, T. R. (February, 2003). How classroom assessments improve learning. Educational Leadership, 60, 7-11.

Herte, C. M. (2004). The Relationship of teacher assessment in grade eight mathematics to the New Jersey grade eight proficiency assessment in mathematics and the no child left behind accountability. ProQuest Information and Learning Company. Ann Arbor, MI, (UMI No. 3139638).

Hurst, M. D. (2004, October 6). Nevada report reveals spike in testing irregularities. Education Week, 24, p. 19, 22.

Keller, B. (2002, January 16). Austin cheating scandal ends in no-contest plea, fine. Education Week, 21, p. 3.

Larson, R., Boswell, L., Kanold, T.D., & Stiff, L. (2004). Algebra 1. Boston, MA: McDougal Littell.

Manzo, K. K. (2005, January 19). Texas takes aim at tainted testing program. Education Week, 24, p. 1,14.

McMillan, J. H., & Nash, S. (2000). Teacher classroom assessment and grading practices decision making. Richmond, VA: Metropolitan Educational Research Consortium. (ERIC Document Reproduction Service No. ED447195)

National Council of Teachers of Mathematics. (1997). Assessment Standards for School Mathematics. Reston, VA: Author.

National Council of Teachers of Mathematics. (2000). Principles and Standards for School Mathematics. Reston, VA: Author.

New Jersey Department of Education. (2003a). GEPA Student Preparation Booklet. Trenton, NJ: Author.

New Jersey Department of Education. (2003b). GEPA Test Manual. Trenton, NJ: Author.

New Jersey Department of Education. (2003c). School and district guidelines: Interpretation and use of GEPA results. Trenton, NJ: Author.

New Jersey Department of Education (2005a). 2004-05 New Jersey School Report Card. Retrieved July 5, 2006 from www.state.nj.us/rc/rc05

New Jersey Department of Education. (2005b). GEPA Score Interpretation Manual: March 2005 Grade Eight Proficiency Assessment (GEPA). Trenton, NJ: Author.

New Jersey Department of Education. (2006a). GEPA Student Preparation Booklet. Trenton, NJ: Author.

New Jersey Department of Education. (2006b). GEPA Test Manual: Grade Eight Proficiency Assessment March 2006. Trenton, NJ: Author.

New Jersey Department of Education. (n.d.a.). Mathematics Standards. Retrieved September 9, 2003 from www.njpep.org/standards/revised_standards/Math_newstandards

Popham, W. J. (February, 2003). The seductive allure of data. Educational Leadership, 60, 48-51.

Popham, W. J. (2006, April 19). Educator cheating on No Child Left Behind tests: can we stop it? Education Week, 25, pp. 32-33.

United States General Accounting Office. (2002). Title I: Education needs to monitor states' scoring of assessments. Report to the Secretary of Education. Washington, DC: General Accounting Office.

Christopher Mark Herte, High School Supervisor of Mathematics, West Windsor-Plainsboro Regional Schools
Table 1:

Teacher Assessments and GEPA by Year

 Year of Mean Std. Std.
 Test Deviation Error
 Mean

Marking Period 1 2003 81.278 10.011 .628
 2005 77.807 10.099 .684
Marking Period 2 2003 79.656 10.755 .675
 2005 76.982 12.654 .857
Midterm Exam 2003 73.917 11.207 .703
 2005 70.982 14.889 1.008
Marking Period 3 2003 81.652 12.018 .754
 2005 78.060 11.035 .747
Number Sense 2003 44.59 21.45 1.35
 2005 61.58 19.16 1.30
Spatial Sense 2003 40.58 20.84 1.31
 2005 44.50 21.73 1.47
Patterns & Functions 2003 58.69 19.67 1.23
 2005 62.23 17.65 1.20
Data Analysis 2003 51.97 19.73 1.24
 2005 63.76 18.28 1.24
GEPA Knowledge 2003 48.85 16.76 1.05
 2005 58.02 15.32 1.04

2003: N = 254, 2005: N = 218

Table 2:
Regression Comparisons 2003a and 2005b

 GEPA

 Standard
Component Year R [R.sup.2] Error

Number Sense 2003 .256 .066 2.494
 2005 .405 .164 17.559
Spatial Sense 2003 .289 .084 2.398
 2005 .471 .221 19.265
Patterns & Functions 2003 .224 .050 2.305
 2005 .447 .200 15.863
Data Analysis 2003 .251 .063 2.296
 2005 .527 .278 15.605
GEPA Total Points 2003 .330 .109 22.930
 2005 .579 .336 12.547

(a) n = 254 (b) n = 218
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有