Tips From Dr. Marzano
Formative Assessment & Standards-Based Grading
In lieu of formative assessments and summative assessments, the terms formative scores and summative scores can be used to describe how teachers employ assessments in the classroom.
Assessments have many forms and many uses, two of which are to provide formative and summative scores to students. The word score, rather than assessment, in the terms formative score and summative score relays that the essential difference between the two is not their format, but their purpose in the assessment process. Teachers record and track formative scores from individual assessments as indicators of students’ knowledge or skill at particular moments in time. In comparison, summative scores are final scores based on the pattern of students’ responses over time. Teachers may base each score on a number of common assessment forms, such as obtrusive, unobtrusive, and student-generated assessments. However, formative scores are used for tracking progress, while summative scores express students’ mastery of a topic, generally at the end of a unit (pp. 27¬–28).
Formative scores should never be averaged to arrive at a student’s summative score.
Averaging may seem like a logical way to determine a student’s cumulative score at the end of a unit; however, this method is antithetical to the key principles of formative assessment. When a teacher tracks a student’s formative scores for one unit, the student’s scores will generally show a progression of learning. This means that a student’s scores will likely be lower at the beginning of a unit than at the end. Therefore, if a teacher averages a student’s formative scores to calculate a summative score, the resulting summative score would be lower than the student’s actual current level of skill, as it would give early scores the same weight as later scores. To avoid inaccurate summative scores, teachers can give more weight to scores at the end of the unit, which generally best reflect students’ level of mastery (p. 28).
A summative score is based on formative scores collected throughout a unit rather than a single final assessment.
While final cumulative assessments can be useful in gathering data about students’ current knowledge and skill in a topic area, every assessment contains error, which necessarily limits the definitiveness of any one assessment. It is essential that students’ summative scores are based on multiple sources of data to lessen the inherent error in all test forms. In addition, a student’s last formative assessment score is not an appropriate summative score, as it may not necessarily reflect the student’s current level of knowledge and skill. Teachers can evaluate individual students’ or the class’s learning progressions in tandem with any final assessment scores to determine the most representative summative scores (p. 29; see pp. 81¬–98 for a more detailed discussion).
Short oral responses are a great opportunity to provide instructional feedback.
Short oral responses are a great informal way to ensure that students grasp classroom content. Teachers pose questions and call on students to answer them, creating a low-stakes assessment opportunity and allowing teachers to correct any errors in understanding (that is, give instructional feedback). When students respond, it is important to ask students why they think their answer is correct, rather than simply judging the answer to be right or wrong and moving on. These opportunities for discussion and explanation give both teachers and students the chance to see what is clear and not clear about content in a low-stress environment, and teachers gain the opportunity to clarify any issues before moving on to more advanced content (p. 70).
Formal oral reports can be used in tandem with proficiency scales to serve as obtrusive formative assessments.
Oral reports, a classic formative assessment, can be likened to written essays that students must develop multiple drafts of before arriving at a final product, though with the added step of delivering their final product orally. In the same way that written essays can be scored using a proficiency scale, formal oral reports can also be scored using a proficiency scale. To do this, teachers should clearly specify the content that students should address in their presentations and understand the proficiency scale that will be used to score the oral report. The proficiency scale should identify content at the basic (2.0), proficient (3.0), and advanced (4.0) levels (pp. 70–71).
Teachers using probing discussions should tailor their follow-up questions to proficiency scales in order to ask the most useful questions for assessing the understanding of a student.
In a probing discussion, a teacher “meets one-on-one with a particular student and asks him or her to explain or demonstrate something.” In these situations, after or as the student gives his or her response, the teacher asks questions about that student’s responses. These follow-up questions are designed to give the teacher a clear idea of what a student does or does not know. In designing questions for probing discussions, teachers should use a proficiency scale for guidance, creating questions that align to the 2.0, 3.0, and 4.0 levels of the scale. As a student answers each question, the teacher evaluates the response as correct, incorrect, or partially correct and uses the student’s pattern of responses to assign them a score. Once finished with the probing discussion, a teacher can use the results as a formal assessment by writing down the score in a grade book or as an opportunity for instructional feedback by correcting any misconceptions a student may have about material (p. 71).
Most assessments in today’s classrooms are based on a 100-point scale. The improper use of this scale can lead to incorrect student achievement scores.
The range of scores between classrooms is one source of error associated with the 100-point scale. This type of scale provides little to no reflection of the difficulty level of each assessment. Weighting items differently from assessment to assessment and the uneven level of difficulty is akin to changing the scale that is being used from one assessment to the next. Tracking student achievement over time using the 100-point scale can be tremendously difficult due to the wide range of scoring provided (p. 41).
A well-written scale can be thought of as an applied version of learning progression.
A scale should make it easy for teachers to design and score assessments. To be most useful, scales should be written in student-friendly language. The teachers should introduce each scale to the students and explain what is meant by the content with each score value. Below is an example of a generic scale (pp. 44–45).
Table 3.5 Generic Form of the Scale
More complex content
Target learning goal
With help, partial success at a score of 2.0 content or higher
Even with help, no success
Well-constructed scales are critical to scoring demonstrative and unobtrusive observations.
Unobtrusive assessments are most easily applied to demonstrations since demonstrating skills usually involves doing something that is observable. Mental procedures are more difficult, however, and typically a teacher would need to ask probing questions of the student to render a discussion. This discussion would be key to assessing the skill level of student achievement (pp. 74–75).
Three types of assessments can and should be used in a classroom for a comprehensive system of formative assessment: obtrusive assessments, unobtrusive assessments, and student-generated assessments.
Student-generated assessments are probably the most underutilized form of classroom assessment. As the name implies, a defining feature of student-generated assessments is that students generate ideas about the manner in which they will demonstrate their current status on a given topic. To do so, they might use any of the types of obtrusive assessments discussed in the preceding text (pp. 23–24).
For example, one student might say that she will provide oral answers to any of the 20 questions in the back of chapter 3 of the science textbook to demonstrate her knowledge of the topic of habitats. Another student might propose that he design and explain a model of the cell membrane to demonstrate his knowledge of the topic (p. 25).
When tracking student progress using formative assessment, a 0 should not be used for a missing or incomplete assignment.
A score of 0 is never recorded in the gradebook if a student has missed an assessment or has not completed an assignment. Many assessment researchers and theorists have addressed this issue in some depth (Reeves, 2004) (Guskey & Bailey, 2001). Briefly, no score should be entered into a gradebook that is not an estimate of a student's knowledge status for a particular topic at a particular point in time (p. 85).
Student-friendly scales should have examples of what it would look like to provide a correct answer for the score of 2.0, 3.0, and 4.0 content.
Scales that have been rewritten in student-friendly language should provide students with clear guidance as to what it would look like to demonstrate score 2.0, 3.0, and 4.0 competence (see Table 3.7 for an example of a student-friendly scale). It is much more likely that students have really considered and come to understand the goals when teachers give the class the opportunity to rewrite the scale(s) in their own words (pp. 46, 141).
One fact that must be kept in mind in any discussion of assessment—formative or otherwise—is that all assessments are imprecise to one degree or another.
One fact that must be kept in mind in any discussion of assessment—formative or otherwise—is that all assessments are imprecise to one degree or another. This is explicit in a fundamental equation of classical test theory that can be represented as follows:
Observed score = true score + error score (p. 13)
Anderson, R. (1998). Why Talk about Different Ways to Grade? The shift from traditional assessment to alternative assessment. New Directions for teaching and Learning 74.
Barrie, S., Brew, A. and McCulloch, M. (1999). Qualitatively different conceptions of criteria used to assess student learning. Paper presented at the Australian Association of Research in Education (AARE) conference, Melbourne.
Biggs, J. (2002). Aligning teaching and assessment to curriculum objectives. Retrieved 18 October 2004.
Flood, A. (2011). Sites of Memory: Positioning thresholds of artistic identity. ACCESS: Critical Perspectives on Communication, Cultural and Policy Studies 30(2), 59–75.
Goos, M. and Moni, K. (2001). Modelling professional practice: A collaborative approach to developing criteria and standards-based assessment in pre-service teacher education courses. Assessment and Evaluation in Higher Education 26(1), 73–88.
Hughes, C.P. (2005). Casting some light on a shady practice: Towards guidelines for the development of assessment criteria and standards. Making a Difference: Effective Teaching and Learning Conference, University of Sydney, 30 November–1 December.
Hughes, G. (2011). Towards a personal best: a case study for introducing ipsative assessment in higher education. Studies in Higher Education 36(3), 353–367.
O'Donovan, B., Rust, C., Price, M. and Carroll, J. (2005). "Staying the distance": The unfolding story of discovery and development through long-term collaborative research into assessment. HERDSA News 27(1), 12–15.
Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of Education 13(2), 191–209.
Sadler, D. (1998). Letting students into the secret: Further steps in making criteria and standards work to improve learning. Paper presented to the Annual Conference for State Review Panels and District Review Panel Chairs, Brisbane.
Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher education. Assessment and Evaluation in Higher Education 30(2), 175–194.
Woolf, H. (2004). Assessment criteria: reflections on current practice. Assessment and Evaluation in Higher Education 29(4), 479–493.
Yorke, M. (2011). Summative Assessment: Dealing with the measurement fallacy, Studies in Higher Education 36(3), 251–273.