Monday, February 1, 2016

Science Program Evaluation and a System of Science Assessment

Assuming your district/school has established a vision for science education and large-scale, specific goals aligned to that vision, you will next need to determine a system of assessments for evaluating progress toward those goals. As mentioned in my last post, many districts are working to adopt and implement new science standards. Strategically assessing science-related outcomes at multiple levels will provide ongoing evidence of effective change – after all, why make changes if you don’t know whether they actually make any difference?

While it might be obvious, an evaluation of a science program based on these goals will take more than one assessment! In other words, the annual state standardized test, often the only systematic science test used by a school, will not measure the full range of outcomes related to a meaningful vision for science education. That requires leaders to strategically implement a system of assessments. The Wisconsin DPI has a chart that illustrates some components of such a system, including formative, interim, and summative elements.

The majority of assessment will happen formatively at the classroom level. This level is where teachers see the day-to-day use of scientific practices by their students as they investigate, communicate, and ask questions about science. It will be critical for teachers to have the structures to discuss what they’re observing from their students, collaboratively determining next steps. Processes of informal formative assessment should drive instructional practice. If schools are moving toward the NGSS or NRC Science Framework, formative, as well as all levels of assessment, should be three-dimensional.

 Common, interim assessments and rubrics across classrooms and grade-levels can support collaborative understanding of students’ abilities. These types of assessments can provide a more formal view into student growth in relation to science content knowledge and practice. Quality performance tasks can potentially provide the clearest information for collaborative groups of teachers to reflect on progress toward their goals. They need to be implemented well, however, in order to be useful. Teachers must have the time to score papers together and come to an agreement on how particular examples of student work meet the rubric criteria.

Large-scale district summative tests (or state level tests) often afford the least amount of data for specific instructional guidance. They might, however, suggest areas for professional development or foci for revised student project rubrics. For example, a set of district end-of-course exams might all show that students across the district struggle with using data effectively. Often these types of tests are multiple-choice, which provide limited information in relation to authentic science practice, but they can be effectively paired with open-ended opportunities for students to describe their reasoning.

An often forgotten element in such an assessment system is an evaluation of student attitudes about science and their general scientific literacy. Do they see how science relates to their lives? Can they make sense of scientific evidence within popular media? Is science meaningful for them?

In summary, schools and districts reviewing and attempting to improve their science programs will have unclear success in that process if they haven’t defined what outcomes they want and how to measure them. A meaningful and strategic system of science assessment will be an essential part of this process. 


The next series of blog posts will discuss formative, interim, and summative assessments in more depth, as well as effective surveys of student attitudes. Each will provide examples of these assessment types and suggestions for classroom or school use.