Component 3: Selecting Measures

Selecting Measures

The principal evaluation system purposes and standards should clearly define the types of practices and outcomes that will be assessed by the evaluation system, and measures should be selected accordingly. Measures are the methods that evaluators use to determine principals’ levels of performance. Principal evaluation approaches typically include measures of principal practice (i.e., the quality of principals’ performance on certain tasks or functions) and outcomes (i.e., anticipated impact on schools, teaching, and students). Selecting or, if need be, developing appropriate measures is essential to evaluation system design.

System design should carefully balance feasibility and fidelity of implementation with validity and reliability issues. Further, an evaluation system can become burdensome for principals, teachers, and evaluators if it attempts to measure too much but can be viewed as invalid if it measures too little. A cumbersome and costly evaluation system will likely face challenges to strong fidelity of implementation.

For a more detailed discussion of these topics, see the full downloadable Acrobat version of A Practical Guide to Designing Comprehensive Principal Evaluation Systems.


Display all maps

Selecting measures

Map 3.1

Evaluation system's purpose

Guiding questions

Strength of measures

Guiding questions

Application to all
leadership contexts

Guiding questions

Human & resource capacity

Guiding questions

Measuring growth in
tested subject

Map 3.2

Contributions to student
learning growth

Guiding questions

Tested subjects

Guiding questions

% results based on growth model

Guiding questions

Identification of teachers

Guiding questions

Data linkage

Guiding questions

Determination of adequate growth

Guiding questions

Alternative growth measures

Map 3.3

Measures other than
standardized tests

Guiding questions

Identification of teachers

Guiding questions

Identification of measures

Guiding questions


Guiding questions

  • How well does the selected measure align with the evaluation systems’ purposes and definition of principal effectiveness?
  • Can the measure yield data to monitor the evaluation system?
  • Does the selected measure assist the state or district to meet pertinent federal, state, or other guidelines for principal evaluation?
  • What is the strength of evidence that the measure is fair, valid, reliable, feasible, and useful for all of the contexts of intended use?
  • What processes are in place (or need to be) to ensure the fidelity of the measure?
  • How do selected multiple measures complement each other to strengthen the performance evaluation?
  • Do the measures overlap so that they are redundant?
  • Do the measures contradict each other so that they are misaligned?
  • Is the measure reliable, valid, fair, feasible, and useful for all school leadership contexts?
  • How well do student growth measures accurately depict student performance, regardless of context, in particular, in nontested grades and subjects?
  • What human and resource capacity is necessary to implement the measure reliably and with validity?
  • Are there specific training needs that should be considered?
  • Who will be responsible for maintaining performance data and monitoring system quality?
  • Can resources be pooled between and within districts to implement the measure?

Plan to Use Other Measures or Satisfied With Current System


  • Will the other measures be rigorous and comparable across classrooms within a school and across schools?
  • How will other measures be used to generate principal evaluation results?
  • Is there evidence that the other measures can differentiate among teachers who are helping students learn at high levels and those who are not?
  • Will excluding student achievement as a factor be acceptable to the state legislature and the community?
  • How will measures be aggregated (e.g., an average of teacher scores) to provide a principal score?


Plan to Use Student Achievement Growth


  • Are legislative changes required to implement an evaluation system that includes student growth as a component?
  • What types of data will need to be reported?
  • Does the state or district currently have human and financial capacity to collect, calculate, and report data with accuracy?
  • How will principals be matched to schools, and what decision rules need to be determined to attribute scores to a principal (i.e., for new principals or principals entering a school at mid-year)?
  • What types of data will be used in personnel decisions?


  • What statistical model of longitudinal student growth will promote the most coherence and alignment with the state’s accountability system? Examples: Colorado Growth Model, value-added models
  • How will the state or district select potential evaluation models? What technical characteristics does the state or district require?
  • Who will be involved in model selection and making decisions about model implementation (e.g., contextual variables to be included, determining exclusion and attribution rules)?
  • Who would support or oppose linking teacher and student data? Why? How will these concerns be addressed?
  • Will the other measures be rigorous and comparable across classrooms and schools?
  • Do these measures meet the federal requirements of rigor: across two points in time and comparability?


  • Should the percentage differ by the length of a principal’s leadership in a school, length of time as a school principal, or other factors (e.g., level of autonomy the principal has in the school, fiscal control)?
  • What percentage will be supported by the education community?
  • What will the state define as significant?
  • Is legislation necessary to determine the percentage?
  • Are the assessments reliable and valid to support a significant portion of the evaluation to be based on student progress?


  • Will all teachers of tested subjects be included?
  • What is the minimum number of students required for a teacher to be evaluated with student growth (e.g., five students per grade/content area)?
  • Are there certain student populations in which inclusion in value-added or other growth models may raise validity questions (e.g., students with disabilities, English learners)?
  • Can students working toward alternative assessments be included in the growth model?
  • How will the state or district choose a model? Will the task force meet with experts? Will the state assessment office investigate options?


Data Integrity


  • What validation process can be established to ensure clean data (e.g., teachers reviewing student lists, administrators monitoring input)?
  • Can automatic data validation programs be developed?
  • Are there certain student populations in which inclusion in value- added or other growth models is not appropriate (e.g., students with disabilities, English learners)?


Teaching Context/Extenuating Circumstances


  • Have the teacher and principal attribution processes been established for all teaching and leadership situations?
  • How will teachers and principals in schools with high student absenteeism rates or highly mobile students be evaluated?
  • Has a focus group been held with teachers and principals to determine fair attribution?


  • How will performance standards be established for principals using student growth, and what will be considered “adequate” or “good”?
  • Will a relative or an absolute standard be set (e.g., growth-to-standard or relative growth)?
  • Will the standard be based on single-year estimates or estimates combined over time, subjects, or schools (for principals who change schools)?
  • How can uncertainty in growth or value-added estimates be taken into account in setting standards or assigning performance levels?
  • Who will be involved in setting standards?
  • Will the learning trajectory be different for at-risk, special needs, or gifted students?
  • Has the ceiling effect been addressed?
  • Will the use of accommodations affect the measure of student growth?
  • Does this measure meet the federal requirements of rigor: across two points in time and comparability?


Plan to Use Measures Other Than Standardized Tests but Not Student Achievement Growth

  • Will the other measures be rigorous and comparable across classrooms within a school and across schools?
  • How will other measures be used to generate principal evaluation results?
  • Is there evidence that the other measures can differentiate among teachers who are helping students learn at high levels and those who are not?
  • Will excluding student achievement as a factor be acceptable to the state legislature and the community?

Plan to Include Student Achievement Growth

  • What would be the challenge of using other measures of growth besides standardized assessment data?
  • Will the measures other than standardized tests be rigorous and comparable across classrooms?
  • Will all teachers (in both tested and nontested subjects) be evaluated with alternative growth measures? Only teachers of nontested subjects?
  • Which teachers fall under the category of nontested subjects?
  • Are there teachers of certain student populations or situations in which standardized test scores are not available or appropriate to utilize?
  • Will contributions to student learning growth be measured for related services personnel?

Content Standards

  • Do content standards exist for all grades and subjects?
  • Is there a consensus on the key competencies students should achieve in the content areas?
  • Can these content standards be used to guide selection and development of measures?

Measure Selection

  • Which stakeholders need to be involved in determining or identifying measures?
  • What type of meetings or facilitation will stakeholder groups require to select or develop student measures?
  • How will growth in performance subjects (e.g., music, art, physical education) be determined to demonstrate student growth?
  • Will the state use classroom-based assessments, interim or benchmark assessments, curriculum-based assessments, and/or the Four Ps (i.e., projects, portfolios, performances, products) as measures?
  • Are there existing measures that could be considered (e.g., end-of-course assessments, DIBELS, DRA)?
  • Could assessments be developed or purchased?
  • Are federal, state, or private funds available to conduct research?
  • How will the content validity be tested?
  • Can national experts in measurement and assessment be appointed to assist in conducting this research?


Guide to Evaluation Products

This online guide from the National Comprehensive Center for Teacher Quality includes detailed descriptions of more than 100 principal and teacher evaluation tools that are currently implemented and tested in districts and states throughout the country. Descriptions include research and resources, information on the teacher and student population assessed, costs, contact information, and technical support offered.

Using Short-Cycle Interim Assessment to Improve Educator Evaluation, Educator Effectiveness, and Student Achievement

This policy brief from Renaissance Learning discusses using assessment systems already in place in most school systems to supply additional estimates of teacher, principal, and school impact on student learning. “Short-cycle interim assessments” are a specific category of test designed to produce reliable results, whether administered weekly, monthly, or less frequently, depending on the purpose of assessment. Robust systems deliver results in multiple formats that are usable for a range of purposes. Short-cycle interim assessments can deliver additional insights with relatively little cost or delay.



The selection of measures for inclusion in the Tennessee principal evaluation system defines the types of practices and outcomes that will be assessed by the system. Student achievement measures (50 percent), combined with evaluation measures of principals’ proficiency on professional standards, plus the quality of teacher observations (50 percent), determine the level of principal effectiveness. Tennessee’s selection of measures has been informed by Race to the Top criteria and the availability of rigorous, feasible, and useful principal evaluation measures.

The primary purpose of the principal evaluation system is to identify and support instruction that will lead to high levels of student achievement. The designation of 50 percent directly related to student achievement combines 35 percent for schoolwide composite growth, and 15 percent for another achievement measure. On the 15 percent achievement measure, principals select, in collaboration with an evaluator, from a list of student achievement measures, one that aligns as closely as possible to their primary responsibility. This could be state assessments for an elementary school or graduation rates for a high school. The designation of 50 percent related to the quality of principals’ leadership and administrative practices combines 35 percent for a qualitative rating of Tennessee’s Instructional Leadership Standards and 15 percent on the quality of teacher observations.

For more details, see the following: