DTAMS - Middle Mathematics Teacher Assessments


Diagnostic Mathematics Assessments for Middle School Teachers serve two purposes: (1) to describe the breadth and depth of mathematics content knowledge so that researchers and evaluators can determine teacher knowledge growth over time, the effects of particular experiences (courses, professional development) on teachers' knowledge, or relationships among teacher content knowledge, teaching practice, and student performance and (2) to describe middle school teachers' strengths and weaknesses in mathematics knowledge so that teachers can make appropriate decisions with regard to courses or further professional development.

The assessments measure mathematics knowledge in four content domains (Number/Computation, Geometry/Measurement, Probability/Statistics, Algebraic Ideas). Each assessment is composed of 20 items—10 multiple-choice and 10 open-response. Six versions of each assessment are available in paper-and-pencil format so that researchers, professional development providers, and course instructors can administer them as pre- and post-tests before and after workshops, institutes, or courses to determine growth in teachers' content knowledge.

To determine breadth of assessable mathematics content for middle school teachers, development teams of mathematicians, mathematics educators and teachers conducted extensive literature reviews regarding what mathematics middle school students and teachers should know. They used national recommendations, national and international test objectives, and research to determine appropriate mathematics content. Click on Middle School Mathematics Content Summary Chart [PDF] to see a summary of the analysis of these documents. The numbers in each cell represent page numbers in the documents and the letters (A1, PS3, NC6, . . .) represent bibliographic references for research articles. Mathematics topics that were identified in more than half of the sources (A in the far right column) were included in the assessments. Click on Research on Middle School Mathematics Bibliography for the bibliography. Each assessment has 2-4 mathematics subdomains. The table below summarizes the subdomains for each assessment:

Number/ ComputationGeometry/ MeasurementProbability/ StatisticsAlgebraic Ideas
Whole Numbers2-D GeometryProbabilityRelations/Functions/ Patterns
Rational Numbers3-D GeometryStatisticsEquations/ Inequalities
IntegersTransformational GeometryExpressions/ Polynomials
Number TheoryMeasurement

The depth of mathematics knowledge to be assessed was based on research that defines types of mathematics knowledge. The four types of knowledge assessed were: Type I: Memorized Knowledge; Type II: Conceptual Understanding; Problem Solving/ Reasoning; and Type IV: Pedagogical Content Knowledge. Click on Types of Knowledge for Middle School Mathematics Teachers to see expanded descriptions. Teams of mathematicians, mathematics educators, and teachers developed items for each of the four content domains.

Each team developed item specification charts for each of the assessments. These charts describe the content and knowledge type of items on each of the four assessments. Click on Number/Computation, Geometry/Measurement, Probability/Statistics, and Algebraic Ideas to view the item specification chart for each assessment.

Establishing Validity

Three strategies were used to ensure the validity of the four mathematics assessments. First, as already described, the breadth and depth of mathematics content necessary for middle school teachers was defined by using national recommendations, objectives of standardized tests, and research on misconceptions for both middle school students and teachers. Second, teams composed of mathematicians, mathematics educators, and middle school teachers were used to create the Middle School Mathematics Content Summary Chart [PDF] and to develop prototype and parallel assessments. Third, national reviewers were used to assess the appropriateness of items created for the six forms of the four assessments. The 24 assessments (six forms of the four assessments) were sent to 35 mathematicians, mathematics educators, and middle school mathematics teachers across the country. These educators had responded to a national call for assessment reviewers. Each reviewer analyzed four sets of assessments. For each item on the assessments, reviewers were asked to (1) identify the mathematics content of the item from a list of specific topics (taken from item specification charts), (2) identify the knowledge type (I, II, III, IV), and (3) indicate if the item assessed important mathematics content for middle school teachers (high, medium, or low level of importance). The assessments were distributed to reviewers so that each of the 24 assessments was reviewed by at least six reviewers, with at least one mathematician, mathematics educator, and teacher reviewing each assessment. Therefore, the sets of six parallel items across all four assessments were reviewed by at least 36 different reviewers.

An item was deemed acceptable if (1) at least 60% of the reviewers identified it as assessing a particular content topic, (2) at least 60% of the reviewers identified it as assessing a particular knowledge type, and (3) at least 75% of the reviewers deemed it important knowledge for middle school teachers. Items that met all three of these criteria were included in the assessments. Items that met the last criteria and one of the first two criteria were revised, and items that met none of the criteria were omitted and replaced by a new item. Of the 80 sets of parallel items across the four assessments, 41 met acceptable criteria, and 39 were revised or rewritten. These 39 new or revised items were sent to 16 mathematicians, mathematics educators, and middle school teachers for an additional review. Of these 39 items, all but four (one from each assessment) met the three criteria above. These four items were revised based on comments from the 16 reviewers and included in the assessments.

Establishing Reliability

To measure the reliability of the assessments and the quality of items, the research team used teachers who had participated in projects, professional development experiences, and courses focused on building middle school teachers' knowledge of mathematics. The assessments were administered in a pre-post format to preservice and practicing teachers participating in these projects across the country. Three types of reliability were computed from the results of these completed assessments. First, internal reliability was determined by computing Cronbach's alpha for internal consistency for each of the 24 assessments. Second, equivalency reliability was determined by computing Pearson product moment correlations for each pair of parallel assessments completed by the same groups of teachers. Third, inter-scorer reliability was established using percents of agreements among three graduate students who developed and used the scoring guides for scoring open-response items and eventually scored all the field tests.

From May 1, 2005, through March 1, 2006, 3462 assessments were completed by 2301 teachers in 38 projects in 17 states. Of these, 1049 teachers completed one assessment, 1100 teachers completed two assessments in pre-post format, and 71 teachers completed three assessments in pre-post-post format. Seventy percent of the teachers completed two forms of the Number/Computation assessments; 48 percent completed two forms of the Geometry/Measurement assessments; 33 percent completed two forms of the Probability/Statistics assessments; and 60 percent completed two forms of the Algebraic Ideas assessments.

Internal Reliability

Cronbach's alpha was used to assess the reliability index of each of the 24 forms of the middle school mathematics teacher assessments. The reliability coefficients for the assessments are reported in the table below:

Mathematics Assessments for Middle School Teacher Coefficients of Internal Reliability

Content Domain AssessmentCasesAlpha
Algebraic Ideas429.87

Note that these Cronbach alpha correlations far exceed the acceptable measure of 0.7.

Equivalency Reliability

Pearson product moment correlations were used to measure the strength (magnitude) of association between the pre- and post-measures of the teacher assessments. Parallel versions of assessments are acknowledged to be reliable if equivalent forms of the assessment are administered to the same groups of teachers. As mentioned earlier, six parallel forms of assessments in each content domain were administered in pairs to a total of 862 teachers.

These correlations compared teacher performances on different forms of the assessments completed in varied pre-post windows. Some differences in performance were expected because teachers who completed these forms were taking mathematics classes or participating in professional development workshops on these mathematics topics. Positive gain scores were observed in 27 out of the 29 paired assessments (see table below). The number and distribution of significant correlated pairs in Number Computation, Geometry and Measurement, and Algebra Ideas indicate equivalency reliability through the property of transitivity; that is if version 1 is equivalent to version 2 and version 2 is equivalent to version 3 then version 1 is equivalent to version 3.

Equivalency correlations across pre- and post-test versions of the four assessments.

Number ComputationProbability Statistics
post-testpost-test a
Geometry MeasurementAlgebra

Dark shaded squares indicate significant correlations at the 0.01 level; medium shaded were significant at the 0.05 level and the ones with diagonals were not significant (NC 1-6 and GM 1-4).

This table reports the significant positive correlation for the Number Computation versions 1-3 using the results from Pair 1. The negative score for the Number Computation versions 1-3 from Pair 2 will be further analyzed by inspecting the individual sections for missing or incomplete responses. The non-significant but positive correlations for pairs in Number Computation (1-6) and Geometry and Measurement (1-4) may be attributed to the relatively small sample sizes. These pairs showed small differences in the means of the pre- and post-test scores. CRMSTD faculty and staff continue to assess scorers with regard to inter-scorer reliability.

Using the Assessments

Currently these assessments are available for use free of charge. However, the assessments will be scored for a fee of $10 per teacher per assessment by CRMSTD staff. Once scored, CRMSTD staff will send instructors and professional development providers detailed summary of teachers' performance that includes scores on individual items, on each mathematics subdomain in the content area, and on four different knowledge types (memorized, conceptual understanding, higher-order thinking, pedagogical content knowledge), allowing them to use the scoring summary to analyze performance on specific items, subdomain topics, or knowledge level.

Ordering the Assessments

Send an email to CRMSTD staff at CRMSTD indicating your interest with a brief description of your intended use (e.g. with Math-Science Partnership grant, for a research study, for other professional development purposes, etc.). Also include the following information to help us plan and schedule our scorers:

  • content area(s) you wish to use (Number/Computation, Geometry/Measurement, Probability/Statistics, Algebraic Ideas)
  • approximate dates of administration
  • approximate number of teachers completing assessments
  • contact information (with email address) to whom to return the completed scoring summaries and fee invoices

Frequently Asked Questions (FAQ)

  1. Is training required to administer the measurement tool?
    Training is not required. We provide a short document with administration instructions that are straightforward. Guidelines for use are also under development to help ensure test security.
  2. What are the costs involved?
    Costs are $10 per assessment per teacher. When the electronic score report is sent, an invoice for that amount will be sent as well.
  3. How long does it take teachers to complete one assessment?
    The length of time to take the assessments generally varies between 45-75 minutes, with the bulk of participants taking about 60 minutes. For the post-test, some teachers have more to say for the open-response questions and thus that can take about 10 minutes longer.
  4. How are the assessments delivered to us, and what is the process to have them scored by you?
    The process for obtaining is as follows:
    • Our assessment coordinator sends the assessment(s) electronically via email to the administrator or coordinator ordering the assessments.
    • The administrator or coordinator downloads the assessment(s) and makes as many copies as necessary for each teacher.
    • After administration, the administrator or coordinator mails the completed paper copies back to the CRMSTD for scoring at the address identified in Ordering the Assessments. The email address to whom to send the scoring summary and invoice should be included with the assessments to be scored.
    • The score summary is sent back electronically along with the fee invoice.What are your recommendations for using these assessments?
  5. Our position is to leave it up to the users to decide how these data will best serve them. Below are some examples of what others have done or are doing with these assessments.
    • We do not provide national norms for the assessment scores because the samples of teachers taking the assessment may or may not be representative of teachers as a whole. The assessments are intended for diagnostic purposes and we suggest they are best used to measure growth or to identify strengths and weaknesses of individual teachers rather than comparing to established benchmark scores.
    • Some project directors have administered all four content areas as a pre-test in order to use results to determine on which content area they would like to focus their upcoming professional development. CAUTION: Due to test fatigue, we recommend not administering all four of the assessments on the same day. In this case, we also request that post-tests on all content areas also be done on a schedule convenient to you so that we can collect parallel form reliability data on the instruments.
    • Some project directors have chosen to focus primarily on one or more of the knowledge type subscores or content subcategory scores. For example, some users were more interested in the pedagogical content knowledge of teachers; others were interested in enhancing understanding; and still others wanted to focus on problem solving and reasoning. Some have chosen to emphasize one or two content category subscores, e.g. rational numbers for Number/Computation, functions, relations and patterns for Algebraic Ideas, or probability for Probability/Statistics. We ask that complete assessments be administered to maintain the integrity of the assessments, but the client is free to use the various subscores that are returned as part of the score report in any way that is helpful to them. CAUTION: Since each of these subscores are based on a fewer number of items than the overall assessment, conclusions drawn from subscores alone are more tentative than those from total scores and should be done cautiously.
    • Some project directors have used these assessments in a pre-post design to look for gains, so that it is not necessary to have other norms. Some looked at gains in subscores (either knowledge type or content subcategory) as well as overall gains. The same caution applies as above.

If you have other questions about these assessments, please contact Dr. William S. Bush at 502-852-0590 or bill.bush @ louisville.edu.