Diagnostic Mathematics Assessments for Middle School Teachers serve two purposes: (1) to describe the breadth and depth of mathematics content knowledge so that researchers and evaluators can determine teacher knowledge growth over time, the effects of particular experiences (courses, professional development) on teachers' knowledge, or relationships among teacher content knowledge, teaching practice, and student performance and (2) to describe middle school teachers' strengths and weaknesses in mathematics knowledge so that teachers can make appropriate decisions with regard to courses or further professional development.
The assessments measure mathematics knowledge in four content domains (Number/Computation, Geometry/Measurement, Probability/Statistics, Algebraic Ideas). Each assessment is composed of 20 items—10 multiple-choice and 10 open-response. Six versions of each assessment are available in paper-and-pencil format so that researchers, professional development providers, and course instructors can administer them as pre- and post-tests before and after workshops, institutes, or courses to determine growth in teachers' content knowledge.
To determine breadth of assessable mathematics content for middle school teachers, development teams of mathematicians, mathematics educators and teachers conducted extensive literature reviews regarding what mathematics middle school students and teachers should know. They used national recommendations, national and international test objectives, and research to determine appropriate mathematics content. Click on Middle School Mathematics Content Summary Chart [PDF] to see a summary of the analysis of these documents. The numbers in each cell represent page numbers in the documents and the letters (A1, PS3, NC6, . . .) represent bibliographic references for research articles. Mathematics topics that were identified in more than half of the sources (A in the far right column) were included in the assessments. Click on Research on Middle School Mathematics Bibliography for the bibliography. Each assessment has 2-4 mathematics subdomains. The table below summarizes the subdomains for each assessment:
Number/ Computation | Geometry/ Measurement | Probability/ Statistics | Algebraic Ideas |
---|---|---|---|
Whole Numbers | 2-D Geometry | Probability | Relations/Functions/ Patterns |
Rational Numbers | 3-D Geometry | Statistics | Equations/ Inequalities |
Integers | Transformational Geometry | Expressions/ Polynomials | |
Number Theory | Measurement |
The depth of mathematics knowledge to be assessed was based on research that defines types of mathematics knowledge. The four types of knowledge assessed were: Type I: Memorized Knowledge; Type II: Conceptual Understanding; Problem Solving/ Reasoning; and Type IV: Pedagogical Content Knowledge. Click on Types of Knowledge for Middle School Mathematics Teachers to see expanded descriptions. Teams of mathematicians, mathematics educators, and teachers developed items for each of the four content domains.
Each team developed item specification charts for each of the assessments. These charts describe the content and knowledge type of items on each of the four assessments. Click on Number/Computation, Geometry/Measurement, Probability/Statistics, and Algebraic Ideas to view the item specification chart for each assessment.
Three strategies were used to ensure the validity of the four mathematics assessments. First, as already described, the breadth and depth of mathematics content necessary for middle school teachers was defined by using national recommendations, objectives of standardized tests, and research on misconceptions for both middle school students and teachers. Second, teams composed of mathematicians, mathematics educators, and middle school teachers were used to create the Middle School Mathematics Content Summary Chart [PDF] and to develop prototype and parallel assessments. Third, national reviewers were used to assess the appropriateness of items created for the six forms of the four assessments. The 24 assessments (six forms of the four assessments) were sent to 35 mathematicians, mathematics educators, and middle school mathematics teachers across the country. These educators had responded to a national call for assessment reviewers. Each reviewer analyzed four sets of assessments. For each item on the assessments, reviewers were asked to (1) identify the mathematics content of the item from a list of specific topics (taken from item specification charts), (2) identify the knowledge type (I, II, III, IV), and (3) indicate if the item assessed important mathematics content for middle school teachers (high, medium, or low level of importance). The assessments were distributed to reviewers so that each of the 24 assessments was reviewed by at least six reviewers, with at least one mathematician, mathematics educator, and teacher reviewing each assessment. Therefore, the sets of six parallel items across all four assessments were reviewed by at least 36 different reviewers.
An item was deemed acceptable if (1) at least 60% of the reviewers identified it as assessing a particular content topic, (2) at least 60% of the reviewers identified it as assessing a particular knowledge type, and (3) at least 75% of the reviewers deemed it important knowledge for middle school teachers. Items that met all three of these criteria were included in the assessments. Items that met the last criteria and one of the first two criteria were revised, and items that met none of the criteria were omitted and replaced by a new item. Of the 80 sets of parallel items across the four assessments, 41 met acceptable criteria, and 39 were revised or rewritten. These 39 new or revised items were sent to 16 mathematicians, mathematics educators, and middle school teachers for an additional review. Of these 39 items, all but four (one from each assessment) met the three criteria above. These four items were revised based on comments from the 16 reviewers and included in the assessments.
To measure the reliability of the assessments and the quality of items, the research team used teachers who had participated in projects, professional development experiences, and courses focused on building middle school teachers' knowledge of mathematics. The assessments were administered in a pre-post format to preservice and practicing teachers participating in these projects across the country. Three types of reliability were computed from the results of these completed assessments. First, internal reliability was determined by computing Cronbach's alpha for internal consistency for each of the 24 assessments. Second, equivalency reliability was determined by computing Pearson product moment correlations for each pair of parallel assessments completed by the same groups of teachers. Third, inter-scorer reliability was established using percents of agreements among three graduate students who developed and used the scoring guides for scoring open-response items and eventually scored all the field tests.
From May 1, 2005, through March 1, 2006, 3462 assessments were completed by 2301 teachers in 38 projects in 17 states. Of these, 1049 teachers completed one assessment, 1100 teachers completed two assessments in pre-post format, and 71 teachers completed three assessments in pre-post-post format. Seventy percent of the teachers completed two forms of the Number/Computation assessments; 48 percent completed two forms of the Geometry/Measurement assessments; 33 percent completed two forms of the Probability/Statistics assessments; and 60 percent completed two forms of the Algebraic Ideas assessments.
Cronbach's alpha was used to assess the reliability index of each of the 24 forms of the middle school mathematics teacher assessments. The reliability coefficients for the assessments are reported in the table below:
Mathematics Assessments for Middle School Teacher Coefficients of Internal Reliability
Content Domain Assessment | Cases | Alpha |
---|---|---|
Number/Computation | 796 | .87 |
Geometry/Measurement | 429 | .87 |
Probability/Statistics | 543 | .90 |
Algebraic Ideas | 429 | .87 |
Note that these Cronbach alpha correlations far exceed the acceptable measure of 0.7.
Pearson product moment correlations were used to measure the strength (magnitude) of association between the pre- and post-measures of the teacher assessments. Parallel versions of assessments are acknowledged to be reliable if equivalent forms of the assessment are administered to the same groups of teachers. As mentioned earlier, six parallel forms of assessments in each content domain were administered in pairs to a total of 862 teachers.
These correlations compared teacher performances on different forms of the assessments completed in varied pre-post windows. Some differences in performance were expected because teachers who completed these forms were taking mathematics classes or participating in professional development workshops on these mathematics topics. Positive gain scores were observed in 27 out of the 29 paired assessments (see table below). The number and distribution of significant correlated pairs in Number Computation, Geometry and Measurement, and Algebra Ideas indicate equivalency reliability through the property of transitivity; that is if version 1 is equivalent to version 2 and version 2 is equivalent to version 3 then version 1 is equivalent to version 3.
Equivalency correlations across pre- and post-test versions of the four assessments.
Number Computation | Probability Statistics | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
post-test | post-test a | |||||||||||||||
pre-tests | 2 | 3 | 4 | 5 | 6 | pre-tests | 3 | 4 | 6 | |||||||
1 | 1 | |||||||||||||||
2 | 5 | |||||||||||||||
3 | ||||||||||||||||
5 | ||||||||||||||||
Geometry Measurement | Algebra | |||||||||||||||
post-test | post-test | |||||||||||||||
pre-test | 2 | 3 | 4 | 5 | 6 | pre-test | 3 | 4 | 5 | 6 | ||||||
1 | 1 | |||||||||||||||
2 | 2 | |||||||||||||||
3 | 3 | |||||||||||||||
4 | 4 | |||||||||||||||
Dark shaded squares indicate significant correlations at the 0.01 level; medium shaded were significant at the 0.05 level and the ones with diagonals were not significant (NC 1-6 and GM 1-4).
This table reports the significant positive correlation for the Number Computation versions 1-3 using the results from Pair 1. The negative score for the Number Computation versions 1-3 from Pair 2 will be further analyzed by inspecting the individual sections for missing or incomplete responses. The non-significant but positive correlations for pairs in Number Computation (1-6) and Geometry and Measurement (1-4) may be attributed to the relatively small sample sizes. These pairs showed small differences in the means of the pre- and post-test scores. CRMSTD faculty and staff continue to assess scorers with regard to inter-scorer reliability.
Currently these assessments are available for use free of charge. However, the assessments will be scored for a fee of $10 per teacher per assessment by CRMSTD staff. Once scored, CRMSTD staff will send instructors and professional development providers detailed summary of teachers' performance that includes scores on individual items, on each mathematics subdomain in the content area, and on four different knowledge types (memorized, conceptual understanding, higher-order thinking, pedagogical content knowledge), allowing them to use the scoring summary to analyze performance on specific items, subdomain topics, or knowledge level.
Send an email to CRMSTD staff at CRMSTD indicating your interest with a brief description of your intended use (e.g. with Math-Science Partnership grant, for a research study, for other professional development purposes, etc.). Also include the following information to help us plan and schedule our scorers:
If you have other questions about these assessments, please contact Dr. William S. Bush at 502-852-0590 or bill.bush @ louisville.edu.