Measuring Individual Growth With Conventional and Adaptive Tests
Measuring individuals or groups longitudinally is frequently necessary in social science research and applications. Substantial research and discussion has focused on the statistical properties of measures of change and some of the psychometric problems involved This monte-carlo simulation study focused on properties of the measurement instruments used for obtaining scores that represent change or growth over five time points and examined how well scores from conventional tests and computerized adaptive tests used to measure individual growth curves reflect true change. Data representing four different patterns of individual change and a baseline no-change condition were generated from an item response theory (IRT) model. Different tests simulated were conventional peaked tests with narrow and wider difficulties and three levels of discrimination, and computerized adaptive tests (CATs) drawn from banks with the same levels of discrimination. Conventional tests were scored by number correct and IRT weighted maximum likelihood. Results showed that as the examinees’ scores moved from the difficulty levels at which the tests were concentrated, number-correct scores over-estimated true change and had increasing amounts of error. High discrimination conventional tests had the poorest recovery of change for both groups and individuals. IRT scoring of the conventional tests improved recovery of change somewhat. By contrast, CATs consistently estimated growth with minimum and consistent error and performed best with highly discriminating items.
Keywords: adaptive testing, computerized adaptive tests, conventional tests, individual growth, item response theory, measuring change, measuring growth, off-target tests