Misclassification errors informed by response time in item factor analysis.


Access rights

Worldwide access

Journal Title

Journal ISSN

Volume Title



The measurement process necessarily leads to observations measured with error to a degree. In education, researchers often want to obtain measurements of difficult-to-measure constructs such as content knowledge, motivation, affect, and personality. A scale is created using multiple items to triangulate the measurement of the construct of interest using the common information across items. One source of error that is not often accounted for is measurement error in the item response itself. In this study, I propose an approach for measuring latent traits while accounting for item-level measurement error. The proposed approach differentially weighs responses by how long an individual takes to respond to the item, i.e., response time as an absolute measure of time taken on each item−weighing responses by response time discounts the information provided by individuals responding rapidly to items. The result is that individuals with longer response times more heavily inform the estimation of the model, and more highly weighted responses are theorized to more accurately reflect the construct of interest. Utilizing more reliable information provides a foundational step in finding validity evidence for inferences made using scales. The purpose of this study was two-fold. First, simulation studies were conducted to show how the proposed measurement can be estimated and demonstrate the effects of estimating traditional item-factor models when data are prone to item-level measurement error. In these studies, I show that the parameter estimates (e.g., factor loadings, residual variances, etc.) may be severely upwardly or downwardly biased. The coverage rates for interval estimates of the parameters were also highly variable across conditions studied and parameters. The results showed that researchers’ ability to make valid inferences about the underlying model is limited by how item-level measurement error is modeled. Secondly, the applied studies used data from the National Assessment of Educational Progress (NAEP) 2017 math assessment and an open-source dataset on extroversion. The results from these applied studies demonstrate the applicability of the proposed model and how inferences about reliability may be highly dependent on how item-level measurement error is modeled. Finally, implications and applications to educational research using the proposed methods are discussed.



Misclassification. Measurement error. Factor-analysis. Bayesian methods.