Bayesian spatial misclassification model for areal count data with applications to COVID-19.


Access rights

Worldwide access.
Access changed 9/25/23.

Journal Title

Journal ISSN

Volume Title



As of December 14, 2020, there have been more than 72.1 million confirmed cases, of which more than 1.61 million have died of COVID-19 globally. In the United States, there are more than 16,200,000 confirmed cases and 299,000 COVID-19-related deaths, the most cases, and deaths of any country. However, even with the huge number of confirmed diagnoses, the public burden of the pandemic is still masked by under-reporting and misclassification. Based on the Bayesian spatial model and Poisson regression, we study two topics, aiming to provide a flexible quantitative approach for simulating and correcting the under-reporting and misclassification of COVID-19 at the US state level. Topic 1 quantifies under-reporting rates with Poisson-logistic regression, combined with the prior information derived from the results of the SARS-CoV-2 antibody sampling study, and then estimates the true case of COVID-19 in each state of the US. Topic 1 also combines the Besag-York-Mollié 2 (BYM2) model to correct the bias of parameter estimation caused by ignoring the spatial autocorrelation. Topic 2 proposes a bivariate Bayesian spatial misclassification model, which can simultaneously calibrate the misclassification of two counts of the same area (for example, state or county). Deaths related to COVID-19 are considered to be misclassified to other causes and vice versa (although the latter case is relatively fewer). In addition, because the number of deaths at the state level shows obvious spatial similarity, BYM2 random effects are included to explain the variability beyond the covariates. Our model was applied to state-level COVID-19 deaths and other deaths, achieving satisfactory results that can be a reference for estimating the true COVID-19 deaths. Topic 3 proposes and discusses the determination of sample size based on skew-normal distribution. This method adopts Bayesian intensive simulation to overcome limitations of closed-form approximation and normality assumption while ensuring sufficient statistical power and nominal coverage of confidence interval (or credible set). Our approach demonstrates good performance and application prospects.



Bayesian spatial model. Under-reporting. Misclassification. COVID-19. Skew-normal.