# Theses/Dissertations - Statistical Sciences

Permanent URI for this collectionhttps://hdl.handle.net/2104/4798

## Browse

# Browsing Theses/Dissertations - Statistical Sciences by Title

Now showing 1 - 20 of 87

- Results Per Page
- Sort Options

Item A beta regression approach to nonparametric longitudinal data classification in clinical trials.(2022-04-07) Hernandez, Roberto Sergio, 1995-; Tubbs, Jack Dale.Classification is an important topic in statistical analysis. For example, in applications involving clinical trials, an often seen objective is to determine whether or not novel medicines and treatments differ from existing standards of care. There are numerous methods and approaches in the literature for this problem when the endpoint of interest is normally distributed or can be approximated by an asymptotic Normal distribution, yet, the approaches when using a non-normally distributed endpoint are limited. This is especially true when these endpoints are correlated across time. In this dissertation, we investigated several techniques for use with longitudinal, repeated measures data where there is a special interest in adapting some recent results found in the literature on Beta regression. The proposed methods provided a nonparametric, with regard to the design endpoint, model that can be used in the repeated measures problem.Item A power contrast of tests for homogeneity of covariance matrices in a high-dimensional setting.(2018-10-31) Barnard, Ben Joseph, 1987-; Young, Dean M.Multivariate statistical analyses, such as linear discriminant analysis, MANOVA, and profile analysis, have a covariance-matrix homogeneity assumption. Until recently, homogeneity testing of covariance matrices was limited to the well-posed problem, where the number of observations is much larger than the data dimension. Linear dimension reduction has many applications in classification and regression but has been used very little in hypothesis testing for equal covariance matrices. In this manuscript, we first contrast the powers of five current tests for homogeneity of covariance matrices under a high-dimensional setting for two population covariance matrices using Monte Carlo simulations. We then derive a linear dimension reduction method specifically constructed for testing homogeneity of high-dimensional covariance matrices. We also explore the effect of our proposed linear dimension reduction for two or more covariance matrices on the power of four tests for homogeneity of covariance matrices under a high-dimensional setting for two- and three-population covariance matrices. We determine that our proposed linear dimension reduction method, when applied to the original data before using an appropriate test, can yield a substantial increase in power.Item Adaptive designs for phase II clinical trials with binary endpoints.(2019-02-01) Carlile, Tom; Johnston, Dennis A.; Baylor University.Because the sample size is varying while the estimate of sample size is changing, the quality of an approximation of the binomial by the Gaussian is variable and thus not desirable. Also, adaptive designs do not follow the tradition of evaluating an entire sample of patients before analyzing the data. For an adaptive design with early stopping for either futility or efficacy, experimental designs are provided for some of the more common error probabilities. Following the footsteps of Simon (1989) two different levels of clinical significance are assumed for each experimental design. There are two problems with original design that are addressed. The first problem is that no explicit stopping boundary for efficacy was provided in the early stages. The other problem lay in assuming Bernoulli observations with identical probabilities of success. This is addressed by assuming the probability is a random variable and that each outcome is Bernoulli conditioned on the probability. When evaluating drugs, it is important to address both the efficacy and toxicity jointly. A joint model for both efficacy and toxicity is proposed and evaluated. Also a method for dichotomizing efficacy and toxicity events in a way that incorporates their severity, duration, and type.Item Applications of Bayesian quantile regression and sample size determination.(2018-03-21) King, J. Clay, 1984-; Song, Joon Jin.; Stamey, James D.Bayesian statistical methods reverse the philosophy of traditional statistical practice by treating parameters as random, rather than fixed. In so doing, Bayesian methods are able to incorporate uncertainty about parameter values and offer new approaches to problems traditionally viewed through only one lens. One technique that was introduced forty years ago but has only been considered from the Bayesian perspective within the last twenty years is quantile regression (QR). Similarly, sample size determination is a staple of both introductory coursework in statistics and upper-level clinical trial design, but it has historically been presented with little to no mention of its construction under the Bayesian paradigm. With Bayesian research now rapidly building in both of these arenas, we offer two distinct applications of Bayesian QR to count data and present a Bayesian sample size determination scheme for a cost-effectiveness model.Item Applications of functional data analysis to environmental problems.(August 2022) Durell, Luke, 1995-; Hering, Amanda S.Functional Data Analysis (FDA) is a relatively recent framework within the statistical sciences, and while it offers compelling benefits to many applications, it has not yet gained widespread applied use. Two important environmental applications, water quality profile forecasting and larval fish photolocomotor response studies, measure functional data and stand to profit from employing FDA. In this work, we present the first application of FDA to these two applications of environmental and biological sciences. Specifically, this dissertation analyzes the most temporally and vertically dense dissolved oxygen lake profiles in the water quality forecasting literature. This is the first work to introduce full function forecasting with exogenous variables, various machine learning approaches, and empirical prediction band construction in the context of functional principal component machine learning hybrid models. Additionally, this research introduces both a new permutation test for two-way functional ANOVA and the first simulation study comparing four global F-based statistics in a two-way functional ANOVA setting.Item Bayesian adaptive designs for non-inferiority and dose selection trials.(2006-07-31T01:02:37Z) Spann, Melissa Elizabeth.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.The process of conducting a pharmaceutical clinical trial often produces information in a way that can be used as the trial progresses. Bayesian methods offer a highly flexible means of using such information yielding inferences and decisions that are consistent with the laws of probability and consequently admit ease of interpretation. Bayesian adaptive sampling methods offer the potential to accelerate the investigation of a drug without compromising the safety of the trial’s participants. These methods select a patient’s treatment based upon prior information and the knowledge accrued from the trial to date which can reduce patient exposure to unsafe or ineffective treatments and therefore improve patient care in clinical trials. Improving the process of clinical trials in this manner is beneficial to all involved including the pharmaceutical companies and more especially the patients; safer and less expensive drugs can make it to market faster. In this research we present a Bayesian approach to determining if an experimental treatment is non-inferior to an active control treatment within a clinical trial that includes a placebo arm. We incorporate this non-inferiority model in a Bayesian adaptive design that uses joint posterior predictive probabilities of safety and efficacy to determine adaptive allocation probabilities. Results from a retrospective study and a simulation are used to illustrate use of the method. We also present a Bayesian adaptive approach to dose selection that uses effect sizes of doses relative to placebo to perform adaptive allocation and to select the most efficacious dose. The proposed design removes treatment arms if their performance relative to placebo or other treatment arms is undesirable. Results from analyses of simulated data will be discussed.Item Bayesian adjustment for misclassification bias and prior elicitation for dependent parameters.(2019-11-25) Lakshminarayanan, Divya Ranjani, 1993-; Seaman, John Weldon, 1956-This research is motivated by problems in biopharmaceutical research. Prior elicitation is defined as formulating an expert's beliefs about one or more uncertain quantities into a joint probability distribution, and is often used in Bayesian statistics for specifying prior distributions for parameters in the data model. However, there is limited research on eliciting information about dependent random variables, which is often necessary in practice. We develop methods for constructing a prior distribution for the correlation coefficient using expert elicitation. Electronic health records are often used to assess potential adverse drug reaction risk, which may be misclassified for many reasons. Unbiased estimation with the presence of outcome misclassification requires additional information. Using internally validated data, we develop Bayesian models for analyzing misclassified data with a validation substudy and compare its performance to the existing frequentist approaches.Item Bayesian and likelihood-based interval estimation for the risk ratio using double sampling with misclassified binomial data.(2011-01-05T19:44:19Z) Rahardja, Dewi Gabriela.; Young, Dean M.; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.We consider the problem of point and interval estimation for the risk ratio using double sampling with two-sample misclassified binary data. For such data, it is well-known that the actual data model is unidentifiable. To achieve model identifiability, then, we obtain additional data via a double-sampling scheme. For the Bayesian paradigm, we devise a parametric, straight-forward algorithm for sampling from the joint posterior density for the parameters, given the data. We then obtain Bayesian point and interval estimators of the risk ratio of two-proportion parameters. We illustrate our algorithm using a real data example and conduct two Monte Carlo simulation studies to demonstrate that both the point and interval estimators perform well. Additionally, we derive three likelihood-based confidence intervals (CIs) for the risk ratio. Specifically, we first obtain closed-form maximum likelihood estimators (MLEs) for all parameters. We then derive three CIs for the risk ratio: a naive Wald interval, a modified Wald interval, and a Fieller-type interval. For illustration purposes, we apply the three CIs to a real data example. We also perform various Monte Carlo simulation studies to assess and compare the coverage probabilities and average lengths of the three CIs. A modified Wald CI performs the best of the three CIs and has near-nominal coverage probabilities.Item Bayesian and maximum likelihood methods for some two-segment generalized linear models.(2008-10-14T20:38:46Z) Miyamoto, Kazutoshi.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.The change-point (CP) problem, wherein parameters of a model change abruptly at an unknown covariate value, is common in many fields, such as process control, epidemiology, and ecology. CP problems using two-segment regression models, such as those based on generalized linear models, are very flexible and widely used. For two-segment Poisson and logistic regression models, misclassification in the response is well known to cause attenuation of key parameters and other difficulties. How misclassification effects estimation of a CP in such models has not been studied. In this research, we consider the effect of misclassification on CP problems in Poisson and logistic regression. We focus on maximum likelihood and Bayesian methods.Item Bayesian and pseudo-likelihood interval estimation for comparing two Poisson rate parameters using under-reported data.(2009-04-01T15:56:04Z) Greer, Brandi A.; Young, Dean M.; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.We present interval estimation methods for comparing Poisson rate parameters from two independent populations with under-reported data for the rate difference and the rate ratio. In addition, we apply the Bayesian paradigm to derive credible intervals for both the ratio and the difference of the Poisson rates. We also construct pseudo-likelihood-based confidence intervals for the ratio of the rates. We begin by considering two cases for analyzing under-reported Poisson counts: inference when training data are available and inference when they are not. From these cases we derive two marginal posterior densities for the difference in Poisson rates and corresponding credible sets. First, we perform Monte Carlo simulation analyses to examine the effects of differing model parameters on the posterior density. Then we perform additional simulations to study the robustness of the posterior density to misspecified priors. In addition, we apply the new Bayesian credible intervals for the difference of Poisson rates to an example concerning the mortality rates due to acute lower respiratory infection in two age groups for children in the Upper River Division in Gambia and to an example comparing automobile accident injury rates for male and female drivers. We also use the Bayesian paradigm to derive two closed-form posterior densities and credible intervals for the Poisson rate ratio, again in the presence of training data and without it. We perform a series of Monte Carlo simulation studies to examine the properties of our new posterior densities for the Poisson rate ratio and apply our Bayesian credible intervals for the rate ratio to the same two examples mentioned above. Lastly, we derive three new pseudo-likelihood-based confidence intervals for the ratio of two Poisson rates using the double-sampling paradigm for under-reported data. Specifically, we derive profile likelihood-, integrated likelihood-, and approximate integrated likelihood-based intervals. We compare coverage properties and interval widths of the newly derived confidence intervals via a Monte Carlo simulation. Then we apply our newly derived confidence intervals to an example comparing cervical cancer rates.Item Bayesian approach to partially validated binary regression with response and exposure misclassification.(2018-06-09) Anderson, Katrina Julie, 1987-; Stamey, James D.Misclassification of epidemiological and observational data is a problem that commonly arises and can have adverse ramifications on the validity of results if not properly handled. Considerable research has been conducted when only the response or only the exposure are misclassified, while less work has been done on the simultaneous case. We extend previous frequentist work by investigating a Bayesian approach to dependent, differential misclassification models. Using a logit model with misclassified binary response and exposure variables and assuming a validation sub-sample is available, we compare the resulting confidence and credible intervals under the two paradigms. We compare the results under varying percentages of validation subsamples, 100% (ideal scenario), 25%, 15%, 10%, 5%, 2.5%, and 0% (naive scenario) of the overall sample size. We extend this work further be examining scenarios for which the assumptions may falter; we assume independent, differential misclassification, increase the overall sample size, and vary the influence of our priors from diffuse to concentrated.Item Bayesian approaches for design of psychometric studies with underreporting and misclassification.(2013-05-15) Falley, Brandi.; Stamey, James D.; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.Measurement error problems in binary regression are of considerable interest among researchers, especially in epidemiological studies. Misclassification can be considered a special case of measurement error specifically for the situation when measurement is the categorical classification of items. Bayesian methods offer practical advantages for the analysis of epidemiological data including the possibility of incorporating relevant prior scientific information and the ability to make inferences that do not rely on large sample assumptions. Because of the high cost and time constraints for clinical trials, researchers often need to determine the smallest sample size that provides accurate inferences for a parameter of interest. Although most experimenters have employed frequentist methods, the Bayesian paradigm offers a wide variety of methodologies and are becoming increasingly more popular in clinical trials because of their flexibility and their ease of interpretation. We will simultaneously estimate efficacy and safety where the safety variable is subject to underreporting. We propose a Bayesian sample size determination method to account for the underreporting and appropriately power the study. We will allow efficacy and safety to be independent, as well as dependent using a regression model. For both models, we will allow the safety variable to be underreported.Item Bayesian approaches for survival data in pharmaceutical research.(2020-09-15) Prajapati, Purvi Kishor, 1992-; Stamey, James D.; Seaman, John Weldon, 1984-In this research, we consider Bayesian methodologies to address problems in biopharmaceutical research, most of which are motivated by real-world problems in network meta-analysis, prior elicitation, and adaptive designs. Network meta-analysis is a hierarchical model used to combine the results of multiple studies, and allows for us to make direct and indirect comparisons between treatments. We investigate Bayesian network meta-analysis models for survival data based on modeling the log-hazard rates, as opposed to hazards ratios. Expert opinion is often needed to construct priors for time-to-event data, especially in pediatric and oncology studies. For this, we propose a prior elicitation method for the Weibull time-to-event distribution that is based on potentially observable time-to-event summaries which can be transformed to obtain a joint prior distribution for the Weibull parameters. Bayesian adaptive designs take advantage of accumulating information, by allowing key trial parameters to change in response to accruing information and predefined rules. We introduce a novel model-based Bayesian assessment of reading speed that uses an adaptive algorithm to target key reading metrics. These metrics are used in the assessment of reading speed in individuals with poor vision.Item Bayesian approaches to correcting bias in epidemiological data.(2011-05-12T15:17:48Z) Bennett, Monica M.; Stamey, James D.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.Bias in parameter estimation of count data is a common concern. The concern is even greater when all counts are not recorded. Failing to adjust for underreported data can lead to incorrect parameter estimates. A Bayesian Poisson regression model to account for underreported data has previously been developed. We expand this model by using a multilevel Poisson regression. In our model we consider the case where the probability of reporting is the same for all groups, and the case where there are multiple reporting probabilities. In both situations we show the importance of accounting for underreporting in the analysis. Another common source of bias in parameter estimation is missing data. In particular, we consider missing data in follow-up studies aimed to estimate the rate of a particular event. If we ignore the missing data, then both the overall event rates and the uncertainty in the model parameters will be underestimated. To address this problem we will extend an already existing Bayesian model for missing data in follow-up studies to two multilevel models. One model uses an overdispersion term to account for excess variability in the data. The second model uses random intercepts and slopes. The last topic that we consider is a meta-analysis comparison. We are interested in the performance of the methods for safety signal evaluation of rare events. This topic is of particular interest due to the recent FDA guidance for assessing cardiovascular risk in diabetes drugs. We consider three methods based on the Cox proportional hazards model, including a Bayesian approach. A formal comparison of the methods is conducted using a simulation study. In our simulation we model two treatments and consider several scenarios.Item Bayesian approaches to problems in drug safety and adaptive clinical trial designs.(2008-06-10T21:19:06Z) Mauldin, Jo A.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.The efficacy, safety, and cost of pharmaceutical products are critical issues in society today. Motivated both financially and ethically by these concerns, the pharmaceutical industry has continually worked to develop methods which provide more efficient and ethical assessments of the safety and efficacy of pharmaceutical products. There is an increased emphasis on more targeted treatments with a focus on better patient outcomes. In this vein, recent applications of advanced statistical methods have allowed companies to reduce the costs of getting safe and effective products to market—savings that can be passed on to consumers in the form of price cuts or additional investment in research and development. Among the methods that have become increasingly important in drug development are adaptive experimental designs. We first investigate the impacts of misclassification of response on a Bayesian adaptive design. A primary argument for the use of adaptive designs is the efficiency one gains over implementing a traditional fixed design. We examine the design’s performance under misclassified responses and compare it to the situation for which we account for the misclassification in a Bayesian model. Next, we examine the utility of safety lab measures collected during the clinical development of a drug. These labs are used to characterize a drug’s safety profile and their scope can be limited when reasonably confident of no associated safety concern, facilitating reduced costs and less subject burden. We consider the use of a Bayesian generalized linear model and investigate the use of conditional means priors and power priors for the regression coefficients used in the analysis of safety lab measures. Finally, we address the need for transparent benefit-risk assessment methods that combine safety and efficacy data and allow straight forward comparisons of treatment options. We begin by developing interval estimates on a commonly-used benefit-risk ratio. We then propose the use of a Bayesian generalized linear model to jointly assess safety and efficacy, allowing for direct comparisons of competing treatment options utilizing posterior 95% credible sets and predictive probabilities.Item Bayesian dynamic borrowing strategies with power priors and quantifying prior information for circular priors.(2023-08) Xie, Chang, 1994-; Seaman, John Weldon, 1956-This dissertation concerns two problems in Bayesian prior construction. One is the development of a dynamic historical borrowing strategy tailored for settings with small current or historical data samples. The second concerns strategies for the assessment of circular priors. Chapter two introduces a novel dynamic borrowing method that can be applied in both clinical and non-clinical settings. Recent approaches such as that in Thompson et al. (2021) do not accommodate situations with limited current sample sizes. Our approach integrates data amplification techniques specifically for small data sets. We assess the effectiveness of this method through simulation. In Chapter three, we explore the bioassay validation process, which necessitates borrowing from previous studies. We apply the method introduced in Chapter two to a case study for bioassay validation. The outcomes are then contrasted with results derived from another dynamic borrowing method we present in this chapter. Chapter four shifts focus to the quantification of information contained in a circular prior. Circular data are measurements on a unit circle, with the von Mises model being the most widely used model for such data. We gauge the information contained in priors for the von Mises data model using prior equivalent sample size.Item Bayesian evaluation of surrogate endpoints.(2006-07-29T17:03:06Z) Feng, Chunyao.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.To save time and reduce the size and cost of clinical trials, surrogate endpoints are frequently measured instead of true endpoints. The proportion of the treatment effect explained by surrogate endpoints (PTE) is a widely used, albeit controversial, validation criteria. Frequentist and Bayesian methods have been developed to facilitate such validation. The former does not formally incorporate prior information; a critical issue since confidence intervals on PTE is often unacceptably wide. Both the Bayesian and frequentist approaches may yield estimates of PTE outside the unit interval. Furthermore, the existing Bayesian method offers no insight into the prior used for PTE, making prior-to-posterior sensitivity analyses problematic. We proposed a fully Bayesian approach that avoids both of these problems. We also consider the effect of interaction on inference for PTE. As an alternative to the use of PTE, we develop a Bayesian model for relative effect and the association between surrogate and true endpoints, making use of power priors.Item Bayesian inference for bivariate Poisson data with zero-inflation.(2017-07-27) Drevets, Madeline L., 1991-; Seaman, John Weldon, 1956-Multivariate count data with zero-inflation is common throughout pure and applied science. Such count data often includes excess zeros. Zero-inflated Poisson regression models have been used in several applications to model bivariate count data with excess zeros. In this dissertation, we explore a Bayesian approach to bivariate Poisson models where either one or both counts is zero-inflated, with a primary focus on informative prior structures for these models. Bayesian treatments of zero-inflated Poisson models have focused on diffuse prior structures for model parameters. Nevertheless, we demonstrate that such an approach can be problematic with respect to convergence. We offer an informative prior approach, and propose methods of prior elicitation from a subject-matter expert. This includes exploration of methods for informative prior construction for an association parameter, and a multivariate distribution. We demonstrate our proposed methods within the context of a clinical example.Item Bayesian inference for correlated binary data with an application to diabetes complication progression.(2006-10-26T19:07:46Z) Carlin, Patricia M.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.Correlated binary measurements can occur in a variety of practical contexts and afford interesting statistical modeling challenges. In order to model the separate probabilities for each measurement we must somehow account for the relationship between them. We choose to focus our applications to the progression of the complications of diabetic retinopathy and diabetic nephropathy. We first consider probabilistic models which employ Bayes' theorem for predicting the probability of onset of diabetic nephropathy given that a patient has developed diabetic retinopathy, modifying the work of Ballone, Colagrande, Di Nicola, Di Mascio, Di Mascio, and Capani (2003). We consider beta-binomial models using the Sarmanov (1966) framework which allows us to specify the marginal distributions for a given bivariate likelihood. We present both maximum likelihood and Bayesian methods based on this approach. Our Bayesian methods include a fully identified model based on proportional probabilities of disease incidence. Finally, we consider Bayesian models for three different prior structures using likelihoods representing the data in the form of a 2-by-2 table. To do so, we consider the data as counts resulting from two correlated binary measurements: the onset of diabetic retinopathy and the onset of diabetic nephropathy. We compare resulting posterior distributions from a Jeffreys' prior, independent beta priors, and conditional beta priors, based on a structural zero likelihood model and the bivariate binomial model.Item Bayesian inference for vaccine efficacy and prediction of survival probability in prime-boost vaccination regimes.(2019-11-08) Lu, Yuelin, 1992-; Seaman, John Weldon, 1956-This dissertation consists of two major topics on applying Bayesian statistical methods in vaccine development. Chapter two concerns the estimation of vaccine efficacy from validation samples with selection bias. Since there exists a selection bias in the validated group, traditional assumptions about the non-validated group being missing at random do not hold. A selection bias parameter is introduced to handle this problem. Extending the methods of et al. scharfstein (2006), we construct and validate a data generating mechanism that simulates real-world data and allows evaluation of their model. We implement the Bayesian model in JAGS and assess its performance via simulation. Chapter three introduces a two-level Bayesian model which can be used in predicting survival probability from administrated dose concentrations. This research is motivated by the need to use limited information to infer the probability of survival for the next Ebola outbreak under a heterologous prime-boost vaccine regimen. The first level models the relationship between dose and induced antibody count. We use a two-stage response surface to model this relationship. The second level models the association between the antibody count and the probability of survival using a logistic regression. We combine these models to predict survival probability from administrated dosage. We illustrate application of the model with three examples in this chapter and evaluate its performance in Chapter four.