A modelling approach to the analysis of complex survey data
- Authors: Dlangamandla, Olwethu
- Date: 2021-10-29
- Subjects: Sampling (Statistics) , Linear models (Statistics) , Multilevel models (Statistics) , Logistic regression analysis , Complex survey data
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10962/192955 , vital:45284
- Description: Surveys are an essential tool for collecting data and most surveys use complex sampling designs to collect the data. Complex sampling designs are used mainly to enhance representativeness in the sample by accounting for the underlying structure of the population. This often results in data that are non-independent and clustered. Ignoring complex design features such as clustering, stratification, multistage and unequal probability sampling may result in inaccurate and incorrect inference. An overview of, and difference between, design-based and model-based approaches to inference for complex survey data has been discussed. This study adopts a model-based approach. The objective of this study is to discuss and describe the modelling approach in analysing complex survey data. This is specifically done by introducing the principle inference methods under which data from complex surveys may be analysed. In particular, discussions on the theory and methods of model fitting for the analysis of complex survey data are presented. We begin by discussing unique features of complex survey data and explore appropriate methods of analysis that account for the complexity inherent in the survey data. We also explore the widely applied logistic regression modelling of binary data in a complex sample survey context. In particular, four forms of logistic regression models are fitted. These models are generalized linear models, multilevel models, mixed effects models and generalized linear mixed models. Simulated complex survey data are used to illustrate the methods and models. Various R packages are used for the analysis. The results presented and discussed in this thesis indicate that a logistic mixed model with first and second level predictors has a better fit compared to a logistic mixed model with first level predictors. In addition, a logistic multilevel model with first and second level predictors and nested random effects provides a better fit to the data compared to other logistic multilevel fitted models. Similar results were obtained from fitting a generalized logistic mixed model with first and second level predictor variables and a generalized linear mixed model with first and second level predictors and nested random effects. , Thesis (MSC) -- Faculty of Science, Statistics, 2021
- Full Text:
- Date Issued: 2021-10-29
- Authors: Dlangamandla, Olwethu
- Date: 2021-10-29
- Subjects: Sampling (Statistics) , Linear models (Statistics) , Multilevel models (Statistics) , Logistic regression analysis , Complex survey data
- Language: English
- Type: Master's theses , text
- Identifier: http://hdl.handle.net/10962/192955 , vital:45284
- Description: Surveys are an essential tool for collecting data and most surveys use complex sampling designs to collect the data. Complex sampling designs are used mainly to enhance representativeness in the sample by accounting for the underlying structure of the population. This often results in data that are non-independent and clustered. Ignoring complex design features such as clustering, stratification, multistage and unequal probability sampling may result in inaccurate and incorrect inference. An overview of, and difference between, design-based and model-based approaches to inference for complex survey data has been discussed. This study adopts a model-based approach. The objective of this study is to discuss and describe the modelling approach in analysing complex survey data. This is specifically done by introducing the principle inference methods under which data from complex surveys may be analysed. In particular, discussions on the theory and methods of model fitting for the analysis of complex survey data are presented. We begin by discussing unique features of complex survey data and explore appropriate methods of analysis that account for the complexity inherent in the survey data. We also explore the widely applied logistic regression modelling of binary data in a complex sample survey context. In particular, four forms of logistic regression models are fitted. These models are generalized linear models, multilevel models, mixed effects models and generalized linear mixed models. Simulated complex survey data are used to illustrate the methods and models. Various R packages are used for the analysis. The results presented and discussed in this thesis indicate that a logistic mixed model with first and second level predictors has a better fit compared to a logistic mixed model with first level predictors. In addition, a logistic multilevel model with first and second level predictors and nested random effects provides a better fit to the data compared to other logistic multilevel fitted models. Similar results were obtained from fitting a generalized logistic mixed model with first and second level predictor variables and a generalized linear mixed model with first and second level predictors and nested random effects. , Thesis (MSC) -- Faculty of Science, Statistics, 2021
- Full Text:
- Date Issued: 2021-10-29
Generalized linear models, with applications in fisheries research
- Authors: Sidumo, Bonelwa
- Date: 2018
- Subjects: Western mosquitofish , Analysis of variance , Fisheries Catch effort South Africa Sundays River (Eastern Cape) , Linear models (Statistics) , Multilevel models (Statistics) , Experimental design
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/61102 , vital:27975
- Description: Gambusia affinis (G. affinis) is an invasive fish species found in the Sundays River Valley of the Eastern Cape, South Africa, The relative abundance and population dynamics of G. affinis were quantified in five interconnected impoundments within the Sundays River Valley, This study utilised a G. affinis data set to demonstrate various, classical ANOVA models. Generalized linear models were used to standardize catch per unit effort (CPUE) estimates and to determine environmental variables which influenced the CPUE, Based on the generalized linear model results dam age, mean temperature, Oreochromis mossambicus abundance and Glossogobius callidus abundance had a significant effect on the G. affinis CPUE. The Albany Angling Association collected data during fishing tag and release events. These data were utilized to demonstrate repeated measures designs. Mixed-effects models provided a powerful and flexible tool for analyzing clustered data such as repeated measures data and nested data, lienee it has become tremendously popular as a framework for the analysis of bio-behavioral experiments. The results show that the mixed-effects methods proposed in this study are more efficient than those based on generalized linear models. These data were better modeled with mixed-effects models due to their flexibility in handling missing data.
- Full Text:
- Date Issued: 2018
- Authors: Sidumo, Bonelwa
- Date: 2018
- Subjects: Western mosquitofish , Analysis of variance , Fisheries Catch effort South Africa Sundays River (Eastern Cape) , Linear models (Statistics) , Multilevel models (Statistics) , Experimental design
- Language: English
- Type: text , Thesis , Masters , MSc
- Identifier: http://hdl.handle.net/10962/61102 , vital:27975
- Description: Gambusia affinis (G. affinis) is an invasive fish species found in the Sundays River Valley of the Eastern Cape, South Africa, The relative abundance and population dynamics of G. affinis were quantified in five interconnected impoundments within the Sundays River Valley, This study utilised a G. affinis data set to demonstrate various, classical ANOVA models. Generalized linear models were used to standardize catch per unit effort (CPUE) estimates and to determine environmental variables which influenced the CPUE, Based on the generalized linear model results dam age, mean temperature, Oreochromis mossambicus abundance and Glossogobius callidus abundance had a significant effect on the G. affinis CPUE. The Albany Angling Association collected data during fishing tag and release events. These data were utilized to demonstrate repeated measures designs. Mixed-effects models provided a powerful and flexible tool for analyzing clustered data such as repeated measures data and nested data, lienee it has become tremendously popular as a framework for the analysis of bio-behavioral experiments. The results show that the mixed-effects methods proposed in this study are more efficient than those based on generalized linear models. These data were better modeled with mixed-effects models due to their flexibility in handling missing data.
- Full Text:
- Date Issued: 2018
Tolerance intervals for variance component models using a Bayesian simulation procedure
- Authors: Sarpong, Abeam Danso
- Date: 2013
- Subjects: Bayesian statistical decision theory , Multilevel models (Statistics)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10583 , http://hdl.handle.net/10948/d1021025
- Description: The estimation of variance components serves as an integral part of the evaluation of variation, and is of interest and required in a variety of applications (Hugo, 2012). Estimation of the among-group variance components is often desired for quantifying the variability and effectively understanding these measurements (Van Der Rijst, 2006). The methodology for determining Bayesian tolerance intervals for the one – way random effects model has originally been proposed by Wolfinger (1998) using both informative and non-informative prior distributions (Hugo, 2012). Wolfinger (1998) also provided relationships with frequentist methodologies. From a Bayesian point of view, it is important to investigate and compare the effect on coverage probabilities if negative variance components are either replaced by zero, or completely disregarded from the simulation process. This research presents a simulation-based approach for determining Bayesian tolerance intervals in variance component models when negative variance components are either replaced by zero, or completely disregarded from the simulation process. This approach handles different kinds of tolerance intervals in a straightforward fashion. It makes use of a computer-generated sample (Monte Carlo process) from the joint posterior distribution of the mean and variance parameters to construct a sample from other relevant posterior distributions. This research makes use of only non-informative Jeffreys‟ prior distributions and uses three Bayesian simulation methods. Comparative results of different tolerance intervals obtained using a method where negative variance components are either replaced by zero or completely disregarded from the simulation process, is investigated and discussed in this research.
- Full Text:
- Date Issued: 2013
- Authors: Sarpong, Abeam Danso
- Date: 2013
- Subjects: Bayesian statistical decision theory , Multilevel models (Statistics)
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:10583 , http://hdl.handle.net/10948/d1021025
- Description: The estimation of variance components serves as an integral part of the evaluation of variation, and is of interest and required in a variety of applications (Hugo, 2012). Estimation of the among-group variance components is often desired for quantifying the variability and effectively understanding these measurements (Van Der Rijst, 2006). The methodology for determining Bayesian tolerance intervals for the one – way random effects model has originally been proposed by Wolfinger (1998) using both informative and non-informative prior distributions (Hugo, 2012). Wolfinger (1998) also provided relationships with frequentist methodologies. From a Bayesian point of view, it is important to investigate and compare the effect on coverage probabilities if negative variance components are either replaced by zero, or completely disregarded from the simulation process. This research presents a simulation-based approach for determining Bayesian tolerance intervals in variance component models when negative variance components are either replaced by zero, or completely disregarded from the simulation process. This approach handles different kinds of tolerance intervals in a straightforward fashion. It makes use of a computer-generated sample (Monte Carlo process) from the joint posterior distribution of the mean and variance parameters to construct a sample from other relevant posterior distributions. This research makes use of only non-informative Jeffreys‟ prior distributions and uses three Bayesian simulation methods. Comparative results of different tolerance intervals obtained using a method where negative variance components are either replaced by zero or completely disregarded from the simulation process, is investigated and discussed in this research.
- Full Text:
- Date Issued: 2013
- «
- ‹
- 1
- ›
- »