List of Titles

Quick View

Importance of various data sources in deterministic stock assessment models

- Northrop, Amanda Rosalind

Authors: Northrop, Amanda Rosalind
Date: 2008
Subjects: Fish stock assessment -- Mathematical models , Fishery management -- Mathematical models , Fish populations -- Mathematical models , Error analysis (Mathematics) , Fishery management -- Statistical methods , Fish stock assessment -- Statistical methods
Language: English
Type: Thesis , Masters , MSc
Identifier: vital:5571 , http://hdl.handle.net/10962/d1002811 , Fish stock assessment -- Mathematical models , Fishery management -- Mathematical models , Fish populations -- Mathematical models , Error analysis (Mathematics) , Fishery management -- Statistical methods , Fish stock assessment -- Statistical methods
Description: In fisheries, advice for the management of fish populations is based upon management quantities that are estimated by stock assessment models. Fisheries stock assessment is a process in which data collected from a fish population are used to generate a model which enables the effects of fishing on a stock to be quantified. This study determined the effects of various data sources, assumptions, error scenarios and sample sizes on the accuracy with which the age-structured production model and the Schaefer model (assessment models) were able to estimate key management quantities for a fish resource similar to the Cape hakes (Merluccius capensis and M. paradoxus). An age-structured production model was used as the operating model to simulate hypothetical fish resource population dynamics for which management quantities could be determined by the assessment models. Different stocks were simulated with various harvest rate histories. These harvest rates produced Downhill trip data, where harvest rates increase over time until the resource is close to collapse, and Good contrast data, where the harvest rate increases over time until the resource is at less than half of it’s exploitable biomass, and then it decreases allowing the resource to rebuild. The accuracy of the assessment models were determined when data were drawn from the operating model with various combinations of error. The age-structured production model was more accurate at estimating maximum sustainable yield, maximum sustainable yield level and the maximum sustainable yield ratio. The Schaefer model gave more accurate estimates of Depletion and Total Allowable Catch. While the assessment models were able to estimate management quantities using Downhill trip data, the estimates improved significantly when the models were tuned with Good contrast data. When autocorrelation in the spawner-recruit curve was not accounted for by the deterministic assessment model, inaccuracy in parameter estimates were high. The assessment model management quantities were not greatly affected by multinomial ageing error in the catch-at-age matrices at a sample size of 5000 otoliths. Assessment model estimates were closer to their true values when log-normal error were assumed in the catch-at-age matrix, even when the true underlying error were multinomial. However, the multinomial had smaller coefficients of variation at all sample sizes, between 1000 and 10000, of otoliths aged. It was recommended that the assessment model is chosen based on the management quantity of interest. When the underlying error is multinomial, the weighted log-normal likelihood function should be used in the catch-at-age matrix to obtain accurate parameter estimates. However, the multinomial likelihood should be used to minimise the coefficient of variation. Investigation into correcting for autocorrelation in the stock-recruitment relationship should be carried out, as it had a large effect on the accuracy of management quantities.
Full Text:
Date Issued: 2008

Quick View

A modelling approach to the analysis of complex survey data

- Dlangamandla, Olwethu

Authors: Dlangamandla, Olwethu
Date: 2021-10-29
Subjects: Sampling (Statistics) , Linear models (Statistics) , Multilevel models (Statistics) , Logistic regression analysis , Complex survey data
Language: English
Type: Master's theses , text
Identifier: http://hdl.handle.net/10962/192955 , vital:45284
Description: Surveys are an essential tool for collecting data and most surveys use complex sampling designs to collect the data. Complex sampling designs are used mainly to enhance representativeness in the sample by accounting for the underlying structure of the population. This often results in data that are non-independent and clustered. Ignoring complex design features such as clustering, stratification, multistage and unequal probability sampling may result in inaccurate and incorrect inference. An overview of, and difference between, design-based and model-based approaches to inference for complex survey data has been discussed. This study adopts a model-based approach. The objective of this study is to discuss and describe the modelling approach in analysing complex survey data. This is specifically done by introducing the principle inference methods under which data from complex surveys may be analysed. In particular, discussions on the theory and methods of model fitting for the analysis of complex survey data are presented. We begin by discussing unique features of complex survey data and explore appropriate methods of analysis that account for the complexity inherent in the survey data. We also explore the widely applied logistic regression modelling of binary data in a complex sample survey context. In particular, four forms of logistic regression models are fitted. These models are generalized linear models, multilevel models, mixed effects models and generalized linear mixed models. Simulated complex survey data are used to illustrate the methods and models. Various R packages are used for the analysis. The results presented and discussed in this thesis indicate that a logistic mixed model with first and second level predictors has a better fit compared to a logistic mixed model with first level predictors. In addition, a logistic multilevel model with first and second level predictors and nested random effects provides a better fit to the data compared to other logistic multilevel fitted models. Similar results were obtained from fitting a generalized logistic mixed model with first and second level predictor variables and a generalized linear mixed model with first and second level predictors and nested random effects. , Thesis (MSC) -- Faculty of Science, Statistics, 2021
Full Text:
Date Issued: 2021-10-29

Quick View

Statistical and Mathematical Learning: an application to fraud detection and prevention

- Hamlomo, Sisipho

Authors: Hamlomo, Sisipho
Date: 2022-04-06
Subjects: Credit card fraud , Bootstrap (Statistics) , Support vector machines , Neural networks (Computer science) , Decision trees , Machine learning , Cross-validation‎ , Imbalanced data‎
Language: English
Type: Master's thesis , text
Identifier: http://hdl.handle.net/10962/233795 , vital:50128
Description: Credit card fraud is an ever-growing problem. There has been a rapid increase in the rate of fraudulent activities in recent years resulting in a considerable loss to several organizations, companies, and government agencies. Many researchers have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, credit card fraud detection is not a straightforward task since fraudulent behaviours usually differ for each attempt and the dataset is highly imbalanced, that is, the frequency of non-fraudulent cases outnumbers the frequency of fraudulent cases. In the case of the European credit card dataset, we have a ratio of approximately one fraudulent case to five hundred and seventy-eight non-fraudulent cases. Different methods were implemented to overcome this problem, namely random undersampling, one-sided sampling, SMOTE combined with Tomek links and parameter tuning. Predictive classifiers, namely logistic regression, decision trees, k-nearest neighbour, support vector machine and multilayer perceptrons, are applied to predict if a transaction is fraudulent or non-fraudulent. The model's performance is evaluated based on recall, precision, F1-score, the area under receiver operating characteristics curve, geometric mean and Matthew correlation coefficient. The results showed that the logistic regression classifier performed better than other classifiers except when the dataset was oversampled. , Thesis (MSc) -- Faculty of Science, Statistics, 2022
Full Text:
Date Issued: 2022-04-06

Quick View

Rates of return to education of blacks in South Africa

- Serumaga-Zake, Philip A

Authors: Serumaga-Zake, Philip A
Date: 1991
Subjects: Black people -- Education -- South Africa -- Statistics Black people -- Employment -- South Africa -- Statistics Black people -- South Africa -- Economic conditions -- Statistics
Language: English
Type: Thesis , Masters , MSc
Identifier: vital:5565 , http://hdl.handle.net/10962/d1002084
Description: The principal objectives of this empirical study were to test the hypothesis that eduction is a major determinant of people's earnings differentials and to calculate private and social rates of return to education of blacks in South Africa excluding Transkei, Bophuthatswana, Venda and Ciskei. Basically, the data for working men and women used in the study were extracted from the 1985 current Population survey files comprising a sample representative of the black population. Lifetime earnings profiles are constructed from these data for five educational levels, namely, no schooling up to standard 1, standards 2 to 4, standards 5 to 7, standards 8 to 9 and standard 10. Schooling is assumed to account for 60% of the income differentials between these profiles, after adjustment for the differing probabilities of finding work of persons in specific age-education groups. Imputed average household outlays on schooling are taken as the private direct cost of education supplemented by estimates of per pupil spending by the various government departments responsible for black schooling for calculation of the social costs per year of primary and secondary schooling. Indirect cost in the form of imputed foregone earnings are included from standard 5 (age 15) onwards. The resulting private internal rates of return to education of males are about 16% at primary level and 24% for secondary schooling. Corresponding social rates of return are about 6% for primary and 15% for secondary education. The estimates for females indicate that between no schooling and standards 2 to 4 level, the private and social rates of return are -1% and -4% respectively, from standards 2 to 4 to standards 5 to 7 level, private returns of 12% and social returns of 4% are reported and for the remaining secondary school phases private returns of 32% and social returns of 15% are estimated. It is implied that black education is receiving minimal government financial assistance compared to those of the other population groups. The evidence of the results of the study indicates that; besides education, marital status, locational, regional and occupational variables also influence earnings differentials, the governments responsible for black education should emphasize human capital investment in relation to physical capital investment, on average more educated persons are better off than the less educated ones and with the exception of female early primary schooling, generally, it is worthwhile for an individual to undertake a certain educational programme investment
Full Text:
Date Issued: 1991

Quick View

Application of multiserver queueing to call centres

- Majakwara, Jacob

Authors: Majakwara, Jacob
Date: 2010
Subjects: Call centers , ERLANG (Computer program language) , Queuing theory
Language: English
Type: Thesis , Masters , MSc
Identifier: vital:5578 , http://hdl.handle.net/10962/d1015461
Description: The simplest and most widely used queueing model in call centres is the M/M/k system, sometimes referred to as Erlang-C. For many applications the model is an over-simplification. Erlang-C model ignores among other things busy signals, customer impatience and services that span multiple visits. Although the Erlang-C formula is easily implemented, it is not easy to obtain insight from its answers (for example, to find an approximate answer to questions such as "how many additional agents do I need if the arrival rate doubles?"). An approximation of the Erlang-C formula that gives structural insight into this type of question would be of use to better understand economies of scale in call centre operations. Erlang-C based predictions can also turn out highly inaccurate because of violations of underlying assumptions and these violations are not straightforward to model. For example, non-exponential service times lead one to the M/G/k queue which, in stark contrast to the M/M/k system, is difficult to analyse. This thesis deals mainly with the general M/GI/k model with abandonment. The arrival process conforms to a Poisson process, service durations are independent and identically distributed with a general distribution, there are k servers, and independent and identically distributed customer abandoning times with a general distribution. This thesis will endeavour to analyse call centres using M/GI/k model with abandonment and the data to be used will be simulated using EZSIM-software. The paper by Brown et al. [3] entitled "Statistical Analysis of a Telephone Call Centre: A Queueing-Science Perspective," will be the basis upon which this thesis is built.
Full Text:
Date Issued: 2010

Quick View

Generalized linear models, with applications in fisheries research

- Sidumo, Bonelwa

Authors: Sidumo, Bonelwa
Date: 2018
Subjects: Western mosquitofish , Analysis of variance , Fisheries Catch effort South Africa Sundays River (Eastern Cape) , Linear models (Statistics) , Multilevel models (Statistics) , Experimental design
Language: English
Type: text , Thesis , Masters , MSc
Identifier: http://hdl.handle.net/10962/61102 , vital:27975
Description: Gambusia affinis (G. affinis) is an invasive fish species found in the Sundays River Valley of the Eastern Cape, South Africa, The relative abundance and population dynamics of G. affinis were quantified in five interconnected impoundments within the Sundays River Valley, This study utilised a G. affinis data set to demonstrate various, classical ANOVA models. Generalized linear models were used to standardize catch per unit effort (CPUE) estimates and to determine environmental variables which influenced the CPUE, Based on the generalized linear model results dam age, mean temperature, Oreochromis mossambicus abundance and Glossogobius callidus abundance had a significant effect on the G. affinis CPUE. The Albany Angling Association collected data during fishing tag and release events. These data were utilized to demonstrate repeated measures designs. Mixed-effects models provided a powerful and flexible tool for analyzing clustered data such as repeated measures data and nested data, lienee it has become tremendously popular as a framework for the analysis of bio-behavioral experiments. The results show that the mixed-effects methods proposed in this study are more efficient than those based on generalized linear models. These data were better modeled with mixed-effects models due to their flexibility in handling missing data.
Full Text:
Date Issued: 2018

Quick View

Bayesian accelerated life tests: exponential and Weibull models

- Izally, Sharkay Ruwade

Authors: Izally, Sharkay Ruwade
Date: 2016
Language: English
Type: Thesis , Masters , MSc
Identifier: http://hdl.handle.net/10962/3003 , vital:20351
Description: Reliability life testing is used for life data analysis in which samples are tested under normal conditions to obtain failure time data for reliability assessment. It can be costly and time consuming to obtain failure time data under normal operating conditions if the mean time to failure of a product is long. An alternative is to use failure time data from an accelerated life test (ALT) to extrapolate the reliability under normal conditions. In accelerated life testing, the units are placed under a higher than normal stress condition such as voltage, current, pressure, temperature, to make the items fail in a shorter period of time. The failure information is then transformed through an accelerated model commonly known as the time transformation function, to predict the reliability under normal operating conditions. The power law will be used as the time transformation function in this thesis. We will first consider a Bayesian inference model under the assumption that the underlying life distribution in the accelerated life test is exponentially distributed. The maximal data information (MDI) prior, the Ghosh Mergel and Liu (GML) prior and the Jeffreys prior will be derived for the exponential distribution. The propriety of the posterior distributions will be investigated. Results will be compared when using these non-informative priors in a simulation study by looking at the posterior variances. The Weibull distribution as the underlying life distribution in the accelerated life test will also be investigated. The maximal data information prior will be derived for the Weibull distribution using the power law. The uniform prior and a mixture of Gamma and uniform priors will be considered. The propriety of these posteriors will also be investigated. The predictive reliability at the use-stress will be computed for these models. The deviance information criterion will be used to compare these priors. As a result of using a time transformation function, Bayesian inference becomes analytically intractable and Markov Chain Monte Carlo (MCMC) methods will be used to alleviate this problem. The Metropolis-Hastings algorithm will be used to sample from the posteriors for the exponential model in the accelerated life test. The adaptive rejection sampling method will be used to sample from the posterior distributions when the Weibull model is considered.
Full Text:
Date Issued: 2016

Quick View

Bayesian hierarchical modelling with application in spatial epidemiology

- Southey, Richard Robert

Authors: Southey, Richard Robert
Date: 2018
Subjects: Bayesian statistical decision theory , Spatial analysis (Statistics) , Medical mapping , Pericarditis , Mortality Statistics
Language: English
Type: text , Thesis , Masters , MSc
Identifier: http://hdl.handle.net/10962/59489 , vital:27617
Description: Disease mapping and spatial statistics have become an important part of modern day statistics and have increased in popularity as the methods and techniques have evolved. The application of disease mapping is not only confined to the analysis of diseases as other applications of disease mapping can be found in Econometric and financial disciplines. This thesis will consider two data sets. These are the Georgia oral cancer 2004 data set and the South African acute pericarditis 2014 data set. The Georgia data set will be used to assess the hyperprior sensitivity of the precision for the uncorrelated heterogeneity and correlated heterogeneity components in a convolution model. The correlated heterogeneity will be modelled by a conditional autoregressive prior distribution and the uncorrelated heterogeneity will be modelled with a zero mean Gaussian prior distribution. The sensitivity analysis will be performed using three models with conjugate, Jeffreys' and a fixed parameter prior for the hyperprior distribution of the precision for the uncorrelated heterogeneity component. A simulation study will be done to compare four prior distributions which will be the conjugate, Jeffreys', probability matching and divergence priors. The three models will be fitted in WinBUGS® using a Bayesian approach. The results of the three models will be in the form of disease maps, figures and tables. The results show that the hyperprior of the precision for the uncorrelated heterogeneity and correlated heterogeneity components are sensitive to changes and will result in different results depending on the specification of the hyperprior distribution of the precision for the two components in the model. The South African data set will be used to examine whether there is a difference between the proper conditional autoregressive prior and intrinsic conditional autoregressive prior for the correlated heterogeneity component in a convolution model. Two models will be fitted in WinBUGS® for this comparison. Both the hyperpriors of the precision for the uncorrelated heterogeneity and correlated heterogeneity components will be modelled using a Jeffreys' prior distribution. The results show that there is no significant difference between the results of the model with a proper conditional autoregressive prior and intrinsic conditional autoregressive prior for the South African data, although there are a few disadvantages of using a proper conditional autoregressive prior for the correlated heterogeneity which will be stated in the conclusion.
Full Text:
Date Issued: 2018

Quick View

The application of statistical classification to predict sovereign default

- Vele, Rendani

Authors: Vele, Rendani
Date: 2023-10-13
Subjects: Uncatalogued
Language: English
Type: Academic theses , Master's theses , text
Identifier: http://hdl.handle.net/10962/424563 , vital:72164
Description: When considering sovereign loans, it is imperative for a financial institution to have a good understanding of the sovereign they are transacting with. Defaults can occur if proper evaluation steps are not considered. To aid in the prediction of potential sovereign defaults, financial institutions, together with grading companies, quantify the risk associated with issuing a loan to a sovereign by developing sovereign default early warning systems (EWS). Various classification models are considered in this study to develop sovereign default EWS. These models are the binary logit, probit, Bayesian additive regression trees, and artificial neural networks. This study investigates the predictive performance of the various classification techniques. Sovereign information is not readily available, so missing data techniques are considered in order to counter the data availability issue. Sovereign defaults are rare, which results in an imbalance in the distribution of the binary dependent variable. To assess data sets with such characteristics, metrics for imbalanced data are considered for model performance comparison. From the findings, the Bayesian additive regression technique generated better results than the other techniques when considering a basic data analysis. Moreover when cross-validation was considered, the neural network technique performed best. In addition, regional models had better results than the global model when considering model predictive capability. The significance of this study is to develop sovereign default prediction models using various classification techniques focused on enhancing previous literature and analysis through the application of Bayesian additive regression trees. , Thesis (MSc) -- Faculty of Science, Statistics, 2023
Full Text:
Date Issued: 2023-10-13

Quick View

Enhancing the use of large-scale assessment data in South Africa: Multidimensional Item Response Theory

- Lahoud, Tamlyn Ann

Authors: Lahoud, Tamlyn Ann
Date: 2023-03-29
Subjects: Uncatalogued
Language: English
Type: Academic theses , Master's theses , text
Identifier: http://hdl.handle.net/10962/422389 , vital:71938
Description: This research aims to enhance the use of large-scale assessment data in South Africa by evaluating assessment validity by means of multidimensional item response theory and its associated statistical techniques, which have been severely underutilised. Data from the 2014 administration of the grade 6 Mathematics annual national assessment was used in this study and all analyses were conducted using the mirt package in R. A two parameter logistic item response theory model was developed which indicated a clear alignment between the model parameters and difficulty specifications of the test. The test was found to favour learners within the central band on the ability scale. An exploratory five dimensional item response theory model was then developed to investigate the alignment with the test specifications as evidence for construct validity. Significant discrepancies between the factor structure and the specifications of the test were identified. Notably, the results suggest that some items measured an ability that was not purely mathematical, such as reading ability, which would distort the test’s representation of Mathematics ability, disadvantage learners with lower English literacy, and reduce the construct validity of the test. Further validity evidence was obtained by differential item functioning analyses which revealed that fourteen items function differently for learners from different provinces. Although possible reasons for the presence of differential item functioning among provinces were not discussed, its presence provided sufficient evidence against the validity of the test. In conclusion, multidimensional item response theory provided an effective and rigorous approach to establishing the validity of a large-scale assessment. To avoid the pitfalls of the annual national assessments, it is recommended that this multidimensional item and differential item functioning techniques are utilised for the development and evaluation of future national assessment instruments in South Africa. , Thesis (MSc) -- Faculty of Science, Statistics, 2023
Full Text:
Date Issued: 2023-03-29

Showing items 21 - 30 of 30