Philosophiae Doctor - PhD (Statistics and Population Studies)
Permanent URI for this collection
Browse
Browsing by Author "Kotze, Danelle"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Analysis and estimation of customer survival Time in subscription-based businesses(University of the Western Cape, 2008) Mohammed, Zakariya Mohammed Salih; Kotze, Danelle; Maritz, Johannes Stefan; Dept. of Statistics; Faculty of ScienceSubscription-based industries have seen a massive expansion in recent decades. In this type of industry the customer has to subscribe to be able to enjoy the service; there-fore, well-de ned start and end points of the customer relationship with the service provider are known. The length of this relationship, that is the time from subscription to service cancellation, is de ned as customer survival time. Unlike transaction-based businesses, where the emphasis is on the quality of a product and customer acquisition, subscription-based businesses focus on the customer and customer retention. A customer focus requires a new approach: managing according to customer equity (the value of a rm's customers) rather than brand equity (the value of a rm's brands). The concept of customer equity is attractive and straightforward, but the implementation and management of the customer equity approach do present some challenges. Amongst these challenges is that customer asset metric - customer lifetime value (the present value of all future pro ts generated from a customer) - depends upon assumptions about the expected survival time of the customer (Bell et al., 2002; Gupta and Lehmann, 2003). In addition, managing and valuing customers as an asset require extensive data and complex modelling. The aim of this study is to illustrate, adapt and develop methods of survival analysis in analysing and estimating customer survival time in subscription-based businesses. Two particular objectives are studied. The fi rst objective is to rede ne the existing survival analysis techniques in business terms and to discuss their uses in order to understand various issues related to the customer-fi rm relationship. The lesson to be learnt here is the ability of survival analysis techniques to extract important information on customers with regard to their loyalties, risk of cancellation of the service, and lifetime value. The ultimate outcome of this process of studying customer survival time will be to understand the dynamics and behaviour of customers with respect to their risk of cancellation, survival probability and lifetime value. The results of the estimates of customer mean survival time obtained from different nonparametric and parametric approaches; namely, the Kaplan-Meier method as well as exponential, Weibull and gamma regression models were found to vary greatly showing the importance of the assumption imposed on the distribution of the survival time. The second objective is to extrapolate the customer survival curve beyond the empirical distribution. The practical motivation for extrapolating the survival curve beyond the empirical distribution originates from two issues; that of calculating survival probabilities (retention rate) beyond the empirical data and of calculating the conditional survival probability and conditional mean survival time at a speci c point in time and for a speci c time window in the future. The survival probabilties are the main components needed to calculate customer lifetime value and thereafter customer equity. In this regard, we propose a survivor function that can be used to extrapolate the survival probabilities beyond the last observed failure time; the estimation of parameters of the newly proposed extrapolation function is based completely on the Kaplan-Meier estimate of the survival probabilities. The proposed function has shown a good mathematical accuracy. Furthermore, the standard error of the estimate of the extrapolation survival function has been derived. The function is ready to be used by business managers where the objective is to enhance customer retention and to emphasise a customer-centric approach. The extrapolation function can be applied and used beyond the customer survival time data to cover clinical trial applications. In general the survival analysis techniques were found to be valuable in understanding and managing a customer- rm relationship; yet, much still needs to be done in this area of research to make these techniques that are traditionally used in medical studies more useful and applicable in business settings.Item A framework for evaluating an introductory statistics programme at the University of the Western Cape(University of the Western Cape, 2009) Makapela, Nomawabo; Kotze, Danelle; Dept. of Statistics; Faculty of ScienceThere have been calls both from the government and private sector for Higher Education institutions to introduce programmes that produce employable graduates whilst at the same time contributing to the growing economy of the country by addressing the skills shortage. Transformation and intervention committees have since been introduced to follow the extent to which the challenges are being addressed (DOE, 1996; 1997; Luescher and Symes, 2003; Forbes, 2007). Amongst the list of issues that needed urgent address were the skills shortage and underperformance of students particularly university entering students (Daniels, 2007; De Klerk, 2006; Cooper, 2001). Research particularly in the South African context, has revealed that contributing to the underperformance of university entering students and shortage of skills are: the legacy of apartheid (forcing certain racial groups to focus on selected areas such as teaching and nursing), the schooling system (resulting in university entering students to struggle), the home language and academic language. Barrell (1998), places stress on language as a contributing factor towards the performance of students. Although not much research has been done on skills shortage, most of the areas with skills shortage require Mathematics, either on a minimum or comprehensive scale. Students who have a strong Mathematics background have proved to perform better compared to students who have a limited or no Mathematics background at all in Grade 12 (Hahn, 1988; Conners, McCown & Roskos-Ewoldsen, 1998; Nolan, 2002).The department of Statistics offers an Introductory Statistics (IS) course at first year level. Resources available to enhance student learning include: a problem-solving component with web-based tutorials and students attending lectures three hours per week. The course material and all the necessary information regarding the course including teach yourself problems, useful web-sites and links students can make use of, are all stored under the Knowledge- Environment for Web-based learning (KEWL). Despite all the available information, the students were not performing well and they were not interested in the course. The department regards statistical numeracy as a life skill. The desire of the department is to break down the fear of Statistics and to bring about a perspective change in students' mindsets. The study was part of a contribution to ensuring that the department has the best first year students in Statistics in the Western Cape achieving a success rate comparable to the national norm.Item Imputation techniques for non-ordered categorical missing data(University of the Western Cape, 2016) Karangwa, Innocent; Kotze, Danelle; Blignaut, RenetteMissing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.Item Statistical modelling of clustered and incomplete data with applications in population health studies in developing countries(University of Western Cape, 2014) Adegboye, Oyelola Abdulwasiu; Kotze, DanelleThe United Nations (UN) Millennium Development Goals (MDGs) drafted eight goals to be achieved by the year 2015, namely: eradicating extreme poverty and hunger, achieving universal primary education, promoting gender equality and women empowerment, reducing child mortality, improving maternal health, combating HIV/AIDS, malaria and other diseases, ensuring environmental sustainability and lastly developing a global partnership for development. Many public health studies often result in complicated and complex data sets, the nature of these data sets could be clustered, multivariate, longitudinal, hierarchical, spatial, temporal or spatio-temporal. This often results in what is called correlated data, because the assumption of independence among observations may not be appropriate. The shared genetic traits in the studies of illness or shared household characteristics among family members in the studies of poverty are examples of correlated data. In cross-sectional studies, individuals may be nested within sub-clusters (e.g., families) that are nested within clusters (e.g., environment), thus causing correlation within clusters. Ignoring the structure of the data may result in asymptotically biased parameter estimates. Clustered data may also be a result of geographical location or time (spatial and temporal). A crucial step in modelling correlated data is the speci cation of the dependency by choosing the covariance/correlation function. However, often the choice for a particular application is unclear and diagnostic tests will have to be carried out, following tting of a model. This study's view of developing countries investigates the prospects of achieving MDGs through the development of flexible predictor statistical models. The first objective of this study is to explore the existing methods for modelling correlated data sets (hierarchical, multilevel and spatial) and then apply the methods in a novel way to several data sets addressing the underlying MDGs. One of the most challenging issue in spatial or spatio-temporal analysis is the choice of a valid and yet exible correlation (covariance) structure. In cases of high dimensionality of the data, where the number of spatial locations or time points that produced the observations is large, the analysis of such data presents great computational challenges. It is debatable whether some of the classical correlation structures adequately reect the dependency in the data. The second objective is to propose a new flexible technique for handling spatial, temporal and spatio-temporal correlations. The goal of this study is to resolve the dependencies problems by proposing a more robust method for modelling spatial correlation. The techniques are used for di erent correlation structures and then combined to form the resulting estimating equations using the platform of the Generalized Method of Moments. The proposed model will therefore be built on a foundation of the Generalized Estimating Equations; this has the advantage of producing consistent regression parameter estimates under mild conditions due to separation of the processes of estimating the regression parameters from the modelling of the correlation. These estimates of the regression parameters are consistent under mild conditions. Thirdly, to account for spatio-temporal correlation in data sets, a method that decouples the two sources of correlations is proposed. Speci cally, the spatial and temporal e ects were modelled separately and then combined optimally. The approach circumvents the need of inverting the full covariance matrix and simpli es the modelling of complex relationships such as anisotropy, which is known to be extremely di cult or Lastly, large public health data sets consist of a high degree of zero counts where it is very di cult to distinguish between "true zeros" and "imputed" zeros. This can be due to the reporting mechanism as a result of insecurity, technical and logistics issues. The focus is therefore on the implementation of a technique that is capable of handling such a problem. The study will make the assumption that "imputed" zeros are a random event and consider the option of discarding the zeros, and then model a conditional Poisson model, conditioning on all cases greater than 0.