How can synthetic data generation techniques, enhance the lift accuracy of churn prediction models in imbalanced datasets from the telecommunications sector?

  • Jill Salii (Student)

Student thesis: Master's Thesis

Abstract

The thesis aimed to examine the efficiency of synthetic data in increasing the predictive abilities of churn prediction models within imbalanced datasets, particularly in the telecommunication industry. Given that imbalanced datasets were a significant obstacle in the telecommunication sector, the study assessed the impact of including synthetic data in addressing the imbalance. Various synthetic data generation methods, including SMOTENC, ADASYN, TVAE, and CTGAN, were applied to a real-world dataset to achieve this. The goal was to determine to what extent synthetic data could help overcome data imbalance and enhance the predictive capabilities of classification models. Although a significant improvement in the lift score was not achieved, valuable insights into the challenges that come with utilizing synthetically created data were gained. The research highlighted the importance of a consistent and transparent data-cleaning strategy and the need for customized approaches to synthetic data models. The limitations encountered during the study were also discussed, including the use of a limited number of synthetic data models and the dependency on the quality of synthetic data derived from the original data quality. Finally, the thesis offered valuable insights into future research and the practical application of common synthetic data methods on imbalanced real-world datasets in the telco industry.
Date of Award8 May 2024
Original languageEnglish
Awarding Institution
  • Universidade Católica Portuguesa
SupervisorNuno Filipe Loureiro Paiva (Supervisor)

Keywords

  • Synthetic data
  • Churn prediction
  • Imbalanced datasets
  • Telecommunication industry
  • SMOTENC
  • ADASYN
  • TVAE
  • CTGAN
  • Lift score
  • Data quality

Designation

  • Mestrado em Análise de Dados para Gestão

Cite this

'