Synthetic data, real impact
: a framework for augmenting tabular datasets with synthetic data in machine learning

  • Jann Noah Bitzer (Student)

Student thesis: Master's Thesis

Abstract

Access to high-quality data is an ever-occurring challenge in machine learning due to scarcity, cost, privacy constraints, and biases. While synthetic data has gained traction in large-scale AI applications to overcome these challenges, its practical implementation for small to mid-size businesses remains underexplored. This study bridges this gap by developing a structured and universal framework to integrate synthetic data augmentation into various machine learning processes. The approach systematically assesses augmentation ratios, selective filtering strategies, and their impact on predictive performance. This research provides a scalable and actionable framework for businesses to use synthetic data, offering practical guidance on augmentation strategies and performance evaluation. By addressing technical and ethical considerations, this study advances the adoption of synthetic data as a transformative tool for data-driven decision-making in business environments.
Date of Award8 May 2025
Original languageEnglish
Awarding Institution
  • Universidade Católica Portuguesa
SupervisorPedro Afonso Fernandes (Supervisor)

Keywords

  • Synthetic data
  • Machine learning
  • Data augmentation
  • Workflow automation
  • Business analytics
  • Predictive modeling
  • AI implementation

Designation

  • Mestrado em Análise de Dados para Gestão

Cite this

'