Access to high-quality data is an ever-occurring challenge in machine learning due to scarcity, cost, privacy constraints, and biases. While synthetic data has gained traction in large-scale AI applications to overcome these challenges, its practical implementation for small to mid-size businesses remains underexplored. This study bridges this gap by developing a structured and universal framework to integrate synthetic data augmentation into various machine learning processes. The approach systematically assesses augmentation ratios, selective filtering strategies, and their impact on predictive performance. This research provides a scalable and actionable framework for businesses to use synthetic data, offering practical guidance on augmentation strategies and performance evaluation. By addressing technical and ethical considerations, this study advances the adoption of synthetic data as a transformative tool for data-driven decision-making in business environments.
| Date of Award | 8 May 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - Universidade Católica Portuguesa
|
|---|
| Supervisor | Pedro Afonso Fernandes (Supervisor) |
|---|
- Synthetic data
- Machine learning
- Data augmentation
- Workflow automation
- Business analytics
- Predictive modeling
- AI implementation
- Mestrado em Análise de Dados para Gestão
Synthetic data, real impact: a framework for augmenting tabular datasets with synthetic data in machine learning
Bitzer, J. N. (Student). 8 May 2025
Student thesis: Master's Thesis