This thesis aims to analyze the bike sharing system in Chicago and apply predictive models that accurately predict the hourly demand for shared bicycles by using timerelated and weatherrelated features. The dependent variable is Count, expressing the sum of the number of bicycles used per hour. Predictive models that are used for this regression problem are Linear Regression, Random Forest, Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting, and MultiLayer Perceptron. Accuracies of these predictive models are measured by R2_score, Root Mean Square Error and Mean Absolute Error. For better predictions, different hyperparameters are used inpredictive models.Without hyperparameters, Random Forest achieves the best accuracy measures. However, after using hyperparameters, Gradient Boosting predicts the most accurate results. The accuracy of Gradient Boosting boosts with hyperparameters, whereas Random Forest is almost unaffected by them for this regression problem. The secondbest model when using hyperparameters is Extreme Gradient Boosting. The neural network model, MultiLayer Perceptron presents less accurate results than the Random Forest and the Boosting models for this type of problem.Features that are most important for predictive models to forecast accurately were Temperature, Hour, Weekend, Pressure, Uv_Index, and Day.
Date of Award | 3 Feb 2023 |
---|
Original language | English |
---|
Awarding Institution | - Universidade Católica Portuguesa
|
---|
Supervisor | Nicolò Bertani (Supervisor) |
---|
- Demand forecasting
- Shared bikes
- Weather data
- Machine learning
- Predictive models
- Mestrado em Análise de Dados para Gestão
Predicting hourly demand for shared bicycles with weather data and machine learning models
Peja, D. (Student). 3 Feb 2023
Student thesis: Master's Thesis