This thesis presents a study on the issue of Automobile Insurance Fraud. The purpose of this study is to increase knowledge concerning fraudulent claims in the Portuguese market, while raising awareness to the use of Data Mining techniques towards this, and other similar problems. We conduct an application of data mining techniques to the problem of predicting automobile insurance fraud, shown to be of interest to insurance companies around the world. We present fraud definitions and conduct an overview of existing literature on the subject. Live policy and claim data from the Portuguese insurance market in 2005 is used to train a Logit Regression Model and a CHAID Classification and Regression Tree. The use of Data Mining tools and techniques enabled the identification of underlying fraud patterns, specific to the raw data used to build the models. The list of potential fraud indicators includes variables such as the policy’s tenure, the number of policy holders, not admitting fault in the accident or fractioning premium payments semiannually. Other variables such as the number of days between the accident and the patient filing the claim, the client’s age, and the geographical location of the accident were also found to be relevant in specific sub-populations of the used dataset. Model variables and coefficients are interpreted comparatively and key performance results are presented, including PCC, sensitivity, specificity and AUROC. Both the Logit Model and the CHAID C&R Tree achieve fair results in predicting automobile insurance fraud in the used dataset.
Date of Award | 17 Sept 2012 |
---|
Original language | English |
---|
Awarding Institution | - Universidade Católica Portuguesa
|
---|
Supervisor | José Filipe Rafael (Supervisor) |
---|
Using data mining to predict automobile insurance fraud
Vale, J. B. D. (Student). 17 Sept 2012
Student thesis: Master's Thesis