An enhanced hybrid model for credit scoring using machine learning approach

Mugumya, Micheal

View/Open

Michael Mugumya_SCI_MSC_IS _2024_Brian Kasozi.pdf (19.12Mb)

Date

2024-09

Author

Mugumya, Micheal

Metadata

Show full item record

Abstract

This study investigates the application of a hybrid machine learning model for classifying loan statuses, combining logistic regression and decision tree classifiers. The dataset used comprises various loan records, with the target variable being the loan status, which is dichotomous in nature. The primary objective of this research is to develop a robust predictive model that can accurately determine the likelihood of a loan default. The analysis began with data preprocessing, including the handling of missing values and encoding of categorical variables. The dataset was then divided into training and testing sets to evaluate model performance. Two individual models’ logistic regression and decision tree were initialized with class weighting to address potential class imbalances. These models were combined using a soft voting classifier to form a hybrid model, leveraging the strengths of both algorithms. The hybrid model was trained and tested, with its performance evaluated using key metrics such as accuracy, precision, recall, F1-score, and confusion matrix. The results indicated that the model achieved a reasonable level of accuracy, particularly in predicting non-default loans (class 0), as evidenced by a high number of true negatives. However, the model's performance in predicting default loans (class 1) was less satisfactory, with a notable number of false negatives, suggesting a need for further refinement. Visualizations, including the confusion matrix and bar plots of evaluation metrics, provided deeper insights into the model's predictive capabilities and highlighted areas where the model could be improved. These findings underscore the complexity of loan status prediction and the challenges associated with imbalanced datasets. Overall, this study demonstrates the potential of hybrid machine learning models in financial risk prediction, while also identifying critical areas for future research and model enhancement. The implications of this research extend to financial institutions seeking to improve their risk management practices and enhance the accuracy of their loan approval processes.

URI

http://dissertations.umu.ac.ug/xmlui/handle/123456789/1789

Collections

Master of Science in Information Systems (Dissertations) [46]