An enhanced hybrid model for credit scoring using machine learning approach
Abstract
This study investigates the application of a hybrid machine learning model for classifying loan
statuses, combining logistic regression and decision tree classifiers. The dataset used comprises
various loan records, with the target variable being the loan status, which is dichotomous in
nature. The primary objective of this research is to develop a robust predictive model that can
accurately determine the likelihood of a loan default.
The analysis began with data preprocessing, including the handling of missing values and
encoding of categorical variables. The dataset was then divided into training and testing sets to
evaluate model performance. Two individual models’ logistic regression and decision tree were
initialized with class weighting to address potential class imbalances. These models were
combined using a soft voting classifier to form a hybrid model, leveraging the strengths of both
algorithms.
The hybrid model was trained and tested, with its performance evaluated using key metrics such
as accuracy, precision, recall, F1-score, and confusion matrix. The results indicated that the
model achieved a reasonable level of accuracy, particularly in predicting non-default loans (class
0), as evidenced by a high number of true negatives. However, the model's performance in
predicting default loans (class 1) was less satisfactory, with a notable number of false negatives,
suggesting a need for further refinement.
Visualizations, including the confusion matrix and bar plots of evaluation metrics, provided
deeper insights into the model's predictive capabilities and highlighted areas where the model
could be improved. These findings underscore the complexity of loan status prediction and the
challenges associated with imbalanced datasets.
Overall, this study demonstrates the potential of hybrid machine learning models in financial risk
prediction, while also identifying critical areas for future research and model enhancement. The
implications of this research extend to financial institutions seeking to improve their risk
management practices and enhance the accuracy of their loan approval processes.


