Explainable Neural Networks for Credit Risk Modeling

Authors | Subbu Venkataramanan, Gungor Ozer

Understanding and Interpreting Credit Risk Models

Credit scores are designed to measure how likely it is that the borrower will repay the loan in full. There are many credit scores that model the probability of loan default as a function of borrower features like length of credit history, credit utilization, repayment history, inquiries etc. For example, although the details are not fully disclosed, a leading Credit Score uses a combination of:

  • On-time payment history

  • Capacity-in-use of total credit line

  • Length of credit history

  • Types of credit used

  • Recent “new” credit requests

to quantize probability of repay / default. So, when an “adverse” credit decision like rejecting the application is made, it can clearly be attributed to reasons like “Revolving balance to credit line too high”, “Serious delinquency in the last 5 years” etc. Such explainability is crucial because Federal regulations require financial institutions to provide “adverse action reasons” to rejected borrowers. It is not enough to say that the “credit score is too low”. From the customer (borrower) perspective the reasons help them understand what actions they would need to take to improve their credit score and, therefore, the chance of loan acceptance.

Most scoring systems, although useful, can lead to false conclusions and denial of credit to many credit-worthy applicants. This is because, these scoring systems use linear/less-complex “shallow” models like logistic regression, which perform well enough when the amount of data and number of parameters / predictors (exogenous variables) is small. However, as data and number of predictors grows larger, the accuracy of linear models is reduced.

Considering that the amount (tradeline level loan features, timing, sequence and repayment history) and variety (alternate data sources) of data that are available on borrowers are increasing dramatically, credit risk modelers need more sophisticated “deep” ML algorithms that can learn more complex, non-linear functions. Neural network models, for example, have been shown to have higher accuracy when the data is “big”.

The paradox here is that most of the complex ML algorithms utilize non-linear relationships that are not readily explainable. For many sectors and industries, including the financial world, models need to be explainable for them to adopt into their day-to-day operations per obeying government regulations and satisfying customer expectations.

Enter Deep Learning

Today, even personal computers are equipped with processing powers that can store/handle gigabytes of data. Technologically speaking, we can efficiently process terabytes of data quite efficiently by clusters of processors (HPCs), thanks to distributed computing and fast algorithms. One such algorithm is neural networks, now called “Deep Learning” especially when there are several hidden layers. The method describes very complex non-linear relationships between large number of features (tradeline level features, alternate data features etc.) and the outcome (probability of paying back).

Neural networks are widely used in image / sound / speech / text processing and produce incredible results for various classification and regression problems. To elaborate in their current applications, below is a limited list of different neural networks and example modeling problems they are used for:

Neural Networks (NN) for Credit Risk Modeling

Although very useful in classification models with high accuracy and extremely successful in capturing complex, non-linear relationships between huge number of variables and an outcome, NNs do not produce straightforward model coefficients. This gives the impression that they are black box models but that is not strictly true. They are explainable to some degree through sensitivity analysis. However, so far, these models have eluded fully regulatory compliant explainability that includes adverse action reasoning. This in turn makes the regular/unmodified neural network models unadoptable for credit underwriting. For this reason, despite lower accuracy, simpler “shallow” models like logistic regression are still preferred by financial institutions.

Explainable Neural Networks from Scienaptic

At Scienaptic Systems, we have developed a novel NN algorithm that would improve prediction accuracy while overcoming explainability limitations of regular neural network models. This breakthrough algorithm can be applied to any supervised learning problem, such as credit risk modeling for financial institutions.

Scienaptic’s new explainable NN algorithm proposes an efficient way for discretization of numerical inputs that is independent of numerical range towards creating an explainable neural network. The method is automated by nature and therefore, also reduces the need for feature engineering, which saves significant labor and computing resource for manual intervention. Leveraging the optimized TensorFlow package, it is applicable to large data sets and brings, for banks and other lending institutions, the power of highly predictive and explainable, deep learning models that would drive up approval rates and lower loss rates.

Basically, the novel algorithm maps all exogenous variables to optimized buckets and to the final prediction. Through this connection, the modeled relationship is explainable in a straightforward manner. For example, a reason such as “Too many revolving trade inquiries in past three months” can be readily provided because the number of such inquiries may fall into a bucket, say 5 to 8 that has ten times the default rate compared to average. Note that, the optimizations / transformations require minimal external

input, meaning very little – if any – manual intervention is needed.

The algorithm also provides score ranges like FICO. And, the resulting model is as conclusive for each of the added metrics like FICO bucket conclusions, such as EXCEPTIONAL (FICO = 800+) borrowers have only 1% of default rate or FAIR (579 < FICO < 670) are likely to default 27 times out of 100.

In this sense, Scienaptic’s innovative algorithm is a leap forward for much needed explainability of sophisticated neural networks for credit underwriting and risk modeling. Such explainability – combined with high accuracy of deep learning – will be of great value to credit risk models for it would help addressing federal regulations and consumer expectations.  Successful implementations will bring higher approval rate for qualified borrowers and lower risk for financial institutions: WIN-WIN!