Author | Subbu Venkataramanan
When a customer applies for a credit card, three decisions need to be taken:
1. Approve or reject the application
2. Decide how much credit line to assign if approved
3. How much APR to charge
Key driver of approve/reject decision is the credit worthiness of the applicant as measured by risk score(s). The price (APR) is pretty much determined by the market for the risk level. Therefore, it is in credit line assignment that banks have significant room for delighting the customer in providing an amount that is appropriate for her credit needs while also optimizing for bank’s risk adjusted revenue. In other words, win-win for both the customer and the lender.
There are three important considerations for architecting a credit line assignment algorithm:
1 Loss exposure
Expected monetary loss for the line assigned
2 Expected revenue
Line drives the customer’s ability to spend for interchange revenue and the revolving balance for interest revenue
3 Customer’s credit need and expectation
Highly credit worthy customers, in general, would not use the card if given too low a line, but may be more accepting for a store card. In a balance transfer scenario, line may need to cover at least the balance transfer amount.
Optimizing credit line assignment, thus requires sophisticated modelling and decision science processes and algorithms that leverage diverse set of data across channels, past and existing relationships, random line tests and performance, Credit Bureau data and alternate data.
The optimization takes the form of max(π) where π = NPV(REVt – ECLt). Doing this for, say, the credit card application limit assignment decision requires a continuous function that maps the assigned limit onto expected loss adjusted revenue. The question being asked is, has AI evolved to the point where it can determine the REV function? Yes and here is how:
The Objective: Optimize portfolio objectives (maximize profits, increase sales, build revolving balance, control loss rates)
management constraints (total incremental exposure, hurdle rates) by scanning across customer’s relationship (on us/off us performance), analysing key dimensions of channel affinity, risk and spend.
THE DRIVERS OF A CLI DECISION
Linear models like regression do not predict revenue drivers of spend and finance charges well:
Low R-Squared; predicted spend and finance charges do not flatten out with increasing credit limit leading to corner solutions
More importantly, significant human expertise needed to segment data for any useful revenue prediction
Scienaptic’s Ether.Underwrite product uses advanced machine learning and AI
Captures non-linear relationships; High predictive accuracy
Leverages far more number of features without sacrificing robustness
Human expertise driven segmentation unnecessary due to natural tree ensemble structure; Several microsegments instead of few coarse segments
Ether’s proprietary methodology to impose monotonicity constraints for regulatory compliance
Feedback of performance data for auto-refit with little human intervention
Ether optimizes line assignment by factoring digital affinity across channels.
Line assigned where the expected value
[E(FC) + E(Spend) – E(Loss)] maximizes
While credit limit is mathematically real valued, in practice the optimization is done over finite number of values For example, $500 to $2500 in increments of $100
Even so, computationally heavy at real time; Ether solves for this and returns decision on live application within millisecond
It clearly demonstrates that AI can substitute for Logistic / REG of all the cash flow components; however, it does not shed light on the process for uncovering the primary effects function that maps a prospective limit onto activation, revolving balances, interchange, etc. These are an essential input to the profit maximization process.
At Scienaptic, the models for activation, expected spend, expected revolving balance etc. are built using past data with actual limit that was assigned as being one of the independent variables along with other raw bureau and application variables. This provides the function mapping from limit assigned to the various revenue components. When these models are scored on a live application, the value for limit variable in these models is supplied by a loop iterating from lower bound to upper bound in some increment and the limit value chosen is one that maximizes the objective function. Potential drawback to this approach is that, historically, the bank would not have done a true random line assignment test. However, usually there is still sufficient variation within some bounds for the mapping to work and over time this can be improved. Also, other variables in the models would include credit limits elsewhere as seen in the bureau tradeline level data and these provide additional signals on how far our assigned limit is from what the customer got from other banks.
It does provide with the function that maps an assigned limit to the cash flow components such as revolving balance, interchange, etc.
A secondary question is whether our product manages local and global credit risk constraints.
The constraints in our optimization is currently risk policy level constraints such as, for a customer in Fico score band 690 - 720 and income between $1000 and $2000 per month the assigned limit should be greater than $300 but less than $1500. Global risk weighted exposure constraints can be set at development and results tested out with these policy constraints. If these are satisfactory then the optimization is rolled out so long as through the door population is largely stable. The policy constraints can be easily changed within Ether (literally overnight) if monitoring throws up disproportionate numbers, very low limits or very high limits or any other metric.
Now that question is assessing the ability of AI driven software to capture some non-trivial data facts that are familiar to risk managers and which are likely to have been specified for Logistic and automatically learn the important predictive features without having to hand code them.
Strategically, it would be valuable to develop clear prescriptions regarding the forms of information value best left to the AI versus the forms that will depend on a modeller’s data transformation. One such possible test now is whether to externally compute a debt ratio or to let the AI figure out what to do.
At Scienaptic, we have solved the problem of "representation learning" so that our neural network can automatically learn the important predictive features without having to hand code them, especially so for credit risk domain. In other words, we are more interested in proving deep learning (or ensemble with other methods) can indeed be the master algorithm that can learn and use features like debt to total income ratio to predict credit risk. We have had success with raw feature level transformations and choice of activation functions. What we have currently, is that we can run a suite of algorithms including GBM that can-do way better than logistic regression but is still dependent on the features that are created either manually or automatically. Our models are also non-black boxy and adverse actions can be issued out of them.
Our product also leverages thousands of features that we have authored from our experience which we collectively call customer consciousness. But, we are working on an algorithm that would do away with the need for any handmade feature. Ratios are the hardest to learn - even simple ones like loan amount to income or loan amount to total household income. We have had success around different activation functions in neural nets that can potentially surmount this difficulty. Ensemble of models especially pairing neural nets with random forests has worked.
We believe in harnessing the power of human-machine synergy in order to deliver better results than either humans or AI alone. It appears that our beliefs about AI versus human performance on traditional credit risk models is mostly compatible