Robust Financial Forecasting using ML

Author | Gungor Ozer

“Planning is bringing the future into the present so that you can do something about it now.”

Alan Lakein

A recent headline caught our attention (May’19): GAP reported a significant earning miss in its first quarter results and slashed its annual profit outlook. This resulted in a 15% drop in its stock price on that day, with price hovering close to its lowest level since mid-2016. The CEO blamed external factors like cold wettest quarters in recent memory resulting in lower in-store visits by customers. It appeared to be a surprise to the company as apparently the forecast models lacked to take into account such factors.

Robust financial series forecasting (revenue, expenses, profits as examples) is a bed-rock of sound financial planning as methodology error, process or data gaps can have significant shareholder impact and financial pain.

Key objectives for robust financial forecasting are:

Improve forecast accuracy using business & macro levers: For financial forecast to be accurate, the business needs to factor in time series data for key business levers (the proverbial “weather impact” in the GAP example), seasonality, competitive actions, key macroeconomic data etc. besides just the nominal historical trends. So, identifying the right levers that drive accurate forecasts & building robust time series models is critical.

Develop scenarios versus point forecasts: It’s not enough to create “point estimates” of time series, its important to be able to flex the ML models for different scenarios, like “worst case”, “middle-of-road” and “best-case” scenarios and then create probability weighted forecasts. Communication of this with internal and external stakeholders is also critical to manage expectations.

In the context of financial services, planners are forecasting revenue and expense streams. Some of the expense streams are more easily forecast such as headcount-based payroll expenses or facility driven op-ex. On the other hand, critical elements of revenue and loss forecasting are lot more involved.

To exemplify, Forecasting revenue requires projection of accounts acquisition that can be influenced by banks promotions/incentives, pricing, retail footprint and campaigns. On similar lines, projected losses depend on origination vintages in the current portfolio, composition of accounts, their credit quality and macro -economic conditions which can impact loss rates differently in various credit quality bands.

With all these different levers at play, improving accuracy of forecasts is a paramount challenge at financial institutions. To improve accuracy of forecasts and develop robust “range” estimates, ML can be leveraged effectively using state of art models like auto-regressive methods (example, SARIMAX), Recurrent Neural-Net algorithms (example LSTM) etc.

These models do particularly well with business/macro levers but require experience in setting up the train/validation time series windows, transformation & scaling of business & macro levers, feature selection process & model generalization to build robust models. We, at Scienaptic, have developed unique IP in this area based on its experience of working with multiple retail banks, supporting their loss/revenue forecasting in risk & finance.

Let’s illustrate these concepts with a case study on a forecasting revolving consumer credit based on public data available from the Feds.

Forecasting Revolving Consumer Credit (RCC)

  • Business Objective We have forecasted revolving consumer credit (RCC) in the US and stress-tested our forecasts against various macro-economic scenarios. RCC is a collection of various non-installment consumer loans such as credit card debt or overdraft charges and forecasting models need to leverage macroeconomic variables such as GDP, unemployment (UE) and interest rate (IR). We have developed models that are accurate as well as forecast response to various macro-economic scenarios.

  • Forecasting Method Auto-regressive / Moving Average (ARMA) methods perform quite well in identifying the inherent structure of the time series data and fit on a regression equation. We utilized SARIMAX model, which has seasonal (S), autoregressive (AR), integrated (I), moving average (MA), and exogenous (X) components. We opted to use these added components (S, I, X) for obvious reasons; for example, Seasonality (S) is very important in the revolving debt (as can be visually seen in the plot).

  • Feature Selection ARMA models may suffer accuracy issues due to randomness of the endogenous time series (the RCC series). To improve accuracy, we developed an exogenous model where these features are independent variables influenced by the output series (the RCC series). In general, a good exogenous feature will be highly correlated to the dependent variable (DV). To avoid over-fitting (low bias and high variance issue), Scienaptic has developed unique IP for feature selection called CCC - Correlation, Clustering, and Causality. This feature selection techniques can be tailored for the unique business problem to identify best features. The technique also considers the lagged effects of exogenous features to drive better prediction. In the case of RCC, our automated feature selection algorithm identified three macro variables including inflation, short-term interest rates, and bankruptcies. Bankruptcy was of interest because it displayed long-lagged relationship to RCC. Inflation and interest rates had short-term impact on RCC, ranging from 1-to-5 months.

  • Cross Validation A good time series model should also be stable across different training and test data. We have developed a systematic process for selection of the train/test window called a Walk Forward Cross – Validation (WF-CV). See the illustration of this concept below.

Lift in Financial Time Series Models by Exogenous Feature Selection

To measure “lift” of our unique approach, we compared the model performance against a univariate SARIMA-only model with no exogenous predictors. Scienaptic approach did a better job in predicting the holdout data with a Root Mean Square Error (RMSE) of 0.107 compared to 0.245 for the baseline SARIMA model. The model was also stable with lower deviation between training and test RMSEs.

Macro-Scenario Assessment

One big advantage of using external variables is the ability to stress test the forecasts against certain macro-economic scenarios. Holding everything else constant:

  • A 20% increase/decrease in inflation led to a significant jump/drop in total revolving consumer credit.

  • Bankruptcies showed less sensitivity to the same 20% scenario. One interesting note is that, an increase in bankruptcies lead to drop in the RCC. We can attribute to the fact that revolving credit of bankrupt consumers are written off by the creditors, therefore reducing the total RCC.

See sensitivity charts below for Inflation & Bankruptcies below:

Concluding Remarks

Time series forecasting is a specific branch of Machine Learning that can be used in modeling any data that progress in time. Advance time models utilize both endogenous features (trend, seasonality) as well as exogenous features. When correctly identified, external predictors improve time series models significantly and add stress-testing capabilities to the developed forecasts models. At Scienaptic, we develop sophisticated time series models with fully automated feature selection & model generalization modules to drive rapid impact for clients.

Search words:

accurate loss forecasting, accurate revenue forecasting, cecl forecasting platform, finance forecast, finance forecast ai, finance forecast ml, financial forecasting, Financial forecasting ai, financial forecasting ml, GBM sarimax apc glm nn, financial loss revenue forecasting portfolio, GBM sarimax apc, loss forecast model, loss forecasting, loss forecasting ai, neural networks forecasting, NN loss revenue forecasting, revenue forecast model, revenue forecasting, revenue forecasting ml, time series for financial forecasting, ts time series model for loss revenue forecasting ai ml artificial intelligence