- Feb 28, 2020
- 6 min read

Bridging the Data Gap

Scienaptic Research Group

The credit industry and the needs of borrowers have evolved rapidly in the past decade. One of the main challenges that lenders face is an accurate risk assessment of individuals and firms who have limited or no credit records with the National Credit Rating Agencies (NCRAs). Such “credit invisible" and “thin-file” consumers constitute nearly 20% (45 million) of the entire US adult population. In addition, there are several “credit deserts” in the US, which are geographic areas with little access to traditional sources of credit. These unbanked and underbanked sections remain largely untapped by banks, thanks to the inadequacy of traditional credit data.

The Data Gap in Traditional Credit Underwriting

When deciding whether to approve a loan application, banks and other lenders typically use FICO or VantageScore scores that are derived from NCRA records. They may use statistical credit scoring methods with defined parameters such as time in the business, profitability, disposable income, credit ratings, and so on. But there is limited integration with data from alternative data providers. So traditional credit underwriting tools end up following a linear, one-dimensional process using a limited data set.

Very often, the data gets captured into multiple systems that stay unintegrated. All this makes getting a holistic portfolio of borrowers very difficult. Their fate depends on credit scores, the absence of which leads to their loan applications getting rejected.

Alternative Data can Bridge the Data Gap to Access the Unbanked and Underbanked

The key to reaching this underserved market lies in assessing their creditworthiness using non-traditional or alternative data along with the existing traditional credit data.

Alternative credit data is no longer an unfamiliar topic. Interestingly, in late 2019, five US financial agencies issued a joint statement stating that they “recognize alternative data’s potential to expand access to credit and produce benefits for consumers.”

Let us go deeper into the sources of alternative data and how to bridge this data gap.

Types of Alternative Credit Data

There are plenty of static and dynamic alternative or non-traditional data sources that banks can use to avoid risk.

All this data is indirectly related to creditworthiness and can be used as indicators to a borrower’s income, financial skills, consumption profile, and social capital. This will give banks and financial institutions a well-rounded perspective about an applicant.

Identifying and Collecting Alternative Credit Data

It is important to pick the right sources of alternative credit data. An alternative data partner needs to be experienced, reliable and knowledgeable. According to management consulting firm Oliver Wyman, a good source of data should have good coverage, provide specific information, be accurate and updated, and must comply with consumer credit regulations such as the Fair Credit Reporting Act, the Equal Credit Opportunity Act, and the Gramm-Leach-Bliley Act.

Some of this data can be collected directly from borrowers who are increasingly embracing this push for alternative data. According to a recent study Experian, 70% of Americans are ready to share more financial data for fairer credit decisions.

Partnering with data aggregators, and using a data aggregation platform is another smart way to collect alternative data. Scienaptic recommends the use of definitive sources of alternative data such as LexisNexis for banks and financial institutions. Lenders can partner with FinTech companies who will source, aggregate, and analyze the data.

Challenges for Banks in Using Alternative Data

While alternative data is transforming credit underwriting processes, there are some inherent challenges and risks that need to be addressed.

Systematic data flow: Banks need to collect huge amounts of unstructured data systematically into their systems, and use it for credit decisioning.

Data security and privacy: Calls, text data, and social media data are private data. Third-parties can possibly store, analyze, or sell this data at any time. Data security breaches can lead to leakage of sensitive customer information.

Transparency: Credit scoring methods used by many companies are protected trade secrets with little transparency. This makes it tough to determine whether they manipulate data or exploit incorrect data.

Data quality or data gaming: Answers to questionnaires and psychometric tests, and social media data can easily be faked.

Complying with credit laws and regulations: Banks must have an internal governance team to ensure compliance with fair lending requirements. Their credit underwriting models have to be explainable and FCRA-compliant with adverse action reasoning. Else, they need to partner with a third-party firm that has alternative data expertise in credit laws and regulations.

Developing/testing risk models: Lenders not only need to develop and test predictive models, but should also glean powerful insights to make better lending decisions. They should partner with a technology provider who has deep experience in building alternative data models that are empirically derived and are consistent over time.

Clearly, there is an urgent need for sharper tools in credit underwriting. Lenders, data aggregators, and fintech companies should collaborate to develop alternative credit scoring methods that can learn from every interaction.

Representative view of a shaper credit decisioning system that effectively leverages alternate data

Using Al-based Credit Scoring with Alternative Data

AI, and Machine Learning (ML)- powered credit scoring that uses alternative data is the answer to lowering credit risk and increasing loan approvals to the “credit invisible” section. With this technology, banks can better identify qualified prospects, flag high-risk prospects, and offer a more complete credit risk assessment.

How does an AI-powered Credit Underwriting System Work?

AI-powered credit underwriting using alternative data sources typically goes through a streamlined process:

Collect raw data from traditional and alternative sources: This can be done using APIs to ensure automated data flows and synchronization so that only the most updated data is available at any time.
Transform unstructured alternative data into an AI-ready form: Data needs to be normalized, cleaned, and catalogued before being AI-ready. Even seemingly structured data such as company names needs to be identified and normalized before any further downstream usage. E.g. XYZ Tech could be stored as XYZ Tech Limited, XYZ Tech Inc or XYZTech in various data sources. Similarly, unstructured data such as images, fingerprints, sensor data, and GPS data need to be manipulated appropriately. News websites need to be whitelisted or blacklisted according to their credibility. Credit scoring models should be trained to remove old content, error pages, paywalls, etc. and identify a fake news story about a borrower from a genuine one.

Analytics: Algorithms that analyze credit data usually identify input variables related to the highest predictive potential of credit risk, and assign weights to each of them. All weighted variables must be fed into an explainable AI model to generate a credit score. This model will identify predicting features and be able to pinpoint the top three reasons for credit rejection. The ideal analytics solution needs to use effective risk segmentation schema based on industry segments, obligator types, collateral type and value, guarantees, and lien position/sensitivity of claim.

Machine Learning: ML will ensure that whenever new data is collected, the algorithms automatically change the significant input variables and their weights, auto-adjusting the model. Continuous learning and adjustment will improve the accuracy of the final score, and will eventually improve their predictive power.

Platform: The credit underwriting platform must have the ability to run and manage multiple concurrent strategies and multiple handoffs. It should also be able to integrate decisioning systems into LMS and LOS workflows. The system should also have batch and real-time decisioning capabilities that integrate seamlessly into the bank’s existing lending workflows.

Such an environment will accelerate the deployment of new-age credit decisioning without disrupting existing processes and technologies.

How Can Alternative Data be Used in such AI-Powered Underwriting Systems?

Some datasets are more predictive than others for different types of financial products. In case of missing information, various data sources can be triangulated.

Let us see how alternative data can be used for three typical lending products.

Personal loans for individuals: The combination of loan, rent and utility repayment history, address, and property ownership data will paint a more accurate picture of a borrower’s ability and willingness to repay loans. This will help lenders monitor signs of financial distress. A wrong or bad address can be fixed by triangulating the addresses found on social media, GPS data, and other websites.

Credit cards for Individuals: Financial transactions, interactions with the call center, and other application data can help to create a holistic knowledge base on a credit card applicant.
Working capital loans to businesses: The number of invoices raised per month, shipping patterns, etc. can be unexpected but reliable indicators of enterprise credit risk. Leadership history of small businesses can be verified by using data from Glassdoor, news websites and other sources such as Dun & Bradstreet. Details on fundraising, new product launches, and executive churn can be triangulated to indicate a company’s plans for sale or M&A, or even bankruptcy.

Advantages of employing an AI-based platform in the scoring process

A comparison between AI-powered systems using traditional and alternative data and legacy systems that use traditional data will explain the advantages of the former.

This shows that AI-powered credit underwriting solutions can be an easy win for banks while providing the agility and seamless customer experience that borrowers look for.

AI, ML-Based Credit Scoring Platforms are the Way Forward for sharper credit decisions

With AI-powered credit underwriting using alternative data, banks can now uncover and serve the huge untapped opportunity – the unbanked and underbanked segments. The lower costs and increased competitiveness achieved will make this an attractive segment for banks.

The low-risk, accurate credit decisioning for credit invisible customers made possible by AI-powered credit underwriting platforms signals a disruptive era in credit scoring and promises to be the future of financial inclusion.