Author | Vinay Bhaskar
Just think about it for a sec.... the question is non-trivial!
As a business leader who wants to leverage analytics in a new initiative, this question always come up bluntly or subtly. The dilemma here is:
1. Do I have enough scenarios of patterns already identified from my business knowledge or processes which I can amplify using machine learning? OR
2. Do I have a broad definition of the problem (which is real) but very few scenarios concretely available to drive analytical solutions
In case of “1”, supervised methods are applicable, in case of “2” you will need to use unsupervised methods to understand patterns and develop decisioning based on it.
So what’s the big deal? Let’s try and explain this with an example. Let’s say I am looking for fraud patterns from traditional channels (in-store purchases) and digital channels (like mobile) for a retailer.
Discovering versus Learning Patterns for Digital Fraud
Let’s say the retailer recently (6 months) launched the digital channel for selling pet food. The retailer doesn't have a lot of experience around fraud in this channel.
a. Since digital fraud scenarios are new I might use clustering or outlier detection techniques to understand patterns of customer purchases which might be considered outliers. I might also use time series event modelling like Markov Chains or Recurrent Neural Nets to understand customer behavior temporally to see anomalous behavior. The issue is I don’t know if outliers identified are really outliers?
b. The analytics team then needs to go back to the business SME’s (domain experts) and ask them to manually verify and “tag” these outliers for them, which is an unexpected additional burden especially if the volumes to analyze are large. Why is this important?
c. The reason tagging becomes important is that business is nervous to put such systems into production where the risk of False Positive is high and its adverse impact on customer experience.
d. Essentially the business & analytics teams are flying blind and have to make a “leap of faith” that some mouse-trap is better than none! The unsupervised approach then needs constant refinement and re-learning based on fraud data being collected post deployment to make it more effective in capturing fraud universe (sensitivity of model) and quality of detection (precision of the model)
a. Challenge the business team and the analytics team to see if you can break the problem down into a series of narrow footprint analytical problems for which you have a reasonable understanding of fraud behavior (even if it is by proxy). You might not catch all the fraud, but its better to machine learn from patterns in existing data versus trying to discover them. In our case example, there might be some cross-over fraud patterns from the in-store world (like payment fraud or item return fraud) which might be applicable to digital channels. Build supervised models to capture this behavior and get immediate business impact. Manage false positive carefully through descriptive analysis of non-fraud and build business rules which overlay on top of model scores to reduce False Positives with minimal impact to fraud detection.
b. Go out and collect data from controlled experiments and observe/analyze fraud behavior.Yes that means you might need to wait for 3-4 months till some patterns start to emerge, but that might help create a better mouse-trap downstream.
c. See if you can purchase external data at point of digital purchase (for example ID Vision from TransUnion provides a device risk score) to augment your feature set for prediction.
Broadly speaking I see the unsupervised approach as being "transient" in nature, you will eventually migrate to a supervised approach once you have sufficient data which is tagged and you understand fraud patterns well. We have also built semi-supervised models which sequence clustering with supervised models to drive higher detection rate and lower False Positives.
At Scienaptic we are working closely with clients and helping them navigate such issue for delivering real business impact.