Learning Fraud Anomalies, a Semi-Supervised Mouse Trap!

Author | Aditya Khandekar


While we are happily enjoying the Labor Day weekend, fraudsters are busy at work! I had a card breach this weekend for what looked like an innocuous car wash transaction. The catch? I am based in NY, my “context” is NY, the car wash transaction was in Atlanta GA! I must thank the bank’s fraud engine which caught this thread early before it got real ugly.


This got me thinking on writing a quick note on fraud detection, which has applicability across verticals like e-commerce, Insurance, Healthcare and Core Retail & Banking.



Have you ever thought of a blind person leading one with good vision in a complex maze? Sounds counter-intuitive, right? Well that’s a good analogy for building robust fraud engines using semi-supervised approaches.




Let’s dig deeper…


A lot of hard to catch fraud is “evolving” in nature with new patterns emerging constantly. It’s hard to train supervised models for such patterns. That results in:​


Either the fraud model missing the fraud (a false negative)

Catching the fraud by chance based on some other pattern it had been trained on earlier. This might result in tagging a lot of good transactions as fraud (a false positive) which is an operational nightmare & bad customer experience


So what can be done?


What if you stack an unsupervised anomaly detector on top of the trained fraud model? While this might seem like a simple view of the world, lets explain this “system” with an example…





…Let’s say you are based in NY and took an overnight flight for vacation trip to Paris. Suddenly the fraud model is throwing up false positives for your Starbucks purchase at Paris airport (yeah, we really like our Starbucks latte, so much for brand affinity!). What if the anomaly detector was maintaining a “state” information about you and had noted a magazine purchase at the JFK airport, a ride share transaction in Uber to JFK and some in-flight purchases. Suddenly that Paris transaction does not look like an outlier anymore!! State maintaining models like Hidden Markov Models or LSTM (Long Short Term Memory, a Neural Net based method that maintains memory) are exactly the kind of models that can “moderate” the opinion of trained fraud model and make better decisions.


There is an additional benefit of self-learning with new patterns which can be very powerful.​


When the above “system” results are investigated offline by an expert(s) for confirming (or not) fraud, this can act as a feedback “Test and Learn” loop to drive better predictors to improve the supervised fraud models and overall the system’s ability to detect fraud improves over time.





So is this real? What kind of business impact have we created?


Yes it is! Scienaptic has demonstrated substantial reduction in false positives (40-70%) and improved authorization rates (or less declines) in the order of 30-50% while improving accuracy of fraud capture by 1.2-1.4x. We have modelled these methodologies on our enterprise modelling platform, Ether, to build out-of-box fraud detection systems.





We continue to explore new approaches to fraud detection using next-gen AI such as Reinforcement Learning and Adversarial Training. As a teaser, here is a comprehensive list of all the major anomalies detection families. We have found good success with density based methods (specifically Local Outlier Factor).




On this Labor Day, as the weekend winds down, lets drink to better mouse traps!

26 views