Adversarial Machine Learning
Machine learning techniques were originally designed for environments in which the training and test data are assumed to be generated from the same (although possibly unknown) distribution and/or process. In the presence of intelligent and adaptive adversaries, however, this working hypothesis is likely to be violated.
Applying machine learning to use cases like fraud, anti-money laundering and infosec presents a unique set of challenges:
- Little or no labeled data
- Non-stationary data distributions
- Model decay
- Counterfactual conditions
This event is entirely devoted to understanding how modern machine learning methods can be applied to these adversarial environments. We will have hands-on workshops as well as talks by leading practitioners from industry and academia.
Sep 10, 2016, 9:30a - 5p
620 Folsom St #100
San Francisco, CA 94107
09:00 - 09:30 Registration
09:30 - 10:15 Tackling Bitcoin's Fraud Problems (Soups)
10:15 - 11:45 TensorFlow Workshop on Adversarial Examples (Illia)
11:45 - 12:15 Learning from Large Bodies of Malware (Zach)
12:15 - 01:00 Lunch
01:00 - 01:45 Dealing with Counterfactual Model Decisions (Alyssa)
01:45 - 02:30 Tackling the Full Spectrum of Threats Confronting the Enterprise (Arshak)
02:30 - 03:00 Break
03:00 - 03:45 Assessing Merchant Fraud Risk at Square (Thomson)
04:00 - 05:00 ML-based Fraud Detection for Fraud and Abuse (Jacob)
Adversarial ML Topics Covered
Expert Speakers That Understand Adversarial ML Challenges
Coinbase is the easiest place in the world to buy or sell crypto-currency (bitcoin, ethereum, litecoin). We also attract the most sophisticated fraudsters: who use stolen credit cards or bank accounts to purchase crypto-currency and move it out of our hosted wallets. Since crypto-currency is like cash and there's no means to recover the stolen goods once sold, our fraud problem is easily one of the hardest fraud problems in the world.
I'll present some of the recent fraud trends we've seen and how we've solved them using a combination of machine learning systems (machine learning based risk score, anomaly detection and related user modeling). I'll also talk about the Know Your Customer (KYC) program that we've built to establish the identity of our customer.
TensorFlow has taken the deep learning world by storm. This workshop will be led by one of TensorFlow’s main contributors, Illia Polosukhin. Illia’s hands-on workshop will cover:
- Dropout - both for preventing overfitting and as mechanics to get "what model doesn't know" (confidence of prediction).
- Augmenting data with adversarial examples - to prevent overfitting and speed up training
- How to limit technical exploits of your models - e.g. how to use different methods to prevent your model going haywire, using different methods (confidence, adversarial examples, discriminator, separate classifiers or just simple whitelists).
In this talk, methods for working with and learning from large bodies of malware samples will be demonstrated. We will discuss the process of modeling complex, evolving data, the uses of such systems in a production environment, and strategies for adapting to the natural adversaries that develop such malware.
Our validation methodology for identifying new malicious binaries, our general feature set, and our production environment for utilizing this data will be explored, along with a discussion of alternate approaches taken by other organizations to solve similar problems. Amongst those alternate solutions are the Polonium system, approaches generated for the Microsoft Kaggle challenge, and network-system hybrid approaches such as Mastino.
We announce Adversarial.AI a unified platform that tackles the full spectrum of threats confronting the modern enterprise.
Applying machine learning to any adversarial use case presents a common set of challenges:
- Lack of labeled data
- Model decay and indeterminate retraining intervals
- Extreme class imbalance
- Dealing with adversarial examples (remember Tay?)
The machine learning and data analysis approaches we are developing specifically address these issues and can be generalized to a diverse set of domains including data breach, network hack, identity theft, counterfeiting, brand scam, arbitrage, phishing, social engineering, money laundering, wire fraud and internal fraud.
Stripe processes billions of dollars a year in payments for businesses around the world. To protect our users from fraud, we use machine learning to score and block potentially fraudulent transactions. One major issue we faced when building this system was evaluating the performance of a fraud model in production when it's impossible to observe the counterfactual for charges the model deems fraudulent and thus declines: we can't determine if a transaction's is truly fraudulent if it's never processed. Monitoring the performance of the production system is business-critical, so we developed a system that lets us observe the outcomes we need to assess model performance. This talk will outline how that system works, discuss a few specific problems with the system we've solved, and pose some questions about the evaluation system that remain open.
Sift Science is the leading provider of real-time machine learning fraud prevention for online businesses across the globe. Sift Science protects thousand of different businesses from all kinds of fraud and abuse, from a stolen credit card used to buy an airline ticket or a digital game, from a fake apartment or job listing, from a fraudulent money transfer, or from abuse of a referral program.
In this talk, we'll discuss some challenges building a machine learning system to detect all of these diverse kinds of fraud and abuse, including extracting features and training models on custom data specific to each business, leveraging our network of data to help each individual business, learning in real-time, and explaining our system's recommendations to customers.