October 23, 2018
Guest post by: Haojun Zhu, Data Scientist, Kount
Machine learning is a technique where humans use programming to train computers, giving them the ability to learn from data and make decisions. Today, fraud prevention is a business with a very narrow margin of error. Human mistakes in the machine learning lifecycle can mean loss of tens of thousands of dollars in revenue. To prevent this, Kount uses the open source platform MLflow to manage their machine learning models. Kount has integrated this tool into their workflow for better version control and data governance practices.
Losing Track of Experiments
During a machine learning lifecycle, data scientists run a lot of experiments before coming up with a good machine learning model – they try many different combinations of data features and model parameters before obtaining a satisfactory final model. In this iterative process, every step should be well documented for reproducibility and repeatability. Otherwise, there’s no guarantee the right model makes it into production.
Data Science is a new field, and these processes are often hardcoded or done manually, which introduces a lot of error. Sometimes, data scientists resort to spreadsheets to document changes. This, is a simple yet effective approach when only a handful of models are being trained. However, spreadsheet tracking is no longer sufficient in today’s machine learning landscape, where dozens or even hundreds of models are trained before attaining an optimal model. Without a rigorous system of tracking experiments, it is extremely hard to get reproducible results.
Some models can take days to train. Using the standard practice, data scientists and machine learning engineers spend lots of time asking all sorts of questions like ‘What set of parameters did we run this model with?’, ‘Where is the input training data?’
Data Science is Reproducible
Kount uses MLflow to keep track of experiments and get reproducible results. This includes logging model parameters, performance metrics, source code, and input/output files for each experiment. The benefits of this machine learning lifecycle platform include:
- Documenting Parameters and Metrics. MLflow has a ‘Tracking’ component built in. This is an API and UI for logging model parameters and metrics when the ML code is packaged under the MLflow framework.
Rows of model iterations, with different set of parameters and metrics.
- Traceability through Version Control. MLflow is also useful for version control. Running Kount’s ML code saves the model-generating script as an artifact. The latest Git commit hash is also saved. All model hyper parameters are objectized and changed through configurations, rather than being hard-coded or manually changed before spinning up new experiments.
- Exposing Hidden Patterns. With the MLflow Tracking web UI, Kount data scientists and machine learning engineers quickly discover how changing model parameters affects model performance. With tunable parameters and performance metrics displayed side by side, they quickly discover hidden connections between model parameter and model metrics. This dramatically decreases model tuning time by focusing attention on the parameters that matter.
Using MLflow, Kount has adopted a good practice of machine learning lifecycle management and data governance. Kount’s advanced machine learning technologies combines real-time data, historical insight, and industry expertise to help organizations approve more good orders while preventing fraud. Learn why not all machine learning is created equal for fraud detection in the eBook, The Truth About Machine Learning in Fraud Prevention.