Machine learning model, In the fall of 1992, a small start-up company named Vicarious developed an Artificial Intelligence software product named “VICAR”. VICAR was designed to provide the user with a tool that substantially enhances productivity in any application that manipulates numbers, such as financial data or statistics.
In particular, VICAR was designed with certain key capabilities that set it apart from other products in this class:
Unparalleled speed –
Calculations are done directly on your computer’s hardware, in C language. This means no additional software is required to run VICAR; you can run it without having installed anything else on your machine (including Microsoft Windows). The speed increases even further when running on SMP systems like dual processors.
VICAR was initially sold to major financial institutions. For example, AIG (American International Group) used VICAR as a critical component of its risk management systems.
But the most promising market at the time for Vicarious’ product was on Wall Street brokerage firms that relied on automated trading programs to handle huge volumes of stock trades per day. With VICAR, it would be possible for a human being to run a “virtual” desk where all transactions are performed by a computer system without any human intervention.
In order to sell its product, Vicarious created an impressive demo using some large financial datasets from those years and showcased those results at some major Wall Street events in early 1993. In particular, they demonstrated the remarkable speed of the software and showed that it was able to handle multi-dimensional analysis in any range of securities available in a given market.
The company received very positive feedback from Wall Street executives who visited its booth at the events… …Until one day when a chief trader from a large brokerage firm visited Vicarious’ booth and noticed a critical flaw in VICAR’s analysis: if you were to sell every stock on an exchange except for one, then that remaining stock would have 100% of the volume traded. The system would point out this abnormality as being highly suspicious.
Amazingly, the Vicarious demo never showed this case study with just one stock on an exchange because they wanted to avoid drawing attention to such important issues – these were supposed to be carefully analyzed by the customer’s financial experts.
The trader kindly pointed out this critical error and Vicarious had no choice but to correct it in their code and remove that specific case study from its demos. From that point on, many customers started noticing other issues with VICAR’s outputs and stopped trusting them completely.
Why? Because they were just simple examples that the software was NOT tested against!
This is a classic example of both using machine learning models in production without having thoroughly tested them beforehand for all possible use cases and also of drawing incorrect conclusions about a model’s behavior simply because a training dataset never included some important cases while forgetting to consider how easily such cases be added to the model while testing it.
This story is one of the most illustrative examples to understand how important it is to thoroughly test your machine learning models on production data in order to ensure that they behave correctly under real-world circumstances. Any incorrect behavior of your model can have dramatic consequences for business, especially when using predictive models in automated systems without requiring human intervention before applying them. While this case study has happened with Vicarious VICAR software product, many other examples of Wrong behaviors occurring with trained machine learning models are documented every day around the web. A few years ago, there was a famous instance of an Amazon Machine Learning model wrongly predicting negative reviews for some users due to being trained with “unrealistic” cases where previous reviewers wrote bad comments about a product that they never purchased to the website. This example again shows how important it is to thoroughly test machine learning models for all possible use cases and how much doing so can cost if you miss a case study in your dataset!
Rather than going through a lengthy description of what machine learning testing is, let’s go straight into showing some examples of tools that exist out there for automating tests of machine learning systems as well as their integration with any JVM-based production environment.
Testing Machine Learning Systems Automatically
There are many different ways to achieve this goal – from using simple shell scripts to complex software packages. In the following sections, we’ll try to present a few popular options that might be useful depending on your specific situation.
Testing machine learning models automatically in production is one of the most important processes when building software products. The examples presented here are just a small subset of what tools are currently available for this purpose and should help you choose an approach that best suits your current situation, be it building your own custom script to test your models or using some existing package to achieve the same goal.