This is the end of my research. Now that you’ve developed, tested, and validated your machine learning (ML) model, you’re ready to begin development and push the model to production. The hard work, the research, is finally over. or is it?
Understand the challenges of serving machine learning models
You suddenly think, “How do I make sure my model behaves as expected in production?”
There are multiple reasons why your hard-earned model can fail when deployed to production. Here are some.
different data
Development and research do not use the same database and the data is not exactly the same. In production, inference is performed in real time on data in memory. Research typically uses a data lake and several query engines to process batch data.
various environments
Production systems use Java and C, while Research uses Python. One way around this issue is to deploy the model as a web endpoint, but the performance requirements may be too stringent for this solution.
Feature incompatibilities
There are great tools to streamline the process of delivering models from research to development. For example, researchers can use feature stores where features are extracted in real time and stored in a database. Researchers use the same data that will later be used in the production system and provide endpoints to run predictions on. This solution is especially useful when the research team knows more or less what features they plan to use. As the data and context change from project to project, it becomes very difficult to work with the feature store as new features have to be created each time. It’s great for delivery, but it’s inflexible and adds a lot of overhead to changing or adding new features.
For these reasons, the preferred method for some projects is to distribute a specification with pseudocode or proof-of-concept (POC) code as a reference. The Research team submits the document to development and re-implements the solution in a way that fits the system, since Research’s code cannot be used as-is.
In most cases, an experienced developer will have no difficulty implementing code based on pseudocode and specs, but the final product will not be exactly the same.
Development and research must go through many rounds to work out bugs to get the desired result.
One of the main reasons there are some differences between R&D models is that developers cannot actually test their code. You can see if it conforms to the spec or pseudo code, and compare the final result, but you can’t test the code properly because you have no way of knowing if it works. that’s right Just as Research intended. Also, if there is an error in the specification, I think it will take some time before someone notices it.
Solutions to Machine Learning Model Delivery Challenges
One way to solve this problem is to provide tests. We’re talking about the same tests that Research used when developing the model. Testing is one way research teams can improve the results of their ML models. This guarantees better and more efficient results and can also be used in the final product. How do you provide your tests? Using the test API endpoint. When Research wants to test a feature, they implement an API endpoint to do it and create a test for that API endpoint. This makes your tests language-agnostic, as they run against your API endpoints instead of your code.
Tests can then be used to validate the model created by the development team to ensure that the model in production is working as expected. You can get a lot more with this method. Research now contributes the models and tests to development, and it is Research’s responsibility to keep the models and tests up-to-date, increasing Research’s ownership.
Development, on the other hand, can work using a test-driven development (TDD) methodology, allowing development to proceed more quickly. This allows you to easily test your code while you’re writing it, and have confidence that your code is ready when the tests pass. This speeds up development as developers don’t have to constantly go back and recheck their code.
Why is testing the solution? Let’s take a look at the issues mentioned earlier and see how testing can solve these common problems.
different data
The initial data will be different, but through testing the developer can adjust the code to ensure that the resulting intermediate data is identical.
various environments
Development uses a different environment and code language than research, making it difficult to distribute code from one group to another. This whole conundrum is avoided because the tests are provided using an API. Tests act as an intermediate language.
Testing in machine learning model delivery
we used pie test to develop and run tests, and flask Run tests on the local test endpoint. If you run your tests with the local endpoint, you can later replace the endpoint and run your tests with the DEV endpoint.
Research needs to create connectors. Get REST API calls and perform inference and processing operations. Both operations take a DataFrame containing the input records and return JSON records.
Figure 2: Example of an endpoint interface
Create a local connector implementation with a Flask application. The local endpoint used in our research calls the connector function from our test.
Figure 3: Local endpoint implementation
A helper function helps hide the JSON serialization, making the test as clean as if the endpoint didn’t exist. For example, use a model that takes a query string and determines if it contains a SQL injection payload. Below are the model helper functions and tests. This test verifies that a SQL injection payload is being sent and that it is identified as such.
Figure 4: Helper function and test example
Below is another example that tests the feature extraction code. The _process helper function is similar to the _inference function in the last example. null_values is a function that counts the number of null words.
Figure 4: Example of feature extraction test
The test files are part of the model development git project. They are created and maintained as part of it.
Delivery to development
Developers are responsible for running tests as part of the build process. You need to wrap your model and processing code in an API and send the API endpoint to your test.
I used Docker to hide the internal test implementation. Tests are packed as Docker images. The image contains a minimal Python runtime and test dependencies. The test takes an endpoint as a parameter and returns test results (passed, failed, etc.) and logs.
The ‘ENDPOINT’ environment variable is used to run tests on a specific endpoint instead of the local test application. Below is an example of a test run on an endpoint. The image name is sqli-ml-test.
Here is an example test result:
Figure 5: Pytest example output
Developers can choose which test version to use from Git (master or a specific branch) and run the tests. Endpoints can run on local ports. The build process can launch endpoints and run tests. Docker Desktop allows you to run tests on your PC without installing Python or any dependencies.
Bug lifecycle flow
Tests are maintained and versioned with the source code. These can be delivered to development on demand. Below is an example of a bug lifecycle flow. After a bug is found, research or development can create tests. Tests are run on both research and development endpoints and code updates are made accordingly.
summary
Testing is important to ensure the usefulness of ML models for research teams. By leveraging common tools such as Pytest, Flask, git, and Docker, we have successfully implemented a testing process that accelerates the transition from research to production. This allowed us to reuse our tests in both environments. These tests remained well-organized and this aspect remained transparent to the researchers, so they were not affected by the fact that they were performed locally on his endpoint.
This methodology impacts researchers to maintain code, testing, and ownership after distribution. Development receives testing, which accelerates progress and improves quality.
This post, “Overcoming Challenges in Delivering Machine Learning Models from Research to Operation,” was first posted on the blog.
*** This is a Security Bloggers Network syndicated blog of Blogs created by Ori Nagar. Read the original post: https://www.imperva.com/blog/overcoming-machine-learning-model-delivery-challenges/
