Agile Cloud Institute

Agile AI Platform Architecture with the Agile Cloud Manager

Part 6 of 10: Continuous Experimentation And Training

The entire transcript of this video is given below the video so that you can read and consume it at your own pace. We recommend that you both read and watch to make it easier to more completely grasp the material.

AgileAIPlatform6

Now let’s examine how work can more efficiently flow through the structures that can be defined in the kinds of A.I. project templates that we discussed in the preceding slide.

The images on the bottom left of this slide remind us of how changes in the environment can cause things like data drift and concept drift that can break or degrade the effectiveness of an A.I. model. The images on the bottom left show us that, if we define an A.I. model to interact with a green forest, that model will lose effectiveness if the forest undergoes acid rain or a forest fire, or even severe winter snow.

Continuous experimentation with new A.I. models is required in order for an organization to be able to replace its broken or degraded models with new or retrained models. And in order for an organization to be able to continually sustain an ideation process to explore new A.I. models as the business evolves.

Number one on the slide illustrates the many different ideas for approaches that an organization might identify as an organization thinks through how to approach an A.I. problem.

The life cycle of each approach can be seen in terms of passing through a series of environments.

Number two on the slide illustrates the dev environment.

Number three shows how an iteration of a model can proceed into a test environment if that iteration of the model matures to the point of being ready to progress beyond the dev environment.

Number four on the slide illustrates how the production environment can receive iterations of a model that have passed all of the requirements that are defined for the test environment.

Some of the things that project management templates can be defined to make occur in each of these environments include the following.

Number five on the slide illustrates that each artifact in the dev and test environments needs to be packaged as a collection of five types of software-defined artifacts in order to ensure complete reproducibility of results.

The five types of software-defined artifacts that get packaged together include:

The model.
The version of the dataset that is used to train the model.
The code used to score the model.
The infrastructure as code.
And the configuration as code that provisions and otherwise configures the infrastructure, including deploying the model within an A.P.I.

Number six on the slide illustrates how all runs of each experiment can be recorded and compared using the A.I. project management tools. These comparisons become possible because the versions of each artifact have been recorded in a way that enables all of the results to become completely reproducible.

Software-definition of everything is what enables you to iterate through so many different model development projects while complying with all government regulations.

Number seven on the slide illustrates that the results of all the continuous experimentation with models can identify the best model and the best version of each model that is chosen to move into production.

Number eight on the slide illustrates how the artifacts that reach production are a slightly different bundle than what was used in the dev and test environments.

The items that are deployed into production that are identical to what was used in the dev and test environments include:

The model.
The code for scoring the model.
Infrastructure as code templates.
And configuration as code templates that provision and otherwise configure the infrastructure, including deploying the A.P.I. that will be wrapped around the A.I. mode.

Number nine on the slide illustrates that live production data is what is used in the production environment. By contrast, the lower environments used data that was approved for use in the lower environments. Ideally, you should try to at least use samples of production data in the lower environments. But your governance rules might not allow that, so it is common for organizations to develop automated ways to generate datasets for dev and testing that approximate the characteristics of production data.

Number ten on the slide illustrates monitoring.

Number eleven on the slide illustrates event triggers that can be defined to be run specific processes whenever specific monitors identify that specific types of events have occurred.

Let’s look at monitors and event triggers together.

You can define many types of monitors and event triggers. Some of the general types include:

Data drift monitors continually test the characteristics of the data to validate whether or not the profile of the data has been changing over time. If the characteristics of the data change, then the performance of the model might be degraded. An event trigger tied to a data drift monitor might start a process of retraining the model in a lower environment using a new dataset that more closely resembles the new profile of production data.
Concept drift monitors continually test to see whether or not the underlying assumptions of the model are relevant. An event trigger that handles concept drift might replace the model with a different model from a lower environment.
Operations variables can also be monitored. Event triggers associated with operations variables can do all sorts of operations tasks, including sending alerts to relevant other systems and people, and also doing patching and maintenance as needed.

Taking a step back and looking at this entire slide together can help you see how everything needs to be software-defined. The sheer number of A.I. model projects means that all the redundant tasks need to be automated when each new project is created. Also, the reproducibility required for continuous experimentation and testing means that the artifacts and environments need to be trackable and reusable. Government laws also require auditability. And it is a lot easier to audit things that have been software-defined, with version control, and with the kinds of clear role-based access control for all of these processes that was described in an earlier slide.

Proceed to Part Seven: Review Essential Components Of An AI Platform

Back to Part Five

Back to Series Table Of Contents: Agile AI Platform Architecture With Agile Cloud Manager

back to Site Home

back to Architecture section Home