Agile Cloud Institute

Cross-Functional Architecture And Tools For Cloud-Based Operating Models

Logging for the Agile Cloud Manager

The Agile Cloud Manager is intended to be used inside a pipeline tool.

You can continue to use your current pipeline tool because any pipeline tool can potentially be used as the graphical user interface for the Agile Cloud Manager.

Reading the logs produced by the Agile Cloud Manager during runtime will become one of the most important tasks that your engineers perform. Your operations teams will use the logs to diagnose day to day performance of the pipelines. Your platform engineering teams will also use the logs to develop the templates that will compose your appliances.

The Agile Cloud Manager’s logs consolidate output from many different underlying tools and organize that underlying output within the structure of the workflow that the Agile Cloud Manager creates to execute any given CLI Command.

Using the logs is a 4-step process which we will summarize in this article as follows:

  1. Locate logs from the Agile Cloud Manager
  2. Identify which tool is summarized in each line
  3. Identify point in workflow where something of interest occurred
  4. Examine the underlying logs at the point of interest in the workflow

Step One: Locate Logs From The Agile Cloud Manager

Logs are written both to your pipeline tool’s logs and also to specific log file locations within each agent’s file structure.

Your pipeline tool’s graphical user interface will organize logs by job and then by step within each job.

The Agile Cloud Manager will create new logs each time one of the 12 CLI commands is run.

Therefore, you can locate the Agile Cloud Manager’s logs by navigating in your pipeline tool’s graphical user interface first to each relevant job and then to each step that calls one of the Agile Cloud Manager’s CLI commands within each relevant job.

Alternatively, the Agile Cloud Manager also writes two different log files to each agent’s directory structure. In Windows agents, logs are written to $USER_HOME\\acm\\logs\\. In Linux agents, logs are written to /var/log/acm/. The two native log files created within these log directories by the Agile Cloud Manager are:

A new copy of log-verbose.log and log-acm-summary.log will be created every time you run an Agile Cloud Manager CLI command, and a new backup of the previously new version of the log file will be created each time using the <date-time-stamp> value from the log being backed up, so that the logs directory on the agent might contain numerous log files with names in the log-verbose<date-time-stamp>.log and log-acm-summary<date-time-stamp>.log formats.

To access these logs in the agent directory structure, your provisioning scripts for each agent must map the locations of these log files in each agent to a network file share that will persist after each agent is destroyed when each job is completed.

Your pipeline tool’s graphical user interface will be a sufficient source of logs in most cases because the slight differences in chronological ordering are usually not an impediment to the diagnostic use of logs. But the Agile Cloud Manager’s own native log files are available if you need them.

Step Two: Identify Which Tool Is Being Summarized On Each Line

The first thing you will notice in the logs is that the first block at the left of each line is an indicator of which tool is producing the logs. For example, if Agile Cloud Manager workflow code is being summarized in the line, then the line will begin with [ acm ]. By contrast, if a shell command is being summarized in the line, then the line will begin with [ shell ]. If a terraform command is being summarized in the line the line will begin with [ terraform ], and so on.

A summary of some of the values in the start of each line includes:

[ acm ]		Agile Cloud Manager  
[ shell ]		Shell  
[ terraform ]	Terraform  
[ packer ]	Packer  
[ arm ]		ARM  
[ cf ]		Cloud Formation  
[ az-cli ]		Azure CLI  
[ … ]		Others can be specified explicitly.  

Step Three: Identify Point In Workflow Where Something Of Interest Occurred

The logs are structured by breakpoints that summarize the progress of the workflows that get created by the Agile Cloud Manager when you run any of the CLI commands.

To identify the point in a workflow where something of interest occurred, you must therefore understand the structure of the workflow whose breakpoints provide structure to the Agile Cloud Manager’s logs.

Workflow That Agile Cloud Manager Creates For Each CLI Command Run

There are 12 possible types of steps in any workflow that the Agile Cloud Manager will create to execute one of the CLI commands on an appliance configuration that you provide. These 12 types of possible steps are listed as follows:

Start of appliance run  
Start of each system within appliance  
Start of foundation (if exists) within each system  
End of foundation (if exists)  
Start of ServiceTypes  
Start of each ServiceType  
Start each instance of each ServiceType  
End of each instance of each ServiceType  
End of each ServiceType  
End of all ServiceTypes  
End of each system within appliance  
End of appliance run  

Examining the list of possible types of steps illustrates several aspects of the logs, including:

  1. A minimum of 10 steps will be included in each complete log, because there is a minimum of one system template with one instance of one service type, and because a foundation block is optional within a system template.
  2. There is no theoretical maximum number of steps because every line that contains the word “each” in the above list of possible step types can occur N times.
    1. Meaning that each appliance can theoretically include arbitrarily many system templates, which each can theoretically have arbitrarily many service types, which each can have arbitrarily many instances.
  3. The order of steps is linear, starting at either the top or the bottom of acm.yaml and proceeding on a “for each” basis down into the object model defined in each of the system templates that are referenced in acm.yaml.
    1. Note that workflows for “on” commands proceed from first to last in the configuration object model, while workflows for “off” commands proceed in reverse order from last to first in the configuration object model.

Good design will result in each appliance being composed of relatively small numbers of system templates, service types, and service instances.

The number of steps in any given log file will directly correspond to the number of objects defined in each of the system template files that are referenced in the acm.yaml file.

The number of steps in the current workflow will be printed into the logs.

The following summaries will be printed to the logs at the workflow breakpoints where each of the workflow steps has just been completed:

  1. A human-only-readable summary that more succinctly summarizes the current state in the workflow.
  2. A human-and-machine-readable Changes Manifest with detailed information.
  3. A machine-readable Changes Taxonomy that gives a JSON taxonomy of the workflow.

Human-Readable Changes Summary

A human readable summary is given at each point in the workflow. This human-readable summary contains the same information given in the Changes Manifest and in the Change Taxonomy, but is very clearly written in plain English.

One example of a human-readable changes summary from a command run by one of our demos is:

[ acm ] APPLIANBCE LEVEL:   
[ acm ]     command is: on  
[ acm ]     overallStatus changed from NOT Started to In Process  
[ acm ]     currentStep did NOT change since the last step and is: 0 out of 1 steps.   
[ acm ]     SYSTEMS:  Each system in the appliance will be summarized one at a time as follows:    
[ acm ]         tfbackend SUMMARY LEVEL:   
[ acm ]             name: tfbackend  
[ acm ]             system summary status did NOT change and is: NOT Started  
[ acm ]             system summary currentStep did NOT change and is: 0 out of 1 steps.   
[ acm ]             SERVICES SUMMARY LEVEL:   
[ acm ]                 all services summary status did NOT change and is: NOT Started  
[ acm ]                 all services summary currentStep did NOT change and is: 0 out of 1 steps.   
[ acm ]                 Each type of service is summarized as follows:   
[ acm ]                     tfBackend summary is as follows:   
[ acm ]                         tfBackend summary status did NOT change and is: NOT Started  
[ acm ]                         tfBackend summary currentStep did NOT change and is: 0 out of a total 2 steps.   
[ acm ]                          INSTANCES OF tfBackend SERVICE TYPE ARE:   
[ acm ]                             adminAccounts instance of tfBackend  
[ acm ]                                 adminAccounts summary status did NOT change and is: NOT Started  
[ acm ]                                 adminAccounts summary currentStep did NOT change and is: 0 out of a total 1 steps.   
[ acm ]                             pipelineAgents instance of tfBackend  
[ acm ]                                 pipelineAgents summary status did NOT change and is: NOT Started  
[ acm ]                                 pipelineAgents summary currentStep did NOT change and is: 0 out of a total 1 steps.   
[ acm ] ////////////////////////////////////////////////////////////////////  
[ acm ] CHANGE SUMMARY:   
[ acm ] ...     overallStatus changed from NOT Started to In Process  

If you examine this human-only-readable summary, you will notice several things:

  1. It clearly stands out from other logs because it is large and human readable, so that you can find it easily if you scroll through the logs.
  2. The CHANGE SUMMARY at the very bottom of the block lists what actually changed in the current step. In this case, only one thing happened, which was the change of the overallStatus from NOT Started to In Process.
  3. The status of each level of each affected system template in the appliance is listed in hierarchical order in plain English with indentation.

Changes Manifest

A Changes Manifest will also be reprinted into the log file at the end of every step in the workflow. This Changes Manifest will list each step including the changes that will be made in each step in the entire workflow, along with the current status of each of the changes to be made in each step.

Each of the changes in each step can do one of only two possible things:

  1. Change the status of the step to one of the following two possible values:
    1. In-Process
    2. Completed
  2. Change the number of the current step by +1 to indicate progress through the workflow.

In addition, the status of each change will be reported as either True or False each time the Changes Manifest is printed into the logs. So that you can see the progression of changes being made as the Agile Cloud Manager steps through the workflow that it creates to execute a CLI command.

The intent is for the Changes Manifest to be both machine-readable and human-readable.

One example of a Changes Manifest from a command run by one of our demos is:

[ acm ] The current status of the 12 changes being made in this run is: 
[ acm ] {'changeIndex': 1, 'changeType': 'Start of appliance run', 'key': 'applianceStart', 'changes': [{'affectedUnit': ' appliance', 'Status': 'To In Process', 'Step': 'Same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 2, 'changeType': 'Start of a system', 'key': 'appliance/system:tfbackend', 'changes': [{'affectedUnit': ' appliance', 'Status': 'same', 'Step': '+1', 'changeCompleted': False}, {'affectedUnit': ' appliance/system:tfbackend', 'Status': 'To In Process', 'Step': 'Same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 3, 'changeType': 'Start of a services section', 'key': ' appliance/system:tfbackend', 'changes': [{'affectedUnit': ' appliance/system:tfbackend', 'Status': 'same', 'Step': '+1', 'changeCompleted': False}, {'affectedUnit': ' appliance/system:tfbackend/serviceTypes', 'Status': 'To In Process', 'Step': 'same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 4, 'changeType': 'Start of a serviceType', 'key': ' appliance/system:tfbackend/serviceTypes', 'changes': [{'affectedUnit': ' appliance/system:tfbackend/serviceTypes', 'Status': 'same', 'Step': '+1', 'changeCompleted': False}, {'affectedUnit': ' appliance/system:tfbackend/serviceTypes/tfBackend', 'Status': 'To In Process', 'Step': 'same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 5, 'changeType': 'Start of an instance of a serviceType', 'key': ' appliance/system:tfbackend/serviceTypes/tfBackend', 'changes': [{'affectedUnit': ' appliance/system:tfbackend/serviceTypes/tfBackend', 'Status': 'same', 'Step': '+1', 'changeCompleted': False}, {'affectedUnit': ' appliance/system:tfbackend/serviceTypes/tfBackend/adminAccounts', 'Status': 'To In Process', 'Step': '+1', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 6, 'changeType': 'End of an instance of a serviceType', 'key': ' appliance/system:tfbackend/serviceTypes/tfBackend/adminAccounts', 'changes': [{'affectedUnit': ' appliance/system:tfbackend/serviceTypes/tfBackend/adminAccounts', 'Status': 'To Completed', 'Step': 'same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 7, 'changeType': 'Start of an instance of a serviceType', 'key': 'appliance/system:tfbackend/serviceTypes/tfBackend', 'changes': [{'affectedUnit': 'appliance/system:tfbackend/serviceTypes/tfBackend', 'Status': 'same', 'Step': '+1', 'changeCompleted': False}, {'affectedUnit': 'appliance/system:tfbackend/serviceTypes/tfBackend/pipelineAgents', 'Status': 'To In Process', 'Step': '+1', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 8, 'changeType': 'End of an instance of a serviceType', 'key': 'appliance/system:tfbackend/serviceTypes/tfBackend/pipelineAgents', 'changes': [{'affectedUnit': 'appliance/system:tfbackend/serviceTypes/tfBackend/pipelineAgents', 'Status': 'To Completed', 'Step': 'same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 9, 'changeType': 'End of a serviceType', 'key': 'appliance/system:tfbackend/serviceTypes/tfBackend', 'changes': [{'affectedUnit': 'appliance/system:tfbackend/serviceTypes/tfBackend', 'Status': 'To Completed', 'Step': 'same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 10, 'changeType': 'End of a services section', 'key': 'appliance/system:tfbackend/serviceTypes', 'changes': [{'affectedUnit': 'appliance/system:tfbackend/serviceTypes', 'Status': 'To Completed', 'Step': 'same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 11, 'changeType': 'End of a system', 'key': 'appliance/system:tfbackend', 'changes': [{'affectedUnit': 'appliance/system:tfbackend', 'Status': 'To Completed', 'Step': 'Same', 'changeCompleted': False}]}
[ acm ] {'changeIndex': 12, 'changeType': 'End of appliance run', 'key': 'applianceEnd', 'changes': [{'affectedUnit': 'appliance', 'Status': 'To Completed', 'Step': 'Same', 'changeCompleted': False}]}

As you can see, the Changes Manifest gives many lines of simple JSON that is easy enough to be read by a human. Each line tells you a specific step, with one or more smaller changes at each step, and with a changeCompleted field marked either True or False.

The example above is the first printout in a log file so that changeCompleted is False in every step at the very start of a run.

The logs will print a new copy of the Changes Manifest at every step, so that if you examine the logs, you will see that each new copy of the Changes Manifest marks the changes in one more step as changeCompleted:True until the very last step shows all steps and all changes as changeCompleted:True.

Finding the Point Of Interest

To diagnose a problem, you can therefore look through the logs for a human-readable changes summary marked by the [ acm ] at the start of each line and containing the distinctive structure shown above. The human-readable summary stands out due to its large human-readable nature and will clearly tell you the status of the Agile Cloud Manager workflow at each point.

You can then further examine the Changes Manifest to identify which step at each point is the last step to be marked as completed.

The step after the last completed step would be the step where something broke, if anything broke.

Step Four: Examine Underlying Logs At Point Of Interest In Workflow

Once you know the point in the workflow, you scroll down from the last human-readable changes summary and examine the outputs of each command run against underlying tools to find the last command that ran along with any error message that might have been printed.

Each underlying tool will have a unique pattern which will repeat in all its logs, so that you can learn to use the information to debug issues that arise in the performance of underlying tools.

For example, the command that the Agile Cloud Manager runs against a given tool will be printed in the logs along with any required information about the directory in which the command is being run.

Platform engineers developing with the Agile Cloud Manager can navigate their terminals to the directory given in the logs and can paste in the underlying 3rd party tool command that was run when the pipeline broke. This should give adequate information required to identify the root cause of the problem so that it can be fixed.

Platform engineers should be able to fix any underlying problems during development, so that underlying problems should be resolved before each template is elevated to higher environments.

Some problems occur when third party systems have outages elsewhere on the internet. Other problems occur when credentials expire, and for other reasons that have nothing to do with the template itself.

Operations engineers who use the Agile Cloud Manager’s logs in pipelines can identify whether a problem simply requires re-running the job that broke in a pipeline, or whether other changes might need to be made, such as potentially updating credentials, or potentially deploying to a different cloud region if a cloud provider is having a regional outage.

back to Site Home

back to Engineering section Home