Agile Cloud Institute

Cross-Functional Architecture And Tools For Cloud-Based Operating Models

Data Lake House In Azure using ARM templates

This is a starting point for developing data lake house appliance in Azure using ARM templates. This article is hands-on training for engineers. You can separately read about the architecture of this example Azure data lake house appliance.

Three system templates are orchestrated in this appliance.

The data engineer system gives an example of importing an output variable from the core system, which is a feature that requires version 1.4 of Agile Cloud Manager.

This starting point contains elements from which you can build out a more full-featured appliance.

STEP ONE: Create DevBox

You will need a provisioned devbox in order to orchestrate this appliance using the Agile Cloud Manager. Either Windows or Linux.

For convenience, we have defined a simple process for spinning up a DevBox at this link .

If you already have a working DevBox that you created for one of the other appliance examples, you can reuse that DevBox for this example if you do the following things first:

STEP TWO: Create keys.yaml and config.yaml

Once you have created a DevBox, you can create keys.yaml and config.yaml.

The Azure clientId that you use will need to be Subscription Owner for the subscription that you specify. Later on, after you get the example appliance up and running, you can experiment with Subscription Contributor or Subscription Administrator roles.

Several items will need to be collected from the Azure Portal, including the Subscription Name, the Subscription ID, the Tenant ID, the Client Name, the Client ID, and the Client Secret.

Unique strings will also need to be created for the rest of the variables. The below example keys.yaml and config.yaml give the number of characters for each value that you will need to create. We test with specific length strings. Later on, you can experiment with different length strings after you get the example appliance working with strings of the lengths specified below.

keys.yaml should contain the following fields, with valid values that you specify as described for each field below:

secretsType: master
clientName: Actual_Client_Name
sqlAdministratorLoginPassword: 1234567890abcd
clientId: 12345678-1234-1234-1234-123456789012
clientSecret: abcdefghijklmnopqrstuvwxyz78901234567890

Config.yaml should contain the following fields, with valid values.

subscriptionId: actual-subscription-id-guid-string
subscriptionName: ActualSubscriptionName
tenantId: actual-tenant-id-guid-string
resourceGroupRegion: eastus
orgARM: <6character-unique-string>
rgLhCoreFoundation: <6character-unique-string>
rgLhCoreSynapseService: <5character-unique-string>
rgLhEngineerSynapseService: <5character-unique-string>
rgLhCorePauseService: <5character-unique-string>
rgLhCoreResumeService: <5character-unique-string>
rgLhCorePauseRoleService: <5character-unique-string>
rgLhCoreResumeRoleService: <5character-unique-string>
imageNameARM: <6character-unique-string>
lhCoreFoundationDeployNameARM: <10character-unique-string>
synapseCoreDeployNameARM: <10character-unique-string>
synapseEngineerDeployNameARM: <5character-unique-string>
pauseCoreDeployNameARM: <10character-unique-string>
resumeCoreDeployNameARM: <10character-unique-string>
rolePauseCoreDeployNameARM: <10character-unique-string>
roleResumeCoreDeployNameARM: <10character-unique-string>
networkName: <12character-unique-string>
sysName: <14character-unique-string>
pauseEngineerDeployNameARM: <10character-unique-string>
rgLhEngineerPauseService: <5character-unique-string>
resumeEngineerDeployNameARM: <9character-unique-string>
rgLhEngineerResumeService: <5character-unique-string>
rolePauseEngineerDeployNameARM: <10character-unique-string>
rgLhEngineerPauseRoleService: <5character-unique-string>
roleResumeEngineerDeployNameARM: <10character-unique-string>
rgLhEngineerResumeRoleService: <5character-unique-string>
mlScientistDeployNameARM: <10character-unique-string>
rgLhScientistMLService: <5character-unique-string>
wsName: <15character-unique-string>
kvName: <16character-unique-string>
rgName1: <5character-unique-string>

Both keys.yaml and config.yaml need to be valid YAML files, with a simple, one-level list of key/value pairs like what you see above.

If you have already create keys.yaml for the Azure terraform/packer appliance, you can reuse the Subscription Name, the Subscription ID, the Tenant ID, the Client Name, the Client ID, and the Client Secret. But you will need to change all the other values, as shown above. And you will need to make sure that you have assigned Subscription Owner role to the client that is referred to by the Client Name and the Client ID.

After you have created config.yaml and keys.yaml using the process documented in the link, place copies of config.yaml and keys.yaml in the $USER\acm\keys\starter directory of the DevBox.

USE PROPER SYNTAX FOR keys.yaml AND FOR config.yaml.

PROPER SYNTAX INCLUDES VALID YAML, A LIST OF ONE-LEVEL KEY/VALUE PAIRS AS SHOWN, THE CORRECT VALID VALUES FOR EACH OF THE AZURE CREDENTIALS, AND UNIQUE STRINGS OF THE SPECIFIED LENGTHS FOR ALL THE OTHER VALUES.

STEP THREE: Confirm that ACM Has Been Installed

After you have confirmed that $USER\acm\keys\starter\keys.yaml and $USER\acm\keys\starter\config.yaml have been properly created and placed, navigate the command line to any empty directory into which you want the Agile Cloud Manager to place resources for the given appliance in the DevBox.

Check the version by running the following command:

$ acm version  
1.4  

Always use the newest version of Agile Cloud Manager because the versions are backward compatible, and because we might change these example appliances over time to only work with newer versions.

VERSION 1.4 OR HIGHER OF AGILE CLOUD MANAGER IS REQUIRED FOR THIS APPLIANCE.

STEP FOUR: Run Setup CLI Command

Navigate to any empty directory in which you want to orchestrate the appliance, then download and install all the requirements for the appliance by running the following CLI command:

acm setup on sourceRepo=https://github.com/AgileCloudInstitute/acm-demo-azure-data-lake-house.git    

After the setup command completes running, confirm that your current working directory directory contains an acmAdmin subdirectory and an acmConfig subdirectory in addition to subdirectories for the other repositories that are listed in the appliance’s setupConfig.yaml file. You should now be able to find setupConfig.yaml inside the acmConfig subdirectory now that the acm setup on sourceRepo=https://github.com/AgileCloudInstitute/acm-demo-azure-data-lake-house.git command has completed running.

For example, on a Windows DevBox, you might run the dir command and see:

C:\path\to\mydirectory>dir  
 Volume in drive C is Windows  
 Volume Serial Number is 3E5E-9650  
  
 Directory of C:\p\a\acm_dl  
  
10/25/2023  04:48 PM    <DIR>          .  
10/23/2023  09:09 AM    <DIR>          ..  
10/23/2023  09:35 AM    <DIR>          acm-system-templates  
10/23/2023  09:34 AM    <DIR>          acmAdmin  
10/23/2023  09:34 AM    <DIR>          acmConfig  
10/23/2023  09:35 AM    <DIR>          azure-building-blocks  

The acmAdmin and acmConfig subdirectories will be present for any acm working directory after setup is run. The acm-system-templates and azure-building-blocks subdirectories are specific to this demo, and their names can be validated by examining the contents of the setupConfig.yaml file that you will find inside the acmConfig directory.

STEP FIVE: Run Beginner CLI Commands

After validating setup, start with some narrowly-scoped commands so that you can examine what is happening under the hood when Agile Cloud Manager runs.

Narrowly-scoped commands also make it easier for you to diagnose any problems that might occur. For example, if you got the syntax of one of the keys.yaml or config.yaml variables wrong. Or if your Client ID is not assigned Subscription Owner permissions. Or if there is an Azure network outage.

First, create the foundation of the lakehouse-azure-core system by running the following command:

acm foundation on systemName=lakehouse-azure-core  

You can watch the foundation being created in the following two ways:

  1. The command line will print out structured logs organized by workflow steps you can read about in the documentation on this web site . These same logs will be stored in local log files on your DevBox machine.
  2. The Azure Portal GUI console at portal.azure.com has a resource groups dashboard. You can log in to that dashboard and view the resource groups being created and deleted by Agile Cloud Manager’s automation. NOTE: If you are one of the small percentage of users whose Azure resource groups dashboard GUI is not showing the resource creation, you can alternatively look for the same resources by navigating in the Azure Portal GUI to “Home>Subscriptions>YourSubScriptionName>Resources”.

Continue to review the command line logs and the Azure Portal GUI as you run each of the remaining commands as follows.

Second, only after the foundation has been successfully created, next create the services for the lakehouse-azure-core system by running the following command:

acm services on systemName=lakehouse-azure-core  

Third, after the services have been successfully created for the lakehouse-azure-core system, destroy the services by running the following command:

acm services off systemName=lakehouse-azure-core  

Fourth, after the services have been destroyed successfully, now destroy the foundation of the lakehouse-azure-core system by running the following command:

acm foundation on systemName=lakehouse-azure-core  

You have now created and then destroyed the core system.

STEP SIX: Run Appliance Commands

You can create the entire data lake house appliance in one single command if you run the following:

acm appliance on  

The logs will be a lot more complex, because a lot more is being done under the hood when you create an entire appliance with only one command.

Everything should work perfectly.

But an Azure service outage or a typo in your keys.yaml or your config.yaml could cause an error to be thrown.

If any problems occur, you can diagnose the problems using the same links given above to our logging documentation.

The logs make it easy to pinpoint exactly where in the workflow any problem might occur.

You then fix any problem that might have occurred, and then you run the “acm appliance on” command again to see the problem resolved.

After the appliance has been created, you must destroy the appliance with the following command:

acm appliance off  

Confirm in the Azure Portal GUI that all involved resources have been deleted before moving on. If you encounter a problem, you can diagnose the problem by examining the logs. If you need help, create a ticket at the project website and someone will respond to help you in a timely manner.

STEP SEVEN: Run Other CLI Commands

Experiment with other CLI commands after the appliance has been destroyed. The other CLI commands will enable you to create and destroy individual components of the appliance.

The documentation for the CLI commands is at this link.

You can read about the language that defines the objects on which the CLI commands work at this second link.

And you can read about operating on the object model using the CLI at this third link.

STEP EIGHT: Clean Up

Back up keys.yaml and config.yaml someplace safe so that you can re-use them later.

Confirm that all relevant resources have been deleted by viewing the Azure Portal GUI’s resource groups dashboard, or in “Home>Subscriptions>YourSubScriptionName>Resources” if for some reason you are one of the small percentage of users for whom Azure does not display resources in the resource groups dashboard.

Run acm setup off if you wish.

Sometimes Windows machines will give a permissions error when the setup off command tries to delete local git folders in repositories that were downloaded during the seup on command. If that happens, you can delete the downloaded contents using Windows File Explorer. The setup off command only deletes the folders in the working directory in which you ran the setup on command, so you can just as easily delete those subfolders manually. You can also run acm setup off as a higher-powered super user to avoid this problem if it occurs.

Confirm that anything you created has now been deleted.

Make sure that there are no keys.yaml or config.yaml in your $USER\acm\keys\starter directory after you backed up those files to a safe location.

Dig Deeper

If you encounter any errors, or if you want to experiment, dig deeper, and potentially cleanup after running “acm appliance on” and “acm appliance off”, you can try reading the instructions at this link

Return to the list of example appliances at this link