Agile Cloud Institute

Cross-Functional Architecture And Tools For Cloud-Based Operating Models

Data Lake House in AWS using CloudFormation with Agile Cloud Manager

This is a full-fledged data lake house appliance including several systems, complete with networking, IAM roles, source data ingested into s3, EMR clusters, a Glue Crawler example, and a SageMaker studio domain.

The embedded video on this page will ask you to cut and paste things from the text below, and will also illustrate each step being done. The architecture of the appliance created on this page is described in this other video in the architecture section of this web site .

STEP ONE: Create DevBox

You will need a provisioned devbox in order to orchestrate this appliance using the Agile Cloud Manager. Either Windows or Linux.

For convenience, we have defined a simple process for spinning up a DevBox at this link .

If you already have a working DevBox that you created for one of the other appliance examples, you can reuse that DevBox for this example if you do the following things first:

STEP TWO: Create Administrative User

You will need an administrative user with specific permissions in order to run this example appliance.

You will also need to be working in the us-east-1 (N. Virginia) AWS region because some of the resource types are only available in us-east-1.

So begin by switching to the us-east-1 (N. Virginia) region as shown in the following screen shot.

Then open up a new CloudShell again, this time in the us-east-1 region.

Type the following command to download the template required to create the administrative user:

wget https://github.com/AgileCloudInstitute/aws-building-blocks/blob/master/cf/lf-admin-user.yaml?raw=true -O lf-admin-user.yaml   

Then create the administrative user with the following command:

aws cloudformation create-stack --stack-name adminUser --capabilities CAPABILITY_NAMED_IAM --template-body file://lf-admin-user.yaml

One the CLI command has run, navigate another browser tab to the AWS CloudFormation service and make sure to switch to the us-east-1 (N. Virginia) region. Locate the stack, which should be named adminUser, and wait for the stack to say “CREATE_COMPLETE”.

Get the AWS Secrets as follows:

  1. Navigate to the Outputs tab of the stack.
  2. You will harvest the value of the “AccessKeyIdAdmin” output variable to use as the value of “AWSAccessKeyId” in your keys.yaml file, and you will harvest the value of the “SecretAccessKeyAdmin” output variable to use as the value of “AWSSecretKey” in your keys.yaml file.
  3. The Outputs tab in CloudFormation will have the URL of an AWS Secrets Manager secret for each of the two secrets.
  4. You will need to click from the outputs section to open the AWS Secrets Manager secret that will store each of these secret values.
  5. Then, in each AWS Secrets Manager secret’s page, click the “Retrieve Secret Value” button to reveal the secret.

You can see how this will look in the next section.

STEP THREE: Create keys.yaml

Create a file in the DevBox named keys.yaml at $USER\acm\keys\starter\keys.yaml

After you have generated the keys for the newly-created admin user in STEP TWO above, place the AWSAccessKeyId and AWSSecretKey into your $USER\acm\keys\starter\keys.yaml under secretsType: master, so that your entire keys.yaml will look as follows:

secretsType: master  
AWSAccessKeyId: <ACTUAL-ID-REDACTED>  
AWSSecretKey: <long-alpha-numeric-actual-key-redacted>  

STEP FOUR: Create config.yaml

Create a file in the DevBox named config.yaml in the $USER\acm\keys\starter directory and add the following precise contents to the file (REMOVE THE INDENTATION FROM EACH LINE BUT KEEP EVERYTHING ELSE EXACTLY AS IT IS):

TPCDBName: tpc
DBMasterUser: tpcadmin
DBMasterPassword: BigData26!
EEKeyPair: MyKeyPair
LatestAmiId: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2
organization: lhf3e
networkName: name-of-vnet
sysName: name-of-system
lhFoundationStackName: lhfoundation
region: us-east-1
CFNDatabaseName: tpc
lfUsersStackName: lh-iam-users
lfgluStackName: glue-database
ClassificationVal1: Sensitive
ClassificationVal2: Non-Sensitive
GroupVal1: developer
GroupVal2: campaign
GroupVal3: analyst
emrEngineerStackName: lhengineer
ReleaseLabel: emr-6.7.0
InstanceType: m4.large
EC2KeyPair: MyKeyPair
emrEngineerUserStackName: lh-engineer-users
EMRStepUserPassword: 2!PutRealPasswordInKeysYaml
emrScientistStackName: emr-scientist
glueScientistStackName: glue-scientist
lhScienceFoundationStackName: foundation-scientist

STEP FIVE: Confirm that ACM Has Been Installed

After you have confirmed that $USER\acm\keys\starter\keys.yaml and $USER\acm\keys\starter\config.yaml have been properly created, navigate the command line to any directory into which you want the Agile Cloud Manager to place resources for the given appliance.

Check the version by running the following command:

$ acm version  
1.3  

The version must be at least 1.3 to successfully run the Lake House example appliance. If you have a lower version installed, you will need to upgrade to the latest version, or at least to version 1.3.

STEP SIX: Run Setup CLI Command

Download and install all the requirements for the Lake House appliance by running the following CLI command in the new directory into which you want the Agile Cloud Manager to place the resources for the appliance:

acm setup on sourceRepo=https://github.com/AgileCloudInstitute/acm-demo-lake-formation.git    

After the setup command completes running, confirm that your current working directory directory contains an acmAdmin subdirectory and an acmConfig subdirectory in addition to subdirectories for the other repositories that are listed in the appliance’s setupConfig.yaml file. You should now be able to find setupConfig.yaml inside the acmConfig subdirectory now that the acm setup on sourceRepo=https://github.com/AgileCloudInstitute/acm-demo-lake-formation.git command has completed running.

For example, on a Windows DevBox, you might run the dir command and see:

C:\path\to\mydirectory>dir  
 Volume in drive C is Windows  
 Volume Serial Number is 3E5E-9650  
  
 Directory of C:\p\a\acm_dl  
  
10/25/2023  04:48 PM    <DIR>          .  
10/23/2023  09:09 AM    <DIR>          ..  
10/23/2023  09:35 AM    <DIR>          acm-system-templates  
10/23/2023  09:34 AM    <DIR>          acmAdmin  
10/23/2023  09:34 AM    <DIR>          acmConfig  
10/23/2023  09:35 AM    <DIR>          aws-building-blocks  

The acmAdmin and acmConfig subdirectories will be present for any acm working directory after setup is run. The acm-system-templates and aws-building-blocks subdirectories are specific to this demo, and their names can be validated by examining the contents of the setupConfig.yaml file that you will find inside the acmConfig directory.

STEP SEVEN: Run Appliance CLI Commands

You can create the entire appliance by running “acm appliance on”, and you can destroy the entire appliance by running “acm appliance off”.

But it is better to run smaller, more narrowly-scoped commands the first time.

Narrowly-scoped commands make it easier for you to learn how to examine the logs and to troubleshoot to understand what is going on.

Therefore, run the following create commands one at a time in sequence. Wait until each command has finished running, and monitor the progress in the AWS GUI console to see everything working properly. If you encounter any errors, examine the log files.

acm foundation on systemName=lakehouse-core  
acm services on systemName=lakehouse-core  

acm foundation on systemName=lakehouse-engineer  
acm services on systemName=lakehouse-engineer  
  
acm foundation on systemName=lakehouse-scientist  
acm services on systemName=lakehouse-scientist  

Then after you get all the above “on” commands working properly, run the “off” commands one at a time as follows:

acm services off systemName=lakehouse-scientist  
acm foundation off systemName=lakehouse-scientist  
  
acm services off systemName=lakehouse-engineer  
acm foundation off systemName=lakehouse-engineer  
  
acm services off systemName=lakehouse-core  
acm foundation off systemName=lakehouse-core  

You can watch the appliance being created in the following two ways:

  1. The command line will print out structured logs organized by workflow steps you can read about in the documentation on this web site . These same logs will be stored in local log files on your machine.
  2. The AWS GUI console at aws.amazon.com has a cloudformation stacks dashboard. You can log in to that dashboard and select the appropriate region to view the stacks in progress. The region must be us-east-1 for this example application.

After you have gotten all of the narrowly-scoped commands working, you can create and then destroy the entire appliance by running the following two commands one after the other:

acm appliance on  
acm appliance off    

Confirm in the AWS GUI console that all involved stacks have been deleted before moving on. If you encounter a problem, you can diagnose the problem by examining the logs. If you need help, create a ticket at the project website and someone will respond to help you in a timely manner.

STEP EIGHT: Run Other CLI Commands

Experiment with other CLI commands after the appliance has been destroyed. The other CLI commands will enable you to create and destroy individual components of the appliance.

The documentation for the CLI commands is at this link.

You can read about the language that defines the objects on which the CLI commands work at this second link.

And you can read about operating on the object model using the CLI at this third link.

STEP NINE: Clean Up

Back up keys.yaml and config.yaml someplace safe so that you can re-use them later.

Confirm that all relevant stacks have been deleted by viewing the AWS GUI console’s cloudformation stacks dashboard for the us-east-1 region.

Run acm setup off if you wish.

Confirm that anything you created has now been deleted.

Make sure that there are no keys.yaml or config.yaml in your $USER\acm\keys\starter directory after you backed up those files to a safe location.

Dig Deeper

If you encounter any errors, or if you want to experiment, dig deeper, and potentially cleanup after running “acm appliance on” and “acm appliance off”, you can try reading the instructions at this link

Return to the list of example appliances at this link