Agile Cloud Institute

Cross-Functional Architecture And Tools For Cloud-Based Operating Models

Agile Data Lake House Architecture with the Agile Cloud Manager

The entire transcript of this video is given below the video so that you can read and consume it at your own pace. Screen shots of each slide are also given below to make it easier for you to connect the words with the pictures. We recommend that you both read and watch to make it easier to more completely grasp the material.

An entire data lake house can be put under Agile management if you decompose it into meaningful subcomponents that can be deployed individually.

Agile Cloud Manager makes it easy to put a data lake house under Agile management because Agile Cloud Manager makes it easy for you to deploy subcomponents individually or as groups.

Agenda

In this workshop, we will explain how to manage Agile data lake houses by taking you through eight slides summarized as follows.

AgileDataLakeHouses2

What is an Agile Data Lake House?

Now let’s examine what a data lake house is.

AgileDataLakeHouses3

An Agile data lake house is a data architecture that combines the following three things:

An Agile data lake house is a data architecture that combines these best aspects of data lakes, data warehouses, and Agile management.

Components of a Data Lake House

To illustrate the components of a data lake house, let’s begin with the various types of data sources on the left, and then proceed through the various components all the way to the end users of the data lake house, which you can see on the right.

AgileDataLakeHouses4

The numbers will guide us through the components.

Number 1 on the slide marks the data sources.

Some of the many types of data that can be ingested into a data lake house include:

Storage is marked by number 2 on the slide. The storage layer includes a data lake and a data warehouse.

A central data catalog is number 3 on the slide. The data catalog is a meta-layer that enables all the data in the entire data lake to be organized and searched. Your organization creates its own data catalog as a navigable map to all the resources in both the data lake and the data warehouse. A well-organized data catalog enables you to search the unstructured data in the data lake as easily as you can search the structured data in the data warehouse.

Number 4 on the slide marks the data cleansing that gets done in the lake house. Many of your data pipelines are summarized here. This includes Extract-Transform-Load, E.L.T., pipelines. This also includes Extract-Load-Transform, E.T.L., pipelines.

The transaction layer is marked by number 5 on the slide. The transaction layer is what connects the data catalog with the various applications that end users will employ to connect with the data that is summarized in the data catalog.

Integrations between the lake house and other tools employed by end users are summarized by number 6 on the slide. The types of tools that can integrate include:

Many different types of end users are summarized by number 7 on the slide. The types of users include as many different types of people as there might be in your extended enterprise. These include executives, sales teams, marketing teams, operations teams, human resources teams, financial teams, and other types of users.

Number 8 on the slide illustrates that all these different components of the lake house need to have quality administration, including enterprise-grade security, data integrity, and performance.

Different lake house vendors have their own various tools for managing each of the 8 basic types of components of a data lake house.

Architecture of an Agile, tool-agnostic data lake house begins by looking at all these components together as functional units before you even begin to make decisions about which tools to use for each component.

Layered Architecture

It can also help to look at a data lake house as four layers that include and ingestion layer, a storage layer, a data management layer, and a consumption layer.

AgileDataLakeHouses5

The ingestion layer is where many different types of data are ingested into both the data lake and the data warehouse. Different types of automation are employed to do the data ingestion depending on the type of data. For example, streaming data might be ingested using tools like Kafka. Batch files might be ingested by scheduled jobs. Other types of data might be ingested using other types of tools.

The storage layer is where all the many different types of data will be housed. The storage layer can be partitioned so that it will be clear how close each partition is to being ready to be consumed. For example, there might be:

The data management layer is where all the computing work gets done. This includes running the many pipelines that transform and write data into the different partitions during the work of discovery, cleaning, processing, monitoring, serving, and archiving. The data catalog is part of the data management layer. All the governance and security get done at the data management layer.

The consumption layer is what many different types of user groups use to interact with the curated unified data store. Each of these various types of user groups utilizes its own types of tools based on their skill sets and based on their needs.

Examine The Life Cycle Of Each Element Of Lake House

An Agile data lake house needs to have each of its components managed separately and deployed separately, while remaining part of an integrated, testable, whole.

AgileDataLakeHouses6

The architecture therefore must consider the different life cycles of each of the different components.

You can start to identify the different life cycles if you examine three main things:

Group Elements Of Lake House Into Separate Systems

A data lake house is complex enough that it can include several different systems.
Each system in your data lake house can best be defined in a way that has clear functionality, life cycle, and building blocks.

AgileDataLakeHouses7

It can help to think of your data lake appliance as a core system with several satellite systems.

The core system might include things that are shared by all the other systems. For example, your core system might include storage and IAM roles, among other things.

And each satellite system might define a functional unit that interacts directly with the core system to do specific things. A satellite system might correspond with a specific user group that does specific functions that can be grouped together as a coherent whole.

Building blocks refer to the 3rd party infrastructure templates that the Agile Cloud Manager orchestrates.

Organize Components Of Each System

The Agile Cloud Manager’s simple templating language makes it very easy for you to organize the components of each system into units that can be deployed and operated on individually.

AgileDataLakeHouses8

Several different types of services might be included in each system.

The types of services can each do very different, but yet complementary things.

There might be multiple different instances of each type of service in a given system. Each instance of each type of service might be instantiated using different parameters, so that each instance of each type of service might be designed to behave differently from the other instances. One example might be if you have different instances of the same type of service running in different regions.

An optional foundation can also be defined for each of your systems.

The foundation is intended to provide shared resources that are used by multiple different types of services. Networking is one example of something that might best be defined in a shared foundation.

The various types of services might each be deployed into the same network, and that same shared network would best be defined in a shared foundation.

Define Agile Products For Each Component

Agile Product Management can enable your organization to manage the evolution of each element of your data lake house separately and as a group of distinct entities, so that your organization can evolve more effectively.

AgileDataLakeHouses8a

Four layers can organize each of the new Agile products that you can define. These layers include:

Each of your new Agile products can get all the individual attention it needs including:

Example Lake House Appliance

A working example of a software-defined data lake house appliance is available in the marketplace section of AgileCloudInstitute.io. You can seed your Agile data lake house program with this working example.

AgileDataLakeHouses9

The example data lake house appliance is composed of three systems:

Each of the systems in the example lake house appliance includes its own:

Every aspect of the example data lake house appliance is software-defined, meaning that nothing will remain after you destroy each instance of the lake house appliance.

Every level defined in the example lake house appliance can be orchestrated with its own CLI deployments that perform create, update, delete CRUD operations on each level.

You can learn about the CLI commands and about the object model in the Engineering section of AgileCloudInstitute.io.

Next Steps:

The next steps after reading this article include:

back to Site Home
back to Architecture section Home