Data Engineering on AWS - A Streaming Data Pipeline Solution (Includes Labs)

via AWS Skill Builder

AWS Skill Builder

411 Courses


course image

Overview

In this course, you will learn to build a streaming data analytics solution using AWS services, including Amazon Kinesis, Amazon Data Firehose, and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Kinesis is a massively scalable and durable real-time data streaming service. Amazon MSK offers a secure, fully managed, and highly available Apache Kafka service.

You will learn how Kinesis and Amazon MSK integrate with AWS services such as AWS Glue and AWS Lambda. The course addresses the streaming data ingestion, stream storage, and stream processing components of the data analytics pipeline. You will also learn to apply security, performance, and cost management best practices to the operation of Kinesis and Amazon MSK.

The course is divided into different modules. The learning modules introduce new concepts and the AWS services you can use to build your solution. Lab modules are in-depth, hands-on activities with step-by-step instructions for you to apply what you’ve learned.

Activities

Interactive content, videos, knowledge checks, assessments, and hands-on labs

Course Objectives

  • Recognize an analytics customer challenge and describe the appropriate AWS solution for solving it featuring a streaming data architecture.
  • Describe data sources suitable for streaming applications and how that data is ingested.
  • Identify short-term and long-term storage services for streaming data.
  • Describe how to design and implement real-time data processing solutions.
  • Recognize how to serve streaming data for consumption by end users.
  • Describe how to optimize a streaming data pipeline using Amazon Kinesis, Amazon MSK, and Amazon Redshift.
  • Identify best practices for securing a streaming data pipeline.

Intended Audience

  • Data engineer
  • Data analyst
  • Data architect
  • Business intelligence engineer

Recommended Skills

  • 2-3 years of experience in data engineering
  • 1–2 years of hands-on experience with AWS services
  • Completed AWS Cloud Practitioner Essentials or equivalent
  • Completed Fundamentals of Analytics on AWS Part 1 and 2
  • Completed Data Engineering on AWS – Foundations

Course Outline

Module 1: Building a Streaming Data Pipeline Solution (75 min)

This course shows how to identify, select, and configure the appropriate AWS services for building a streaming data pipeline solution to meet a fictitious customer's business goals.

  • Introduction
  • Ingesting Data from Stream Sources
  • Storing Streaming Data
  • Processing Data
  • Analyzing Data
  • Final Assessment
  • Conclusion

Module 2: Streaming Analytics with Amazon Managed Service for Apache Flink (Lab) (45 min)

This lab is a step-by-step, hands-on activity to build a stream processing pipeline by ingesting clickstream data and enriching the clickstream data with catalog data stored in Amazon Simple Storage Service (Amazon S3). You perform analysis on the enriched data to identify the sales per category in real time and visualize the output.

  • Lab overview
  • Task 1: Setting up Zeppelin notebook environment
  • Task 2: Connect to the Amazon EC2 producer and start the clickstream generator
  • Task 3: Import the Zeppelin notebook
  • Task 4: Analytics development in Managed Apache Flink Studio with Zeppelin notebook
  • Task 5: Understanding in-memory table creation in AWS Glue Data Catalog
  • Conclusion

Module 3: Optimizing and Securing a Streaming Data Pipeline Solution (45 min)

This course covers how to configure a fictitious customer's streaming data pipeline solution to increase efficiency, control costs, secure and protect the data, and govern the infrastructure.

Found in