What You Need to Know Before
You Start

Starts 14 July 2026 15:44

Ends 14 July 2026

00 Days

00 Hours

00 Minutes

00 Seconds

Data Engineering on AWS - A Streaming Data Pipeline Solution

Data Engineering on AWS - A Streaming Data Pipeline Solution In this course, you will learn to build a streaming data analytics solutions using AWS services, including Amazon Kinesis, Amazon Data Firehose, and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Kinesis is a massively scalable and durable real-time data streaming service. Amazo.

via AWS Skill Builder

Not Specified

Optional upgrade avallable

All Levels

Progress at your own speed

Free

Optional upgrade avallable

Overview

In this course, you will learn to build a streaming data analytics solutions using AWS services, including Amazon Kinesis, Amazon Data Firehose, and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Kinesis is a massively scalable and durable real-time data streaming service.

Amazon MSK offers a secure, fully managed, and highly available Apache Kafka service.

You will learn how Kinesis and Amazon MSK integrate with AWS services such as AWS Glue and AWS Lambda. The course addresses the streaming data ingestion, stream storage, and stream processing components of the data analytics pipeline.

You will also learn to apply security, performance, and cost management best practices to the operation of Kinesis and Amazon MSK.

The course is divided into different modules. The learning modules introduce new concepts and the AWS services you can use to build your solution.

Lab modules are in-depth, hands-on activities with step-by-step instructions for you to apply what you’ve learned.

Activities:

Interactive content, videos, knowledge checks, assessments, and hands-on labs

Course objectives:

Recognize an analytics customer challenge and describe the appropriate AWS solution for solving it featuring a streaming data architecture.
Describe data sources suitable for streaming applications and how that data is ingested.
Identify short-term and long-term storage services for streaming data.
Describe how to design and implement real-time data processing solutions.
Recognize how to serve streaming data for consumption by end users.
Describe how to optimize a streaming data pipeline using Amazon Kinesis, Amazon MSK, and Amazon Redshift.
Identify best practices for securing a streaming data pipeline.

Intended audience:

Data engineer
Data analyst
Data architect
Business intelligence engineer

Recommended skills:

2-3 years of experience in data engineering
1–2 years of hands-on experience with AWS services
Completed AWS Cloud Practitioner Essentials or equivalent
Completed Fundamentals of Analytics on AWS Part 1 and 2
Completed Data Engineering on AWS – Foundations

Course outline:

Module 1:

Building a Streaming Data Pipeline Solution

This course shows how to identify, select, and configure the appropriate AWS services for building a streaming data pipeline solution to meet a fictitious customer's business goals.

Introduction
Ingesting Data from Stream Sources
Storing Streaming Data
Processing Data
Analyzing Data
Final Assessment
Conclusion

Module 2:

Streaming Analytics with Amazon Managed Service for Apache Flink (Lab)

This lab is a step-by-step, hands-on activity to build a stream processing pipeline by ingesting clickstream data and enriching the clickstream data with catalog data stored in Amazon Simple Storage Service (Amazon S3). You perform analysis on the enriched data to identify the sales per category in real time and visualize the output.

Lab overview
Task 1:
Setting up Zeppelin notebook environment
Task 2:
Connect to the Amazon EC2 producer and start the clickstream generator
Task 3:
Import the Zeppelin notebook
Task 4:
Analytics development in Managed Apache Flink Studio with Zeppelin notebook
Task 5:
Understanding in-memory table creation in AWS Glue Data Catalog
Conclusion

Module 3:

Optimizing and Securing a Streaming Data Pipeline Solution

This course covers how to configure a fictitious customer's streaming data pipeline solution to increase efficiency, control costs, secure and protect the data, and govern the infrastructure.

Optimization
Security and Governance
Final Assessment

What You Need to Know Before You Start

Data Engineering on AWS - A Streaming Data Pipeline Solution

Not Specified

All Levels

Free

Overview

Subjects

CodeCloak: A DRL-Based Method for Mitigating Code Leakage by LLM Code Assistants

Generative AI for NLP with PyTorch

Machine Learning Engineer: ML and Deep Learning Models

Data Preparation & Applied Machine Learning

Building an AI Cooking Helper with Django

Feature Engineering and Feature Stores for AI and ML

What You Need to Know Before
You Start