Ce que vous devez savoir avant
Vous commencez

Débute 14 July 2026 23:02

Se termine 14 July 2026

00 Jours

00 Heures

00 Minutes

00 Secondes

Data Engineering on AWS - A Batch Data Pipeline Solution (Includes Labs)

Les données affluent constamment dans les organisations à partir de nombreuses sources. Pour tirer des informations et de la valeur de ces données, elles doivent passer par un pipeline orchestré d'étapes d'ingestion, de stockage, de traitement et de diffusion. Ce cours vous apprendra à construire des pipelines de données par lots évolutifs, sécu.

via AWS Skill Builder

Non spécifié

Amélioration optionnelle disponible

Tous niveaux

Progressez à votre rythme

Free

Amélioration optionnelle disponible

Aperçu

Data is constantly flowing into organizations from many sources. To derive insights and value from this data, it needs to go through an orchestrated pipeline of ingestion, storage, processing, and serving stages.

This course will teach you how to build scalable, secure, and cost-effective batch data pipelines on AWS.

You will learn best practices for ingesting batch data from sources like databases and data lakes. The course explores services like AWS Glue and Amazon EMR for processing and transforming the raw data into analytics-ready datasets.

The course covers data cataloging with the AWS Glue Data Catalog. You will also learn how to serve processed data for analysis, machine learning, and reporting using services like Amazon Athena and Amazon QuickSight.

Activities

This course includes interactive content, videos, knowledge checks, assessments, and hands-on labs.

Course objectives

In this course, you will learn to do the following:

Describe the purpose, architecture, and processes of a batch data pipeline solution on AWS.
Identify the appropriate AWS services and configurations for building a batch data pipeline solution.
Explain the processes of data ingestion, processing, cataloging, and serving data for consumption in a batch data pipeline.
Implement automation, orchestration, security, and governance options for a batch data pipeline solution.
Monitor, optimize, and troubleshoot a batch data pipeline solution on AWS.
Build and deploy a batch data pipeline solution using AWS services like Amazon EMR, AWS Glue, Amazon S3, and Amazon Athena. (Lab 1 and 2)

Intended audience

This course is intended for the following job roles:

Data Engineers
Data Scientists
Data Analysts
Business Intelligence Engineers

Prerequisites

We recommend that attendees of this course have the following:

2-3 years of experience in data engineering
1-2 years of hands-on experience with AWS services
Completed AWS Cloud Practitioner Essentials
Completed Fundamentals of Analytics on AWS - Parts 1 and 2
Completed Data Engineering on AWS - Foundations

Course outline Module 1 - Building a Batch Data Pipeline (35 min)

This section lays the foundation for building a batch data pipeline on AWS.

It covers the key design considerations, data ingestion methods, and provides an assessment to evaluate your understanding of constructing a robust batch data pipeline solution.

Lesson 1:
Course Navigation
Lesson 2:
Introduction
Lesson 3:
Designing a Batch Data Pipeline
Lesson 4:
Ingesting Data
Lesson 5:
Assessment
Lesson 6:
Conclusion
Lesson 7:
Contact Us

Module 2 - Implementing the Batch Data Pipeline (30 min)

After designing the batch pipeline, this section dives into the implementation details. You'll learn how to process and transform data, catalog it for governance, and serve it for consumption by analytics tools.

An assessment reinforces the concepts.

Lesson 1:
Course Navigation
Lesson 2:
Introduction
Lesson 3:
Processing and Transforming Data
Lesson 4:
Cataloging data
Lesson 5:
Serving Data for Consumption
Lesson 6:
Assessment
Lesson 7:
Conclusion

Module 3:

A Day in the life of a Data Engineer (Lab) (60 min)

In this lab, you will use temperature and precipitation metrics to determine whether a company should stock summer or winter items for various cities. You'll create an AWS Glue crawler, review IAM policies, view the Data Catalog, run a Glue job to transform data,

Ce que vous devez savoir avant Vous commencez

Data Engineering on AWS - A Batch Data Pipeline Solution (Includes Labs)

Non spécifié

Tous niveaux

Free

Aperçu

Matières

CodeCloak : une méthode basée sur DRL pour atténuer les fuites de code par les assistants de code LLM

IA générative pour le TALN avec PyTorch

Ingénieur en apprentissage automatique : Modèles d'apprentissage automatique et profond

Préparation des données et apprentissage automatique appliqué

Création d'un assistant culinaire IA avec Django

Ingénierie des caractéristiques et magasins de caractéristiques pour l'IA et le ML

Ce que vous devez savoir avant
Vous commencez