Qué necesitas saber antes de
comenzar

Inicio 17 July 2026 13:47

Fin 17 July 2026

00 Días

00 Horas

00 Minutos

00 Segundos

Registrarse

Introducción a la Ingeniería de Datos utilizando IA Generativa

Guía Práctica para Principiantes sobre GenAI y LLMs para Transformar, Cargar y Modelar Datos con Python y SQL

via Udemy

6 hours 8 minutes

Actualización opcional disponible

Not Specified

Avanza a tu propio ritmo

Paid Course

Actualización opcional disponible

Resumen

Hands-On Beginner's Guide to GenAI and LLMs for Transforming, Loading, and Modeling Data with Python and SQL What you'll learn:

Use large language models to create Python code to implement data pipelinesUse LLMs to solve data loading, data transformation, and data quality assessment challengesCreate databases and analytic data models using generative AICreate Python, SQL, and Bash scripts to implement common data engineering tasks Updated description 3/12/2024Generative AI tools such as ChatGPT, Claude, and Bard are making data engineering more accessible and more efficient. If you work with spreadsheets or business intelligence tools but aren't too familiar with Python or SQL, then generative AI can help you analyze data and build your own data pipelines and ETL/ELT processes.Generative AI and LLMs will not replace data engineers or data analysts but those who know how to use these AI tools will be able to build more capable and reliable data pipelines faster.

They will also have access to a tool that can help you develop your Python, SQL, and data modeling skills by providing a variety of examples of functional code and help with error messages and troubleshooting processes that do not work as expected.Learn Data Engineering Techniques as Well as Data Engineering ToolsIn this course, you will learn how to break down data engineering problems into a series of tasks that can be automated using Python, SQL, and command line scripts generated by a large language model (LLM). Prompting an AI to "generate a data processing script to do X, Y, and Z" will probably not get you the results you expect.

LLMs are powerful tools, but they are not oracles. As with any tool, we need to understand what the tool is capable of and how to use the capabilities to meet our needs.

This course shows you how to think through a data transformation and loading problems, incrementally building components of a solution. This course is organized into several topics that cover the fundamental skills needed to begin work in data engineering using GenAI, including:

Introduction to large language models, foundation models, and other AI topics related to data engineering.

This course uses Claude AI from Anthropic, a large language model that is both well suited to data engineering code generation and free to use.Working with CSV and JSON filesData quality and data cleaning, including statistics and visualizationsExtraction transformation and load (ETL)/ extraction, load, and transform (ELT) processesRelational and NoSQL databasesData modeling using dimensional data model patternsWorking with JSON data in relational databases such as PostgreSQLThe course begins with the most basic of data engineering tasks:

working with files. You will learn how to quickly filter, transform, and find problems in data sets made up of comma-separated value (CSV) and JSON files.

You'll also see how we can create samples from large data sets to efficiently experiment with different solutions to our data engineer needs. You will learn how to generate code that uses command line utilities like awk, a text processing and data extraction tool, and jq, a tool for parsing, filtering, and transforming JSON data.

If you are not familiar with tools like awk and jq, that is no problem. In this course, you will learn how to describe what you want in a solution so the LLM can choose an appropriate tool for the job.Data quality is a primary concern in any data engineering project.

Fortunately, with GenAI and a basic understanding of data quality checks, you can quickly generate scripts to check for common data quality problems and apply transformations to the data to correct for those problems. Statistics and visualizations are important tools for ensuring data quality.

In this course, you will learn how to use basic statistics and visualizations to help with data quality and data exploration. And because generative AI is used to generate code, you can spend more time learning about statistics, visualizations, and how to apply them to your problem domain and less time trying to find syntax errors or debug a logic error in your code.Databases are the foundation of many applications and data analysis platforms.

You will learn about relational databases as well as NoSQL databases and when to use them. Databases are complicated systems that require that we describe how we want to structure our data.

This process is known as data modeling. This course will introduce data modeling with a focus on dimensional modeling, a commonly used data model pattern in data analytics.

You will also learn how to generate SQL code to implement dimensional models, load data into your database, and query and analyze data once it is loaded.Now is a great time to become a data engineer because the demand for data engineering skills is high and we now have tools in place that allow us to focus on the problems we are solving while accelerating how quickly we can create scalable, reliable data pipelines.

Programa

Descripción del Curso

Introducción a la Ingeniería de Datos y la IA Generativa

Objetivos y Resultados del Curso

Resumen del Temario y Estructura del Curso

Fundamentos de la Ingeniería de Datos

Introducción a los Flujos de Datos

Conceptos Clave: ETL (Extracción, Transformación, Carga)

Visión General de Soluciones de Almacenamiento de Datos: Bases de Datos, Lagos de Datos y Almacenes

Introducción a la IA Generativa

¿Qué es la IA Generativa?

Visión General de Modelos Generativos: GANs, VAEs y Transformadores

Recolección y Limpieza de Datos

Fuentes de Datos y Adquisición

Calidad de los Datos: Técnicas de Limpieza y Preprocesamiento

Automatización de la Limpieza de Datos con IA Generativa

Sistemas de Almacenamiento de Datos

Bases de Datos Relacionales vs. No Relacionales

Introducción al Almacenamiento y Gestión en la Nube

Aprovechamiento de la IA Generativa para la Estructuración de Datos

Transformación de Datos e Ingeniería de Características

Procesos de Transformación en Ingeniería de Datos

Técnicas de Selección e Ingeniería de Características

Papel de la IA Generativa en la Creación de Características

Construcción y Gestión de Flujos de Datos

Arquitectura de Flujos y Gestión de Flujos de Trabajo

Herramientas y Plataformas para la Automatización de Flujos

Uso de la IA para la Optimización de Flujos

Introducción a la Infraestructura de Aprendizaje Automático

Infraestructura de AA en Ingeniería de Datos

Gestión y Escalado de Modelos de AA con IA Generativa

Estudios de Caso y Aplicaciones de la IA Generativa en Ingeniería de Datos

Aplicaciones del Mundo Real y Estudios de Caso

Consideraciones Éticas y Mejores Prácticas

Revisión y Trabajo de Proyecto

Proyecto de Culminación: Diseño de un Flujo de Datos con Integración de IA Generativa

Presentaciones y Retroalimentación

Conclusión y Direcciones Futuras

Resumen de Temas Clave

Tendencias Futuras en Ingeniería de Datos e IA

Retroalimentación del Curso y Recursos de Aprendizaje Adicionales

Impartido por

Dan Sullivan

Materias

Data Science

Qué necesitas saber antes de comenzar

Introducción a la Ingeniería de Datos utilizando IA Generativa

6 hours 8 minutes

Not Specified

Paid Course

Resumen

Programa

Impartido por

Materias

IA para Automatización y Modelado de FP&A

FP&A con IA: Proyecto de Fin de Carrera

Interpretabilidad de los LLM - Generación de Descripciones de Características de SAE - Primavera 2026

CodeCloak: Un método basado en DRL para mitigar la fuga de código por asistentes de código LLM

IA generativa para PLN con PyTorch

Ingeniero de Aprendizaje Automático: Modelos de ML y Aprendizaje Profundo

Qué necesitas saber antes de
comenzar